-
Notifications
You must be signed in to change notification settings - Fork 175
Fix lexer skipping tokens when heredoc body is unterminated #3918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When we hit EOF and still have lex modes left, it means some content was unterminated. Heredocs specifically have logic that needs to happen when the body finished lexing. If we don't reset the mode back to how it was before, it will not continue lexing at the correct place.
Prism inserts these to make bookkeeping easier. Ripper does not do so.
| STRING_BEGIN(1,0)-(1,1)("\"") | ||
| EMBEXPR_BEGIN(1,1)-(1,3)("\#{") | ||
| CONSTANT(1,3)-(1,4)("C") | ||
| EOF(1,4)-(1,4)("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This containing EOF twice is preexisting and not caused by this change
| prism = Prism.lex_compat(File.read(__FILE__), version: "current").value | ||
| ripper = Ripper.lex(File.read(__FILE__)) | ||
| def test_lex_compat | ||
| source = "foo bar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added heredocs below where ripper doesn't do the state correctly. So I just have it parse some other source
| // Only when no mode is remaining will we actually emit the EOF token. | ||
| if (parser->lex_modes.current->mode != PM_LEX_DEFAULT) { | ||
| lex_mode_pop(parser); | ||
| goto switch_lex_mode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure about this but it was the easiest way to accomplish it. There are already gotos in the method (lex_next_token) so maybe it is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be return parser_lex?
| end | ||
|
|
||
| def assert_lexed(code, expected) | ||
| actual = Prism.lex(code).value.map { |token| token[0].pretty_inspect } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there will be many tests of this sort, for now writing them like this seems fine. I added some errors tests but not everything will be visible there (like double EOF). The error test highlights + after the heredoc identifer which it didn't do before so if you think that is enough I can remove this
| // Only when no mode is remaining will we actually emit the EOF token. | ||
| if (parser->lex_modes.current->mode != PM_LEX_DEFAULT) { | ||
| lex_mode_pop(parser); | ||
| goto switch_lex_mode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be return parser_lex?
| assert_lexed(<<~'RUBY'.strip, <<~'LEXED') | ||
| "#{ | ||
| RUBY | ||
| STRING_BEGIN(1,0)-(1,1)("\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like these kinds of snapshot tests because they're usually very difficult to update, and it's not clear what exactly is being tested. Could you instead lex the content, get the tokens, and make assertions against the specific behavior you're looking for?
Fixes #3911
When we hit EOF and still have lex modes left, it means some content was unterminated.
Heredocs specifically have logic that needs to happen when the body finished lexing.
If we don't reset the mode back to how it was before, it will not continue lexing at the correct place.
Also skips the heredoc_end tokens that prism emits in the ripper translator. They are for bookkeeping only and ripper doesn't do so.