Skip to content

Conversation

@Earlopain
Copy link
Collaborator

Fixes #3911

When we hit EOF and still have lex modes left, it means some content was unterminated.
Heredocs specifically have logic that needs to happen when the body finished lexing.
If we don't reset the mode back to how it was before, it will not continue lexing at the correct place.

Also skips the heredoc_end tokens that prism emits in the ripper translator. They are for bookkeeping only and ripper doesn't do so.

When we hit EOF and still have lex modes left, it means some content was unterminated.
Heredocs specifically have logic that needs to happen when the body finished lexing.
If we don't reset the mode back to how it was before, it will not continue lexing at the correct place.
Prism inserts these to make bookkeeping easier. Ripper does not do so.
STRING_BEGIN(1,0)-(1,1)("\"")
EMBEXPR_BEGIN(1,1)-(1,3)("\#{")
CONSTANT(1,3)-(1,4)("C")
EOF(1,4)-(1,4)("")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This containing EOF twice is preexisting and not caused by this change

prism = Prism.lex_compat(File.read(__FILE__), version: "current").value
ripper = Ripper.lex(File.read(__FILE__))
def test_lex_compat
source = "foo bar"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added heredocs below where ripper doesn't do the state correctly. So I just have it parse some other source

// Only when no mode is remaining will we actually emit the EOF token.
if (parser->lex_modes.current->mode != PM_LEX_DEFAULT) {
lex_mode_pop(parser);
goto switch_lex_mode;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure about this but it was the easiest way to accomplish it. There are already gotos in the method (lex_next_token) so maybe it is fine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be return parser_lex?

end

def assert_lexed(code, expected)
actual = Prism.lex(code).value.map { |token| token[0].pretty_inspect }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there will be many tests of this sort, for now writing them like this seems fine. I added some errors tests but not everything will be visible there (like double EOF). The error test highlights + after the heredoc identifer which it didn't do before so if you think that is enough I can remove this

// Only when no mode is remaining will we actually emit the EOF token.
if (parser->lex_modes.current->mode != PM_LEX_DEFAULT) {
lex_mode_pop(parser);
goto switch_lex_mode;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be return parser_lex?

assert_lexed(<<~'RUBY'.strip, <<~'LEXED')
"#{
RUBY
STRING_BEGIN(1,0)-(1,1)("\"")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like these kinds of snapshot tests because they're usually very difficult to update, and it's not clear what exactly is being tested. Could you instead lex the content, get the tokens, and make assertions against the specific behavior you're looking for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prism.lex_compat creates wrong on_sp token when used with heredoc and unclosed embexpr

2 participants