Fix lexer skipping tokens when heredoc body is unterminated #3918

Earlopain · 2026-02-11T12:09:56Z

When we hit EOF and still have lex modes left, it means some content was unterminated.
Heredocs specifically have logic that needs to happen when the body finished lexing.
If we don't reset the mode back to how it was before, it will not continue lexing at the correct place.

Also skips the heredoc_end tokens that prism emits in the ripper translator. They are for bookkeeping only and ripper doesn't do so.

When we hit EOF and still have lex modes left, it means some content was unterminated. Heredocs specifically have logic that needs to happen when the body finished lexing. If we don't reset the mode back to how it was before, it will not continue lexing at the correct place.

Prism inserts these to make bookkeeping easier. Ripper does not do so.

Earlopain · 2026-02-11T12:10:31Z

test/prism/lex_test.rb

+        STRING_BEGIN(1,0)-(1,1)("\"")
+        EMBEXPR_BEGIN(1,1)-(1,3)("\#{")
+        CONSTANT(1,3)-(1,4)("C")
+        EOF(1,4)-(1,4)("")


This containing EOF twice is preexisting and not caused by this change

Earlopain · 2026-02-11T12:11:15Z

test/prism/lex_test.rb

-        prism = Prism.lex_compat(File.read(__FILE__), version: "current").value
-        ripper = Ripper.lex(File.read(__FILE__))
+      def test_lex_compat
+        source = "foo bar"


I added heredocs below where ripper doesn't do the state correctly. So I just have it parse some other source

Earlopain · 2026-02-11T12:12:50Z

src/prism.c

+                // Only when no mode is remaining will we actually emit the EOF token.
+                if (parser->lex_modes.current->mode != PM_LEX_DEFAULT) {
+                    lex_mode_pop(parser);
+                    goto switch_lex_mode;


Unsure about this but it was the easiest way to accomplish it. There are already gotos in the method (lex_next_token) so maybe it is fine?

Could this just be return parser_lex?

Earlopain · 2026-02-11T12:15:20Z

test/prism/lex_test.rb

+    end
+
+    def assert_lexed(code, expected)
+      actual = Prism.lex(code).value.map { |token| token[0].pretty_inspect }


I don't think there will be many tests of this sort, for now writing them like this seems fine. I added some errors tests but not everything will be visible there (like double EOF). The error test highlights + after the heredoc identifer which it didn't do before so if you think that is enough I can remove this

kddnewton · 2026-02-11T15:45:19Z

src/prism.c

+                // Only when no mode is remaining will we actually emit the EOF token.
+                if (parser->lex_modes.current->mode != PM_LEX_DEFAULT) {
+                    lex_mode_pop(parser);
+                    goto switch_lex_mode;


Could this just be return parser_lex?

kddnewton · 2026-02-11T15:46:06Z

test/prism/lex_test.rb

+      assert_lexed(<<~'RUBY'.strip, <<~'LEXED')
+        "#{
+      RUBY
+        STRING_BEGIN(1,0)-(1,1)("\"")


I really don't like these kinds of snapshot tests because they're usually very difficult to update, and it's not clear what exactly is being tested. Could you instead lex the content, get the tokens, and make assertions against the specific behavior you're looking for?

Earlopain added 2 commits February 11, 2026 13:05

Skip missing heredoc end in ripper translator

4b5ca09

Prism inserts these to make bookkeeping easier. Ripper does not do so.

Earlopain commented Feb 11, 2026

View reviewed changes

kddnewton requested changes Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix lexer skipping tokens when heredoc body is unterminated #3918

Fix lexer skipping tokens when heredoc body is unterminated #3918

Uh oh!

Earlopain commented Feb 11, 2026

Uh oh!

Earlopain Feb 11, 2026

Uh oh!

Earlopain Feb 11, 2026

Uh oh!

Earlopain Feb 11, 2026

Uh oh!

kddnewton Feb 11, 2026

Uh oh!

Earlopain Feb 11, 2026

Uh oh!

kddnewton Feb 11, 2026

Uh oh!

kddnewton Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix lexer skipping tokens when heredoc body is unterminated #3918

Are you sure you want to change the base?

Fix lexer skipping tokens when heredoc body is unterminated #3918

Uh oh!

Conversation

Earlopain commented Feb 11, 2026

Uh oh!

Earlopain Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Earlopain Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Earlopain Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kddnewton Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Earlopain Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kddnewton Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kddnewton Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants