Skip to content

Conversation

@tenderlove
Copy link
Member

If we hit an EOF token, and the character before the EOF is a newline, we need to rewind the error to the newline so that error messages contain the correct newline.

Before this commit, we would report an error on the "line of the EOF" which would come right after the newline

https://bugs.ruby-lang.org/issues/20918

@kddnewton
Copy link
Collaborator

It looks like this works for this particular case, but there are many other ways to add errors to the list without going through this macro so I think I would like to solve it in another place. I think https://github.com/ruby/ruby/blob/f43585b02c3634ab9a4e54049b08e04ab1a640fd/prism_compile.c#L10242 is the place to fix it, which handles formatting the errors for stdout. That way it's all handled in one place regardless of how the errors are added to the parser struct so we don't have to worry about other entrypoints. I'll see if I can take a look at it this morning.

@tenderlove
Copy link
Member Author

@kddnewton I don't think that's the right place to fix it. The way I understand that function is that it formats the errors for output. But in this case we're reporting the wrong line for the error. Consumers of the Ruby API for example, will get the wrong line number in their objects (if I understand this function correctly)

@tenderlove
Copy link
Member Author

I tried adding a Ruby level test for this bug, but it seems like my fix doesn't do the right thing. 😭

@tenderlove
Copy link
Member Author

Ok, I fixed it. I'm just bad at C.

@tenderlove
Copy link
Member Author

It looks like this works for this particular case, but there are many other ways to add errors to the list without going through this macro so I think I would like to solve it in another place.

If there isn't one entry point when adding an error via token, then it feels like we should make that the case.

I can't imagine knowing to "rewind" without the additional information of "this is the eof of the file!". The token knows that we're at the end of the file (based on the token type), so it seems natural to only do this when there is an EOF token.

@tenderlove
Copy link
Member Author

After thinking about this for a while, I've updated it to a fix I like better. I changed the tokenizer such that when it encounters an EOF, it checks whether or not the previous character was a newline and then changes that token's start position accordingly. This way we don't have to specifically check for an EOF token every time we add an error to the error list.

If we hit an EOF token, and the character before the EOF is a newline,
we should make EOF token start at the previous newline.  That way any
errors reported will occur on that line.

For example "foo(\n" should report an error on line 1 even though the
EOF technically occurs on line 2.

[Bug #20918]
https://bugs.ruby-lang.org/issues/20918
Copy link
Collaborator

@kddnewton kddnewton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great solution!

@tenderlove tenderlove merged commit f2a0fe6 into main Dec 5, 2024
55 of 56 checks passed
@tenderlove tenderlove deleted the eof-tok branch December 5, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants