Increment cursor to breakpoint + 2 #3290

eileencodes · 2024-12-05T21:21:48Z

When there is a \r\n in the following example: p eval "%\n1\r\n \n" Prism was returning "1\n " whereas parse.y returns "1". In this case the cursor needs to be set to the breakpoint (which is \r) plus 2 to ignore the new line and the space.

I'm having trouble figuring out how to test this but locally here is the before and after output:

Parse.y

$ ./miniruby --parser=parse.y test.rb
"1"

Prism Before:

$ ./miniruby --parser=prism test.rb
"1\n "

Prism after:

$ ./miniruby --parser=prism test.rb
"1"

Fixed example 2 from #3230

When there is a `\r\n` in the following example: `p eval "%\n1\r\n \n"` Prism was returning `"1\n "` whereas parse.y returns `"1"`. In this case the cursor needs to be set to the breakpoint (which is `\r`) plus 2 to ignore the new line and the space. I'm having trouble figuring out how to test this but locally here is the before and after output: Parse.y ``` $ ./miniruby --parser=parse.y test.rb "1" ``` Prism Before: ``` $ ./miniruby --parser=prism test.rb "1\n " ``` Prism after: ``` $ ./miniruby --parser=prism test.rb "1" ```

kddnewton · 2024-12-05T22:46:57Z

I believe that this fixes it but I'm super nervous about it because the fix doesn't make sense... the breakpoint has already been incremented by 1, so it seems like this would specifically move it forward by 3 bytes.

Could you test out some other combinations and see if you can reproduce? I can merge otherwise, but I think it merits some more time just because it's such a confusing area of the codebase.

eileencodes · 2024-12-05T22:59:33Z

Happy to test more but I'm not sure what else to test, the CRuby parser tests pass locally as do all the Prism tests. It also doesn't break the known cases I came up with more than they are already broken. I also am not super confident in this change other than for the testing I did.

tenderlove · 2024-12-09T21:53:07Z

Is this actually a bug in Prism?

@eightbitraptor and I were taking a look in to this issue. We'd expect both programs below to evaluate the same:

program1 = "%\n1_\r\n\n" # => parse.y: 1_, prism: "1_\n"
program2 = "%'1_\r\n'"   # => "1_\n"

p eval(program1)
p eval(program2)

Both program1 and program2 specify one byte as the string delimiter for %. With Prism, both of these programs output the same value, but on parse.y, they output different values. I'd expect both program1 and program2 to output the same value regardless of which parser was being used (IOW, it seems like parse.y is incorrect).

eightbitraptor · 2024-12-09T22:04:39Z

We raised this upstream as a Bug in parse.y

https://bugs.ruby-lang.org/issues/20938

tenderlove · 2024-12-10T19:23:38Z

I guess this fix isn't complete. The problem is that parse.y will do EOL "unescaping" before figuring out the string boundaries.

So this program %\n1\r\n \n becomes this program %\n1\n \n before string tokenization. The second \n becomes the delimiter for the string, and the space after the second newline is just part of the program itself.

Here is a program to show a more exacerbated issue:

def hi
  "hello"
end

str = "%\n1\r\n;hi\n"
p eval str, binding

With parse.y this will return the string hello because it calls the hi method. Prism returns the string "1\n;hi":

$ ruby --parser=parse.y test.rb
"hello"
$ ruby test.rb
"1\n;hi"

It looks like with this patch we still get the wrong value:

#<Prism::ParseResult:0x000000011c2bfeb8 @value=@ ProgramNode (location: (1,0)-(4,0))
├── flags: ∅
├── locals: []
└── statements:
    @ StatementsNode (location: (1,0)-(4,0))
    ├── flags: ∅
    └── body: (length: 1)
        └── @ StringNode (location: (1,0)-(4,0))
            ├── flags: newline
            ├── opening_loc: (1,0)-(2,0) = "%\n"
            ├── content_loc: (2,0)-(3,3) = "1\r\n;hi"
            ├── closing_loc: (3,3)-(4,0) = "\n"
            └── unescaped: "1hi"

Honestly, I think we should make a special case for % that uses \n as the delimiter, but I need to poke at the code a bit first. It seems like if the delimiter is \n, we should just scan to the first \n and call it a day (I think).

kddnewton · 2024-12-10T19:47:54Z

I don't think we can look for the next \n because we need to handle interpolation. But I think you're right that we need to have a special case. It seems like if (*breakpoint == lex_mode->as.string.terminator) { just falls down.

If I understand the problem correctly, then in order to implement this right we need to make a special case for \n or \r\n, because I think %\r\n foo \n would be accepted as well. So maybe we should set the terminator to be 0x80 or something to indicate that it should accept either kind of newline and then have a whole path through that checks this.

tenderlove · 2024-12-10T19:49:24Z

It seems like if (*breakpoint == lex_mode->as.string.terminator) { just falls down.

Ya, that's what I'm toying with right now. It doesn't seem to remove the \r though.

tenderlove closed this Dec 11, 2024

eileencodes deleted the partially-fix-3230 branch December 13, 2024 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increment cursor to breakpoint + 2 #3290

Increment cursor to breakpoint + 2 #3290

Uh oh!

eileencodes commented Dec 5, 2024 •

edited

Loading

Uh oh!

kddnewton commented Dec 5, 2024

Uh oh!

eileencodes commented Dec 5, 2024

Uh oh!

tenderlove commented Dec 9, 2024

Uh oh!

eightbitraptor commented Dec 9, 2024

Uh oh!

tenderlove commented Dec 10, 2024

Uh oh!

kddnewton commented Dec 10, 2024

Uh oh!

tenderlove commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Increment cursor to breakpoint + 2 #3290

Increment cursor to breakpoint + 2 #3290

Uh oh!

Conversation

eileencodes commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kddnewton commented Dec 5, 2024

Uh oh!

eileencodes commented Dec 5, 2024

Uh oh!

tenderlove commented Dec 9, 2024

Uh oh!

eightbitraptor commented Dec 9, 2024

Uh oh!

tenderlove commented Dec 10, 2024

Uh oh!

kddnewton commented Dec 10, 2024

Uh oh!

tenderlove commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eileencodes commented Dec 5, 2024 •

edited

Loading