Skip to content

Conversation

@eileencodes
Copy link
Member

@eileencodes eileencodes commented Dec 5, 2024

When there is a \r\n in the following example: p eval "%\n1\r\n \n" Prism was returning "1\n " whereas parse.y returns "1". In this case the cursor needs to be set to the breakpoint (which is \r) plus 2 to ignore the new line and the space.

I'm having trouble figuring out how to test this but locally here is the before and after output:

Parse.y

$ ./miniruby --parser=parse.y test.rb
"1"

Prism Before:

$ ./miniruby --parser=prism test.rb
"1\n "

Prism after:

$ ./miniruby --parser=prism test.rb
"1"

Fixed example 2 from #3230

When there is a `\r\n` in the following example: `p eval "%\n1\r\n \n"`
Prism was returning `"1\n "` whereas parse.y returns `"1"`. In this case
the cursor needs to be set to the breakpoint (which is `\r`) plus 2 to
ignore the new line and the space.

I'm having trouble figuring out how to test this but locally here is the
before and after output:

Parse.y

```
$ ./miniruby --parser=parse.y test.rb
"1"
```

Prism Before:

```
$ ./miniruby --parser=prism test.rb
"1\n "
```

Prism after:

```
$ ./miniruby --parser=prism test.rb
"1"
```
@kddnewton
Copy link
Collaborator

I believe that this fixes it but I'm super nervous about it because the fix doesn't make sense... the breakpoint has already been incremented by 1, so it seems like this would specifically move it forward by 3 bytes.

Could you test out some other combinations and see if you can reproduce? I can merge otherwise, but I think it merits some more time just because it's such a confusing area of the codebase.

@eileencodes
Copy link
Member Author

Happy to test more but I'm not sure what else to test, the CRuby parser tests pass locally as do all the Prism tests. It also doesn't break the known cases I came up with more than they are already broken. I also am not super confident in this change other than for the testing I did.

@tenderlove
Copy link
Member

Is this actually a bug in Prism?

@eightbitraptor and I were taking a look in to this issue. We'd expect both programs below to evaluate the same:

program1 = "%\n1_\r\n\n" # => parse.y: 1_, prism: "1_\n"
program2 = "%'1_\r\n'"   # => "1_\n"

p eval(program1)
p eval(program2)

Both program1 and program2 specify one byte as the string delimiter for %. With Prism, both of these programs output the same value, but on parse.y, they output different values. I'd expect both program1 and program2 to output the same value regardless of which parser was being used (IOW, it seems like parse.y is incorrect).

@eightbitraptor
Copy link
Contributor

We raised this upstream as a Bug in parse.y

https://bugs.ruby-lang.org/issues/20938

@tenderlove
Copy link
Member

I guess this fix isn't complete. The problem is that parse.y will do EOL "unescaping" before figuring out the string boundaries.

So this program %\n1\r\n \n becomes this program %\n1\n \n before string tokenization. The second \n becomes the delimiter for the string, and the space after the second newline is just part of the program itself.

Here is a program to show a more exacerbated issue:

def hi
  "hello"
end

str = "%\n1\r\n;hi\n"
p eval str, binding

With parse.y this will return the string hello because it calls the hi method. Prism returns the string "1\n;hi":

$ ruby --parser=parse.y test.rb
"hello"
$ ruby test.rb
"1\n;hi"

It looks like with this patch we still get the wrong value:

#<Prism::ParseResult:0x000000011c2bfeb8 @value=@ ProgramNode (location: (1,0)-(4,0))
├── flags: ∅
├── locals: []
└── statements:
    @ StatementsNode (location: (1,0)-(4,0))
    ├── flags: ∅
    └── body: (length: 1)
        └── @ StringNode (location: (1,0)-(4,0))
            ├── flags: newline
            ├── opening_loc: (1,0)-(2,0) = "%\n"
            ├── content_loc: (2,0)-(3,3) = "1\r\n;hi"
            ├── closing_loc: (3,3)-(4,0) = "\n"
            └── unescaped: "1hi"

Honestly, I think we should make a special case for % that uses \n as the delimiter, but I need to poke at the code a bit first. It seems like if the delimiter is \n, we should just scan to the first \n and call it a day (I think).

@kddnewton
Copy link
Collaborator

I don't think we can look for the next \n because we need to handle interpolation. But I think you're right that we need to have a special case. It seems like if (*breakpoint == lex_mode->as.string.terminator) { just falls down.

If I understand the problem correctly, then in order to implement this right we need to make a special case for \n or \r\n, because I think %\r\n foo \n would be accepted as well. So maybe we should set the terminator to be 0x80 or something to indicate that it should accept either kind of newline and then have a whole path through that checks this.

@tenderlove
Copy link
Member

It seems like if (*breakpoint == lex_mode->as.string.terminator) { just falls down.

Ya, that's what I'm toying with right now. It doesn't seem to remove the \r though.

@tenderlove tenderlove closed this Dec 11, 2024
@eileencodes eileencodes deleted the partially-fix-3230 branch December 13, 2024 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants