-
Notifications
You must be signed in to change notification settings - Fork 22
pest to chumsky migration #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
pest to chumsky migration #185
Conversation
|
cc @canndrew may want to keep an eye on progress here |
6db55db to
1b1e751
Compare
|
Right now there is a working parser using the Error reporting is currently broken because we need to replace the logic of The code will be refactored because some parts are only half-finished (such as adding |
|
cc @canndrew |
| } | ||
|
|
||
| #[test] | ||
| #[ignore] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1b1e751 It's nice to see that chumsky seems to be faster than pest here.
This adds parsing via `chumsky` and some necessary changes for it to work: - Change `error::Span` type to use byte offset for position. Also add the `line-index` crate to replace the `line_col` method which was previously used with the `pest` parser. - Replace the `PestParse` trait with the `ChumskyParse` trait and the `ParseFromStr` implementation for it.
it's not slow anymore
1b1e751 to
1e7c61b
Compare
src/error.rs
Outdated
| let mut current_line = 1; | ||
| let mut current_col = 1; | ||
| let mut start_index = None; | ||
| if file.is_empty() && self.start == 0 && self.end == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just do file.get(self.start..self.end) here. That'll also handle indexes into the middle of multi-byte codepoints without panicking.
src/error.rs
Outdated
| debug_assert!(start.line <= end.line); | ||
| debug_assert!(start.line < end.line || start.col <= end.col); | ||
| Span::new(start, end) | ||
| Span::new(0, s.len() - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thanks, it should be without the - 1.
The previous Span struct defined the end as inclusive, but chumsky uses an exclusive end (and to_slice method also assumes that end is exclusive)
src/error.rs
Outdated
| }) | ||
| .map_or(0, |ts| u32::from(ts) as usize); | ||
|
|
||
| let start_col = file[line_start_byte..self.span.start].chars().count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to count columns as being the number of utf8 codepoints? There's no good way to define "number of columns" in general for non-ascii text, but LSP defines it as the number of utf16 codepoints and that's the closest thing to a standard that I'm aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just checked and LSP now allows you to choose between utf{8,16,32} at your leisure. But it's moot anyway since this is just deciding how long an underline to print and that's going to depend on the terminal.
|
It's weird that the lexer is treating all our built-in macro/function/etc names as being keywords. I realize that's how the compiler currently works, so it's okay to land this PR as-is to keep the changes small. But obviously we'd want to eventually treat these as just being identifiers. |
Also adds new error types, because error messages for parsing stage so errors would be more verbose. Also add `ErrorHandler` for collecting and displaying errors
I hadn't removed ParseFromStr trait, because it would break everything, so I added new trait to parse with new `ErrorHandler` and collect errors into one place. also some changes in parsing, error handling and error reporting
3592b31 to
c03241c
Compare
No description provided.