Skip to content

Conversation

@gerau
Copy link
Contributor

@gerau gerau commented Dec 18, 2025

No description provided.

@apoelstra
Copy link
Contributor

cc @canndrew may want to keep an eye on progress here

@gerau
Copy link
Contributor Author

gerau commented Jan 12, 2026

Right now there is a working parser using the chumsky crate which replicates the behavior of the pest parser in terms of building a correct parse tree -- it should produce the same Simplicity program. This implementation also fixes #79.

Error reporting is currently broken because we need to replace the logic of parse::ParseFromStr to return multiple errors or handle recoverable errors differently, and error recovery is proving to be more overwhelming than I estimated it would be.

The code will be refactored because some parts are only half-finished (such as adding Spanned for certain names) and there are better ways to use parser combinators. However, I want to show this progress before implementing error recovery.

@gerau
Copy link
Contributor Author

gerau commented Jan 12, 2026

cc @canndrew

}

#[test]
#[ignore]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1b1e751 It's nice to see that chumsky seems to be faster than pest here.

gerau added 3 commits January 14, 2026 16:32
The lexer parses incoming code into tokens, which makes it simpler to
process using `chumsky`.
This adds parsing via `chumsky` and some necessary changes for it to
work:

- Change `error::Span` type to use byte offset for position. Also add
the `line-index` crate to replace the `line_col` method which was
previously used with the `pest` parser.
- Replace the `PestParse` trait with the `ChumskyParse` trait and the
`ParseFromStr` implementation for it.
@gerau gerau force-pushed the simc/chumsky-migration branch from 1b1e751 to 1e7c61b Compare January 14, 2026 15:10
src/error.rs Outdated
let mut current_line = 1;
let mut current_col = 1;
let mut start_index = None;
if file.is_empty() && self.start == 0 && self.end == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just do file.get(self.start..self.end) here. That'll also handle indexes into the middle of multi-byte codepoints without panicking.

src/error.rs Outdated
debug_assert!(start.line <= end.line);
debug_assert!(start.line < end.line || start.col <= end.col);
Span::new(start, end)
Span::new(0, s.len() - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks, it should be without the - 1.

The previous Span struct defined the end as inclusive, but chumsky uses an exclusive end (and to_slice method also assumes that end is exclusive)

src/error.rs Outdated
})
.map_or(0, |ts| u32::from(ts) as usize);

let start_col = file[line_start_byte..self.span.start].chars().count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to count columns as being the number of utf8 codepoints? There's no good way to define "number of columns" in general for non-ascii text, but LSP defines it as the number of utf16 codepoints and that's the closest thing to a standard that I'm aware of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just checked and LSP now allows you to choose between utf{8,16,32} at your leisure. But it's moot anyway since this is just deciding how long an underline to print and that's going to depend on the terminal.

@canndrew
Copy link
Contributor

It's weird that the lexer is treating all our built-in macro/function/etc names as being keywords. I realize that's how the compiler currently works, so it's okay to land this PR as-is to keep the changes small. But obviously we'd want to eventually treat these as just being identifiers.

gerau added 3 commits January 16, 2026 18:38
Also adds new error types, because error messages for parsing stage so
errors would be more verbose. Also add `ErrorHandler` for collecting and
displaying errors
I hadn't removed ParseFromStr trait, because it would break everything,
so I added new trait to parse with new `ErrorHandler` and collect errors
into one place.

also some changes in parsing, error handling and error reporting
@gerau gerau force-pushed the simc/chumsky-migration branch from 3592b31 to c03241c Compare January 16, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants