pest to chumsky migration #185

gerau · 2025-12-18T11:12:21Z

No description provided.

apoelstra · 2025-12-18T13:21:06Z

cc @canndrew may want to keep an eye on progress here

gerau · 2026-01-12T13:16:25Z

Right now there is a working parser using the chumsky crate which replicates the behavior of the pest parser in terms of building a correct parse tree -- it should produce the same Simplicity program. This implementation also fixes #79.

Error reporting is currently broken because we need to replace the logic of parse::ParseFromStr to return multiple errors or handle recoverable errors differently, and error recovery is proving to be more overwhelming than I estimated it would be.

The code will be refactored because some parts are only half-finished (such as adding Spanned for certain names) and there are better ways to use parser combinators. However, I want to show this progress before implementing error recovery.

gerau · 2026-01-12T13:16:48Z

cc @canndrew

uncomputable · 2026-01-12T15:19:49Z

src/lib.rs

    }

    #[test]
-    #[ignore]


1b1e751 It's nice to see that chumsky seems to be faster than pest here.

The lexer parses incoming code into tokens, which makes it simpler to process using `chumsky`.

This adds parsing via `chumsky` and some necessary changes for it to work: - Change `error::Span` type to use byte offset for position. Also add the `line-index` crate to replace the `line_col` method which was previously used with the `pest` parser. - Replace the `PestParse` trait with the `ChumskyParse` trait and the `ParseFromStr` implementation for it.

it's not slow anymore

canndrew · 2026-01-16T07:54:50Z

src/error.rs

-        let mut current_line = 1;
-        let mut current_col = 1;
-        let mut start_index = None;
+        if file.is_empty() && self.start == 0 && self.end == 0 {


You can just do file.get(self.start..self.end) here. That'll also handle indexes into the middle of multi-byte codepoints without panicking.

canndrew · 2026-01-16T07:57:03Z

src/error.rs

-        debug_assert!(start.line <= end.line);
-        debug_assert!(start.line < end.line || start.col <= end.col);
-        Span::new(start, end)
+        Span::new(0, s.len() - 1)


Yeah, thanks, it should be without the - 1.

The previous Span struct defined the end as inclusive, but chumsky uses an exclusive end (and to_slice method also assumes that end is exclusive)

canndrew · 2026-01-16T08:10:06Z

src/error.rs

+                    })
+                    .map_or(0, |ts| u32::from(ts) as usize);
+
+                let start_col = file[line_start_byte..self.span.start].chars().count();


Do we want to count columns as being the number of utf8 codepoints? There's no good way to define "number of columns" in general for non-ascii text, but LSP defines it as the number of utf16 codepoints and that's the closest thing to a standard that I'm aware of.

Actually I just checked and LSP now allows you to choose between utf{8,16,32} at your leisure. But it's moot anyway since this is just deciding how long an underline to print and that's going to depend on the terminal.

canndrew · 2026-01-16T08:19:09Z

It's weird that the lexer is treating all our built-in macro/function/etc names as being keywords. I realize that's how the compiler currently works, so it's okay to land this PR as-is to keep the changes small. But obviously we'd want to eventually treat these as just being identifiers.

Also adds new error types, because error messages for parsing stage so errors would be more verbose. Also add `ErrorHandler` for collecting and displaying errors

I hadn't removed ParseFromStr trait, because it would break everything, so I added new trait to parse with new `ErrorHandler` and collect errors into one place. also some changes in parsing, error handling and error reporting

gerau mentioned this pull request Dec 26, 2025

Refactor parsing and analysis for better tooling support #191

Open

gerau force-pushed the simc/chumsky-migration branch from 6db55db to 1b1e751 Compare January 12, 2026 13:01

uncomputable reviewed Jan 12, 2026

View reviewed changes

src/lib.rs

}

#[test]

#[ignore]

Copy link

Collaborator

uncomputable Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1b1e751 It's nice to see that chumsky seems to be faster than pest here.

gerau added 3 commits January 14, 2026 16:32

add lexer

a0473c3

The lexer parses incoming code into tokens, which makes it simpler to process using `chumsky`.

remove ignore above fuzz_slow_unit_1

1e7c61b

it's not slow anymore

gerau force-pushed the simc/chumsky-migration branch from 1b1e751 to 1e7c61b Compare January 14, 2026 15:10

canndrew reviewed Jan 16, 2026

View reviewed changes

gerau added 3 commits January 16, 2026 18:38

implemented default trait for errors in chumsky for RichError

6f7ef73

Also adds new error types, because error messages for parsing stage so errors would be more verbose. Also add `ErrorHandler` for collecting and displaying errors

add multiple error printing

c03241c

gerau force-pushed the simc/chumsky-migration branch from 3592b31 to c03241c Compare January 16, 2026 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pest to chumsky migration #185

pest to chumsky migration #185

gerau commented Dec 18, 2025

Uh oh!

apoelstra commented Dec 18, 2025

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

uncomputable Jan 12, 2026

Uh oh!

canndrew Jan 16, 2026

Uh oh!

canndrew Jan 16, 2026

Uh oh!

gerau Jan 16, 2026

Uh oh!

canndrew Jan 16, 2026

Uh oh!

canndrew Jan 16, 2026

Uh oh!

canndrew commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pest to chumsky migration #185

Are you sure you want to change the base?

pest to chumsky migration #185

Conversation

gerau commented Dec 18, 2025

Uh oh!

apoelstra commented Dec 18, 2025

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

uncomputable Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

canndrew Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

canndrew Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gerau Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

canndrew Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

canndrew Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

canndrew commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants