Improve lexer performance by 5-10% overall, improve string lexer performance 15% #149689

fereidani · 2025-12-05T19:17:04Z

Hi, this PR improves lexer performance by ~5-10% when lexing the entire standard library. It specifically targets the string lexer, comment lexer, and frontmatter lexer.

For strings and comments, it replaces the previous logic with a new eat_past2 function that leverages memchr2.
For frontmatter, I eliminated the heap allocation from format! and rewrote the lexer using memchr-based scanning, which is roughly 4× faster.

I also applied a few minor optimizations in other areas.

I’ll send the benchmark repo in the next message. Here are the results on my x86_64 laptop (AMD 6650U):

Benchmarking tokenize_real_world/stdlib_all_files: Collecting 100 samples in esttokenize_real_world/stdlib_all_files
                        time:   [74.193 ms 74.224 ms 74.256 ms]
                        thrpt:  [423.74 MiB/s 423.92 MiB/s 424.10 MiB/s]
                 change:
                        time:   [−5.4046% −5.3465% −5.2907%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5862% +5.6484% +5.7134%]
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  2 (2.00%) high mild
  19 (19.00%) high severe

Benchmarking strip_shebang/valid_shebang: Collecting 100 samples in estimated 5.strip_shebang/valid_shebang
                        time:   [11.391 ns 11.401 ns 11.412 ns]
                        thrpt:  [1.7954 GiB/s 1.7971 GiB/s 1.7987 GiB/s]
                 change:
                        time:   [−8.1076% −7.8921% −7.6485%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2820% +8.5683% +8.8229%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking strip_shebang/no_shebang: Collecting 100 samples in estimated 5.000strip_shebang/no_shebang
                        time:   [4.8656 ns 4.8680 ns 4.8711 ns]
                        thrpt:  [4.2062 GiB/s 4.2089 GiB/s 4.2110 GiB/s]
                 change:
                        time:   [−0.1156% −0.0139% +0.0821%] (p = 0.78 > 0.05)
                        thrpt:  [−0.0821% +0.0139% +0.1157%]
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) high mild
  19 (19.00%) high severe

Benchmarking tokenize/simple_function: Collecting 100 samples in estimated 5.001tokenize/simple_function
                        time:   [288.86 ns 293.20 ns 297.41 ns]
                        thrpt:  [173.16 MiB/s 175.64 MiB/s 178.28 MiB/s]
                 change:
                        time:   [−2.2198% −0.8716% +0.3321%] (p = 0.20 > 0.05)
                        thrpt:  [−0.3310% +0.8793% +2.2702%]
                        No change in performance detected.
Benchmarking tokenize/strings: Collecting 100 samples in estimated 5.0032 s (4.6tokenize/strings        time:   [1.1175 µs 1.1379 µs 1.1573 µs]
                        thrpt:  [44.497 MiB/s 45.258 MiB/s 46.083 MiB/s]
                 change:
                        time:   [−14.860% −13.620% −12.359%] (p = 0.00 < 0.05)
                        thrpt:  [+14.101% +15.767% +17.454%]
                        Performance has improved.
Benchmarking tokenize/single_line_comments: Collecting 100 samples in estimated tokenize/single_line_comments
                        time:   [159.67 ns 161.52 ns 163.29 ns]
                        thrpt:  [315.39 MiB/s 318.84 MiB/s 322.53 MiB/s]
                 change:
                        time:   [+0.4110% +1.4523% +2.4709%] (p = 0.01 < 0.05)
                        thrpt:  [−2.4113% −1.4315% −0.4093%]
                        Change within noise threshold.
Benchmarking tokenize/multi_line_comments: Collecting 100 samples in estimated 5tokenize/multi_line_comments
                        time:   [220.54 ns 223.33 ns 225.99 ns]
                        thrpt:  [227.88 MiB/s 230.60 MiB/s 233.51 MiB/s]
                 change:
                        time:   [−7.7271% −6.7443% −5.7976%] (p = 0.00 < 0.05)
                        thrpt:  [+6.1544% +7.2320% +8.3742%]
                        Performance has improved.
Benchmarking tokenize/literals: Collecting 100 samples in estimated 5.0008 s (13tokenize/literals       time:   [399.63 ns 405.42 ns 410.94 ns]
                        thrpt:  [125.32 MiB/s 127.02 MiB/s 128.86 MiB/s]
                 change:
                        time:   [−1.4649% −0.3653% +0.7608%] (p = 0.54 > 0.05)
                        thrpt:  [−0.7550% +0.3666% +1.4867%]
                        No change in performance detected.

Benchmarking frontmatter/frontmatter_allowed: Collecting 100 samples in estimatefrontmatter/frontmatter_allowed
                        time:   [188.37 ns 189.51 ns 190.85 ns]
                        thrpt:  [264.85 MiB/s 266.71 MiB/s 268.33 MiB/s]
                 change:
                        time:   [−26.032% −25.300% −24.590%] (p = 0.00 < 0.05)
                        thrpt:  [+32.609% +33.869% +35.194%]
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  17 (17.00%) high severe

Benchmarking cursor_first/first: Collecting 100 samples in estimated 5.0000 s (5cursor_first/first      time:   [886.05 ps 886.23 ps 886.43 ps]
                        thrpt:  [42.026 GiB/s 42.035 GiB/s 42.044 GiB/s]
                 change:
                        time:   [−1.7088% −1.6398% −1.5732%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5984% +1.6671% +1.7385%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Benchmarking cursor_iteration/bump_all: Collecting 100 samples in estimated 5.00cursor_iteration/bump_all
                        time:   [891.48 ns 892.06 ns 892.78 ns]
                        thrpt:  [4.1727 GiB/s 4.1760 GiB/s 4.1788 GiB/s]
                 change:
                        time:   [−50.335% −50.211% −50.037%] (p = 0.00 < 0.05)
                        thrpt:  [+100.15% +100.85% +101.35%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe

Benchmarking cursor_eat_while/eat_while_alpha: Collecting 100 samples in estimatcursor_eat_while/eat_while_alpha
                        time:   [34.992 ns 34.999 ns 35.007 ns]
                        thrpt:  [1.7292 GiB/s 1.7297 GiB/s 1.7300 GiB/s]
                 change:
                        time:   [−1.0098% −0.8721% −0.7699%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7759% +0.8798% +1.0201%]
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking cursor_eat_until/eat_until_newline: Collecting 100 samples in estimcursor_eat_until/eat_until_newline
                        time:   [3.1314 ns 3.1323 ns 3.1332 ns]
                        thrpt:  [15.754 GiB/s 15.759 GiB/s 15.763 GiB/s]
                 change:
                        time:   [−0.4774% −0.3069% −0.1459%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1461% +0.3078% +0.4797%]
                        Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
  14 (14.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

rustbot · 2025-12-05T19:17:09Z

r? @nnethercote

rustbot has assigned @nnethercote.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

fereidani · 2025-12-05T19:32:23Z

this is the benchmark library to track performance changes:
https://github.com/fereidani/rustc_lexer_benchmark

matthiaskrgr · 2025-12-05T19:45:40Z

@bors try @rust-timer queue

Improve lexer performance by 5-10% overall, improve string lexer performance 15%

rust-bors · 2025-12-05T22:04:48Z

☀️ Try build successful (CI)
Build commit: e0cf684 (e0cf684abe69de9dd471c12c65d8cf3e198875e5, parent: 66428d92bec337ed4785d695d0127276a482278c)

rust-timer · 2025-12-05T23:26:05Z

Finished benchmarking commit (e0cf684): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.7%	[0.0%, 1.7%]	18
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.1%	[-0.2%, -0.1%]	2
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.0%	[1.5%, 6.9%]	9
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.3%	[-2.3%, -0.8%]	5
All ❌✅ (primary)	-	-	0

Cycles

Results (primary 3.1%, secondary 1.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.1%	[2.3%, 4.9%]	4
Regressions ❌ (secondary)	3.6%	[2.0%, 6.4%]	12
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.7%	[-6.2%, -1.8%]	6
All ❌✅ (primary)	3.1%	[2.3%, 4.9%]	4

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 470.249s -> 469.703s (-0.12%)
Artifact size: 386.85 MiB -> 388.89 MiB (0.53%)

rustbot assigned nnethercote Dec 5, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 5, 2025

This comment has been minimized.

Sign in to view

optimize lexer for faster lexing

ef2fa3b

fereidani force-pushed the main branch from f37ef09 to ef2fa3b Compare December 5, 2025 19:42

This comment has been minimized.

Sign in to view

rust-bors bot added a commit that referenced this pull request Dec 5, 2025

Auto merge of #149689 - fereidani:main, r=<try>

e0cf684

Improve lexer performance by 5-10% overall, improve string lexer performance 15%

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 5, 2025

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve lexer performance by 5-10% overall, improve string lexer performance 15% #149689

Improve lexer performance by 5-10% overall, improve string lexer performance 15% #149689

fereidani commented Dec 5, 2025

Uh oh!

rustbot commented Dec 5, 2025

Uh oh!

This comment has been minimized.

fereidani commented Dec 5, 2025

Uh oh!

matthiaskrgr commented Dec 5, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Dec 5, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Improve lexer performance by 5-10% overall, improve string lexer performance 15% #149689

Are you sure you want to change the base?

Improve lexer performance by 5-10% overall, improve string lexer performance 15% #149689

Conversation

fereidani commented Dec 5, 2025

Uh oh!

rustbot commented Dec 5, 2025

Uh oh!

This comment has been minimized.

fereidani commented Dec 5, 2025

Uh oh!

matthiaskrgr commented Dec 5, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Dec 5, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Dec 5, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants