Skip to content

Conversation

@fereidani
Copy link

Hi, this PR improves lexer performance by ~5-10% when lexing the entire standard library. It specifically targets the string lexer, comment lexer, and frontmatter lexer.

  • For strings and comments, it replaces the previous logic with a new eat_past2 function that leverages memchr2.
  • For frontmatter, I eliminated the heap allocation from format! and rewrote the lexer using memchr-based scanning, which is roughly 4× faster.

I also applied a few minor optimizations in other areas.

I’ll send the benchmark repo in the next message. Here are the results on my x86_64 laptop (AMD 6650U):

Benchmarking tokenize_real_world/stdlib_all_files: Collecting 100 samples in esttokenize_real_world/stdlib_all_files
                        time:   [74.193 ms 74.224 ms 74.256 ms]
                        thrpt:  [423.74 MiB/s 423.92 MiB/s 424.10 MiB/s]
                 change:
                        time:   [−5.4046% −5.3465% −5.2907%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5862% +5.6484% +5.7134%]
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  2 (2.00%) high mild
  19 (19.00%) high severe

Benchmarking strip_shebang/valid_shebang: Collecting 100 samples in estimated 5.strip_shebang/valid_shebang
                        time:   [11.391 ns 11.401 ns 11.412 ns]
                        thrpt:  [1.7954 GiB/s 1.7971 GiB/s 1.7987 GiB/s]
                 change:
                        time:   [−8.1076% −7.8921% −7.6485%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2820% +8.5683% +8.8229%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking strip_shebang/no_shebang: Collecting 100 samples in estimated 5.000strip_shebang/no_shebang
                        time:   [4.8656 ns 4.8680 ns 4.8711 ns]
                        thrpt:  [4.2062 GiB/s 4.2089 GiB/s 4.2110 GiB/s]
                 change:
                        time:   [−0.1156% −0.0139% +0.0821%] (p = 0.78 > 0.05)
                        thrpt:  [−0.0821% +0.0139% +0.1157%]
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) high mild
  19 (19.00%) high severe

Benchmarking tokenize/simple_function: Collecting 100 samples in estimated 5.001tokenize/simple_function
                        time:   [288.86 ns 293.20 ns 297.41 ns]
                        thrpt:  [173.16 MiB/s 175.64 MiB/s 178.28 MiB/s]
                 change:
                        time:   [−2.2198% −0.8716% +0.3321%] (p = 0.20 > 0.05)
                        thrpt:  [−0.3310% +0.8793% +2.2702%]
                        No change in performance detected.
Benchmarking tokenize/strings: Collecting 100 samples in estimated 5.0032 s (4.6tokenize/strings        time:   [1.1175 µs 1.1379 µs 1.1573 µs]
                        thrpt:  [44.497 MiB/s 45.258 MiB/s 46.083 MiB/s]
                 change:
                        time:   [−14.860% −13.620% −12.359%] (p = 0.00 < 0.05)
                        thrpt:  [+14.101% +15.767% +17.454%]
                        Performance has improved.
Benchmarking tokenize/single_line_comments: Collecting 100 samples in estimated tokenize/single_line_comments
                        time:   [159.67 ns 161.52 ns 163.29 ns]
                        thrpt:  [315.39 MiB/s 318.84 MiB/s 322.53 MiB/s]
                 change:
                        time:   [+0.4110% +1.4523% +2.4709%] (p = 0.01 < 0.05)
                        thrpt:  [−2.4113% −1.4315% −0.4093%]
                        Change within noise threshold.
Benchmarking tokenize/multi_line_comments: Collecting 100 samples in estimated 5tokenize/multi_line_comments
                        time:   [220.54 ns 223.33 ns 225.99 ns]
                        thrpt:  [227.88 MiB/s 230.60 MiB/s 233.51 MiB/s]
                 change:
                        time:   [−7.7271% −6.7443% −5.7976%] (p = 0.00 < 0.05)
                        thrpt:  [+6.1544% +7.2320% +8.3742%]
                        Performance has improved.
Benchmarking tokenize/literals: Collecting 100 samples in estimated 5.0008 s (13tokenize/literals       time:   [399.63 ns 405.42 ns 410.94 ns]
                        thrpt:  [125.32 MiB/s 127.02 MiB/s 128.86 MiB/s]
                 change:
                        time:   [−1.4649% −0.3653% +0.7608%] (p = 0.54 > 0.05)
                        thrpt:  [−0.7550% +0.3666% +1.4867%]
                        No change in performance detected.

Benchmarking frontmatter/frontmatter_allowed: Collecting 100 samples in estimatefrontmatter/frontmatter_allowed
                        time:   [188.37 ns 189.51 ns 190.85 ns]
                        thrpt:  [264.85 MiB/s 266.71 MiB/s 268.33 MiB/s]
                 change:
                        time:   [−26.032% −25.300% −24.590%] (p = 0.00 < 0.05)
                        thrpt:  [+32.609% +33.869% +35.194%]
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  17 (17.00%) high severe

Benchmarking cursor_first/first: Collecting 100 samples in estimated 5.0000 s (5cursor_first/first      time:   [886.05 ps 886.23 ps 886.43 ps]
                        thrpt:  [42.026 GiB/s 42.035 GiB/s 42.044 GiB/s]
                 change:
                        time:   [−1.7088% −1.6398% −1.5732%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5984% +1.6671% +1.7385%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Benchmarking cursor_iteration/bump_all: Collecting 100 samples in estimated 5.00cursor_iteration/bump_all
                        time:   [891.48 ns 892.06 ns 892.78 ns]
                        thrpt:  [4.1727 GiB/s 4.1760 GiB/s 4.1788 GiB/s]
                 change:
                        time:   [−50.335% −50.211% −50.037%] (p = 0.00 < 0.05)
                        thrpt:  [+100.15% +100.85% +101.35%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe

Benchmarking cursor_eat_while/eat_while_alpha: Collecting 100 samples in estimatcursor_eat_while/eat_while_alpha
                        time:   [34.992 ns 34.999 ns 35.007 ns]
                        thrpt:  [1.7292 GiB/s 1.7297 GiB/s 1.7300 GiB/s]
                 change:
                        time:   [−1.0098% −0.8721% −0.7699%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7759% +0.8798% +1.0201%]
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking cursor_eat_until/eat_until_newline: Collecting 100 samples in estimcursor_eat_until/eat_until_newline
                        time:   [3.1314 ns 3.1323 ns 3.1332 ns]
                        thrpt:  [15.754 GiB/s 15.759 GiB/s 15.763 GiB/s]
                 change:
                        time:   [−0.4774% −0.3069% −0.1459%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1461% +0.3078% +0.4797%]
                        Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
  14 (14.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 5, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 5, 2025

r? @nnethercote

rustbot has assigned @nnethercote.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@fereidani
Copy link
Author

this is the benchmark library to track performance changes:
https://github.com/fereidani/rustc_lexer_benchmark

@matthiaskrgr
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Dec 5, 2025
Improve lexer performance by 5-10% overall, improve string lexer performance 15%
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 5, 2025
@rust-bors
Copy link

rust-bors bot commented Dec 5, 2025

☀️ Try build successful (CI)
Build commit: e0cf684 (e0cf684abe69de9dd471c12c65d8cf3e198875e5, parent: 66428d92bec337ed4785d695d0127276a482278c)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e0cf684): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.7% [0.0%, 1.7%] 18
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.1% [-0.2%, -0.1%] 2
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.0% [1.5%, 6.9%] 9
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.3% [-2.3%, -0.8%] 5
All ❌✅ (primary) - - 0

Cycles

Results (primary 3.1%, secondary 1.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.1% [2.3%, 4.9%] 4
Regressions ❌
(secondary)
3.6% [2.0%, 6.4%] 12
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.7% [-6.2%, -1.8%] 6
All ❌✅ (primary) 3.1% [2.3%, 4.9%] 4

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 470.249s -> 469.703s (-0.12%)
Artifact size: 386.85 MiB -> 388.89 MiB (0.53%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants