Skip to content

Conversation

@joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Nov 10, 2025

benchmark: use typescript for import cjs benchmark

The original benchmark uses a not very realistic fixture (it has
a huge try-catch block that would throw on the first line and then
export at the end, hardly representative of real-world code).
Also, it measures the entire import including evaluation, not just
parsing. This updates the name to import-cjs to be more accurate,
and use the typescript.js as the fixture which has been reported
to be slow to import, leading users to use require() to work around
the peformance impact. It splits the measurement into two different
types: parsing CJS for the first time (where the overhead of
loading the lexer makes a difference) and parsing CJS after the
lexer has been loaded.

esm: use wasm version of cjs-module-lexer

The synchronous version has been available since 1.4.0.

Refs: #59913

                              confidence improvement accuracy (*)   (**)  (***)
esm/import-cjs.js type='cold'        ***     22.09 %       ±3.65% ±5.00% ±6.81%
esm/import-cjs.js type='warm'        ***     -3.93 %       ±1.69% ±2.29% ±3.06%

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/performance

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels Nov 10, 2025
@joyeecheung joyeecheung changed the title Cjs lexer init esm: use wasm version of cjs-module-lexer Nov 10, 2025
@joyeecheung
Copy link
Member Author

I am somewhat puzzled why we are using WASM for this, it seems there's a lot of hoops being jumped in lexer.js that could've been saved if we just parse it natively (e.g. copying strings into a buffer as UTF16), as far as I can tell @guybedford

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.54%. Comparing base (bd3a202) to head (19b3154).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #60663      +/-   ##
==========================================
- Coverage   88.55%   88.54%   -0.01%     
==========================================
  Files         703      703              
  Lines      208077   208082       +5     
  Branches    40083    40086       +3     
==========================================
- Hits       184254   184247       -7     
- Misses      15841    15844       +3     
- Partials     7982     7991       +9     
Files with missing lines Coverage Δ
lib/internal/modules/esm/translators.js 92.98% <100.00%> (+0.05%) ⬆️

... and 30 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@joyeecheung
Copy link
Member Author

Updated the benchmark again to split the measurement - one measures loading the lexer (which is now faster), one measures actually doing the parsing after the lexer is already loaded (which is now actually slower, indicating the WASM version is actually slower than the JS version).

The original benchmark uses a not very realistic fixture (it has
a huge try-catch block that would throw on the first line and then
export at the end, hardly representative of real-world code).
Also, it measures the entire import including evaluation, not just
parsing. This updates the name to import-cjs to be more accurate,
and use the typescript.js as the fixture which has been reported
to be slow to import, leading users to use require() to work around
the peformance impact. It splits the measurement into two different
types: parsing CJS for the first time (where the overhead of
loading the lexer makes a difference) and parsing CJS after the
lexer has been loaded.
The synchronous version has been available since 1.4.0.
@guybedford
Copy link
Contributor

This sounds correct to me, the Wasm gain is avoiding the warm up, and that was always the story for es module lexer this came from.

Doing a safe C++ or Rust port would be beneficial. @anonrig started some Rust work here previously in https://github.com/anonrig/commonjs-lexer.

The code would need to be carefully vetted for safety properties for a full C++ inclusion, but that could also very much be a good approach to follow.

@anonrig
Copy link
Member

anonrig commented Nov 11, 2025

This sounds correct to me, the Wasm gain is avoiding the warm up, and that was always the story for es module lexer this came from.

Doing a safe C++ or Rust port would be beneficial. @anonrig started some Rust work here previously in https://github.com/anonrig/commonjs-lexer.

The code would need to be carefully vetted for safety properties for a full C++ inclusion, but that could also very much be a good approach to follow.

I'm extremely close to convince @lemire to revive that work. Maybe we should do it sooner

@joyeecheung
Copy link
Member Author

FWIW I think if we want to rewrite it to native, the native API should take UTF16 (or +Latin1) for input and try not to assume the data comes in UTF8 to avoid the transcoding.

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Nov 11, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Nov 11, 2025
@nodejs-github-bot
Copy link
Collaborator

@joyeecheung joyeecheung added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. labels Nov 12, 2025
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Nov 12, 2025
@nodejs-github-bot
Copy link
Collaborator

Landed in 2388991...04a086a

nodejs-github-bot pushed a commit that referenced this pull request Nov 12, 2025
The original benchmark uses a not very realistic fixture (it has
a huge try-catch block that would throw on the first line and then
export at the end, hardly representative of real-world code).
Also, it measures the entire import including evaluation, not just
parsing. This updates the name to import-cjs to be more accurate,
and use the typescript.js as the fixture which has been reported
to be slow to import, leading users to use require() to work around
the peformance impact. It splits the measurement into two different
types: parsing CJS for the first time (where the overhead of
loading the lexer makes a difference) and parsing CJS after the
lexer has been loaded.

PR-URL: #60663
Refs: #59913
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
nodejs-github-bot pushed a commit that referenced this pull request Nov 12, 2025
The synchronous version has been available since 1.4.0.

PR-URL: #60663
Refs: #59913
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
targos pushed a commit that referenced this pull request Nov 27, 2025
The original benchmark uses a not very realistic fixture (it has
a huge try-catch block that would throw on the first line and then
export at the end, hardly representative of real-world code).
Also, it measures the entire import including evaluation, not just
parsing. This updates the name to import-cjs to be more accurate,
and use the typescript.js as the fixture which has been reported
to be slow to import, leading users to use require() to work around
the peformance impact. It splits the measurement into two different
types: parsing CJS for the first time (where the overhead of
loading the lexer makes a difference) and parsing CJS after the
lexer has been loaded.

PR-URL: #60663
Refs: #59913
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
targos pushed a commit that referenced this pull request Nov 27, 2025
The synchronous version has been available since 1.4.0.

PR-URL: #60663
Refs: #59913
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants