-
Notifications
You must be signed in to change notification settings - Fork 242
Modelopt-windows documentation update #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: vipandya <vipandya@nvidia.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughDocumentation refactor expanding ONNX Runtime Execution Provider (EP) support on Windows beyond DirectML to include CUDA, TensorRT-RTX, and CPU options. Includes new 0.41 release notes, updated system requirements tables, revised installation guides, and refreshed support matrices across multiple documentation and example files. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/source/getting_started/windows/_installation_for_Windows.rst (1)
18-18: Clarify CUDA version requirements.The system requirements table specifies CUDA
>=12.0(Line 18), while the note mentionsCUDA-12.8+for Blackwell GPU support (Line 28). This may confuse users about the actual minimum CUDA version required.Consider clarifying whether:
- CUDA 12.0 is the general minimum, with 12.8+ needed only for Blackwell GPUs
- Or if the table should be updated to reflect 12.8+ as the universal minimum
Also applies to: 28-28
docs/source/deployment/2_onnxruntime.rst (1)
42-42: Fix double slash in URL.The URL contains a double slash before the closing:
python//should bepython/.🔗 Proposed fix
-- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python//>`_ in the ORT GenAI example repository for generating output sequences using a single function call. +- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python/>`_ in the ORT GenAI example repository for generating output sequences using a single function call.
🤖 Fix all issues with AI agents
In `@CHANGELOG-Windows.rst`:
- Line 15: Replace the misspelled link text "Perlexity" with the correct
spelling "Perplexity" in the CHANGELOG entry (the link label that currently
reads `Perlexity
<https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_);
ensure the link URL and formatting remain unchanged and only the visible label
is corrected.
In `@docs/source/getting_started/1_overview.rst`:
- Line 14: The Markdown link for ModelOpt-Windows uses a mixed tree/file path
causing a redirect; update the URL in the sentence that references
`ModelOpt-Windows` to use the correct GitHub blob path
`https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md`
or point to the directory
`https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows` so the
link resolves directly without a 301 redirect.
In `@docs/source/getting_started/windows/_installation_standalone.rst`:
- Line 51: There is a typo in the sentence that reads 'The default CUDA version
neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x.' — change "neeedd" to
"needed" so it reads 'The default CUDA version needed for *onnxruntime-gpu*
since v1.19.0 is 12.x.' Update the sentence where "ModelOpt-Windows installs
*onnxruntime-gpu*" is mentioned to correct that single word.
In `@docs/source/getting_started/windows/_installation_with_olive.rst`:
- Line 65: Replace the broken GitHub link target in the rst line that currently
reads "overview
<https://github.com/microsoft/Olive/blob/main/docs/architecture.md>"_ with the
Olive docs site URL (for example "overview
<https://microsoft.github.io/Olive/>"_), keeping the visible link text the same
so the sentence points to the actual Olive architecture documentation.
In `@docs/source/guides/0_support_matrix.rst`:
- Line 101: Replace the incorrect GitHub anchor URL in the README line
referencing the model support matrix (the text containing
"https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix")
with the official Model Optimizer Windows installation/support page URL that
documents supported platform requirements and GPU specifications; update the
link target in the phrase "details <...>" so it points directly to the official
Windows installation/support documentation rather than the GitHub examples
anchor.
In `@examples/windows/onnx_ptq/genai_llm/README.md`:
- Line 32: Update the README sentence to tighten wording and fix typos: change
"ONNX GenAI compatible" to "GenAI-compatible", use "precision" consistently
(e.g., "select precision" or "precision level"), and replace informal phrasing
like "choose from" with "select from" for clarity; apply the same edits to the
other occurrences mentioned (the paragraphs around the same phrasing at the
other locations) so all instances use "GenAI-compatible", consistent "precision"
wording, and "select from" phrasing for a uniform, clearer README.
- Around line 56-57: The README lists the `--dataset` supported values as "cnn,
pilevel" but the description calls it "pile-val"; pick one canonical value
(recommend "pile-val") and update the `--dataset` supported-values list and the
descriptive text to match, and also search for any validation or flag-parsing
logic that references `pilevel` and update it to the chosen canonical token so
the flag, description, and code all match (`--dataset`, cnn, pile-val).
🧹 Nitpick comments (5)
examples/windows/torch_onnx/diffusers/README.md (1)
95-109: Consider improving clarity and consistency.The Support Matrix section rename improves consistency with other README files, and the new footnotes provide valuable context. However, consider the following refinements:
Line 109: The note about "some known performance issues with NVFP4 model execution" is vague. Consider being more specific about what issues users might encounter or providing a reference to a tracking issue.
Lines 103, 105: Footnote formatting is inconsistent - these lines lack ending punctuation while line 107 includes a period.
♻️ Suggested improvements
-> *<sup>1.</sup> NVFP4 inference requires Blackwell GPUs for speedup.* +> *<sup>1.</sup> NVFP4 inference requires Blackwell GPUs for speedup.* -> *<sup>2.</sup> It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.* +> *<sup>2.</sup> It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.* -> *There are some known performance issues with NVFP4 model execution using TRTRTX EP. Stay tuned for further updates!* +> *NVFP4 model execution using TRTRTX EP has known performance limitations. Stay tuned for further updates!*CHANGELOG-Windows.rst (1)
14-14: Consider more descriptive link text.The link text "example script" could be more descriptive, similar to line 13's "example for GenAI LLMs". Consider something like "diffusion models quantization example" for consistency and clarity.
docs/source/getting_started/windows/_installation_standalone.rst (1)
72-76: Minor: Consider consistent capitalization in verification checklist.The verification item "Onnxruntime Package" uses different capitalization compared to other items like "Python Interpreter" and "Task Manager" (title case). Consider using "ONNX Runtime Package" for consistency.
docs/source/deployment/2_onnxruntime.rst (2)
9-16: Good addition of multi-EP support overview.The execution provider descriptions effectively communicate the options available to users. The guidance to select based on model, hardware, and deployment requirements is helpful.
Optional: Consider clarifying DirectML EP scope.
Line 12's description "Enables deployment on a wide range of GPUs" could be more specific about which GPU vendors (e.g., AMD, Intel, NVIDIA) or hardware generations are supported to help users make informed decisions.
32-34: Clarify that EP constraints are build-optimization specific, not inherent to ONNX portability.The note's core guidance—rebuild/re-export models for different EPs—is sound practice for ONNX Runtime GenAI. However, the explanation should be more precise: models are constrained to their export EP+precision combination because the GenAI model builder produces optimizations specific to that configuration, not because ONNX itself prevents cross-EP portability. While the underlying ONNX/ORT framework supports heterogeneous execution across EPs, GenAI's build process outputs precision- and EP-optimized artifacts that don't always transfer directly. Refine the note to clarify this is a practical build/optimization constraint (rebuild when targeting a different EP) rather than an inherent incompatibility, and optionally reference the model builder's documented EP/precision support matrix.
docs/source/getting_started/windows/_installation_standalone.rst
Outdated
Show resolved
Hide resolved
docs/source/getting_started/windows/_installation_with_olive.rst
Outdated
Show resolved
Hide resolved
| | `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. | | ||
| | `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix dataset value mismatch (pilevel vs pile-val).
The supported value list says pilevel, but the description refers to “pile-val”. Pick one canonical flag/value so users don’t pass an invalid option.
✅ Suggested edit
-| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
+| `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. | | |
| | `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. | | |
| | `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. | | |
| | `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. | |
🤖 Prompt for AI Agents
In `@examples/windows/onnx_ptq/genai_llm/README.md` around lines 56 - 57, The
README lists the `--dataset` supported values as "cnn, pilevel" but the
description calls it "pile-val"; pick one canonical value (recommend "pile-val")
and update the `--dataset` supported-values list and the descriptive text to
match, and also search for any validation or flag-parsing logic that references
`pilevel` and update it to the chosen canonical token so the flag, description,
and code all match (`--dataset`, cnn, pile-val).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishalpandya1990, understood. I've noted your decision.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #812 +/- ##
==========================================
- Coverage 74.17% 74.13% -0.05%
==========================================
Files 192 192
Lines 19246 19263 +17
==========================================
+ Hits 14276 14280 +4
- Misses 4970 4983 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: vipandya <vipandya@nvidia.com>
What does this PR do?
Documentation
Overview:
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.