Skip to content

Conversation

@vishalpandya1990
Copy link
Contributor

@vishalpandya1990 vishalpandya1990 commented Jan 23, 2026

What does this PR do?

Documentation

Overview:

  • Update support matrix, changelog, deployment page, example readmes as per recent feature and model support on Windows side.

Testing

  • No testing, its just documentation change

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • New Features

    • Added ONNX Mixed Precision Weight-only quantization (INT4/INT8) support.
    • Introduced diffusion-model quantization on Windows.
    • Added new accuracy benchmarks (Perplexity and KL-Divergence).
    • Expanded deployment with multiple ONNX Runtime Execution Providers (CUDA, DirectML, TensorRT-RTX).
  • Bug Fixes

    • Fixed ONNX 1.19 compatibility issue with CuPy during INT4 AWQ quantization.
  • Documentation

    • Updated installation guides with system requirements and multiple backend options.
    • Reorganized deployment documentation with comprehensive execution provider guidance.
    • Expanded example workflows with improved setup instructions and support matrices.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: vipandya <vipandya@nvidia.com>
@vishalpandya1990 vishalpandya1990 requested a review from a team as a code owner January 23, 2026 13:03
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Documentation refactor expanding ONNX Runtime Execution Provider (EP) support on Windows beyond DirectML to include CUDA, TensorRT-RTX, and CPU options. Includes new 0.41 release notes, updated system requirements tables, revised installation guides, and refreshed support matrices across multiple documentation and example files.

Changes

Cohort / File(s) Summary
Release Notes
CHANGELOG-Windows.rst
Added new 0.41 (TBD) section with bug fixes for ONNX 1.19/CuPy compatibility and new features for mixed-precision/diffusion-model quantization and accuracy benchmarks. Updated 0.33 section with refined wording for LLM quantization and DirectML deployment references.
Deployment Docs
docs/source/deployment/2_onnxruntime.rst
Renamed section from DirectML to ONNX Runtime. Expanded overview to introduce multiple EPs (CUDA, DirectML, TensorRT-RTX, CPU) with guidance on selection. Added compatibility note clarifying EP-specific model requirements.
Getting Started—Overview
docs/source/getting_started/1_overview.rst
Updated Model Optimizer link and added TensorRT-RTX as additional backend option alongside DirectML in Windows section.
Getting Started—Installation
docs/source/getting_started/windows/_installation_for_Windows.rst
Added system requirements table covering OS, architecture, Python, CUDA, ONNX Runtime, driver, and GPU specs.
Getting Started—Standalone Setup
docs/source/getting_started/windows/_installation_standalone.rst
Added CUDA Toolkit and CuDNN prerequisites. Reframed installation focus to ONNX module. Introduced explicit EP options (onnxruntime-trt-rtx, onnxruntime-directml, onnxruntime-gpu) with default changed from DirectML to GPU (CUDA). Added guidance for EP switching and verification requiring exactly one EP installed.
Getting Started—Olive Installation
docs/source/getting_started/windows/_installation_with_olive.rst
Reworded intro to emphasize general model optimization. Expanded Prerequisites with explicit DirectML EP packages and example commands. Updated quantization pass reference link. Removed phi3-specific example references.
Support Matrix & Guides
docs/source/guides/0_support_matrix.rst, docs/source/guides/windows_guides/_ONNX_PTQ_guide.rst
Updated feature tables to replace ORT-DirectML with expanded EP coverage (ORT-DML, ORT-CUDA, ORT-TRT-RTX). Clarified EP definitions. Simplified Windows model section to reference external matrix. Updated deployment reference from DirectML-specific to ONNX Runtime guidance.
FAQs
docs/source/support/2_faqs.rst
Minor wording refinements; added caution about CuPy compatibility with CUDA toolkit.
Examples—Windows Root
examples/windows/README.md
Updated deployment reference from DirectML to ONNX Runtime. Replaced single support matrix reference with table listing model types and corresponding links.
Examples—GenAI LLM
examples/windows/onnx_ptq/genai_llm/README.md
Major restructuring with Table of Contents, expanded Overview (added TensorRT-RTX/CUDA backends), new Setup and dedicated Quantization sections. Replaced Command Line Arguments with comprehensive Arguments section including new options (\-\-output_path, \-\-use_zero_point, \-\-block_size, \-\-awqlite_alpha_step, etc.). Expanded example command with ONNX path and flags. Reorganized Evaluate and Deployment sections. Replaced support matrix with detailed table and GenAI note. Added Troubleshoot section.
Examples—SAM2 & Whisper
examples/windows/onnx_ptq/sam2/README.md, examples/windows/onnx_ptq/whisper/README.md
Added new Support Matrix sections in TOC and as dedicated sections with tables for INT8/FP8 modes and explanatory notes. No logic changes.
Examples—Diffusers
examples/windows/torch_onnx/diffusers/README.md
Renamed "Quantization Support Matrix" to "Support Matrix". Reformatted table. Replaced external link reference with inline NVFP4 performance notes and new footnotes on Blackwell GPU requirements and RAM recommendations for Flux.1.Dev.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Modelopt-windows documentation update' is vague and generic, using the non-descriptive term 'update' without conveying specific details about the primary changes. Consider a more specific title that highlights key changes, such as 'Add ONNX Runtime execution provider support and update Windows documentation' or 'Update Windows documentation for TensorRT-RTX and CUDA support'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/source/getting_started/windows/_installation_for_Windows.rst (1)

18-18: Clarify CUDA version requirements.

The system requirements table specifies CUDA >=12.0 (Line 18), while the note mentions CUDA-12.8+ for Blackwell GPU support (Line 28). This may confuse users about the actual minimum CUDA version required.

Consider clarifying whether:

  • CUDA 12.0 is the general minimum, with 12.8+ needed only for Blackwell GPUs
  • Or if the table should be updated to reflect 12.8+ as the universal minimum

Also applies to: 28-28

docs/source/deployment/2_onnxruntime.rst (1)

42-42: Fix double slash in URL.

The URL contains a double slash before the closing: python// should be python/.

🔗 Proposed fix
-- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python//>`_ in the ORT GenAI example repository for generating output sequences using a single function call.
+- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python/>`_ in the ORT GenAI example repository for generating output sequences using a single function call.
🤖 Fix all issues with AI agents
In `@CHANGELOG-Windows.rst`:
- Line 15: Replace the misspelled link text "Perlexity" with the correct
spelling "Perplexity" in the CHANGELOG entry (the link label that currently
reads `Perlexity
<https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_);
ensure the link URL and formatting remain unchanged and only the visible label
is corrected.

In `@docs/source/getting_started/1_overview.rst`:
- Line 14: The Markdown link for ModelOpt-Windows uses a mixed tree/file path
causing a redirect; update the URL in the sentence that references
`ModelOpt-Windows` to use the correct GitHub blob path
`https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md`
or point to the directory
`https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows` so the
link resolves directly without a 301 redirect.

In `@docs/source/getting_started/windows/_installation_standalone.rst`:
- Line 51: There is a typo in the sentence that reads 'The default CUDA version
neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x.' — change "neeedd" to
"needed" so it reads 'The default CUDA version needed for *onnxruntime-gpu*
since v1.19.0 is 12.x.' Update the sentence where "ModelOpt-Windows installs
*onnxruntime-gpu*" is mentioned to correct that single word.

In `@docs/source/getting_started/windows/_installation_with_olive.rst`:
- Line 65: Replace the broken GitHub link target in the rst line that currently
reads "overview
<https://github.com/microsoft/Olive/blob/main/docs/architecture.md>"_ with the
Olive docs site URL (for example "overview
<https://microsoft.github.io/Olive/>"_), keeping the visible link text the same
so the sentence points to the actual Olive architecture documentation.

In `@docs/source/guides/0_support_matrix.rst`:
- Line 101: Replace the incorrect GitHub anchor URL in the README line
referencing the model support matrix (the text containing
"https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix")
with the official Model Optimizer Windows installation/support page URL that
documents supported platform requirements and GPU specifications; update the
link target in the phrase "details <...>" so it points directly to the official
Windows installation/support documentation rather than the GitHub examples
anchor.

In `@examples/windows/onnx_ptq/genai_llm/README.md`:
- Line 32: Update the README sentence to tighten wording and fix typos: change
"ONNX GenAI compatible" to "GenAI-compatible", use "precision" consistently
(e.g., "select precision" or "precision level"), and replace informal phrasing
like "choose from" with "select from" for clarity; apply the same edits to the
other occurrences mentioned (the paragraphs around the same phrasing at the
other locations) so all instances use "GenAI-compatible", consistent "precision"
wording, and "select from" phrasing for a uniform, clearer README.
- Around line 56-57: The README lists the `--dataset` supported values as "cnn,
pilevel" but the description calls it "pile-val"; pick one canonical value
(recommend "pile-val") and update the `--dataset` supported-values list and the
descriptive text to match, and also search for any validation or flag-parsing
logic that references `pilevel` and update it to the chosen canonical token so
the flag, description, and code all match (`--dataset`, cnn, pile-val).
🧹 Nitpick comments (5)
examples/windows/torch_onnx/diffusers/README.md (1)

95-109: Consider improving clarity and consistency.

The Support Matrix section rename improves consistency with other README files, and the new footnotes provide valuable context. However, consider the following refinements:

  1. Line 109: The note about "some known performance issues with NVFP4 model execution" is vague. Consider being more specific about what issues users might encounter or providing a reference to a tracking issue.

  2. Lines 103, 105: Footnote formatting is inconsistent - these lines lack ending punctuation while line 107 includes a period.

♻️ Suggested improvements
-> *<sup>1.</sup> NVFP4 inference requires Blackwell GPUs for speedup.*
+> *<sup>1.</sup> NVFP4 inference requires Blackwell GPUs for speedup.*

-> *<sup>2.</sup> It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.*
+> *<sup>2.</sup> It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.*

-> *There are some known performance issues with NVFP4 model execution using TRTRTX EP. Stay tuned for further updates!*
+> *NVFP4 model execution using TRTRTX EP has known performance limitations. Stay tuned for further updates!*
CHANGELOG-Windows.rst (1)

14-14: Consider more descriptive link text.

The link text "example script" could be more descriptive, similar to line 13's "example for GenAI LLMs". Consider something like "diffusion models quantization example" for consistency and clarity.

docs/source/getting_started/windows/_installation_standalone.rst (1)

72-76: Minor: Consider consistent capitalization in verification checklist.

The verification item "Onnxruntime Package" uses different capitalization compared to other items like "Python Interpreter" and "Task Manager" (title case). Consider using "ONNX Runtime Package" for consistency.

docs/source/deployment/2_onnxruntime.rst (2)

9-16: Good addition of multi-EP support overview.

The execution provider descriptions effectively communicate the options available to users. The guidance to select based on model, hardware, and deployment requirements is helpful.

Optional: Consider clarifying DirectML EP scope.

Line 12's description "Enables deployment on a wide range of GPUs" could be more specific about which GPU vendors (e.g., AMD, Intel, NVIDIA) or hardware generations are supported to help users make informed decisions.


32-34: Clarify that EP constraints are build-optimization specific, not inherent to ONNX portability.

The note's core guidance—rebuild/re-export models for different EPs—is sound practice for ONNX Runtime GenAI. However, the explanation should be more precise: models are constrained to their export EP+precision combination because the GenAI model builder produces optimizations specific to that configuration, not because ONNX itself prevents cross-EP portability. While the underlying ONNX/ORT framework supports heterogeneous execution across EPs, GenAI's build process outputs precision- and EP-optimized artifacts that don't always transfer directly. Refine the note to clarify this is a practical build/optimization constraint (rebuild when targeting a different EP) rather than an inherent incompatibility, and optionally reference the model builder's documented EP/precision support matrix.

Comment on lines 56 to 57
| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix dataset value mismatch (pilevel vs pile-val).
The supported value list says pilevel, but the description refers to “pile-val”. Pick one canonical flag/value so users don’t pass an invalid option.

✅ Suggested edit
-| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
+| `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |
| `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. |
| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |
🤖 Prompt for AI Agents
In `@examples/windows/onnx_ptq/genai_llm/README.md` around lines 56 - 57, The
README lists the `--dataset` supported values as "cnn, pilevel" but the
description calls it "pile-val"; pick one canonical value (recommend "pile-val")
and update the `--dataset` supported-values list and the descriptive text to
match, and also search for any validation or flag-parsing logic that references
`pilevel` and update it to the chosen canonical token so the flag, description,
and code all match (`--dataset`, cnn, pile-val).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishalpandya1990, understood. I've noted your decision.

@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.13%. Comparing base (2a08622) to head (62b427a).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #812      +/-   ##
==========================================
- Coverage   74.17%   74.13%   -0.05%     
==========================================
  Files         192      192              
  Lines       19246    19263      +17     
==========================================
+ Hits        14276    14280       +4     
- Misses       4970     4983      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: vipandya <vipandya@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant