From 0aee443c6703522f595be47f85952eaa26f1ad40 Mon Sep 17 00:00:00 2001
From: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Date: Mon, 2 Feb 2026 09:25:04 -0800
Subject: [PATCH 1/4] fix broken link of quickstart guide

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
---
 README.rst         | 2 +-
 examples/README.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.rst b/README.rst
index 55be0e583f..a0b1f4c0f2 100644
--- a/README.rst
+++ b/README.rst
@@ -137,7 +137,7 @@ Flax
       for _ in range(10):
         loss, (param_grads, other_grads) = fwd_bwd_fn(params, other_variables, inp)
 
-For a more comprehensive tutorial, check out our `Quickstart Notebook <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb>`_.
+For a more comprehensive tutorial, check out our `JAX Integration Tutorial <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb>`_ or the `Getting Started Guide <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/getting_started/index.rst>`_.
 
 .. overview-end-marker-do-not-remove
 
diff --git a/examples/README.md b/examples/README.md
index 004d1631f1..165271c2b6 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -23,8 +23,8 @@ Additionally, we offer [Jupyter notebook tutorials](https://github.com/NVIDIA/Tr
   - **FP8 Weight Caching**: Avoiding redundant FP8 casting during multiple gradient accumulation steps to improve efficiency.
 - [Introduction to FP8](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/fp8_primer.ipynb)
   - Overview of FP8 datatypes (E4M3, E5M2), mixed precision training, delayed scaling strategies, and code examples for FP8 configuration and usage.
-- [TE Quickstart](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb)
-  - Introduction to TE, building a Transformer Layer using PyTorch, and instructions on integrating TE modules like Linear and LayerNorm.
+- [TE JAX Integration Tutorial](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb)
+  - Introduction to integrating TE into an existing JAX model framework, building a Transformer Layer, and instructions on integrating TE modules like Linear and LayerNorm.
 - [Basic MNIST Example](https://github.com/NVIDIA/TransformerEngine/tree/main/examples/pytorch/mnist)
 
 # JAX

From d21e875503eaeef4c292ef7797425588d8e48a36 Mon Sep 17 00:00:00 2001
From: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Date: Tue, 10 Feb 2026 15:18:23 -0800
Subject: [PATCH 2/4] Update README.rst
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
---
 README.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.rst b/README.rst
index a0b1f4c0f2..f674788c6f 100644
--- a/README.rst
+++ b/README.rst
@@ -137,7 +137,7 @@ Flax
       for _ in range(10):
         loss, (param_grads, other_grads) = fwd_bwd_fn(params, other_variables, inp)
 
-For a more comprehensive tutorial, check out our `JAX Integration Tutorial <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb>`_ or the `Getting Started Guide <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/getting_started/index.rst>`_.
+For a more comprehensive tutorial, check out our `JAX Integration Tutorial <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb>`_ or the `Getting Started Guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/getting_started.html>`_.
 
 .. overview-end-marker-do-not-remove
 

From 675756e153c996e646d5d82a14982942a7b1939f Mon Sep 17 00:00:00 2001
From: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Date: Tue, 10 Feb 2026 15:22:50 -0800
Subject: [PATCH 3/4] moved getting started guide to first and moved jax out of
 pytorch section

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
---
 README.rst         | 2 +-
 examples/README.md | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.rst b/README.rst
index f674788c6f..344f8b12b4 100644
--- a/README.rst
+++ b/README.rst
@@ -137,7 +137,7 @@ Flax
       for _ in range(10):
         loss, (param_grads, other_grads) = fwd_bwd_fn(params, other_variables, inp)
 
-For a more comprehensive tutorial, check out our `JAX Integration Tutorial <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb>`_ or the `Getting Started Guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/getting_started.html>`_.
+For a more comprehensive tutorial, check out our the `Getting Started Guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/getting_started.html>`.
 
 .. overview-end-marker-do-not-remove
 
diff --git a/examples/README.md b/examples/README.md
index 165271c2b6..782dc42f58 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -23,8 +23,6 @@ Additionally, we offer [Jupyter notebook tutorials](https://github.com/NVIDIA/Tr
   - **FP8 Weight Caching**: Avoiding redundant FP8 casting during multiple gradient accumulation steps to improve efficiency.
 - [Introduction to FP8](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/fp8_primer.ipynb)
   - Overview of FP8 datatypes (E4M3, E5M2), mixed precision training, delayed scaling strategies, and code examples for FP8 configuration and usage.
-- [TE JAX Integration Tutorial](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb)
-  - Introduction to integrating TE into an existing JAX model framework, building a Transformer Layer, and instructions on integrating TE modules like Linear and LayerNorm.
 - [Basic MNIST Example](https://github.com/NVIDIA/TransformerEngine/tree/main/examples/pytorch/mnist)
 
 # JAX
@@ -34,7 +32,9 @@ Additionally, we offer [Jupyter notebook tutorials](https://github.com/NVIDIA/Tr
   - Model Parallelism: Divide a model across multiple GPUs for parallel training.
   - Multiprocessing with Model Parallelism: Multiprocessing for model parallelism, including multi-node support and hardware affinity setup.
 - [Basic MNIST Example](https://github.com/NVIDIA/TransformerEngine/tree/main/examples/jax/mnist)
- 
+- [TE JAX Integration Tutorial](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb)
+  - Introduction to integrating TE into an existing JAX model framework, building a Transformer Layer, and instructions on integrating TE modules like Linear and LayerNorm.
+
 # Third party
 - [Hugging Face Accelerate + TE](https://github.com/huggingface/accelerate/tree/main/benchmarks/fp8/transformer_engine)
   - Scripts for training with Accelerate and TE. Supports single GPU, and multi-GPU via DDP, FSDP, and DeepSpeed ZeRO 1-3.

From e32978903fd3c0a75e4a941aa38f32740a444403 Mon Sep 17 00:00:00 2001
From: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Date: Tue, 10 Feb 2026 16:06:07 -0800
Subject: [PATCH 4/4] Update README.rst

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
---
 README.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.rst b/README.rst
index 344f8b12b4..5343aa371d 100644
--- a/README.rst
+++ b/README.rst
@@ -137,7 +137,7 @@ Flax
       for _ in range(10):
         loss, (param_grads, other_grads) = fwd_bwd_fn(params, other_variables, inp)
 
-For a more comprehensive tutorial, check out our the `Getting Started Guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/getting_started.html>`.
+For a more comprehensive tutorial, check out our `Getting Started Guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/getting_started.html>`_.
 
 .. overview-end-marker-do-not-remove