From 0aee443c6703522f595be47f85952eaa26f1ad40 Mon Sep 17 00:00:00 2001 From: Faradawn Yang <73060648+faradawn@users.noreply.github.com> Date: Mon, 2 Feb 2026 09:25:04 -0800 Subject: [PATCH 1/4] fix broken link of quickstart guide Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> --- README.rst | 2 +- examples/README.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 55be0e583f..a0b1f4c0f2 100644 --- a/README.rst +++ b/README.rst @@ -137,7 +137,7 @@ Flax for _ in range(10): loss, (param_grads, other_grads) = fwd_bwd_fn(params, other_variables, inp) -For a more comprehensive tutorial, check out our `Quickstart Notebook `_. +For a more comprehensive tutorial, check out our `JAX Integration Tutorial `_ or the `Getting Started Guide `_. .. overview-end-marker-do-not-remove diff --git a/examples/README.md b/examples/README.md index 004d1631f1..165271c2b6 100644 --- a/examples/README.md +++ b/examples/README.md @@ -23,8 +23,8 @@ Additionally, we offer [Jupyter notebook tutorials](https://github.com/NVIDIA/Tr - **FP8 Weight Caching**: Avoiding redundant FP8 casting during multiple gradient accumulation steps to improve efficiency. - [Introduction to FP8](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/fp8_primer.ipynb) - Overview of FP8 datatypes (E4M3, E5M2), mixed precision training, delayed scaling strategies, and code examples for FP8 configuration and usage. -- [TE Quickstart](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb) - - Introduction to TE, building a Transformer Layer using PyTorch, and instructions on integrating TE modules like Linear and LayerNorm. +- [TE JAX Integration Tutorial](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb) + - Introduction to integrating TE into an existing JAX model framework, building a Transformer Layer, and instructions on integrating TE modules like Linear and LayerNorm. - [Basic MNIST Example](https://github.com/NVIDIA/TransformerEngine/tree/main/examples/pytorch/mnist) # JAX From d21e875503eaeef4c292ef7797425588d8e48a36 Mon Sep 17 00:00:00 2001 From: Faradawn Yang <73060648+faradawn@users.noreply.github.com> Date: Tue, 10 Feb 2026 15:18:23 -0800 Subject: [PATCH 2/4] Update README.rst MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index a0b1f4c0f2..f674788c6f 100644 --- a/README.rst +++ b/README.rst @@ -137,7 +137,7 @@ Flax for _ in range(10): loss, (param_grads, other_grads) = fwd_bwd_fn(params, other_variables, inp) -For a more comprehensive tutorial, check out our `JAX Integration Tutorial `_ or the `Getting Started Guide `_. +For a more comprehensive tutorial, check out our `JAX Integration Tutorial `_ or the `Getting Started Guide `_. .. overview-end-marker-do-not-remove From 675756e153c996e646d5d82a14982942a7b1939f Mon Sep 17 00:00:00 2001 From: Faradawn Yang <73060648+faradawn@users.noreply.github.com> Date: Tue, 10 Feb 2026 15:22:50 -0800 Subject: [PATCH 3/4] moved getting started guide to first and moved jax out of pytorch section Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> --- README.rst | 2 +- examples/README.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index f674788c6f..344f8b12b4 100644 --- a/README.rst +++ b/README.rst @@ -137,7 +137,7 @@ Flax for _ in range(10): loss, (param_grads, other_grads) = fwd_bwd_fn(params, other_variables, inp) -For a more comprehensive tutorial, check out our `JAX Integration Tutorial `_ or the `Getting Started Guide `_. +For a more comprehensive tutorial, check out our the `Getting Started Guide `. .. overview-end-marker-do-not-remove diff --git a/examples/README.md b/examples/README.md index 165271c2b6..782dc42f58 100644 --- a/examples/README.md +++ b/examples/README.md @@ -23,8 +23,6 @@ Additionally, we offer [Jupyter notebook tutorials](https://github.com/NVIDIA/Tr - **FP8 Weight Caching**: Avoiding redundant FP8 casting during multiple gradient accumulation steps to improve efficiency. - [Introduction to FP8](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/fp8_primer.ipynb) - Overview of FP8 datatypes (E4M3, E5M2), mixed precision training, delayed scaling strategies, and code examples for FP8 configuration and usage. -- [TE JAX Integration Tutorial](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb) - - Introduction to integrating TE into an existing JAX model framework, building a Transformer Layer, and instructions on integrating TE modules like Linear and LayerNorm. - [Basic MNIST Example](https://github.com/NVIDIA/TransformerEngine/tree/main/examples/pytorch/mnist) # JAX @@ -34,7 +32,9 @@ Additionally, we offer [Jupyter notebook tutorials](https://github.com/NVIDIA/Tr - Model Parallelism: Divide a model across multiple GPUs for parallel training. - Multiprocessing with Model Parallelism: Multiprocessing for model parallelism, including multi-node support and hardware affinity setup. - [Basic MNIST Example](https://github.com/NVIDIA/TransformerEngine/tree/main/examples/jax/mnist) - +- [TE JAX Integration Tutorial](https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/te_jax_integration.ipynb) + - Introduction to integrating TE into an existing JAX model framework, building a Transformer Layer, and instructions on integrating TE modules like Linear and LayerNorm. + # Third party - [Hugging Face Accelerate + TE](https://github.com/huggingface/accelerate/tree/main/benchmarks/fp8/transformer_engine) - Scripts for training with Accelerate and TE. Supports single GPU, and multi-GPU via DDP, FSDP, and DeepSpeed ZeRO 1-3. From e32978903fd3c0a75e4a941aa38f32740a444403 Mon Sep 17 00:00:00 2001 From: Faradawn Yang <73060648+faradawn@users.noreply.github.com> Date: Tue, 10 Feb 2026 16:06:07 -0800 Subject: [PATCH 4/4] Update README.rst Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 344f8b12b4..5343aa371d 100644 --- a/README.rst +++ b/README.rst @@ -137,7 +137,7 @@ Flax for _ in range(10): loss, (param_grads, other_grads) = fwd_bwd_fn(params, other_variables, inp) -For a more comprehensive tutorial, check out our the `Getting Started Guide `. +For a more comprehensive tutorial, check out our `Getting Started Guide `_. .. overview-end-marker-do-not-remove