Update README with link fix (#2136)

dsikka · web-flow · commit 1a483c013aa8 · 2025-12-16T13:32:19.000-05:00
- Fix example links

Signed-off-by: Dipika Sikka &lt;dipikasikka1@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -37,13 +37,13 @@ Big updates have landed in LLM Compressor! To get a more in-depth look, check ou
 
 Some of the exciting new features include:
 
-* **Batched Calibration Support**: LLM Compressor now supports calibration with batch sizes > 1. A new [`batch_size`](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/args/dataset_arguments.py#L70) argument has been added to the `dataset_arguments` enabling the option to improve quantization speed. Default `batch_size` is currently set to 1
-* **New Model-Free PTQ Pathway**: A new model-free PTQ pathway has been added to LLM Compressor, called [`model_free_ptq`](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/entrypoints/model_free/__init__.py#L36). This pathway allows you to quantize your model without the requirement of Hugging Face model definition and is especially useful in cases where `oneshot` may fail. This pathway is currently supported for data-free pathways only i.e FP8 quantization and was leveraged to quantize the [Mistral Large 3 model](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512). Additional [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/model_free_ptq) have been added illustrating how LLM Compressor can be used for Kimi K2
-* **Extended KV Cache and Attention Quantization Support**: LLM Compressor now supports attention quantization. KV Cache quantization, which previously only supported per-tensor scales, has been extended to support any quantization scheme including a new `per-head` quantization scheme. Support for these checkpoints is on-going in vLLM and scripts to get started have been added to the [experimental folder](https://github.com/vllm-project/llm-compressor/blob/main/experimental/llama3_attention.py)
+* **Batched Calibration Support**: LLM Compressor now supports calibration with batch sizes > 1. A new [`batch_size`](src/llmcompressor/args/dataset_arguments.py#L70) argument has been added to the `dataset_arguments` enabling the option to improve quantization speed. Default `batch_size` is currently set to 1
+* **New Model-Free PTQ Pathway**: A new model-free PTQ pathway has been added to LLM Compressor, called [`model_free_ptq`](src/llmcompressor/entrypoints/model_free/__init__.py#L36). This pathway allows you to quantize your model without the requirement of Hugging Face model definition and is especially useful in cases where `oneshot` may fail. This pathway is currently supported for data-free pathways only i.e FP8 quantization and was leveraged to quantize the [Mistral Large 3 model](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512). Additional [examples](examples/model_free_ptq) have been added illustrating how LLM Compressor can be used for Kimi K2
+* **Extended KV Cache and Attention Quantization Support**: LLM Compressor now supports attention quantization. KV Cache quantization, which previously only supported per-tensor scales, has been extended to support any quantization scheme including a new `per-head` quantization scheme. Support for these checkpoints is on-going in vLLM and scripts to get started have been added to the [experimental folder](experimental/attention)
 * **Generalized AWQ Support**: The AWQModifier has been updated to support quantization schemes beyond W4A16 (e.g W4AFp8). In particular, AWQ no longer constrains that the quantization config needs to have the same settings for `group_size`, `symmetric`, and `num_bits` for each config_group
 * **AutoRound Quantization Support**: Added [`AutoRoundModifier`](examples/autoround/llama3_example.py) for quantization using [AutoRound](https://aclanthology.org/2024.findings-emnlp.662.pdf), an advanced post-training algorithm that optimizes rounding and clipping ranges through sign-gradient descent. This approach combines the efficiency of post-training quantization with the adaptability of parameter tuning, delivering robust compression for large language models while maintaining strong performance
-* **Experimental MXFP4 Support**: Models can now be quantized using an [`MXFP4`](https://github.com/vllm-project/compressed-tensors/blob/main/src/compressed_tensors/quantization/quant_scheme.py#L208) pre-set scheme. Examples can be found under the [experimental folder](https://github.com/vllm-project/llm-compressor/blob/main/experimental/mxfp4/llama3_mxfp4.py). This pathway is still experimental as support and validation with vLLM is still a WIP. 
-* **R3 Transform Support**: LLM Compressor now supports applying transforms to attention in the style of SpinQuant's R3 rotation. Note: this feature is currently not yet supported in vLLM
+* **Experimental MXFP4 Support**: Models can now be quantized using an [`MXFP4`](https://github.com/vllm-project/compressed-tensors/blob/main/src/compressed_tensors/quantization/quant_scheme.py#L208) pre-set scheme. Examples can be found under the [experimental folder](experimental/mxfp4/llama3_mxfp4.py). This pathway is still experimental as support and validation with vLLM is still a WIP. 
+* **R3 Transform Support**: LLM Compressor now supports applying transforms to attention in the style of SpinQuant's R3 rotation. Note: this feature is currently not yet supported in vLLM. An example applying R3 can be found in the [experimental folder](experimental/attention/llama3_attention_r3_nvfp4.py)
 
 ### Supported Formats
 * Activation Quantization: W8A8 (int8 and fp8)