Commit 5f6c8db
authored
fp8 awq examples (#2145)
SUMMARY:
Added examples for fp8 awq which now work after the AWQ generalization
TEST PLAN:
python $REPOS/llm-compressor/examples/awq/fp8_dynamic_llama_example.py
2>&1 | tee fp8_dynamic.log
python $REPOS/llm-compressor/examples/awq/fp8_block_llama_example.py
2>&1 | tee fp8_block.log
<details>
<summary>fp8_dynamic.log</summary>
/home/HDCharles/rhdev/lib/python3.11/site-packages/transformers/utils/hub.py:110:
FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be
removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:00<00:00, 7.60it/s]
Loading checkpoint shards: 50%|█████ | 2/4 [00:00<00:00, 6.70it/s]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:00<00:00, 6.82it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 8.95it/s]
2025-12-17T20:56:18.271169+0000 | reset | INFO - Compression lifecycle
reset
2025-12-17T20:56:18.271896+0000 | from_modifiers | INFO - Creating
recipe from modifiers
2025-12-17T20:56:18.292591+0000 | initialize | INFO - Compression
lifecycle initialized for 1 modifiers
2025-12-17T20:56:18.292874+0000 | IndependentPipeline | INFO - Inferred
`DataFreePipeline` for `QuantizationModifier`
Updating global scales: 0%| | 0/224 [00:00<?, ?it/s]
Updating global scales: 100%|██████████| 224/224 [00:00<00:00,
648394.82it/s]
Fusing global scales: 0it [00:00, ?it/s]
Fusing global scales: 647it [00:00, 511346.28it/s]
Calibrating weights: 0%| | 0/224 [00:00<?, ?it/s]
Calibrating weights: 40%|███▉ | 89/224 [00:00<00:00, 888.99it/s]
Calibrating weights: 100%|██████████| 224/224 [00:00<00:00, 1596.33it/s]
2025-12-17T20:56:53.594142+0000 | finalize | INFO - Compression
lifecycle finalized for 1 modifiers
2025-12-17T20:56:57.580914+0000 | post_process | WARNING - Optimized
model is not saved. To save, please provide`output_dir` as input arg.Ex.
`oneshot(..., output_dir=...)`
The attention mask and the pad token id were not set. As a consequence,
you may observe unexpected behavior. Please pass your input's
`attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because
pad token is same as eos token. As a consequence, you may observe
unexpected behavior. Please pass your input's `attention_mask` to obtain
reliable results.
========== SAMPLE GENERATION ==============
<|begin_of_text|>Hello my name is Sarah and I am a 30-year-old woman who
has been diagnosed with multiple sclerosis (MS). I am here to share my
story and to help raise awareness about this chronic and often
debilitating disease.
I was diagnosed with MS in 2010, when I was 25 years old. At the time, I
was working as a teacher and living a normal life. But suddenly, I
started experiencing strange symptoms such as numbness in my hands and
feet, blurred vision, and fatigue. I went
==========================================
2025-12-17T20:57:24.962901+0000 | get_model_compressor | INFO -
skip_sparsity_compression_stats set to True. Skipping sparsity
compression statistic calculations. No sparsity compressor will be
applied.
Compressing model: 0it [00:00, ?it/s]
Compressing model: 1it [00:00, 3.48it/s]
Compressing model: 5it [00:00, 14.97it/s]
Compressing model: 8it [00:00, 16.31it/s]
Compressing model: 12it [00:00, 22.42it/s]
Compressing model: 15it [00:00, 16.64it/s]
Compressing model: 18it [00:01, 15.70it/s]
Compressing model: 20it [00:01, 12.23it/s]
Compressing model: 25it [00:01, 18.27it/s]
Compressing model: 28it [00:01, 17.00it/s]
Compressing model: 33it [00:01, 22.56it/s]
Compressing model: 36it [00:02, 21.53it/s]
Compressing model: 41it [00:02, 24.09it/s]
Compressing model: 45it [00:02, 27.34it/s]
Compressing model: 49it [00:02, 14.52it/s]
Compressing model: 54it [00:02, 18.97it/s]
Compressing model: 57it [00:03, 18.91it/s]
Compressing model: 61it [00:03, 22.45it/s]
Compressing model: 65it [00:03, 22.69it/s]
Compressing model: 68it [00:03, 18.42it/s]
Compressing model: 71it [00:04, 13.96it/s]
Compressing model: 76it [00:04, 17.50it/s]
Compressing model: 81it [00:04, 22.10it/s]
Compressing model: 84it [00:04, 19.62it/s]
Compressing model: 89it [00:04, 24.35it/s]
Compressing model: 92it [00:04, 22.88it/s]
Compressing model: 96it [00:04, 23.23it/s]
Compressing model: 99it [00:05, 14.21it/s]
Compressing model: 103it [00:05, 17.90it/s]
Compressing model: 106it [00:05, 17.96it/s]
Compressing model: 110it [00:05, 21.75it/s]
Compressing model: 113it [00:06, 15.18it/s]
Compressing model: 116it [00:06, 12.39it/s]
Compressing model: 118it [00:06, 12.76it/s]
Compressing model: 121it [00:06, 15.29it/s]
Compressing model: 125it [00:06, 17.59it/s]
Compressing model: 129it [00:07, 21.70it/s]
Compressing model: 132it [00:07, 20.76it/s]
Compressing model: 137it [00:07, 25.70it/s]
Compressing model: 140it [00:07, 21.44it/s]
Compressing model: 143it [00:07, 14.45it/s]
Compressing model: 146it [00:08, 15.29it/s]
Compressing model: 150it [00:08, 19.35it/s]
Compressing model: 153it [00:08, 19.25it/s]
Compressing model: 158it [00:08, 24.56it/s]
Compressing model: 161it [00:08, 16.54it/s]
Compressing model: 166it [00:09, 17.15it/s]
Compressing model: 169it [00:09, 17.37it/s]
Compressing model: 174it [00:09, 20.55it/s]
Compressing model: 179it [00:09, 25.06it/s]
Compressing model: 182it [00:09, 21.49it/s]
Compressing model: 187it [00:09, 26.07it/s]
Compressing model: 191it [00:10, 25.59it/s]
Compressing model: 194it [00:10, 18.97it/s]
Compressing model: 197it [00:10, 14.35it/s]
Compressing model: 202it [00:10, 17.80it/s]
Compressing model: 206it [00:10, 21.30it/s]
Compressing model: 209it [00:11, 20.75it/s]
Compressing model: 212it [00:11, 12.23it/s]
Compressing model: 215it [00:11, 14.03it/s]
Compressing model: 218it [00:11, 14.99it/s]
Compressing model: 222it [00:12, 19.03it/s]
Compressing model: 224it [00:12, 18.36it/s]
</details>
<details>
<summary>fp8_block.log</summary>
/home/HDCharles/rhdev/lib/python3.11/site-packages/transformers/utils/hub.py:110:
FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be
removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00,
136.99it/s]
2025-12-17T20:57:53.946116+0000 | reset | INFO - Compression lifecycle
reset
2025-12-17T20:57:53.946848+0000 | from_modifiers | INFO - Creating
recipe from modifiers
2025-12-17T20:57:53.966319+0000 | initialize | INFO - Compression
lifecycle initialized for 1 modifiers
2025-12-17T20:57:53.966658+0000 | IndependentPipeline | INFO - Inferred
`DataFreePipeline` for `QuantizationModifier`
Updating global scales: 0%| | 0/224 [00:00<?, ?it/s]
Updating global scales: 100%|██████████| 224/224 [00:00<00:00,
637397.62it/s]
Fusing global scales: 0it [00:00, ?it/s]
Fusing global scales: 647it [00:00, 486415.97it/s]
Calibrating weights: 0%| | 0/224 [00:00<?, ?it/s]
Calibrating weights: 0%| | 1/224 [00:00<00:33, 6.66it/s]
Calibrating weights: 100%|██████████| 224/224 [00:00<00:00, 943.96it/s]
2025-12-17T20:58:00.043737+0000 | finalize | INFO - Compression
lifecycle finalized for 1 modifiers
2025-12-17T20:58:03.951940+0000 | post_process | WARNING - Optimized
model is not saved. To save, please provide`output_dir` as input arg.Ex.
`oneshot(..., output_dir=...)`
The attention mask and the pad token id were not set. As a consequence,
you may observe unexpected behavior. Please pass your input's
`attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because
pad token is same as eos token. As a consequence, you may observe
unexpected behavior. Please pass your input's `attention_mask` to obtain
reliable results.
========== SAMPLE GENERATION ==============
<|begin_of_text|>Hello my name is Kaitlyn and I am a 24-year-old
freelance writer and editor. I have a passion for storytelling and a
knack for crafting compelling narratives. I have a degree in English
Literature and have been writing professionally for over 5 years. I have
experience writing articles, blog posts, and website content for a
variety of clients, including businesses, non-profits, and individuals.
I am also skilled in editing and proofreading, and have worked with
clients to refine their writing and ensure it is error
==========================================
2025-12-17T20:58:34.036482+0000 | get_model_compressor | INFO -
skip_sparsity_compression_stats set to True. Skipping sparsity
compression statistic calculations. No sparsity compressor will be
applied.
Compressing model: 0it [00:00, ?it/s]
Compressing model: 5it [00:00, 34.55it/s]
Compressing model: 9it [00:00, 12.71it/s]
Compressing model: 11it [00:00, 12.76it/s]
Compressing model: 13it [00:00, 13.16it/s]
Compressing model: 17it [00:01, 18.77it/s]
Compressing model: 20it [00:01, 18.86it/s]
Compressing model: 23it [00:01, 20.26it/s]
Compressing model: 27it [00:01, 15.43it/s]
Compressing model: 29it [00:01, 12.88it/s]
Compressing model: 34it [00:02, 17.10it/s]
Compressing model: 39it [00:02, 22.20it/s]
Compressing model: 42it [00:02, 19.60it/s]
Compressing model: 47it [00:02, 24.68it/s]
Compressing model: 50it [00:02, 23.06it/s]
Compressing model: 55it [00:03, 18.85it/s]
Compressing model: 58it [00:03, 16.46it/s]
Compressing model: 62it [00:03, 18.39it/s]
Compressing model: 67it [00:03, 23.19it/s]
Compressing model: 70it [00:03, 20.28it/s]
Compressing model: 75it [00:03, 25.18it/s]
Compressing model: 78it [00:04, 18.17it/s]
Compressing model: 81it [00:04, 19.71it/s]
Compressing model: 84it [00:04, 14.67it/s]
Compressing model: 89it [00:04, 19.78it/s]
Compressing model: 92it [00:04, 19.63it/s]
Compressing model: 97it [00:05, 22.49it/s]
Compressing model: 102it [00:05, 26.98it/s]
Compressing model: 106it [00:05, 17.97it/s]
Compressing model: 110it [00:05, 17.31it/s]
Compressing model: 113it [00:06, 17.63it/s]
Compressing model: 118it [00:06, 20.70it/s]
Compressing model: 122it [00:06, 24.05it/s]
Compressing model: 125it [00:06, 22.60it/s]
Compressing model: 128it [00:06, 13.66it/s]
Compressing model: 131it [00:07, 14.68it/s]
Compressing model: 133it [00:07, 14.59it/s]
Compressing model: 138it [00:07, 20.29it/s]
Compressing model: 141it [00:07, 19.93it/s]
Compressing model: 146it [00:07, 22.96it/s]
Compressing model: 150it [00:07, 26.31it/s]
Compressing model: 153it [00:07, 24.04it/s]
Compressing model: 156it [00:08, 17.59it/s]
Compressing model: 159it [00:08, 14.86it/s]
Compressing model: 161it [00:08, 14.72it/s]
Compressing model: 166it [00:08, 20.20it/s]
Compressing model: 169it [00:08, 19.64it/s]
Compressing model: 173it [00:09, 23.47it/s]
Compressing model: 176it [00:09, 17.13it/s]
Compressing model: 179it [00:09, 18.76it/s]
Compressing model: 182it [00:09, 14.24it/s]
Compressing model: 187it [00:09, 19.21it/s]
Compressing model: 190it [00:10, 19.04it/s]
Compressing model: 195it [00:10, 22.02it/s]
Compressing model: 200it [00:10, 26.51it/s]
Compressing model: 204it [00:10, 18.33it/s]
Compressing model: 207it [00:10, 19.70it/s]
Compressing model: 210it [00:11, 14.98it/s]
Compressing model: 215it [00:11, 19.78it/s]
Compressing model: 218it [00:11, 19.47it/s]
Compressing model: 222it [00:11, 23.07it/s]
Compressing model: 224it [00:11, 19.04it/s]
<\details>
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>1 parent 3f25fd1 commit 5f6c8db
File tree
2 files changed
+162
-0
lines changed- examples/awq
2 files changed
+162
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
0 commit comments