Skip to content

Commit bc6434b

Browse files
[Bugfix] Fix tests, Add Model Free Readme (#2119)
## Purpose ## * Fix failing tests * Add readme for model free ptq source code ## Changes ## * Use `initialize` instead of `on_initialize` so that the `Modifier.initialized_` flag is set * Remove expectation that lm_head inputs are on any particular device ## Testing ## * Replicated test failures locally, confirmed that changes fix tests --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent 03e694a commit bc6434b

File tree

3 files changed

+10
-3
lines changed

3 files changed

+10
-3
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Quantizing models without a model definition
2+
3+
`model_free_ptq` provides a PTQ pathway for data-free schemes (such as FP8 Dynamic Per Token or FP8 Block). Specifically, this pathway removes the requirement for a model definition or the need to load the model through transformers. If you are interested in applying a data-free scheme, there are two key scenarios in which applying this pathway may make sense for your model:
4+
5+
1. The model does not have a model definition available through transformers. This may be the case for a brand new model which has not landed in transformers.
6+
2. The model is very large (such as Kimi K2 Thinking) and is running into issues with `oneshot`
7+
8+
9+
`model_free_ptq` works directly with the safetensors in the checkpoint to which observers are applied, thereby removing the requirement for a model definition or transformers.

tests/llmcompressor/modifiers/transform/test_correctness.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,7 @@ def test_apply_correctness(
4545
with torch.no_grad():
4646
true_output = model(**input)
4747

48-
modifier.on_initialize(state)
49-
modifier.on_start(state, None)
48+
modifier.initialize(state)
5049

5150
with torch.no_grad():
5251
output = model(**input)

tests/llmcompressor/utils/test_helpers.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,5 +174,4 @@ def hook(module, args):
174174
with disable_lm_head(model):
175175
input = {key: value.to("cuda") for key, value in model.dummy_inputs.items()}
176176
output = model(**input)
177-
assert lm_input_device == torch.device("cuda:0")
178177
assert output.logits.device == torch.device("meta")

0 commit comments

Comments
 (0)