[Bugfix] Fix tests, Add Model Free Readme (#2119)

kylesayrs · gemini-code-assist[bot] · web-flow · commit bc6434baf751 · 2025-12-12T16:22:23.000-05:00
## Purpose ##
* Fix failing tests
* Add readme for model free ptq source code

## Changes ##
* Use `initialize` instead of `on_initialize` so that the
`Modifier.initialized_` flag is set
* Remove expectation that lm_head inputs are on any particular device

## Testing ##
* Replicated test failures locally, confirmed that changes fix tests

---------

Signed-off-by: Kyle Sayers &lt;kylesayrs@gmail.com&gt;
Co-authored-by: gemini-code-assist[bot] &lt;176961590+gemini-code-assist[bot]@users.noreply.github.com&gt;
diff --git a/src/llmcompressor/entrypoints/model_free/README.md b/src/llmcompressor/entrypoints/model_free/README.md
@@ -0,0 +1,9 @@
+# Quantizing models without a model definition 
+
+`model_free_ptq` provides a PTQ pathway for data-free schemes (such as FP8 Dynamic Per Token or FP8 Block). Specifically, this pathway removes the requirement for a model definition or the need to load the model through transformers. If you are interested in applying a data-free scheme, there are two key scenarios in which applying this pathway may make sense for your model:
+
+1. The model does not have a model definition available through transformers. This may be the case for a brand new model which has not landed in transformers.
+2. The model is very large (such as Kimi K2 Thinking) and is running into issues with `oneshot`
+
+
+`model_free_ptq` works directly with the safetensors in the checkpoint to which observers are applied, thereby removing the requirement for a model definition or transformers.
diff --git a/tests/llmcompressor/modifiers/transform/test_correctness.py b/tests/llmcompressor/modifiers/transform/test_correctness.py
@@ -45,8 +45,7 @@ def test_apply_correctness(
     with torch.no_grad():
         true_output = model(**input)
 
-    modifier.on_initialize(state)
-    modifier.on_start(state, None)
+    modifier.initialize(state)
 
     with torch.no_grad():
         output = model(**input)
diff --git a/tests/llmcompressor/utils/test_helpers.py b/tests/llmcompressor/utils/test_helpers.py
@@ -174,5 +174,4 @@ def hook(module, args):
     with disable_lm_head(model):
         input = {key: value.to("cuda") for key, value in model.dummy_inputs.items()}
         output = model(**input)
-        assert lm_input_device == torch.device("cuda:0")
         assert output.logits.device == torch.device("meta")