You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/quantization/contribute.md
+11-7Lines changed: 11 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,26 +46,30 @@ Some quantization methods may require "pre-quantizing" the model through data ca
46
46
47
47
## Create new HFQuantizer class
48
48
49
+
0. The best starting point would be to have a look at another quantization method such as Finegrained Fp8. You will have to update or create three files in total: the [config file](https://github.com/huggingface/transformers/blob/main/src/transformers/utils/quantization_config.py), the [integration file](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/finegrained_fp8.py) and the [quantizer file](https://github.com/huggingface/transformers/blob/main/src/transformers/quantizers/quantizer_finegrained_fp8.py).
50
+
49
51
1. Create a new quantization config class inside [src/transformers/utils/quantization_config.py](https://github.com/huggingface/transformers/blob/abbffc4525566a48a9733639797c812301218b83/src/transformers/utils/quantization_config.py). Add the new quantization config to the [_import_structure](https://github.com/huggingface/transformers/blob/abbffc4525566a48a9733639797c812301218b83/src/transformers/__init__.py#L1088) inside Transformers' [src/transformers/__init__.py](https://github.com/huggingface/transformers/blob/abbffc4525566a48a9733639797c812301218b83/src/transformers/__init__.py) file.
50
52
51
53
2. Create a new file inside [src/transformers/quantizers/](https://github.com/huggingface/transformers/tree/abbffc4525566a48a9733639797c812301218b83/src/transformers/quantizers) named `quantizer_your_method.py`, and make it inherit from [`~quantizers.HfQuantizer]. Make sure to add the new quantizer and quantization config in the quantization auto-mapping in [src/transformers/quantizers/auto.py](https://github.com/huggingface/transformers/blob/abbffc4525566a48a9733639797c812301218b83/src/transformers/quantizers/auto.py).
52
54
53
-
3. Define the following class attributes andproperty methods for your quantization method.
55
+
3. Define the following class attributes andproperty methods for your quantization method:
54
56
55
57
-`requires_calibration`: Whether the quantization method requires a data calibration process. If set to `True`, you can only support inference (with quantized weights) andnot inference and quantization.
56
-
-`required_packages`: A list of strings of the required packages to use the quantized weights. You might need to define some new utility methods such as`is_auto_awq_available`in [transformers/src/utils/import_utils.py](https://github.com/huggingface/transformers/blob/abbffc4525566a48a9733639797c812301218b83/src/transformers/utils/import_utils.py).
57
-
-`requires_parameters_quantization`: Only required if your quantization method requires extra attention to the underlying [nn.Parameter](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html) object. For example, bitsandbytes uses [`~bitsandbytes.nn.Params4bit`] and [`~bitsandbytes.nn.Int8Params`], which requires some extra attention when quantizing the model. Most of the recent quantization method packs int2 and int4 weights inside [torch.uint8](https://pytorch.org/docs/stable/tensors.html) weights, so this flag should not be really required (set to `False` by default).
58
58
-`is_serializable`: A property method to determine whether the method is serializable ornot.
59
59
-`is_trainable`: A property method to determine whether you can fine-tune models on top of the quantization method (withor without PEFT approaches).
60
60
61
61
4. Write the `validate_environment`and`update_dtype` methods. These methods are called before creating the quantized model to ensure users use the right configuration. Refer to other quantizers for an example of it is implemented.
62
62
63
63
5. Write the `_process_model_before_weight_loading` method. In Transformers, the quantized models are initialized first on the `"meta"` device before loading the weights. This means the `_process_model_before_weight_loading` method takes care of manipulating the model skeleton to replace some modules ([nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)) with the target modules (quantization modules).
64
64
65
-
You can define module replacement logic orany other utility method by creating a new filein [transformers/src/integrations/](https://github.com/huggingface/transformers/tree/abbffc4525566a48a9733639797c812301218b83/src/transformers/integrations) and exposing the relevant methods in that folder's `__init__.py` file. The best starting point would be to have a look at another quantization method such as [quantizer_awq.py](https://github.com/huggingface/transformers/blob/abbffc4525566a48a9733639797c812301218b83/src/transformers/quantizers/quantizer_awq.py).
65
+
You can define module replacement logic orany other utility method by creating a new filein [transformers/src/integrations/](https://github.com/huggingface/transformers/tree/abbffc4525566a48a9733639797c812301218b83/src/transformers/integrations) and exposing the relevant methods in that folder's `__init__.py` file.
66
+
67
+
6. Add the `get_quantize_ops` method to the quantizer classif the quantization supports quantizing on the fly. In transformers, we materialize each tensor and apply a sequence of different operations on it. In our case, the quantization operation happens at the end. You need to create a `XXXQuantize`, a subclass of `ConversionOps`, and add a `convert` method. In the `convert` method, you need to quantize the weights andreturn a dictionary of quantized params.
68
+
69
+
7. Add the `get_weight_conversions` method to the quantizer classif the quantization supports loading pre-quantized weights. In transformers, we can collect multiple tensors and apply operations on them. This is particularly useful when we have tensors in the checkpoint that require to be regrouped to re-create the quantized tensors.
66
70
67
-
6. Write the `_process_model_after_weight_loading` method. This method enables implementing additional features that require manipulating the model after loading the weights.
71
+
8. Write the `_process_model_after_weight_loading` methodif needed. This method enables implementing additional features that require manipulating the model after loading the weights.
68
72
69
-
7. Document everything! Make sure your quantization method is documented by adding a new file under `docs/source/en/quantization`.
73
+
9. Document everything! Make sure your quantization method is documented by adding a new file under `docs/source/en/quantization`.
70
74
71
-
8. You should add tests by adding the package in our nightly Dockerfile inside `docker/transformers-quantization-latest-gpu`and then adding a new test filein`tests/quantization/xxx`. Feel free to check out existing quantization methods to see how it is implemented.
75
+
10. You should add tests by adding the package in our nightly Dockerfile inside `docker/transformers-quantization-latest-gpu`and then adding a new test filein`tests/quantization/xxx`. Feel free to check out existing quantization methods to see how it is implemented.
Copy file name to clipboardExpand all lines: src/transformers/integrations/bitsandbytes.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ def convert(
44
44
we need to store some parameters to create the quantized weight. For example, bnb requires 6 values that are stored in the checkpoint to recover the quantized weight. So we store them in a dict that it stored in hf_quantizer for now as we can't save it in the op since we create an op per tensor.
45
45
"""
46
46
value=list(input_dict.values())[0]
47
-
value=value[0]ifisinstance(value, list) elsevalue
47
+
value=value[0]
48
48
49
49
# update param name to get the weights instead of the quantized stats
0 commit comments