Releases: foundation-model-stack/fms-model-optimizer
Releases · foundation-model-stack/fms-model-optimizer
v0.8.0
What's Changed
- fix: Fix build and check packages flow by @tharapalanivel in #192
- chore: upgrade torch to allow 2.9 by @Ssukriti in #190
- fix: Fixes for paged fp8 attention with chunked prefill by @ani300 in #191
New Contributors
Full Changelog: v0.7.0...v0.8.0
v0.7.0
What's Changed
- feat: Quantization Refactor by @BrandonGroth in #169
- fix: remove custom scaled bmm op on cpu and fix fp8 test by @andrea-fasoli in #187
- chore(deps): Update torch requirement from <2.8,>=2.2.0 to >=2.2.0,<2.9 by @dependabot[bot] in #177
- chore(deps): Update accelerate requirement from !=0.34,<1.10,>=0.20.3 to >=0.20.3,!=0.34,<1.11 by @dependabot[bot] in #179
- chore(deps): Update transformers requirement from <4.56,>=4.45 to >=4.45,<4.58 by @dependabot[bot] in #186
Full Changelog: v0.6.0...v0.7.0
v0.6.0
What's Changed
- fix: enabling block-by-block evaluation for granite-3.x-models by @bayo-ibm in #165
- fix: pylint false alarm on libdevice functions by @chichun-charlie-liu in #166
- fix: Add version limits for torchao, ensure compat with 0.12 + AIU by @ani300 in #168
- feat: Change paged FP8 prefill back to regular attention by @ani300 in #171
- feat: FP8 requested changes by @ani300 in #173
- chore(deps): Update triton requirement from <3.4,>=3.0 to >=3.0,<3.5 by @dependabot[bot] in #170
- chore(deps): Update transformers requirement from <4.54,>=4.45 to >=4.45,<4.56 by @dependabot[bot] in #172
- fix: FP8 TP fixes by @ani300 in #176
Full Changelog: v0.5.0...v0.6.0
v0.5.0
What's Changed
- chore(deps): Update transformers requirement from <4.53,>=4.45 to >=4.45,<4.54 by @dependabot[bot] in #151
- fix: Mark FP8 scale to have the same batch size as input by @ani300 in #163
- chore: Update torch requirement from <2.6,>=2.2.0 to >=2.2.0,<2.8 by @dependabot[bot] in #100
- feat: Add QmaxDynamic to allow unify Qmax , Qminmax, pertokenmax by @iqbal-saraf in #139
- feat: GPTQv2 enablement for fms_mo by @bayo-ibm in #138
- chore(deps): Update accelerate requirement from !=0.34,<1.9,>=0.20.3 to >=0.20.3,!=0.34,<1.10 by @dependabot[bot] in #164
New Contributors
Full Changelog: v0.4.1...v0.5.0
v0.4.1
What's Changed
Full Changelog: v0.4.0...v0.4.1
v0.4.0
What's Changed
- feat: add guards to sawb recomputation by @andrea-fasoli in #131
- build: Move torchvision to an optional dependency by @BrandonGroth in #144
- fix: feat: fix for new transformers (>4.48) and new QLinear for INT8 training with HW emulation by @chichun-charlie-liu in #141
- chore(deps): Update transformers requirement from <4.52,>=4.45 to >=4.45,<4.53 by @dependabot[bot] in #127
- build: Move triton to an optional dependency by @BrandonGroth in #146
- chore(deps): Update accelerate requirement from !=0.34,<1.7,>=0.20.3 to >=0.20.3,!=0.34,<1.9 by @dependabot[bot] in #143
- build: Make non-essential dependencies optional by @BrandonGroth in #147
- fix: fix available_packages by @chichun-charlie-liu in #153
- fix: Saved qconfig recipe being overwritten with defaults by @BrandonGroth in #152
- fix: Remove gptqmodel Warning on startup by @BrandonGroth in #156
- fix: Remove llmcompressor oneshot import deprecation warning by @BrandonGroth in #157
- feat: addons for FP8 attention bmm, paged attention, and linear in FMS by @ani300 in #154
- feat: addons for FP8 attention bmm and linear in FMS by @andrea-fasoli in #149
- feat: add QA and MaskedLM task for FP8 encoder instantiation by @andrea-fasoli in #148
- feat: AIU sim for FP8 (DL8/DL16) added to triton kernel by @chichun-charlie-liu in #159
- fix: qkvsync bug fix by @chichun-charlie-liu in #161
- chore(deps): Update datasets requirement from <4.0,>=3.0.0 to >=3.0.0,<5.0 by @dependabot[bot] in #160
New Contributors
Full Changelog: v0.3.0...v0.4.0
v0.3.0
Highlights
- AIU support: new example added for model conversion for AIU (see
examples/AIU_CONVERSIONfolder) and new add-ons forfms - triton kernel for specialized matmul HW simulation and verification
- microscaling format support by integrating functionalities from microsoft
mxpackage (seeexamples/MXfor more details) - other upgrades and improvements:
qmodel_preptracing speed improvement, e.g., for Llama3-70B the time has been reduced from ~20min to ~2min now- Upgrade base dependencies to
torch 2.5,python 3.12and migrated fromauto_gptqtogptqmodel
What's Changed
- Add spell checker to alleviate spelling errors by @hickeyma in #32
- chore: Remove Makefile by @hickeyma in #37
- chore: Add GitHub badges to project README by @hickeyma in #38
- ci: Replace coverage with pytest-cov plugin for code coverage by @hickeyma in #39
- fix: Update the quantization notebook tutorial by @hickeyma in #41
- Small updates to the docs by @hickeyma in #40
- fix: Error in quantization notebook tutorial when retrieving image by @hickeyma in #42
- OptArguments by @tharapalanivel in #43
- Add logging and tests for run_quant.py by @tharapalanivel in #44
- Aiu addons by @andrea-fasoli in #46
- fix Qbmm tracing issue by @chichun-charlie-liu in #47
- Add build backend by @tharapalanivel in #50
- Add mypy static checker tool by @hickeyma in #49
- [utils] check if folder exists before attempting to create directory by @kcirred in #52
- fms_mo docker image by @tharapalanivel in #48
- Lint for fx by @tharapalanivel in #54
- Update accelerate requirement from !=0.34,<1.1,>=0.20.3 to >=0.20.3,!=0.34,<1.4 by @dependabot in #56
- Update transformers requirement from <4.48,>=4.45 to >=4.45,<4.49 by @dependabot in #55
- Add FP/INT triton kernels and unit tests, also update QAT example by @chichun-charlie-liu in #58
- ci: Add workflow for PR labels by @tharapalanivel in #57
- fix: Fix labelpr workflow by @tharapalanivel in #63
- feat: added granite support; fixed adapters to ignore model_config by @JRosenkranz in #53
- fix: Triton kernel bug fix by @chichun-charlie-liu in #61
- feat: Support for int8 smoothquant by @andrea-fasoli in #65
- test: Unit test int8 by @andrea-fasoli in #62
- fix: bug fix and minor changes on triton kernel: by @chichun-charlie-liu in #69
- fix: handle linear_type callable at int8 linear instantiation by @andrea-fasoli in #68
- fix: multiple bug fixes: by @chichun-charlie-liu in #70
- feat: improve transformers tracing for last layers by @chichun-charlie-liu in #72
- fix: in DQ example, when nbits_kvcache=8, context manager will detect incorrect frame by @chichun-charlie-liu in #74
- fix: Fix build and check packages flow by @tharapalanivel in #79
- fix: make triton optional for systems without GPUs by @chichun-charlie-liu in #78
- fix: a bug that prevented dynamo from working with PT 2.5.1 has been fixed by @chichun-charlie-liu in #81
- test: int8 unit tests for aiu add-ons by @iqbal-saraf in #77
- feat: confirmed py3.12 with pt2.5.1 by @chichun-charlie-liu in #83
- fix: finish missed items for upgrading to python 3.12 by @chichun-charlie-liu in #84
- fix: minor fix from last PR regarding py3.12 upgrades by @chichun-charlie-liu in #85
- feat: Update accelerate requirement from !=0.34,<1.4,>=0.20.3 to >=0.20.3,!=0.34,<1.7 by @dependabot in #86
- feat: Update transformers requirement from <4.49,>=4.45 to >=4.45,<4.51 by @dependabot in #80
- feat: triton matmul kernel adjusted, now is closer to HW behavior by @chichun-charlie-liu in #82
- fix: fix QBmm detection and default behavior by @chichun-charlie-liu in #87
- feat: expand detection of data types in model size estimation by @andrea-fasoli in #88
- fix: Fix push to pypi flow by @tharapalanivel in #90
- feat: int8 granite addon by @andrea-fasoli in #92
- feat: INT8 LLM TP>1 enablement by @andrea-fasoli in #94
- dependencies: Update transformers requirement from <4.51,>=4.45 to >=4.45,<4.52 by @dependabot in #91
- dependencies: Update triton requirement from <3.2,>=3.0 to >=3.0,<3.4 by @dependabot in #93
- feat: Update syntax of custom torch ops by @andrea-fasoli in #96
- feat: add granite architecture support for DQ with smoothquant by @andrea-fasoli in #101
- feat: trimming config save by @BrandonGroth in #103
- feat: Add int8 sd conversion function for aiu by @andrea-fasoli in #95
- fix: Config save cleanup by @BrandonGroth in #113
- feat: add verbosity to smoothquant during conversion for AIU by @andrea-fasoli in #115
- feat: Conversion example by @andrea-fasoli in #118
- feat: adjust int8 triton to enable msb/lsb truncation by @chichun-charlie-liu in #120
- feat: mx integration by @chichun-charlie-liu in #110
- feat: GPTQModel Migration by @tharapalanivel in #102
- fix: disable granite in custom gptq as gptqmodel already supports it, fix … by @chichun-charlie-liu in #130
- test: Add tests for save_for_aiu functionality w/ tiny models by @BrandonGroth in #126
- fix: Update GPTQ example README.md for typo by @chichun-charlie-liu in #132
- docs: Fix README typo by @tharapalanivel in #135
- build: Update test/verification section of PR template by @tharapalanivel in #136
- fix: Fix versioning by @tharapalanivel in #137
New Contributors
- @chichun-charlie-liu made their first contribution in #47
- @kcirred made their first contribution in #52
- @JRosenkranz made their first contribution in #53
- @iqbal-saraf made their first contribution in #77
- @BrandonGroth made their first contribu...
v0.2.0
This is the first release of FMS Model Optimizer. It provides the core functionality:
- Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
- Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
- Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
- State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
- Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM
What's Changed
- Initial setup by @tharapalanivel in #1
- Initial commit for optimization techniques by @tharapalanivel in #9
- Add dynamic build versioning by @hickeyma in #12
- [ci]: Restructure GitHub workflows by @hickeyma in #13
- Clear notebook output by @tharapalanivel in #15
- Improve README readability by @tharapalanivel in #19
- Change project name to correspond to pypi package name by @hickeyma in #18
- Set smoothq_alpha as buffer by @andrea-fasoli in #20
- Fix device for smoothquant activation scales by @andrea-fasoli in #21
- test: Add checks for unit tests that require Nvidia GPU by @hickeyma in #14
- tox: Add base Python version to tox environment by @hickeyma in #24
- Fix symmetric behavior (issue #22) by @andrea-fasoli in #26
- ci: Add Ruff for lint and code formatting by @hickeyma in #30
- Update pre-commit requirement from <4.0,>=3.0.4 to >=3.0.4,<5.0 by @dependabot in #16
- doc: Update dev env section of the contributing guide by @hickeyma in #29
New Contributors
- @tharapalanivel made their first contribution in #1
- @hickeyma made their first contribution in #12
- @andrea-fasoli made their first contribution in #20
- @dependabot made their first contribution in #16
Full Changelog: https://github.com/foundation-model-stack/fms-model-optimizer/commits/v0.2.0