Releases · foundation-model-stack/fms-model-optimizer

08 Dec 17:38

tharapalanivel

v0.8.0

e3f1310

v0.8.0 Latest

Latest

What's Changed

fix: Fix build and check packages flow by @tharapalanivel in #192
chore: upgrade torch to allow 2.9 by @Ssukriti in #190
fix: Fixes for paged fp8 attention with chunked prefill by @ani300 in #191

New Contributors

@Ssukriti made their first contribution in #190

Full Changelog: v0.7.0...v0.8.0

Contributors

ani300, Ssukriti, and tharapalanivel

Assets 2

28 Oct 18:39

tharapalanivel

v0.7.0

65acdec

v0.7.0

What's Changed

feat: Quantization Refactor by @BrandonGroth in #169
fix: remove custom scaled bmm op on cpu and fix fp8 test by @andrea-fasoli in #187
chore(deps): Update torch requirement from <2.8,>=2.2.0 to >=2.2.0,<2.9 by @dependabot[bot] in #177
chore(deps): Update accelerate requirement from !=0.34,<1.10,>=0.20.3 to >=0.20.3,!=0.34,<1.11 by @dependabot[bot] in #179
chore(deps): Update transformers requirement from <4.56,>=4.45 to >=4.45,<4.58 by @dependabot[bot] in #186

Full Changelog: v0.6.0...v0.7.0

Contributors

dependabot, BrandonGroth, and andrea-fasoli

Assets 2

07 Aug 15:40

tharapalanivel

v0.6.0

207eb06

v0.6.0

What's Changed

fix: enabling block-by-block evaluation for granite-3.x-models by @bayo-ibm in #165
fix: pylint false alarm on libdevice functions by @chichun-charlie-liu in #166
fix: Add version limits for torchao, ensure compat with 0.12 + AIU by @ani300 in #168
feat: Change paged FP8 prefill back to regular attention by @ani300 in #171
feat: FP8 requested changes by @ani300 in #173
chore(deps): Update triton requirement from <3.4,>=3.0 to >=3.0,<3.5 by @dependabot[bot] in #170
chore(deps): Update transformers requirement from <4.54,>=4.45 to >=4.45,<4.56 by @dependabot[bot] in #172
fix: FP8 TP fixes by @ani300 in #176

Full Changelog: v0.5.0...v0.6.0

Contributors

ani300, dependabot, and 2 other contributors

Assets 2

17 Jul 16:40

tharapalanivel

v0.5.0

7777b49

v0.5.0

What's Changed

chore(deps): Update transformers requirement from <4.53,>=4.45 to >=4.45,<4.54 by @dependabot[bot] in #151
fix: Mark FP8 scale to have the same batch size as input by @ani300 in #163
chore: Update torch requirement from <2.6,>=2.2.0 to >=2.2.0,<2.8 by @dependabot[bot] in #100
feat: Add QmaxDynamic to allow unify Qmax , Qminmax, pertokenmax by @iqbal-saraf in #139
feat: GPTQv2 enablement for fms_mo by @bayo-ibm in #138
chore(deps): Update accelerate requirement from !=0.34,<1.9,>=0.20.3 to >=0.20.3,!=0.34,<1.10 by @dependabot[bot] in #164

New Contributors

@bayo-ibm made their first contribution in #138

Full Changelog: v0.4.1...v0.5.0

Contributors

ani300, dependabot, and 2 other contributors

Assets 2

11 Jul 23:18

tharapalanivel

v0.4.1

c920911

v0.4.1

What's Changed

feat: Per-sequence scaling in FP8 attention, FP8 fixes by @ani300 in #162

Full Changelog: v0.4.0...v0.4.1

Contributors

ani300

Assets 2

11 Jul 01:50

tharapalanivel

v0.4.0

67a5e55

v0.4.0

What's Changed

feat: add guards to sawb recomputation by @andrea-fasoli in #131
build: Move torchvision to an optional dependency by @BrandonGroth in #144
fix: feat: fix for new transformers (>4.48) and new QLinear for INT8 training with HW emulation by @chichun-charlie-liu in #141
chore(deps): Update transformers requirement from <4.52,>=4.45 to >=4.45,<4.53 by @dependabot[bot] in #127
build: Move triton to an optional dependency by @BrandonGroth in #146
chore(deps): Update accelerate requirement from !=0.34,<1.7,>=0.20.3 to >=0.20.3,!=0.34,<1.9 by @dependabot[bot] in #143
build: Make non-essential dependencies optional by @BrandonGroth in #147
fix: fix available_packages by @chichun-charlie-liu in #153
fix: Saved qconfig recipe being overwritten with defaults by @BrandonGroth in #152
fix: Remove gptqmodel Warning on startup by @BrandonGroth in #156
fix: Remove llmcompressor oneshot import deprecation warning by @BrandonGroth in #157
feat: addons for FP8 attention bmm, paged attention, and linear in FMS by @ani300 in #154
feat: addons for FP8 attention bmm and linear in FMS by @andrea-fasoli in #149
feat: add QA and MaskedLM task for FP8 encoder instantiation by @andrea-fasoli in #148
feat: AIU sim for FP8 (DL8/DL16) added to triton kernel by @chichun-charlie-liu in #159
fix: qkvsync bug fix by @chichun-charlie-liu in #161
chore(deps): Update datasets requirement from <4.0,>=3.0.0 to >=3.0.0,<5.0 by @dependabot[bot] in #160

New Contributors

@ani300 made their first contribution in #154

Full Changelog: v0.3.0...v0.4.0

Contributors

ani300, dependabot, and 3 other contributors

Assets 2

10 Jun 16:01

tharapalanivel

v0.3.0

7467f68

v0.3.0

Highlights

AIU support: new example added for model conversion for AIU (see examples/AIU_CONVERSION folder) and new add-ons for fms
triton kernel for specialized matmul HW simulation and verification
microscaling format support by integrating functionalities from microsoft mx package (see examples/MX for more details)
other upgrades and improvements:
- qmodel_prep tracing speed improvement, e.g., for Llama3-70B the time has been reduced from ~20min to ~2min now
- Upgrade base dependencies to torch 2.5, python 3.12 and migrated from auto_gptq to gptqmodel

What's Changed

Add spell checker to alleviate spelling errors by @hickeyma in #32
chore: Remove Makefile by @hickeyma in #37
chore: Add GitHub badges to project README by @hickeyma in #38
ci: Replace coverage with pytest-cov plugin for code coverage by @hickeyma in #39
fix: Update the quantization notebook tutorial by @hickeyma in #41
Small updates to the docs by @hickeyma in #40
fix: Error in quantization notebook tutorial when retrieving image by @hickeyma in #42
OptArguments by @tharapalanivel in #43
Add logging and tests for run_quant.py by @tharapalanivel in #44
Aiu addons by @andrea-fasoli in #46
fix Qbmm tracing issue by @chichun-charlie-liu in #47
Add build backend by @tharapalanivel in #50
Add mypy static checker tool by @hickeyma in #49
[utils] check if folder exists before attempting to create directory by @kcirred in #52
fms_mo docker image by @tharapalanivel in #48
Lint for fx by @tharapalanivel in #54
Update accelerate requirement from !=0.34,<1.1,>=0.20.3 to >=0.20.3,!=0.34,<1.4 by @dependabot in #56
Update transformers requirement from <4.48,>=4.45 to >=4.45,<4.49 by @dependabot in #55
Add FP/INT triton kernels and unit tests, also update QAT example by @chichun-charlie-liu in #58
ci: Add workflow for PR labels by @tharapalanivel in #57
fix: Fix labelpr workflow by @tharapalanivel in #63
feat: added granite support; fixed adapters to ignore model_config by @JRosenkranz in #53
fix: Triton kernel bug fix by @chichun-charlie-liu in #61
feat: Support for int8 smoothquant by @andrea-fasoli in #65
test: Unit test int8 by @andrea-fasoli in #62
fix: bug fix and minor changes on triton kernel: by @chichun-charlie-liu in #69
fix: handle linear_type callable at int8 linear instantiation by @andrea-fasoli in #68
fix: multiple bug fixes: by @chichun-charlie-liu in #70
feat: improve transformers tracing for last layers by @chichun-charlie-liu in #72
fix: in DQ example, when nbits_kvcache=8, context manager will detect incorrect frame by @chichun-charlie-liu in #74
fix: Fix build and check packages flow by @tharapalanivel in #79
fix: make triton optional for systems without GPUs by @chichun-charlie-liu in #78
fix: a bug that prevented dynamo from working with PT 2.5.1 has been fixed by @chichun-charlie-liu in #81
test: int8 unit tests for aiu add-ons by @iqbal-saraf in #77
feat: confirmed py3.12 with pt2.5.1 by @chichun-charlie-liu in #83
fix: finish missed items for upgrading to python 3.12 by @chichun-charlie-liu in #84
fix: minor fix from last PR regarding py3.12 upgrades by @chichun-charlie-liu in #85
feat: Update accelerate requirement from !=0.34,<1.4,>=0.20.3 to >=0.20.3,!=0.34,<1.7 by @dependabot in #86
feat: Update transformers requirement from <4.49,>=4.45 to >=4.45,<4.51 by @dependabot in #80
feat: triton matmul kernel adjusted, now is closer to HW behavior by @chichun-charlie-liu in #82
fix: fix QBmm detection and default behavior by @chichun-charlie-liu in #87
feat: expand detection of data types in model size estimation by @andrea-fasoli in #88
fix: Fix push to pypi flow by @tharapalanivel in #90
feat: int8 granite addon by @andrea-fasoli in #92
feat: INT8 LLM TP>1 enablement by @andrea-fasoli in #94
dependencies: Update transformers requirement from <4.51,>=4.45 to >=4.45,<4.52 by @dependabot in #91
dependencies: Update triton requirement from <3.2,>=3.0 to >=3.0,<3.4 by @dependabot in #93
feat: Update syntax of custom torch ops by @andrea-fasoli in #96
feat: add granite architecture support for DQ with smoothquant by @andrea-fasoli in #101
feat: trimming config save by @BrandonGroth in #103
feat: Add int8 sd conversion function for aiu by @andrea-fasoli in #95
fix: Config save cleanup by @BrandonGroth in #113
feat: add verbosity to smoothquant during conversion for AIU by @andrea-fasoli in #115
feat: Conversion example by @andrea-fasoli in #118
feat: adjust int8 triton to enable msb/lsb truncation by @chichun-charlie-liu in #120
feat: mx integration by @chichun-charlie-liu in #110
feat: GPTQModel Migration by @tharapalanivel in #102
fix: disable granite in custom gptq as gptqmodel already supports it, fix … by @chichun-charlie-liu in #130
test: Add tests for save_for_aiu functionality w/ tiny models by @BrandonGroth in #126
fix: Update GPTQ example README.md for typo by @chichun-charlie-liu in #132
docs: Fix README typo by @tharapalanivel in #135
build: Update test/verification section of PR template by @tharapalanivel in #136
fix: Fix versioning by @tharapalanivel in #137

New Contributors

@chichun-charlie-liu made their first contribution in #47
@kcirred made their first contribution in #52
@JRosenkranz made their first contribution in #53
@iqbal-saraf made their first contribution in #77
@BrandonGroth made their first contribu...

Contributors

JRosenkranz, hickeyma, and 7 other contributors

Assets 2

13 Dec 17:50

hickeyma

v0.2.0

e8bc88e

v0.2.0

This is the first release of FMS Model Optimizer. It provides the core functionality:

Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM

What's Changed

Initial setup by @tharapalanivel in #1
Initial commit for optimization techniques by @tharapalanivel in #9
Add dynamic build versioning by @hickeyma in #12
[ci]: Restructure GitHub workflows by @hickeyma in #13
Clear notebook output by @tharapalanivel in #15
Improve README readability by @tharapalanivel in #19
Change project name to correspond to pypi package name by @hickeyma in #18
Set smoothq_alpha as buffer by @andrea-fasoli in #20
Fix device for smoothquant activation scales by @andrea-fasoli in #21
test: Add checks for unit tests that require Nvidia GPU by @hickeyma in #14
tox: Add base Python version to tox environment by @hickeyma in #24
Fix symmetric behavior (issue #22) by @andrea-fasoli in #26
ci: Add Ruff for lint and code formatting by @hickeyma in #30
Update pre-commit requirement from <4.0,>=3.0.4 to >=3.0.4,<5.0 by @dependabot in #16
doc: Update dev env section of the contributing guide by @hickeyma in #29

New Contributors

@tharapalanivel made their first contribution in #1
@hickeyma made their first contribution in #12
@andrea-fasoli made their first contribution in #20
@dependabot made their first contribution in #16

Full Changelog: https://github.com/foundation-model-stack/fms-model-optimizer/commits/v0.2.0

Contributors

hickeyma, dependabot, and 2 other contributors

Assets 2

Releases: foundation-model-stack/fms-model-optimizer

v0.8.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.7.0

What's Changed

Contributors

Uh oh!

v0.6.0

What's Changed

Contributors

Uh oh!

v0.5.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.1

What's Changed

Contributors

Uh oh!

v0.4.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0

What's Changed

New Contributors

Contributors

Uh oh!