Skip to content

Releases: foundation-model-stack/fms-model-optimizer

v0.8.0

08 Dec 17:38
e3f1310

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.8.0

v0.7.0

28 Oct 18:39
65acdec

Choose a tag to compare

What's Changed

  • feat: Quantization Refactor by @BrandonGroth in #169
  • fix: remove custom scaled bmm op on cpu and fix fp8 test by @andrea-fasoli in #187
  • chore(deps): Update torch requirement from <2.8,>=2.2.0 to >=2.2.0,<2.9 by @dependabot[bot] in #177
  • chore(deps): Update accelerate requirement from !=0.34,<1.10,>=0.20.3 to >=0.20.3,!=0.34,<1.11 by @dependabot[bot] in #179
  • chore(deps): Update transformers requirement from <4.56,>=4.45 to >=4.45,<4.58 by @dependabot[bot] in #186

Full Changelog: v0.6.0...v0.7.0

v0.6.0

07 Aug 15:40
207eb06

Choose a tag to compare

What's Changed

  • fix: enabling block-by-block evaluation for granite-3.x-models by @bayo-ibm in #165
  • fix: pylint false alarm on libdevice functions by @chichun-charlie-liu in #166
  • fix: Add version limits for torchao, ensure compat with 0.12 + AIU by @ani300 in #168
  • feat: Change paged FP8 prefill back to regular attention by @ani300 in #171
  • feat: FP8 requested changes by @ani300 in #173
  • chore(deps): Update triton requirement from <3.4,>=3.0 to >=3.0,<3.5 by @dependabot[bot] in #170
  • chore(deps): Update transformers requirement from <4.54,>=4.45 to >=4.45,<4.56 by @dependabot[bot] in #172
  • fix: FP8 TP fixes by @ani300 in #176

Full Changelog: v0.5.0...v0.6.0

v0.5.0

17 Jul 16:40
7777b49

Choose a tag to compare

What's Changed

  • chore(deps): Update transformers requirement from <4.53,>=4.45 to >=4.45,<4.54 by @dependabot[bot] in #151
  • fix: Mark FP8 scale to have the same batch size as input by @ani300 in #163
  • chore: Update torch requirement from <2.6,>=2.2.0 to >=2.2.0,<2.8 by @dependabot[bot] in #100
  • feat: Add QmaxDynamic to allow unify Qmax , Qminmax, pertokenmax by @iqbal-saraf in #139
  • feat: GPTQv2 enablement for fms_mo by @bayo-ibm in #138
  • chore(deps): Update accelerate requirement from !=0.34,<1.9,>=0.20.3 to >=0.20.3,!=0.34,<1.10 by @dependabot[bot] in #164

New Contributors

Full Changelog: v0.4.1...v0.5.0

v0.4.1

11 Jul 23:18
c920911

Choose a tag to compare

What's Changed

  • feat: Per-sequence scaling in FP8 attention, FP8 fixes by @ani300 in #162

Full Changelog: v0.4.0...v0.4.1

v0.4.0

11 Jul 01:50
67a5e55

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.4.0

v0.3.0

10 Jun 16:01
7467f68

Choose a tag to compare

Highlights

  1. AIU support: new example added for model conversion for AIU (see examples/AIU_CONVERSION folder) and new add-ons for fms
  2. triton kernel for specialized matmul HW simulation and verification
  3. microscaling format support by integrating functionalities from microsoft mx package (see examples/MX for more details)
  4. other upgrades and improvements:
    • qmodel_prep tracing speed improvement, e.g., for Llama3-70B the time has been reduced from ~20min to ~2min now
    • Upgrade base dependencies to torch 2.5, python 3.12 and migrated from auto_gptq to gptqmodel

What's Changed

New Contributors

Read more

v0.2.0

13 Dec 17:50
e8bc88e

Choose a tag to compare

This is the first release of FMS Model Optimizer. It provides the core functionality:

  • Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
  • Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
  • Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
  • State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
  • Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM

What's Changed

New Contributors

Full Changelog: https://github.com/foundation-model-stack/fms-model-optimizer/commits/v0.2.0