feat: CP support for mamba layer #642

kmehant · 2025-11-24T07:06:46Z

Results

Legend

abbreviation	meaning
cp	context parallel degree
ep	expert parallel degree
dp	data parallel degree
gas	gradient accumulation steps
ebs	effective batch size
s	sequence length is

Ablations

Parity Experiments

model	experiment setting	loss	tps per gpu
ibm-granite/granite-4.0-h-tiny	cp8-ebs4-s8192-gas1	0.8059140625	973.6
ibm-granite/granite-4.0-h-tiny	cp8-ebs4-s8192-gas1-ep8	0.80224609375	2367.6
ibm-granite/granite-4.0-h-tiny	cp8-ebs4-s8192-gas2	0.8059765625	NA
ibm-granite/granite-4.0-h-tiny	cp4-dp2-ebs4-s8192-gas1	0.802953125	953.4
ibm-granite/granite-4.0-h-tiny	cp1-dp4-ep4-ebs4-s8192-gas1	0.7967056884765625	2576

Long Context (sequence length is 131072 (128k))

model	experiment setting	tps per gpu	GPU memory util ratio
ibm-granite/granite-4.0-h-tiny	cp8-ebs1-s131072-gas1-ep8	1462.8	0.5140136719
ibm-granite/granite-4.0-h-small	cp8-ebs1-s131072-gas1-ep8	682.7	0.9887207031

Training Resumption

settings used: mk-cp8-ebs4-s8192-gas1

Summary of external dependencies

fsdp2-nov https://github.com/kmehant/transformers.git (additional changes not in upstream)

Changes:

Preparing batches to be compatible for CP if CP is enabled.
Wrapping training loop with torch cp context
Preparing shift labels for correct loss calculation
Specific loss reduction when combination of cp and dp is used.
model saving fix

fsdp2-fix https://github.com/kmehant/accelerate.git (additional changes not in upstream)

Changes:

Mixed precision fix when using FSDP2

mamba-cp https://github.com/kmehant/fms-acceleration.git (will be main after merging foundation-model-stack/fms-acceleration#164)

Enables CP for mamba layers to go hand in hand with self attention CP

mamba-cp https://github.com/garrett361/mamba (Thanks to Garrett)

Mamba_ssm kernels must be installed from this fork and branch to leverage CP.

Summary of PR merged into HF repos to enable CP and FSDP2.

github-actions · 2025-11-24T07:06:55Z

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

kmehant · 2025-11-29T17:41:51Z

Blockers:

Depends on custom mamba ssm package, transformers, and accelerate packages.

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

kmehant requested review from aluu317, anhuong, dushyantbehl and fabianlim as code owners November 24, 2025 07:06

kmehant force-pushed the mamba-cp branch from 3b0e3fd to a887a38 Compare November 24, 2025 07:07

kmehant changed the title ~~CP support for mamba layer~~ feat: CP support for mamba layer Nov 24, 2025

github-actions bot added the feat label Nov 24, 2025

kmehant added 6 commits November 29, 2025 23:08

feat: add support for mamba cp

ef4936c

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

feat: add support for mamba cp

17e01b2

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

feat: add support for mamba cp

a66053e

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

feat: add support for mamba cp

2b9c679

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

feat: add support for mamba cp

75d1cf1

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

docs: add documentation

c9fcdfe

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

kmehant force-pushed the mamba-cp branch from 77c480e to c9fcdfe Compare November 29, 2025 17:38

docs: add documentation

80d25b9

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

dushyantbehl added the on hold This PR is on hold and will not be merged right away label Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: CP support for mamba layer #642

feat: CP support for mamba layer #642

Uh oh!

kmehant commented Nov 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

kmehant commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: CP support for mamba layer #642

Are you sure you want to change the base?

feat: CP support for mamba layer #642

Uh oh!

Conversation

kmehant commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Legend

Ablations

Parity Experiments

Long Context (sequence length is 131072 (128k))

Training Resumption

Summary of external dependencies

fsdp2-nov https://github.com/kmehant/transformers.git (additional changes not in upstream)

fsdp2-fix https://github.com/kmehant/accelerate.git (additional changes not in upstream)

mamba-cp https://github.com/kmehant/fms-acceleration.git (will be main after merging foundation-model-stack/fms-acceleration#164)

mamba-cp https://github.com/garrett361/mamba (Thanks to Garrett)

Summary of PR merged into HF repos to enable CP and FSDP2.

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

kmehant commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kmehant commented Nov 24, 2025 •

edited

Loading