-
Notifications
You must be signed in to change notification settings - Fork 17
feat: addons for FP8 attention bmm, paged attention, and linear in FMS #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
|
this PR needs minimal changes and in my opinion it is good to go. FP8 addons are an experimental feature which may require further validation of math and model outputs, but they don't interact with other parts of FMS-MO, so won't break existing code. Only exception is the additional import of torchao, only needed for fp8, which is being added to the build as optional requirement. @tharapalanivel @chichun-charlie-liu please check that this is done appropriately. @ani300 we have barebone unit tests for other addons, that check op registration in the torch namespace and output shape validation: https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/tests/aiu_addons/test_gptq_addon.py |
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
andrea-fasoli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR looks ready to go
|
Now that ibm-fms is an optional package, we will need to add guards to each file that imports fms similar to the other aiu_addon files. |
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
|
@BrandonGroth done |
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Description of the change
This is an updated version of @andrea-fasoli 's refactor of my FP8 work, adding Paged Attention kernels as well as cleaning the code.
Related issues or PRs
#149
How to verify the PR
Code review (including math) is required.
Was the PR tested
Checklist for passing CI/CD:
git commit -signoffor equivalenttox -e fixtox -e linttox -e spellchecktox -e unitNote: CI/CD performs unit tests on multiple versions of Python from a fresh install. There may be differences with your local environment and the test environment.