[Draft] Incorporate vllm 0.5.5 kernels #39

cyang49 · 2024-08-29T15:23:05Z

Incorporating vllm 0.5.5 CUDA kernels by copying csrc folder and build scripts. This is needed to give fms access to performant CUDA kernels. The original paged attention kernel is too outdated and is replaced by the new ones.

Notes:

The generation test has a circular dependency on fms.models, which creates some issues in the FP8 branch (some conflict in torch.library.. still need to find a way to solve that). But with fms/main it works fine
The draft includes ALL kernels from vllm and some build scripts are copied verbatim. I know that we probably don't want unnecessary code, but keeping it this way would minimize the porting efforts if later some manual sync is needed again. I'll keep it this way for now
Alternatively, we can submodule vllm and using a symlink to vllm/csrc with the modified build scripts included in this PR
torch needs to be updated to 2.4.0

Incorporate vllm 0.5.5 kernels

4858bdb

cyang49 mentioned this pull request Aug 29, 2024

[Draft] fp8 ckpt support foundation-model-stack/foundation-model-stack#328

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] Incorporate vllm 0.5.5 kernels #39

[Draft] Incorporate vllm 0.5.5 kernels #39

Uh oh!

cyang49 commented Aug 29, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Draft] Incorporate vllm 0.5.5 kernels #39

Are you sure you want to change the base?

[Draft] Incorporate vllm 0.5.5 kernels #39

Uh oh!

Conversation

cyang49 commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cyang49 commented Aug 29, 2024 •

edited

Loading