Skip to content

Conversation

@cyang49
Copy link

@cyang49 cyang49 commented Aug 29, 2024

Incorporating vllm 0.5.5 CUDA kernels by copying csrc folder and build scripts. This is needed to give fms access to performant CUDA kernels. The original paged attention kernel is too outdated and is replaced by the new ones.

Notes:

  • The generation test has a circular dependency on fms.models, which creates some issues in the FP8 branch (some conflict in torch.library.. still need to find a way to solve that). But with fms/main it works fine
  • The draft includes ALL kernels from vllm and some build scripts are copied verbatim. I know that we probably don't want unnecessary code, but keeping it this way would minimize the porting efforts if later some manual sync is needed again. I'll keep it this way for now
  • Alternatively, we can submodule vllm and using a symlink to vllm/csrc with the modified build scripts included in this PR
  • torch needs to be updated to 2.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant