-
Notifications
You must be signed in to change notification settings - Fork 820
Description
🚀 The feature, motivation and pitch
Are there plans to add multi-LoRA support to the Qualcomm backend and support runtime LoRA switching to meet the needs of multi-LoRA scenarios?
The Feature
Add support for multiple LoRA adapters on the Qualcomm backend, including the ability to dynamically switch LoRA adapters at runtime without reloading or recompiling the base model.
Specifically, this feature would enable:
Loading and managing multiple LoRA adapters simultaneously
Selecting or switching the active LoRA adapter during inference
Keeping the base model static while applying different LoRA weights on demand
This is particularly useful for scenarios where a single base model serves multiple tasks, domains, or user profiles.
Alternatives
No response
Additional context
No response
RFC (Optional)
No response
cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin