Skip to content

Are there plans to add multi-LoRA support to the Qualcomm backend #16999

@cxn-selfie

Description

@cxn-selfie

🚀 The feature, motivation and pitch

Are there plans to add multi-LoRA support to the Qualcomm backend and support runtime LoRA switching to meet the needs of multi-LoRA scenarios?

The Feature

Add support for multiple LoRA adapters on the Qualcomm backend, including the ability to dynamically switch LoRA adapters at runtime without reloading or recompiling the base model.

Specifically, this feature would enable:

Loading and managing multiple LoRA adapters simultaneously

Selecting or switching the active LoRA adapter during inference

Keeping the base model static while applying different LoRA weights on demand

This is particularly useful for scenarios where a single base model serves multiple tasks, domains, or user profiles.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Metadata

Metadata

Assignees

Labels

module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/partner: qualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions