-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
feat(whisperx): add whisperx backend for transcription with speaker diarization #8299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
4dcf358 to
7bf3852
Compare
|
Hi @mudler I have updated the code based on your feedback. |
3e5133d to
0d10ffb
Compare
|
Hi @mudler hope you're having good weekend |
mudler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I will test as well on my setup once on master. Thanks!
|
CI seems to fail ( error looks genuine ) |
Head branch was pushed to by a user without write access
17d5edb to
3f93840
Compare
Can you run the CI again? |
|
Hi @mudler this fail isn't from my code update, it's pre-existing issue unrelated to this PR. |
The error seems to be coming from whisperx, here are the CI logs: |
Oh okay, give me some time to check again |
606ddf0 to
df8946a
Compare
|
Hi can you re-initiate the CI process to confirm? |
Add speaker field to the gRPC TranscriptSegment message and map it through the Go schema, enabling backends to return speaker labels. Signed-off-by: eureka928 <meobius123@gmail.com>
Add Python gRPC backend using WhisperX for speech-to-text with word-level timestamps, forced alignment, and speaker diarization via pyannote-audio when HF_TOKEN is provided. Signed-off-by: eureka928 <meobius123@gmail.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
…ments Address review feedback: - Use --extra-index-url for CPU torch wheels to reduce size - Remove torch version pins, let uv resolve compatible versions Signed-off-by: eureka928 <meobius123@gmail.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra index instead of picking torch==2.8.0+cu128 from PyPI, which pulls unresolvable CUDA dependencies. Signed-off-by: eureka928 <meobius123@gmail.com>
…ion failure uv's default first-match strategy finds torch on PyPI before checking the extra index, causing it to pick torch==2.8.0+cu128 instead of the CPU variant. This makes whisperx's transitive torch dependency unresolvable. Using unsafe-best-match lets uv consider all indexes. Signed-off-by: eureka928 <meobius123@gmail.com>
…ilure PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the issue where uv cannot locate an explicit +cpu local version specifier. This aligns with the pattern used by all other CPU backends. Signed-off-by: eureka928 <meobius123@gmail.com>
8373bec to
7f5d72e
Compare
I tried on my own repo and it passed all tests |
|
hipblas jobs now are failing: |
…ments to fix uv resolution uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4, +rocm6.3) in pinned requirements. The --extra-index-url already points to the correct ROCm wheel index and --index-strategy unsafe-best-match (set in libbackend.sh) ensures the ROCm variant is preferred. Applies the same fix as 7f5d72e (which resolved this for +cpu) across all 14 hipblas requirements files. Signed-off-by: eureka928 <meobius123@gmail.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: eureka928 <meobius123@gmail.com>
| @@ -1,5 +1,5 @@ | |||
| --extra-index-url https://download.pytorch.org/whl/rocm6.3 | |||
| torch==2.7.1+rocm6.3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (and the same on other backends) is unnecessary as other backends are building fine. Please keep it scoped to whisperX
Reverts changes to non-whisperx hipblas requirements files per maintainer review — other backends are building fine with the +rocm local version suffix. Signed-off-by: eureka928 <meobius123@gmail.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: eureka928 <meobius123@gmail.com>
|
Can you run the test again? I updated |
mudler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Description
This PR adds a new WhisperX Python backend that provides transcription with speaker diarization (identifying who is speaking), word-level timestamps, and forced alignment via pyannote-audio.
Closes #3375
Key changes:
TranscriptSegmentmessage with aspeakerfield (backward-compatible — existing backends leave it empty)Speakerfield through the Go schema (core/schema/transcription.go) and backend mapper (core/backend/transcript.go)backend/python/whisperx/backend with gRPC server, requirements for CPU/CUDA 12/CUDA 13/ROCm, and unit testsbackend/index.yaml, and CI workflowSpeaker diarization requires a HuggingFace token (
HF_TOKENenv var) with access to pyannote models, and is activated by settingdiarize=truein the transcription request.Notes for Reviewers
Signed commits