Skip to content

Conversation

@eureka928
Copy link
Contributor

@eureka928 eureka928 commented Jan 30, 2026

Description

This PR adds a new WhisperX Python backend that provides transcription with speaker diarization (identifying who is speaking), word-level timestamps, and forced alignment via pyannote-audio.

Closes #3375

Key changes:

  • Extends the gRPC TranscriptSegment message with a speaker field (backward-compatible — existing backends leave it empty)
  • Maps the new Speaker field through the Go schema (core/schema/transcription.go) and backend mapper (core/backend/transcript.go)
  • Adds the full backend/python/whisperx/ backend with gRPC server, requirements for CPU/CUDA 12/CUDA 13/ROCm, and unit tests
  • Registers the backend in the Makefile, backend/index.yaml, and CI workflow

Speaker diarization requires a HuggingFace token (HF_TOKEN env var) with access to pyannote models, and is activated by setting diarize=true in the transcription request.

Notes for Reviewers

  • The alignment model is cached per language to avoid reloading on every transcription call
  • The diarization pipeline is lazily initialized and reused across calls
  • Timestamp handling matches the existing faster-whisper convention

Signed commits

  • Yes, I signed my commits.

@netlify
Copy link

netlify bot commented Jan 30, 2026

Deploy Preview for localai ready!

Name Link
🔨 Latest commit d6c7cf7
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/6980b5e6ae7e54000858670a
😎 Deploy Preview https://deploy-preview-8299--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@eureka928
Copy link
Contributor Author

@mudler @neurocis nice to meet you and glad to put the first PR

Would you review my PR?

Thank you for your time

@eureka928
Copy link
Contributor Author

eureka928 commented Jan 30, 2026

Hi @mudler I have updated the code based on your feedback.
Please let me know if you have any further feedback after your review.

@eureka928 eureka928 force-pushed the feat/whisperx-backend branch from 3e5133d to 0d10ffb Compare January 30, 2026 21:08
@eureka928
Copy link
Contributor Author

Hi @mudler hope you're having good weekend
Would you give me more feedback after review?
Thank you and have a nice weekend

mudler
mudler previously approved these changes Jan 31, 2026
Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I will test as well on my setup once on master. Thanks!

@mudler mudler enabled auto-merge (squash) January 31, 2026 15:33
@mudler
Copy link
Owner

mudler commented Jan 31, 2026

CI seems to fail ( error looks genuine )

auto-merge was automatically disabled February 1, 2026 02:39

Head branch was pushed to by a user without write access

@eureka928 eureka928 force-pushed the feat/whisperx-backend branch 2 times, most recently from 17d5edb to 3f93840 Compare February 1, 2026 02:41
@eureka928
Copy link
Contributor Author

CI seems to fail ( error looks genuine )

Can you run the CI again?
Thank you

@mudler mudler added enhancement New feature or request and removed dependencies labels Feb 1, 2026
@eureka928
Copy link
Contributor Author

Hi @mudler this fail isn't from my code update, it's pre-existing issue unrelated to this PR.

@mudler
Copy link
Owner

mudler commented Feb 2, 2026

Hi @mudler this fail isn't from my code update, it's pre-existing issue unrelated to this PR.

The error seems to be coming from whisperx, here are the CI logs:


#23 1.719 Using CPython 3.10.18 interpreter at: python/bin/python3
#23 1.719 Creating virtual environment at: venv
#23 1.720 Activate with: source venv/bin/activate
#23 1.746 starting requirements install for /whisperx/requirements.txt
#23 1.950 Using Python 3.10.18 environment at: venv
#23 2.102 Resolved 4 packages in 150ms
#23 2.104 Downloading setuptools (1.0MiB)
#23 2.107 Downloading grpcio-tools (2.4MiB)
#23 2.107 Downloading grpcio (5.7MiB)
#23 2.137  Downloaded grpcio-tools
#23 2.198  Downloaded grpcio
#23 2.238  Downloaded setuptools
#23 2.239 Prepared 4 packages in 136ms
#23 2.255 Installed 4 packages in 15ms
#23 2.255  + grpcio==1.71.0
#23 2.255  + grpcio-tools==1.71.0
#23 2.255  + protobuf==5.29.5
#23 2.255  + setuptools==80.10.2
#23 2.259 finished requirements install for /whisperx/requirements.txt
#23 2.259 starting requirements install for /whisperx/requirements-cpu.txt
#23 2.266 Using Python 3.10.18 environment at: venv
#23 2.630    Updating https://github.com/m-bain/whisperX.git (HEAD)
#23 4.377     Updated https://github.com/m-bain/whisperX.git (6ec4a020489d904c4f2cd1ed097674232d2692d4)
#23 4.968   × No solution found when resolving dependencies:
#23 4.968   ╰─▶ Because only whisperx==3.7.6 is available and whisperx==3.7.6
#23 4.968       depends on torch{platform_machine == 'x86_64' and sys_platform !=
#23 4.968       'darwin'}>=2.8.0,<2.9.dev0, we can conclude that all versions of
#23 4.968       whisperx depend on torch{platform_machine == 'x86_64' and sys_platform
#23 4.968       != 'darwin'}>=2.8.0,<2.9.dev0.
#23 4.968       And because only the following versions of torch are available:
#23 4.968           torch<2.8.0
#23 4.968           torch==2.8.0+cu128
#23 4.968           torch>2.9.dev0
#23 4.968       we can conclude that all versions of whisperx depend on
#23 4.968       torch==2.8.0+cu128. (1)
#23 4.968 
#23 4.968       Because nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and
#23 4.968       sys_platform == 'linux'}==12.8.93 has no wheels with a matching platform
#23 4.968       tag (e.g., `manylinux_2_39_x86_64`) and torch==2.8.0+cu128 depends on
#23 4.968       nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform
#23 4.968       == 'linux'}==12.8.93, we can conclude that torch==2.8.0+cu128 cannot
#23 4.968       be used.
#23 4.968       And because we know from (1) that all versions of whisperx depend on
#23 4.968       torch==2.8.0+cu128, we can conclude that all versions of whisperx cannot
#23 4.968       be used.
#23 4.968       And because you require whisperx, we can conclude that your requirements
#23 4.968       are unsatisfiable.
#23 4.968 
#23 4.968       hint: `torch` was requested with a pre-release marker (e.g., all of:
#23 4.968           torch>=2.8.0,<2.8.0+cu128
#23 4.968           torch>2.8.0+cu128,<2.9.dev0
#23 4.968       ), but pre-releases weren't enabled (try: `--prerelease=allow`)
#23 4.968 
#23 4.968       hint: `nvidia-cuda-nvrtc-cu12` was found on
#23 4.968       https://download.pytorch.org/whl/cpu, but not at the requested version
#23 4.968       (nvidia-cuda-nvrtc-cu12==12.8.93). A compatible version may be available
#23 4.968       on a subsequent index (e.g., https://pypi.org/simple). By default, uv
#23 4.968       will only consider versions that are published on the first index that
#23 4.968       contains a given package, to avoid dependency confusion attacks. If all
#23 4.968       indexes are equally trusted, use `--index-strategy unsafe-best-match` to
#23 4.968       consider all versions from all indexes, regardless of the order in which
#23 4.968       they were defined.
#23 4.968 
#23 4.968       hint: Wheels are available for `nvidia-cuda-nvrtc-cu12`
#23 4.968       (v12.8.93) on the following platforms: `manylinux_2_17_aarch64`,
#23 4.968       `manylinux2014_aarch64`, `win_amd64`
#23 4.974 make: *** [Makefile:5: install] Error 1
#23 ERROR: process "/bin/sh -c cd /${BACKEND} && PORTABLE_PYTHON=true make" did not complete successfully: exit code: 2
------
 > [builder 17/18] RUN cd /whisperx && PORTABLE_PYTHON=true make:
4.968       will only consider versions that are published on the first index that
4.968       contains a given package, to avoid dependency confusion attacks. If all
4.968       indexes are equally trusted, use `--index-strategy unsafe-best-match` to
4.968       consider all versions from all indexes, regardless of the order in which
4.968       they were defined.
4.968 
4.968       hint: Wheels are available for `nvidia-cuda-nvrtc-cu12`
4.968       (v12.8.93) on the following platforms: `manylinux_2_17_aarch64`,
4.968       `manylinux2014_aarch64`, `win_amd64`
4.974 make: *** [Makefile:5: install] Error 1
------
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
Dockerfile.python:198

@eureka928
Copy link
Contributor Author

Hi @mudler this fail isn't from my code update, it's pre-existing issue unrelated to this PR.

The error seems to be coming from whisperx, here are the CI logs:


#23 1.719 Using CPython 3.10.18 interpreter at: python/bin/python3
#23 1.719 Creating virtual environment at: venv
#23 1.720 Activate with: source venv/bin/activate
#23 1.746 starting requirements install for /whisperx/requirements.txt
#23 1.950 Using Python 3.10.18 environment at: venv
#23 2.102 Resolved 4 packages in 150ms
#23 2.104 Downloading setuptools (1.0MiB)
#23 2.107 Downloading grpcio-tools (2.4MiB)
#23 2.107 Downloading grpcio (5.7MiB)
#23 2.137  Downloaded grpcio-tools
#23 2.198  Downloaded grpcio
#23 2.238  Downloaded setuptools
#23 2.239 Prepared 4 packages in 136ms
#23 2.255 Installed 4 packages in 15ms
#23 2.255  + grpcio==1.71.0
#23 2.255  + grpcio-tools==1.71.0
#23 2.255  + protobuf==5.29.5
#23 2.255  + setuptools==80.10.2
#23 2.259 finished requirements install for /whisperx/requirements.txt
#23 2.259 starting requirements install for /whisperx/requirements-cpu.txt
#23 2.266 Using Python 3.10.18 environment at: venv
#23 2.630    Updating https://github.com/m-bain/whisperX.git (HEAD)
#23 4.377     Updated https://github.com/m-bain/whisperX.git (6ec4a020489d904c4f2cd1ed097674232d2692d4)
#23 4.968   × No solution found when resolving dependencies:
#23 4.968   ╰─▶ Because only whisperx==3.7.6 is available and whisperx==3.7.6
#23 4.968       depends on torch{platform_machine == 'x86_64' and sys_platform !=
#23 4.968       'darwin'}>=2.8.0,<2.9.dev0, we can conclude that all versions of
#23 4.968       whisperx depend on torch{platform_machine == 'x86_64' and sys_platform
#23 4.968       != 'darwin'}>=2.8.0,<2.9.dev0.
#23 4.968       And because only the following versions of torch are available:
#23 4.968           torch<2.8.0
#23 4.968           torch==2.8.0+cu128
#23 4.968           torch>2.9.dev0
#23 4.968       we can conclude that all versions of whisperx depend on
#23 4.968       torch==2.8.0+cu128. (1)
#23 4.968 
#23 4.968       Because nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and
#23 4.968       sys_platform == 'linux'}==12.8.93 has no wheels with a matching platform
#23 4.968       tag (e.g., `manylinux_2_39_x86_64`) and torch==2.8.0+cu128 depends on
#23 4.968       nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform
#23 4.968       == 'linux'}==12.8.93, we can conclude that torch==2.8.0+cu128 cannot
#23 4.968       be used.
#23 4.968       And because we know from (1) that all versions of whisperx depend on
#23 4.968       torch==2.8.0+cu128, we can conclude that all versions of whisperx cannot
#23 4.968       be used.
#23 4.968       And because you require whisperx, we can conclude that your requirements
#23 4.968       are unsatisfiable.
#23 4.968 
#23 4.968       hint: `torch` was requested with a pre-release marker (e.g., all of:
#23 4.968           torch>=2.8.0,<2.8.0+cu128
#23 4.968           torch>2.8.0+cu128,<2.9.dev0
#23 4.968       ), but pre-releases weren't enabled (try: `--prerelease=allow`)
#23 4.968 
#23 4.968       hint: `nvidia-cuda-nvrtc-cu12` was found on
#23 4.968       https://download.pytorch.org/whl/cpu, but not at the requested version
#23 4.968       (nvidia-cuda-nvrtc-cu12==12.8.93). A compatible version may be available
#23 4.968       on a subsequent index (e.g., https://pypi.org/simple). By default, uv
#23 4.968       will only consider versions that are published on the first index that
#23 4.968       contains a given package, to avoid dependency confusion attacks. If all
#23 4.968       indexes are equally trusted, use `--index-strategy unsafe-best-match` to
#23 4.968       consider all versions from all indexes, regardless of the order in which
#23 4.968       they were defined.
#23 4.968 
#23 4.968       hint: Wheels are available for `nvidia-cuda-nvrtc-cu12`
#23 4.968       (v12.8.93) on the following platforms: `manylinux_2_17_aarch64`,
#23 4.968       `manylinux2014_aarch64`, `win_amd64`
#23 4.974 make: *** [Makefile:5: install] Error 1
#23 ERROR: process "/bin/sh -c cd /${BACKEND} && PORTABLE_PYTHON=true make" did not complete successfully: exit code: 2
------
 > [builder 17/18] RUN cd /whisperx && PORTABLE_PYTHON=true make:
4.968       will only consider versions that are published on the first index that
4.968       contains a given package, to avoid dependency confusion attacks. If all
4.968       indexes are equally trusted, use `--index-strategy unsafe-best-match` to
4.968       consider all versions from all indexes, regardless of the order in which
4.968       they were defined.
4.968 
4.968       hint: Wheels are available for `nvidia-cuda-nvrtc-cu12`
4.968       (v12.8.93) on the following platforms: `manylinux_2_17_aarch64`,
4.968       `manylinux2014_aarch64`, `win_amd64`
4.974 make: *** [Makefile:5: install] Error 1
------
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
Dockerfile.python:198

Oh okay, give me some time to check again

@eureka928 eureka928 force-pushed the feat/whisperx-backend branch 2 times, most recently from 606ddf0 to df8946a Compare February 2, 2026 08:07
@eureka928
Copy link
Contributor Author

Hi can you re-initiate the CI process to confirm?

Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.

Signed-off-by: eureka928 <meobius123@gmail.com>
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.

Signed-off-by: eureka928 <meobius123@gmail.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
…ments

Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions

Signed-off-by: eureka928 <meobius123@gmail.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.

Signed-off-by: eureka928 <meobius123@gmail.com>
…ion failure

uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.

Signed-off-by: eureka928 <meobius123@gmail.com>
…ilure

PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.

Signed-off-by: eureka928 <meobius123@gmail.com>
@eureka928 eureka928 force-pushed the feat/whisperx-backend branch from 8373bec to 7f5d72e Compare February 2, 2026 10:36
@eureka928
Copy link
Contributor Author

Hi can you re-initiate the CI process to confirm?

I tried on my own repo and it passed all tests

@mudler
Copy link
Owner

mudler commented Feb 2, 2026

hipblas jobs now are failing:

#22 0.888 Extracting cpython-3.10.18+20250818-x86_64-unknown-linux-gnu-install_only.tar.gz -> /whisperx/python
#22 1.744 Python 3.10.18
#22 2.031 Using CPython 3.10.18 interpreter at: python/bin/python3
#22 2.031 Creating virtual environment at: venv
#22 2.033 Activate with: source venv/bin/activate
#22 2.070 starting requirements install for /whisperx/requirements.txt
#22 2.270 Using Python 3.10.18 environment at: venv
#22 2.588 Resolved 4 packages in 317ms
#22 2.596 Downloading setuptools (1.0MiB)
#22 2.610 Downloading grpcio-tools (2.4MiB)
#22 2.610 Downloading grpcio (5.7MiB)
#22 2.731  Downloaded grpcio-tools
#22 2.747  Downloaded grpcio
#22 2.773  Downloaded setuptools
#22 2.774 Prepared 4 packages in 185ms
#22 2.789 Installed 4 packages in 14ms
#22 2.789  + grpcio==1.71.0
#22 2.789  + grpcio-tools==1.71.0
#22 2.789  + protobuf==5.29.5
#22 2.789  + setuptools==80.10.2
#22 2.797 finished requirements install for /whisperx/requirements.txt
#22 2.797 starting requirements install for /whisperx/requirements-hipblas.txt
#22 2.804 Using Python 3.10.18 environment at: venv
#22 3.211    Updating https://github.com/m-bain/whisperX.git (HEAD)
#22 5.642     Updated https://github.com/m-bain/whisperX.git (6ec4a020489d904c4f2cd1ed097674232d2692d4)
#22 6.768   × No solution found when resolving dependencies:
#22 6.768   ╰─▶ Because there is no version of torch==2.8.0+rocm6.4 and you require
#22 6.768       torch==2.8.0+rocm6.4, we can conclude that your requirements are
#22 6.768       unsatisfiable.
#22 6.778 make: *** [Makefile:5: install] Error 1
#22 ERROR: process "/bin/sh -c cd /${BACKEND} && PORTABLE_PYTHON=true make" did not complete successfully: exit code: 2
------
 > [builder 17/18] RUN cd /whisperx && PORTABLE_PYTHON=true make:
2.797 finished requirements install for /whisperx/requirements.txt
2.797 starting requirements install for /whisperx/requirements-hipblas.txt
2.804 Using Python 3.10.18 environment at: venv
3.211    Updating https://github.com/m-bain/whisperX.git (HEAD)
5.642     Updated https://github.com/m-bain/whisperX.git (6ec4a020489d904c4f2cd1ed097674232d2692d4)
6.768   × No solution found when resolving dependencies:
6.768   ╰─▶ Because there is no version of torch==2.8.0+rocm6.4 and you require
6.768       torch==2.8.0+rocm6.4, we can conclude that your requirements are
6.768       unsatisfiable.
6.778 make: *** [Makefile:5: install] Error 1

…ments to fix uv resolution

uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.

Applies the same fix as 7f5d72e (which resolved this for +cpu) across
all 14 hipblas requirements files.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
@@ -1,5 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.3
torch==2.7.1+rocm6.3
Copy link
Owner

@mudler mudler Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (and the same on other backends) is unnecessary as other backends are building fine. Please keep it scoped to whisperX

Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
@eureka928
Copy link
Contributor Author

Can you run the test again? I updated

Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@mudler mudler merged commit 10a1e6c into mudler:master Feb 2, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

integrate whisperX

2 participants