Skip to content

dropbox/dML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dML - Machine Learning Models

Status License

ML and NN training and inference optimized for Apple Silicon via MLX. Currently focused on voice.

All d* projects are entirely AI generated.

Structure

Directory Description Status
model_mlx_migration MLX models for Apple Silicon Usable
voice Streaming voice I/O, 14 languages Preview
metal_mps_parallel Metal/MPS GPU threading Preview

Note: model_mlx_migration will be split into separate TTS, STT, and tooling repos.

Models

Speech-to-Text (STT)

Model Description
ZipFormer Streaming ASR encoder (k2/icefall), 2.85% WER
Whisper large-v3-turbo Non-streaming fallback, 1.8% WER
Silero VAD Voice activity detection

Text-to-Speech (TTS)

Model Description
CosyVoice3 Alibaba TTS with voice cloning, DiT flow-matching
Kokoro Lightweight TTS, 38x realtime

Rich Audio

9 classification heads: emotion (92% acc), paralinguistics (97% acc), language ID (99% acc), pitch, singing detection, punctuation.

License

Apache 2.0

About

Machine learning inference engines optimized for Apple Silicon via MLX

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •