Skip to content

A powerful subtitle file converter that ensures proper UTF-8 encoding. Supports multiple subtitle formats with a simple command-line interface and flexible configuration.

Notifications You must be signed in to change notification settings

onyxdevs/subzilla

Repository files navigation

SubZilla 🦎

A powerful subtitle file converter that ensures proper UTF-8 encoding with robust support for Arabic and other languages. SubZilla automatically detects the input file encoding and converts it to UTF-8, making it perfect for fixing subtitle encoding issues. Built with SOLID, YAGNI, KISS, and DRY principles in mind.

Features ✨

  • Automatic encoding detection.
  • Converts subtitle files to UTF-8.
  • Supports multiple subtitle formats (.srt, .sub, .txt).
  • Strong support for Arabic and other non-Latin scripts.
  • Simple command-line interface.
  • Native macOS desktop application with drag-and-drop.
  • Batch processing with glob pattern support.
  • Parallel processing for better performance.
  • Preserves original file formatting.
  • Creates backup of original files.

Installation πŸš€

Prerequisites

  • Node.js (v14 or higher)
  • Yarn package manager

Global Installation

# Install globally using yarn
yarn global add subzilla

# Or using npm
npm install -g subzilla

Local Development Setup

# Clone the repository
git clone https://github.com/onyxdevs/subzilla.git
cd subzilla

# Install dependencies (installs all workspace packages)
yarn install

# Build all packages
yarn build

# Run the CLI
yarn start

# Development mode (watch for changes)
yarn dev

Usage πŸ’»

Basic Usage

# Convert a single subtitle file
subzilla convert path/to/subtitle.srt

# The converted file will be saved as path/to/subtitle.utf8.srt

# Strip HTML formatting
subzilla convert input.srt --strip-html

# Strip color codes
subzilla convert input.srt --strip-colors

# Strip style tags
subzilla convert input.srt --strip-styles

# Replace URLs with [URL]
subzilla convert input.srt --strip-urls

# Strip all formatting
subzilla convert input.srt --strip-all

# Create backup and strip formatting
subzilla convert input.srt -b --strip-all

# Create numbered backups instead of overwriting existing backup
subzilla convert input.srt -b --no-overwrite-backup

# Combine multiple strip options
subzilla convert input.srt --strip-html --strip-colors

Batch Processing

Convert multiple subtitle files at once using glob patterns:

# Convert all .srt files in current directory
subzilla batch "*.srt"

# Convert files recursively in all subdirectories
subzilla batch "**/*.srt" -r

# Convert multiple formats
subzilla batch "**/*.{srt,sub,txt}" -r

# Specify output directory
subzilla batch "**/*.srt" -o converted/

# Process files in parallel for better performance
subzilla batch "**/*.srt" -p

# Skip existing UTF-8 files
subzilla batch "**/*.srt" -s

# Combine basic options for maximum efficiency
subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/

# Advanced Directory Processing

# Limit recursive depth to 2 levels
subzilla batch "**/*.srt" -r -d 2

# Only process files in specific directories
subzilla batch "**/*.srt" -r -i "movies" "series"

# Exclude specific directories
subzilla batch "**/*.srt" -r -x "temp" "backup"

# Preserve directory structure in output
subzilla batch "**/*.srt" -r -o converted/ --preserve-structure

# Complex example combining all features
subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/ \
  -d 3 -i "movies" "series" -x "temp" "backup" --preserve-structure

# Strip formatting in batch mode
subzilla batch "**/*.srt" -r --strip-all

# Strip specific formatting in batch mode
subzilla batch "**/*.srt" -r --strip-html --strip-colors

# Create backups and strip formatting
subzilla batch "**/*.srt" -r -b --strip-all

# Create numbered backups instead of overwriting existing ones
subzilla batch "**/*.srt" -r -b --no-overwrite-backup --strip-all

# Complex example with formatting options
subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/ \
  -d 3 -i "movies" "series" -x "temp" "backup" \
  --preserve-structure --strip-all -b

Options:

  • -o, --output-dir <dir>: Save converted files to specified directory.
  • -r, --recursive: Search for files in subdirectories.
  • -p, --parallel: Process files in parallel (faster for many files).
  • -s, --skip-existing: Skip files that already have a UTF-8 version.
  • -d, --max-depth <depth>: Maximum directory depth for recursive search.
  • -i, --include-dirs <dirs...>: Only process files in these directories.
  • -x, --exclude-dirs <dirs...>: Exclude files in these directories.
  • --preserve-structure: Preserve directory structure in output.
  • -b, --backup: Create backup of original files.
  • --no-overwrite-backup: Create numbered backups instead of overwriting existing backup.
  • --strip-html: Strip HTML tags.
  • --strip-colors: Strip color codes.
  • --strip-styles: Strip style tags.
  • --strip-urls: Replace URLs with [URL].
  • --strip-all: Strip all formatting (equivalent to all strip options).

Features:

  • Progress bar showing conversion status.
  • Per-directory progress tracking.
  • Detailed statistics after completion.
  • Error tracking and reporting.
  • Parallel processing support.
  • Skip existing files option.
  • Time tracking and performance metrics.
  • Directory structure preservation.
  • Directory filtering and depth control.
  • HTML tag stripping.
  • Color code removal.
  • Style tag removal.
  • URL replacement.
  • Whitespace normalization.
  • Original file backup.

Example Output:

πŸ” Found 25 files in 5 directories...

Converting |==========| 100% | 25/25 | Total Progress
Converting |==========| 100% | 8/8   | Processing movies
Converting |==========| 100% | 7/7   | Processing series/season1
Converting |==========| 100% | 5/5   | Processing series/season2
Converting |==========| 100% | 3/3   | Processing series/specials
Converting |==========| 100% | 2/2   | Processing extras

πŸ“Š Batch Processing Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━
Total files processed: 25
Directories processed: 5
βœ… Successfully converted: 23
❌ Failed: 1
⏭️  Skipped: 1
⏱️  Total time: 5.32s
⚑ Average time per file: 0.22s

πŸ“‚ Directory Statistics:
━━━━━━━━━━━━━━━━━━━━
movies:
  Total: 8
  βœ… Success: 8
  ❌ Failed: 0
  ⏭️  Skipped: 0

series/season1:
  Total: 7
  βœ… Success: 6
  ❌ Failed: 1
  ⏭️  Skipped: 0

series/season2:
  Total: 5
  βœ… Success: 5
  ❌ Failed: 0
  ⏭️  Skipped: 0

series/specials:
  Total: 3
  βœ… Success: 2
  ❌ Failed: 0
  ⏭️  Skipped: 1

extras:
  Total: 2
  βœ… Success: 2
  ❌ Failed: 0
  ⏭️  Skipped: 0

❌ Errors:
━━━━━━━━━
series/season1/broken.srt: Failed to detect encoding

Backup Management

SubZilla provides flexible backup options to protect your original files:

# Basic backup creation
subzilla convert input.srt -b

# By default, subsequent runs overwrite the existing backup
# First run: creates input.srt.bak
# Second run: overwrites input.srt.bak (clean, no accumulation)

# Create numbered backups instead (legacy behavior)
subzilla convert input.srt -b --no-overwrite-backup
# First run: creates input.srt.bak
# Second run: creates input.srt.bak.1
# Third run: creates input.srt.bak.2

# Configure backup behavior in config file
# .subzillarc:
# output:
#   createBackup: true
#   overwriteBackup: false  # Creates numbered backups

Backup Behavior Summary:

  • overwriteBackup: true (default): Clean backup management - always overwrites existing backup
  • overwriteBackup: false: Legacy behavior - creates numbered backups (.bak.1, .bak.2, etc.)
  • CLI override: Use --no-overwrite-backup to temporarily disable backup overwriting

Advanced Options

# Specify output file (single file conversion)
subzilla convert input.srt -o output.srt

# Get help
subzilla --help

# Get version
subzilla --version

# Get help for specific command
subzilla convert --help
subzilla batch --help

Mac App πŸ–₯️

SubZilla includes a native macOS desktop application built with Electron, featuring a drag-and-drop interface for easy subtitle conversion.

Running the Mac App

Development Mode:

# Build all packages first
yarn build

# Run the Mac app in development mode
yarn workspace @subzilla/mac dev

Building for Distribution:

# Build the Mac app (creates DMG and ZIP)
yarn workspace @subzilla/mac build

# Output files are in packages/mac/dist-electron/
# - Subzilla-<version>-arm64.dmg (Apple Silicon)
# - Subzilla-<version>-arm64-mac.zip (Portable)

Features

  • Drag and Drop: Simply drag subtitle files onto the app window
  • File Selection Dialog: Click to browse and select files
  • Preferences Window: Configure conversion settings
  • Auto-Updates: Automatic updates via GitHub releases
  • Native macOS Integration: Menu bar, dock icon, and system notifications

App Structure

The Mac app is located in packages/mac/ and includes:

  • src/main/ - Electron main process (window management, IPC handlers)
  • src/preload/ - Secure context bridge
  • src/renderer/ - User interface (HTML/CSS/JS)

Configuration πŸ”§

SubZilla supports flexible configuration through YAML files and environment variables. All settings are optional with sensible defaults.

Configuration Files

SubZilla looks for configuration files in the following order:

  1. Path specified via --config option
  2. .subzillarc in the current directory
  3. .subzilla.yml or .subzilla.yaml
  4. subzilla.config.yml or subzilla.config.yaml

Example Configurations

Several example configurations are provided in the examples/config directory:

  1. Full Configuration (.subzillarc):

    input:
        encoding: auto # auto, utf8, utf16le, utf16be, ascii, windows1256
        format: auto # auto, srt, sub, ass, ssa, txt
    
    output:
        directory: ./converted # Output directory path
        createBackup: true # Create backup of original files
        overwriteBackup: true # Overwrite existing backup files (default: true)
        format: srt # Output format
        encoding: utf8 # Always UTF-8
        bom: false # Add BOM to output files
        lineEndings: lf # lf, crlf, or auto
    
    # ... and more settings
  2. Minimal Configuration (minimal.subzillarc):

    input:
        encoding: auto
        format: auto
    
    output:
        directory: ./converted
        createBackup: true
        overwriteBackup: true # Overwrite existing backup files
        format: srt
    
    strip:
        html: true
        colors: true
        styles: true
    
    batch:
        recursive: true
        parallel: true
        skipExisting: true
        preserveStructure: true # Maintain directory structure
        chunkSize: 5
  3. Performance-Optimized (performance.subzillarc):

    output:
        createBackup: false # Skip backups
        overwriteBackup: true # When backups are created, overwrite existing ones
        overwriteInput: true # Overwrite input files
        overwriteExisting: true # Don't check existing files
    
    batch:
        parallel: true
        preserveStructure: false # Flat output structure
        chunkSize: 20 # Larger chunks
        retryCount: 0 # No retries
        failFast: true # Stop on first error
  4. Arabic-Optimized (arabic.subzillarc):

    input:
        encoding: windows1256 # Common Arabic encoding
    
    output:
        bom: true # Add BOM for compatibility
        lineEndings: crlf # Windows line endings
    
    batch:
        includeDirectories:
            - arabic
            - Ω…Ψ³Ω„Ψ³Ω„Ψ§Ψͺ
            - أفلام

Environment Variables

You can also configure SubZilla using environment variables. Copy .env.example to .env and modify as needed:

# Input Settings
SUBZILLA_INPUT_ENCODING=utf8
SUBZILLA_INPUT_FORMAT=srt
SUBZILLA_INPUT_DEFAULT_LANGUAGE=ar

# Output Settings
SUBZILLA_OUTPUT_DIRECTORY=./output
SUBZILLA_OUTPUT_CREATE_BACKUP=true

# Complex settings use JSON
SUBZILLA_STRIP='{"html":true,"colors":true,"styles":true}'
SUBZILLA_BATCH_INCLUDE_DIRECTORIES='["movies","series"]'

Configuration Priority

Settings are merged in the following order (later ones override earlier ones):

  1. Default values.
  2. Configuration file.
  3. Environment variables.
  4. Command-line arguments.

Available Options

Input Options

  • encoding: Input file encoding (auto, utf8, utf16le, utf16be, ascii, windows1256).
  • format: Input format (auto, srt, sub, ass, ssa, txt).

Output Options

  • directory: Output directory path.
  • createBackup: Create backup of original files.
  • overwriteBackup: Overwrite existing backup files (default: true).
  • format: Output format.
  • encoding: Output encoding (always utf8).
  • bom: Add BOM to output files.
  • lineEndings: Line ending style (lf, crlf, auto).
  • overwriteInput: Overwrite input files.
  • overwriteExisting: Overwrite existing files.

Strip Options

  • html: Remove HTML tags.
  • colors: Remove color codes.
  • styles: Remove style tags.
  • urls: Replace URLs with [URL].
  • timestamps: Replace timestamps with [TIMESTAMP].
  • numbers: Replace numbers with #.
  • punctuation: Remove punctuation.
  • emojis: Replace emojis with [EMOJI].
  • brackets: Remove brackets.

Batch Options

  • recursive: Process subdirectories.
  • parallel: Process files in parallel.
  • skipExisting: Skip existing UTF-8 files.
  • maxDepth: Maximum directory depth.
  • includeDirectories: Only process these directories.
  • excludeDirectories: Skip these directories.
  • preserveStructure: Maintain directory structure.
  • chunkSize: Files per batch.
  • retryCount: Number of retry attempts.
  • retryDelay: Delay between retries (ms).
  • failFast: Stop on first error.

Architecture πŸ—οΈ

SubZilla follows a modular monorepo architecture with clear separation of concerns:

Package Dependencies

@subzilla/cli              @subzilla/mac
    β”œβ”€β”€ @subzilla/core         β”œβ”€β”€ @subzilla/core
    β”‚   └── @subzilla/types    β”‚   └── @subzilla/types
    └── @subzilla/types        └── @subzilla/types
  • @subzilla/types: Foundation package with no dependencies
  • @subzilla/core: Depends on types, provides core functionality
  • @subzilla/cli: Depends on both core and types, provides command-line interface
  • @subzilla/mac: Depends on both core and types, provides macOS desktop application

Key Design Principles

  • SOLID Principles: Single responsibility, open/closed, Liskov substitution, interface segregation, dependency inversion
  • YAGNI: You Aren't Gonna Need It - avoid over-engineering
  • KISS: Keep It Simple, Stupid - prioritize simplicity and clarity
  • DRY: Don't Repeat Yourself - shared code in appropriate packages

TypeScript Project References

The monorepo uses TypeScript project references for:

  • Faster incremental builds
  • Better IDE support
  • Proper dependency tracking
  • Type-safe cross-package imports

Development πŸ› οΈ

Project Structure

SubZilla is organized as a Yarn Workspaces monorepo with four main packages:

subzilla/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ cli/              # @subzilla/cli - Command-line interface
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ commands/ # CLI command implementations
β”‚   β”‚   β”‚   β”œβ”€β”€ constants/# Shared CLI options
β”‚   β”‚   β”‚   β”œβ”€β”€ registry/ # Command registration system
β”‚   β”‚   β”‚   β”œβ”€β”€ utils/    # CLI utilities
β”‚   β”‚   β”‚   └── main.ts   # CLI entry point
β”‚   β”‚   └── package.json
β”‚   β”œβ”€β”€ core/             # @subzilla/core - Core processing logic
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ utils/    # Output strategies
β”‚   β”‚   β”‚   β”œβ”€β”€ *.ts      # Core services and processors
β”‚   β”‚   β”‚   └── index.ts  # Package exports
β”‚   β”‚   └── package.json
β”‚   β”œβ”€β”€ mac/              # @subzilla/mac - macOS desktop application
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ main/     # Electron main process
β”‚   β”‚   β”‚   β”œβ”€β”€ preload/  # Context bridge
β”‚   β”‚   β”‚   └── renderer/ # UI (HTML/CSS/JS)
β”‚   β”‚   └── package.json
β”‚   └── types/            # @subzilla/types - TypeScript definitions
β”‚       β”œβ”€β”€ src/
β”‚       β”‚   β”œβ”€β”€ cli/      # CLI-related types
β”‚       β”‚   β”œβ”€β”€ core/     # Core functionality types
β”‚       β”‚   β”œβ”€β”€ index.ts  # Main exports
β”‚       β”‚   └── validation.ts # Zod schemas
β”‚       └── package.json
β”œβ”€β”€ examples/             # Configuration examples
β”œβ”€β”€ package.json          # Workspace root configuration
└── tsconfig.json         # TypeScript project references

Package Documentation

Each package has comprehensive documentation:

Testing πŸ§ͺ

SubZilla includes a comprehensive Jest testing framework with 83 passing tests across all packages:

# Run all tests
yarn test

# Test specific package
yarn workspace @subzilla/core test
yarn workspace @subzilla/cli test
yarn workspace @subzilla/types test

Test Coverage:

  • @subzilla/types (13 tests): Zod schema validation, configuration validation
  • @subzilla/core (57 tests): Encoding detection/conversion, formatting stripping, end-to-end processing
  • @subzilla/cli (13 tests): Command registration, CLI parsing, error handling

Key Features:

  • Multi-project Jest setup with TypeScript support
  • Real file system testing with temporary directories
  • CLI integration tests using execSync
  • Proper TypeScript mocking with generic type annotations
  • Arabic text encoding tests for Windows-1256 support
  • CI/CD integration with GitHub Actions

Available Scripts

Workspace-level scripts:

  • yarn build: Build all packages in dependency order
  • yarn start: Run the SubZilla CLI
  • yarn dev: Development mode with watch for all packages
  • yarn test: Run tests across all packages
  • yarn type-check: TypeScript type checking for all packages
  • yarn lint: Run linter across all packages
  • yarn lint:fix: Fix linting issues across all packages
  • yarn format: Format code using Prettier across all packages
  • yarn format:check: Check code formatting across all packages
  • yarn clean: Clean all build artifacts

Package-specific scripts:

# Build specific package
yarn workspace @subzilla/core build

# Run CLI directly
yarn workspace @subzilla/cli start

# Develop specific package
yarn workspace @subzilla/types dev

Monorepo Benefits

The workspace structure provides several advantages:

  • Shared Dependencies: Common dependencies are hoisted to the root, reducing duplication
  • Type Safety: Cross-package imports are fully type-checked at compile time
  • Atomic Changes: Related changes across packages can be made in a single commit
  • Consistent Tooling: Shared linting, formatting, and build configurations
  • Simplified Development: Single yarn install and yarn build for the entire project

Contributing

  1. Fork the repository

  2. Clone your fork and install dependencies

    git clone https://github.com/your-username/subzilla.git
    cd subzilla
    yarn install
  3. Create your feature branch

    git checkout -b feature/amazing-feature
  4. Make your changes

    • Follow the existing code style and patterns
    • Add tests for new functionality
    • Update documentation as needed
    • Ensure all packages build successfully: yarn build
  5. Test your changes

    yarn build
    yarn test
    yarn lint
    yarn type-check
  6. Commit your changes

    git commit -m 'Add some amazing feature'
  7. Push to your branch

    git push origin feature/amazing-feature
  8. Open a Pull Request

Development Workflow

# Start development mode (watches all packages)
yarn dev

# Build specific package
yarn workspace @subzilla/core build

# Test specific package
yarn workspace @subzilla/cli test

# Run CLI during development
yarn start --help

# Clean and rebuild everything
yarn clean
yarn build

License πŸ“

This project is licensed under the ISC License - see the LICENSE file for details.

Support πŸ’ͺ

If you encounter any issues or have questions, please:

  1. Check the issues page
  2. Create a new issue if your problem isn't already listed
  3. Provide as much detail as possible, including:
    • SubZilla version
    • Node.js version
    • Operating system
    • Sample subtitle file (if possible)

Acknowledgments πŸ™

  • Thanks to all contributors.
  • Inspired by the need for better subtitle encoding support.
  • Built with TypeScript and Node.js.

Further Enhancements πŸš€

Planned improvements and feature additions:

  1. Enhanced Format Support

    • Add support for .ass and .ssa subtitle formats
    • Handle multiple subtitle files in batch
    • Support subtitle format conversion (SRT ↔ ASS ↔ SSA)
    • Add WebVTT format support
    • Support subtitle timing synchronization
  2. User Interface & Experience

    • Interactive CLI mode with comprehensive commands
    • Progress bars for batch operations
    • Create a web interface for browser-based conversion
    • Build a native macOS app using Electron
    • Add drag-and-drop GUI interface
    • Implement real-time encoding preview
  3. Performance & Reliability

    • Parallel processing for batch operations
    • Configurable chunk size for parallel processing
    • Retry mechanism for failed conversions
    • Batch processing progress tracking and statistics
    • Memory usage optimization for large files
    • Streaming processing for very large subtitle files
    • Performance benchmarking and profiling tools
    • Caching mechanism for repeated operations
  4. Advanced Features

    • Comprehensive subtitle validation with Zod schemas
    • Extensive formatting stripping (HTML, colors, styles, emojis)
    • Subtitle timing adjustment and synchronization
    • Subtitle merging and splitting
    • Character encoding preview and detection confidence
    • JSON/CSV export for batch processing results
    • AI-powered subtitle translation integration
    • Subtitle quality analysis and scoring
  5. Developer Experience & Infrastructure

    • Comprehensive test suite (83 tests across all packages)
    • TypeScript monorepo with project references
    • Detailed API documentation for all packages
    • Configuration examples and templates
    • GitHub Actions CI/CD workflow
    • Automated release management
    • Performance regression testing
    • Docker containerization
    • Plugin system for custom processors
    • Webhook integration for automated workflows
  6. Integration & Ecosystem

    • VS Code extension for subtitle editing
    • API server mode for remote processing
    • Integration with popular media players
    • Cloud storage integration (S3, Google Drive, Dropbox)
    • Batch processing via file watching
    • Integration with subtitle databases (OpenSubtitles, etc.)

Want to contribute to these enhancements? Check our Contributing section!

About

A powerful subtitle file converter that ensures proper UTF-8 encoding. Supports multiple subtitle formats with a simple command-line interface and flexible configuration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •