-
Notifications
You must be signed in to change notification settings - Fork 417
Closed
Description
Currently, PyAV exposes subtitle streams as SubtitleSet objects. While this low-level access is very useful, there is no built-in high-level API to automatically decode these subtitles into a human-readable text format (e.g., SRT or ASS). This feature would greatly simplify workflows for users who need to extract and process subtitle text from container files.
Use Case:
- I work with container formats like MKV that include embedded subtitles.
- I need to extract these subtitle streams and convert them into a standard text format (such as SRT) for further processing (e.g., for transcription, translation, or overlaying onto videos).
- Currently, I must manually parse the raw subtitle data from the SubtitleSet objects, which is error-prone and cumbersome.
Rationale:
- Ease of Use: Automating the conversion of raw subtitle data into a readable format would help reduce boilerplate code and simplify many common subtitle processing tasks.
- Wider Adoption: Many users coming from multimedia processing backgrounds expect a higher-level API for subtitle handling, similar to what FFmpeg’s CLI offers.
- Incremental Implementation: Even if full support for all subtitle formats isn’t feasible immediately, a partial implementation that covers the most common text-based formats (like SRT and ASS) would be very beneficial.
Potential Implementation Ideas:
- Introduce a method (e.g., SubtitleSet.decode_text()) that processes the raw subtitle packets and returns the subtitle text.
- Allow the method to either return the text as a string or write it directly to a file.
- Optionally support parameters that let users choose the output format, handling details like timing, formatting, and styling where applicable.
Questions & Discussion:
- Is automatic subtitle decoding considered within the intended scope of PyAV?
- Are there known technical challenges or design philosophies that would advise against adding such a feature?
Metadata
Metadata
Assignees
Labels
No labels