Skip to content

Conversation

@cfsmp3
Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 28, 2025

Summary

  • Adds full VOBSUB (S_VOBSUB) subtitle extraction support for Matroska files
  • Previously CCExtractor would just print "Error: VOBSUB not supported" and produce empty files
  • Now generates proper .idx and .sub file pairs that work with VLC, FFmpeg, and other players
  • Includes documentation and Docker tooling for OCR conversion to SRT

Changes

Core VOBSUB extraction (src/lib_ccx/matroska.c):

  • Added PS Pack header generation with SCR derived from timestamps
  • Added PES header generation with PTS for subtitle timing
  • Added save_vobsub_track() function to write both .idx and .sub files
  • Added hex format specifier (LLX_M) for cross-platform file position formatting
  • Standard 2048-byte block alignment for proper VOBSUB format

Documentation and tooling:

  • Added docs/VOBSUB.md - comprehensive guide for VOBSUB extraction and OCR conversion
  • Added tools/vobsubocr/Dockerfile - Docker image for subtile-ocr (VOBSUB to SRT conversion)

Usage

# Extract VOBSUB from MKV
ccextractor movie.mkv
# Output: movie_eng.idx + movie_eng.sub

# Convert to SRT (using Docker)
docker build -t subtile-ocr tools/vobsubocr/
docker run --rm -v $(pwd):/data subtile-ocr -l eng -o /data/movie.srt /data/movie_eng.idx

Test plan

  • Tested with sample CEA-708 support files #178 from issue (bugs.mkv with VOBSUB tracks)
  • Output validates with FFprobe as dvd_subtitle codec
  • Timestamps match reference extraction (mkvextract)
  • File positions match reference extraction
  • SPU data is identical to reference extraction
  • OCR with subtile-ocr produces identical SRT to mkvextract reference

Fixes #1371

🤖 Generated with Claude Code

cfsmp3 and others added 3 commits December 28, 2025 10:02
Previously, CCExtractor would only print "Error: VOBSUB not supported"
when encountering VOBSUB (S_VOBSUB) subtitle tracks in Matroska files.
This left users without any usable output.

This commit adds full VOBSUB extraction support:
- Generate proper .idx index files with timestamps and file positions
- Generate proper .sub files with PS-wrapped SPU data
- Correct PS Pack header with SCR derived from timestamps
- Correct PES header with PTS for each subtitle
- 2048-byte block alignment (standard VOBSUB format)

The output is compatible with VLC, FFmpeg, and other players that
support VobSub subtitle format.

Tested with sample from issue #1371 - output validates correctly
with FFprobe and produces identical subtitle data to mkvextract.

Fixes #1371

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add docs/VOBSUB.md explaining the VOBSUB extraction workflow
- Add tools/vobsubocr/Dockerfile for building subtile-ocr OCR tool
- Document how to convert VOBSUB (.idx/.sub) to SRT using OCR

The Dockerfile uses subtile-ocr (https://github.com/gwen-lg/subtile-ocr),
an actively maintained fork of vobsubocr with better accuracy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MSVC doesn't support variable-length arrays (VLAs). The const int
declaration wasn't being treated as a compile-time constant,
causing Windows build failure with errors C2057, C2466, C2133.

Changed to #define which is a true compile-time constant.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 5beb438...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 1974a299f0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

All tests passed completely.

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit ec30a79...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 6/7
DVD 3/3
DVR-MS 2/2
General 25/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 80/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --out=spupng c83f765c66...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@cfsmp3 cfsmp3 merged commit 173db88 into master Dec 28, 2025
41 of 42 checks passed
@cfsmp3 cfsmp3 deleted the fix/issue-1371-mkv-vobsub-support branch December 28, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Odd output when inputting an MKV file

3 participants