Skip to content

Conversation

@cfsmp3
Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 29, 2025

Summary

  • Fixes OCR only working for the first DVB subtitle stream when multiple streams exist in a program
  • Removes the initialized_ocr flag that incorrectly prevented OCR initialization for subsequent streams
  • Each DVB decoder now gets its own OCR context, matching DVD and VOBSUB decoder behavior

Root Cause

The initialized_ocr flag was stored at the program level (pinfo->initialized_ocr), shared across all DVB subtitle streams within a program. When the first DVB stream was initialized, it set this flag to 1. Subsequent streams saw the flag was set and skipped OCR initialization, leaving them with ocr_ctx = NULL.

Test Plan

Tested with multi-language DVB sample containing Finnish (PID 0xCDF) and Dutch (PID 0xCE0) subtitle streams:

Before fix:

$ ccextractor test.ts -datapid 0xCE0 -ocrlang dut -out=srt
No captions were found in input.

After fix:

$ ccextractor test.ts -datapid 0xCE0 -ocrlang dut -out=srt
# Successfully extracts 5 subtitles

Both streams now extract correctly:

  • First stream (0xCDF): 2 subtitles ✅
  • Second stream (0xCE0): 5 subtitles ✅ (was "No captions found" before)

Fixes #1067

🤖 Generated with Claude Code

cfsmp3 and others added 3 commits December 29, 2025 21:10
This commit fixes two issues:

1. ATSC CC data in private MPEG-2 streams (stream type 0x06) was not
   being processed. The code returned CCX_PRIVATE_MPEG2_CC buffer type
   which was never properly implemented - it just dumped debug output
   and returned placeholder bytes.

   Fix: Treat ATSC CC in private MPEG-2 streams the same as in
   user-private streams (0x80-0x8F) by returning CCX_PES buffer type.
   Both contain the same CC data format and should use the same
   processing path.

2. Several dump() calls were using CCX_DMT_GENERIC_NOTICES which is
   enabled by default, causing binary output to flood the terminal
   when processing certain files.

   Fix: Changed to appropriate debug-only masks (CCX_DMT_VERBOSE,
   CCX_DMT_PARSE) so binary dumps only appear when debug mode is
   explicitly enabled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add build_*/ pattern and linux/build_scan/ to ignore various build
output directories (build_ocr/, build_ocr_asan/, etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, the `initialized_ocr` flag was stored at the program level
and shared across all DVB subtitle streams within a program. This caused
OCR to only initialize for the first DVB stream, leaving subsequent
streams without an OCR context and unable to extract subtitles.

The fix removes the `initialized_ocr` flag entirely. Each DVB subtitle
decoder now gets its own OCR context, matching the behavior of DVD and
VOBSUB decoders which already worked correctly with multiple streams.

Test results with multi-language DVB sample:
- Before: Second stream (0xCE0) → "No captions were found"
- After: Second stream (0xCE0) → 5 subtitles extracted correctly

Fixes CCExtractor#1067

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit fd15528...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

All tests passed completely.

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 26e0f64...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 6/7
DVD 3/3
DVR-MS 2/2
General 25/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 80/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --out=spupng c83f765c66...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@cfsmp3 cfsmp3 merged commit a9413a2 into CCExtractor:master Dec 29, 2025
24 of 25 checks passed
@cfsmp3 cfsmp3 deleted the fix/dvb-ocr-multi-stream-issue-1067 branch December 29, 2025 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] OCR works only for first DVB subtitle stream (OCR context is not shared)

2 participants