Skip to content

Conversation

@cfsmp3
Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 23, 2025

Summary

This PR implements multi-page teletext extraction as requested in issue #665. Users can now extract multiple teletext pages simultaneously, with each page output to a separate file.

New Features

  • Multiple --tpage arguments: Specify multiple teletext pages using repeated --tpage flags
    ccextractor input.mpg --tpage 397 --tpage 398 -o output.srt
  • Separate output files per page: Each page is extracted to its own file with a _pNNN suffix
    • output_p397.srt - Page 397 subtitles
    • output_p398.srt - Page 398 subtitles
  • Backward compatibility: Single-page extraction (one --tpage argument) works exactly as before, without any suffix
  • --tpages-all support: Auto-detect and extract all available teletext pages (up to 8)

Implementation Details

  • Added user_pages vector to teletext config to store multiple requested pages
  • Created per-page output file management in encoder_ctx with on-demand file creation
  • Each page maintains its own SRT counter for correct subtitle numbering
  • Fixed BCD to decimal page number conversion in telxcc.c for correct file naming
  • Maximum of 8 simultaneous page extractions (configurable via MAX_TLT_PAGES_EXTRACT)

Files Changed

File Changes
src/rust/src/args.rs Changed --tpage to accept multiple values via clap::ArgAction::Append
src/rust/src/parser.rs Handle Vec<u16> for multiple pages, populate user_pages
src/rust/lib_ccxr/src/teletext.rs Added user_pages field to TeletextConfig
src/rust/src/common.rs Added user_pages to FFI teletext config struct
src/lib_ccx/teletext.h Added user_pages array and count to C teletext config
src/lib_ccx/ccx_encoders_common.h Added teletext output arrays and helper function declarations
src/lib_ccx/ccx_encoders_common.c Implemented get_teletext_output(), get_teletext_srt_counter(), dinit_teletext_outputs()
src/lib_ccx/ccx_encoders_srt.c Route teletext subtitles to per-page output files
src/lib_ccx/telxcc.c Store decimal page numbers in subtitle metadata
src/lib_ccx/ccx_common_structs.h Added teletext_page field to cc_subtitle struct
src/lib_ccx/lib_ccx.h Added MAX_TLT_PAGES_EXTRACT constant

Testing

Test Samples Used

All samples are from the CCExtractor Sample Platform teletext section (view samples).

Multi-Page Samples (Danish TV - DR1)

Sample Teletext Pages Available
5d5838bde97e2a8706890f19a1722fae97610c754518ac509acb7ff3776a29aa.mpg 365, 369, 397, 398, 437, 565, 765, 965+
3b276ad8bf85741a65d8a36add8fbe990f8d11bfb2a908f2093174edced9baa0.mpg 365, 369, 397, 398, 437, 465, 565, 665+
b236a0590b02f8acffa00f18f3e5d62e68e0f5f461dc5e5e428cca3ebc0be7c5.mpg 397, 398

Single-Page Samples

Sample Test Page
44c45593fb32e475fe5295046d97796b32ac00289ec4369bb216d1c705804f1a.mpg 299
b8c55aa2e9d6882b3d5f1fa57f3bb63fc8bf39a45bad42c0d5e5de1f2fbdf2e7.mpg 299
e639e5455049e9b94c89d7f2917d91c349b9eb8e4ced3ea5f0d5efa7fb56426e.ts Auto-detect with --datapid 2310

How to Test

Test 1: Single Page Extraction (Backward Compatibility)

# Should create output.srt (no suffix) - existing behavior preserved
ccextractor 5d5838bde97e...mpg --autoprogram --tpage 398 -o output.srt

# Verify: Single file created
ls output.srt

Test 2: Multiple Explicit Pages

# Should create output_p397.srt and output_p398.srt
ccextractor 5d5838bde97e...mpg --autoprogram --tpage 397 --tpage 398 -o output.srt

# Verify: Two separate files with correct content
ls output_p*.srt
head -20 output_p397.srt
head -20 output_p398.srt

Test 3: Auto-Detect All Pages

# Should create separate files for each detected page (up to 8)
ccextractor 5d5838bde97e...mpg --autoprogram --tpages-all -o output.srt

# Verify: Multiple files created
ls output_p*.srt
# Expected: output_p365.srt, output_p369.srt, output_p397.srt, output_p398.srt, etc.

Test 4: Specific Page with --tpage (regression test)

# Test with sample that has page 299
ccextractor 44c45593fb32...mpg --autoprogram --tpage 299 -o output.srt

# Verify: Single file, correct subtitles
ls output.srt

Test Results

All tests passed successfully:

Test Status Notes
Single page extraction ✅ PASS No suffix added, backward compatible
Multiple explicit pages ✅ PASS Correct _pNNN suffixes, separate content
Auto-detect all pages ✅ PASS Up to 8 pages extracted, overflow handled gracefully
Per-page SRT counters ✅ PASS Each file has correct sequential numbering starting at 1
Page number display ✅ PASS Decimal page numbers (e.g., 398) not BCD
Existing teletext tests ✅ PASS All 21 sample platform teletext samples work

Sample Output

Multiple pages extracted from 5d5838bde97e...mpg:

$ ccextractor sample.mpg --autoprogram --tpage 397 --tpage 398 -o /tmp/test.srt
$ ls /tmp/test_p*.srt
/tmp/test_p397.srt
/tmp/test_p398.srt

$ head -15 /tmp/test_p397.srt
1
02:19:03,085 --> 02:19:07,344
Vi kan sætte tre byggerier i gang.
Det betyder masser af arbejdspladser.

2
02:19:13,785 --> 02:19:18,644
- der skal øge dansk produktivitet.
Men det koster mindre handlende livet

$ head -15 /tmp/test_p398.srt
1
02:19:25,385 --> 02:19:29,424
- der skal øge dansk produktivitet.
Men det koster mindre handlende livet.

2
02:19:35,665 --> 02:19:40,444
I Kolding er der store planer
om et udvalgsvareudvalg -

Checklist

  • Code compiles without warnings
  • Backward compatibility maintained (single --tpage works as before)
  • All existing teletext regression tests pass
  • New feature tested with multiple samples
  • Memory properly freed in dinit_teletext_outputs()
  • Graceful handling when >8 pages detected (warning + fallback to default output)

Closes #665

🤖 Generated with Claude Code

cfsmp3 and others added 4 commits December 23, 2025 14:28
…665)

Implement support for extracting multiple teletext pages simultaneously,
with each page output to a separate file.

Changes:
- Support multiple --tpage arguments (e.g., --tpage 397 --tpage 398)
- Create separate output files per page with _pNNN suffix
  (e.g., output_p397.srt, output_p398.srt)
- Maintain backward compatibility for single-page extraction (no suffix)
- Add per-page SRT counters for correct subtitle numbering
- Fix BCD to decimal page number conversion in telxcc.c
- Add --tpages-all mode support for auto-detecting all pages

Tested with 21 teletext samples from the sample platform, all passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…heck

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The page update logic at line 1029-1035 was incorrectly updating
tlt_config.page for all accepted pages, even in single-page auto-detect
mode. This caused the auto-detect logic at line 979 to be bypassed
because the first packet (even with an invalid page number like 0xFF)
would set tlt_config.page, preventing proper auto-detection.

The fix restricts the page update to multi-page mode only. In single-page
mode, tlt_config.page is set exclusively by:
1. User specification (--tpage option)
2. Auto-detect logic (first valid subtitle page found)

This fixes regression in SP Test 76 which uses sample
8c1615c1a84d4b9b34134bde8085214bb93305407e935edcdfd4c2fc522c215f.mpg
with --autoprogram --out=ttxt --latin1.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit d573548...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 1974a299f0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --xdsdebug --out=srt c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

All tests passing on the master branch were passed completely.

Check the result page for more info.

@cfsmp3 cfsmp3 merged commit fc230fc into master Dec 23, 2025
41 of 43 checks passed
@cfsmp3 cfsmp3 deleted the feat/issue-665-teletext-multi-page branch December 23, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request: Allow to extract several teletext pages in one pass

3 participants