feat(teletext): Add multi-page extraction with separate output files (#665) #1886

cfsmp3 · 2025-12-23T13:36:39Z

Summary

This PR implements multi-page teletext extraction as requested in issue #665. Users can now extract multiple teletext pages simultaneously, with each page output to a separate file.

New Features

Multiple --tpage arguments: Specify multiple teletext pages using repeated --tpage flags
```
ccextractor input.mpg --tpage 397 --tpage 398 -o output.srt
```
Separate output files per page: Each page is extracted to its own file with a _pNNN suffix
- output_p397.srt - Page 397 subtitles
- output_p398.srt - Page 398 subtitles
Backward compatibility: Single-page extraction (one --tpage argument) works exactly as before, without any suffix
--tpages-all support: Auto-detect and extract all available teletext pages (up to 8)

Implementation Details

Added user_pages vector to teletext config to store multiple requested pages
Created per-page output file management in encoder_ctx with on-demand file creation
Each page maintains its own SRT counter for correct subtitle numbering
Fixed BCD to decimal page number conversion in telxcc.c for correct file naming
Maximum of 8 simultaneous page extractions (configurable via MAX_TLT_PAGES_EXTRACT)

Files Changed

File	Changes
`src/rust/src/args.rs`	Changed `--tpage` to accept multiple values via `clap::ArgAction::Append`
`src/rust/src/parser.rs`	Handle `Vec<u16>` for multiple pages, populate `user_pages`
`src/rust/lib_ccxr/src/teletext.rs`	Added `user_pages` field to `TeletextConfig`
`src/rust/src/common.rs`	Added `user_pages` to FFI teletext config struct
`src/lib_ccx/teletext.h`	Added `user_pages` array and count to C teletext config
`src/lib_ccx/ccx_encoders_common.h`	Added teletext output arrays and helper function declarations
`src/lib_ccx/ccx_encoders_common.c`	Implemented `get_teletext_output()`, `get_teletext_srt_counter()`, `dinit_teletext_outputs()`
`src/lib_ccx/ccx_encoders_srt.c`	Route teletext subtitles to per-page output files
`src/lib_ccx/telxcc.c`	Store decimal page numbers in subtitle metadata
`src/lib_ccx/ccx_common_structs.h`	Added `teletext_page` field to `cc_subtitle` struct
`src/lib_ccx/lib_ccx.h`	Added `MAX_TLT_PAGES_EXTRACT` constant

Testing

Test Samples Used

All samples are from the CCExtractor Sample Platform teletext section (view samples).

Multi-Page Samples (Danish TV - DR1)

Sample	Teletext Pages Available
`5d5838bde97e2a8706890f19a1722fae97610c754518ac509acb7ff3776a29aa.mpg`	365, 369, 397, 398, 437, 565, 765, 965+
`3b276ad8bf85741a65d8a36add8fbe990f8d11bfb2a908f2093174edced9baa0.mpg`	365, 369, 397, 398, 437, 465, 565, 665+
`b236a0590b02f8acffa00f18f3e5d62e68e0f5f461dc5e5e428cca3ebc0be7c5.mpg`	397, 398

Single-Page Samples

Sample	Test Page
`44c45593fb32e475fe5295046d97796b32ac00289ec4369bb216d1c705804f1a.mpg`	299
`b8c55aa2e9d6882b3d5f1fa57f3bb63fc8bf39a45bad42c0d5e5de1f2fbdf2e7.mpg`	299
`e639e5455049e9b94c89d7f2917d91c349b9eb8e4ced3ea5f0d5efa7fb56426e.ts`	Auto-detect with `--datapid 2310`

How to Test

Test 1: Single Page Extraction (Backward Compatibility)

# Should create output.srt (no suffix) - existing behavior preserved
ccextractor 5d5838bde97e...mpg --autoprogram --tpage 398 -o output.srt

# Verify: Single file created
ls output.srt

Test 2: Multiple Explicit Pages

# Should create output_p397.srt and output_p398.srt
ccextractor 5d5838bde97e...mpg --autoprogram --tpage 397 --tpage 398 -o output.srt

# Verify: Two separate files with correct content
ls output_p*.srt
head -20 output_p397.srt
head -20 output_p398.srt

Test 3: Auto-Detect All Pages

# Should create separate files for each detected page (up to 8)
ccextractor 5d5838bde97e...mpg --autoprogram --tpages-all -o output.srt

# Verify: Multiple files created
ls output_p*.srt
# Expected: output_p365.srt, output_p369.srt, output_p397.srt, output_p398.srt, etc.

Test 4: Specific Page with --tpage (regression test)

# Test with sample that has page 299
ccextractor 44c45593fb32...mpg --autoprogram --tpage 299 -o output.srt

# Verify: Single file, correct subtitles
ls output.srt

Test Results

All tests passed successfully:

Test	Status	Notes
Single page extraction	✅ PASS	No suffix added, backward compatible
Multiple explicit pages	✅ PASS	Correct `_pNNN` suffixes, separate content
Auto-detect all pages	✅ PASS	Up to 8 pages extracted, overflow handled gracefully
Per-page SRT counters	✅ PASS	Each file has correct sequential numbering starting at 1
Page number display	✅ PASS	Decimal page numbers (e.g., 398) not BCD
Existing teletext tests	✅ PASS	All 21 sample platform teletext samples work

Sample Output

Multiple pages extracted from 5d5838bde97e...mpg:

$ ccextractor sample.mpg --autoprogram --tpage 397 --tpage 398 -o /tmp/test.srt
$ ls /tmp/test_p*.srt
/tmp/test_p397.srt
/tmp/test_p398.srt

$ head -15 /tmp/test_p397.srt
1
02:19:03,085 --> 02:19:07,344
Vi kan sætte tre byggerier i gang.
Det betyder masser af arbejdspladser.

2
02:19:13,785 --> 02:19:18,644
- der skal øge dansk produktivitet.
Men det koster mindre handlende livet

$ head -15 /tmp/test_p398.srt
1
02:19:25,385 --> 02:19:29,424
- der skal øge dansk produktivitet.
Men det koster mindre handlende livet.

2
02:19:35,665 --> 02:19:40,444
I Kolding er der store planer
om et udvalgsvareudvalg -

Checklist

Code compiles without warnings
Backward compatibility maintained (single --tpage works as before)
All existing teletext regression tests pass
New feature tested with multiple samples
Memory properly freed in dinit_teletext_outputs()
Graceful handling when >8 pages detected (warning + fallback to default output)

Closes #665

🤖 Generated with Claude Code

…665) Implement support for extracting multiple teletext pages simultaneously, with each page output to a separate file. Changes: - Support multiple --tpage arguments (e.g., --tpage 397 --tpage 398) - Create separate output files per page with _pNNN suffix (e.g., output_p397.srt, output_p398.srt) - Maintain backward compatibility for single-page extraction (no suffix) - Add per-page SRT counters for correct subtitle numbering - Fix BCD to decimal page number conversion in telxcc.c - Add --tpages-all mode support for auto-detecting all pages Tested with 21 teletext samples from the sample platform, all passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…heck 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The page update logic at line 1029-1035 was incorrectly updating tlt_config.page for all accepted pages, even in single-page auto-detect mode. This caused the auto-detect logic at line 979 to be bypassed because the first packet (even with an invalid page number like 0xFF) would set tlt_config.page, preventing proper auto-detection. The fix restricts the page update to multi-page mode only. In single-page mode, tlt_config.page is set exclusively by: 1. User specification (--tpage option) 2. Auto-detect logic (first valid subtitle page found) This fixes regression in SP Test 76 which uses sample 8c1615c1a84d4b9b34134bde8085214bb93305407e935edcdfd4c2fc522c215f.mpg with --autoprogram --out=ttxt --latin1. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ccextractor-bot · 2025-12-23T15:48:50Z

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit d573548...:

Report Name	Tests Passed
Broken	13/13
CEA-708	14/14
DVB	7/7
DVD	3/3
DVR-MS	2/2
General	27/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	86/86
Teletext	21/21
WTV	13/13
XDS	34/34

Congratulations: Merging this PR would fix the following tests:

ccextractor --autoprogram --out=ttxt --latin1 1974a299f0..., Last passed: Never
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
ccextractor --xdsdebug --out=srt c83f765c66..., Last passed: Never
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

All tests passing on the master branch were passed completely.

Check the result page for more info.

cfsmp3 and others added 4 commits December 23, 2025 14:28

fix(clippy): Use RangeInclusive::contains() instead of manual range c…

cbb5f0b

…heck 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

docs: Add doxygen comments to should_accept_page function

1d9f322

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cfsmp3 mentioned this pull request Dec 23, 2025

docs: Add Upcoming section to changelog #1887

Merged

cfsmp3 merged commit fc230fc into master Dec 23, 2025
41 of 43 checks passed

cfsmp3 deleted the feat/issue-665-teletext-multi-page branch December 23, 2025 18:37

cfsmp3 mentioned this pull request Dec 24, 2025

[BUG] Differences between GitHub comment and actual results CCExtractor/sample-platform#535

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(teletext): Add multi-page extraction with separate output files (#665) #1886

feat(teletext): Add multi-page extraction with separate output files (#665) #1886

Uh oh!

cfsmp3 commented Dec 23, 2025

Uh oh!

ccextractor-bot commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(teletext): Add multi-page extraction with separate output files (#665) #1886

feat(teletext): Add multi-page extraction with separate output files (#665) #1886

Uh oh!

Conversation

cfsmp3 commented Dec 23, 2025

Summary

New Features

Implementation Details

Files Changed

Testing

Test Samples Used

Multi-Page Samples (Danish TV - DR1)

Single-Page Samples

How to Test

Test 1: Single Page Extraction (Backward Compatibility)

Test 2: Multiple Explicit Pages

Test 3: Auto-Detect All Pages

Test 4: Specific Page with --tpage (regression test)

Test Results

Sample Output

Checklist

Uh oh!

ccextractor-bot commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants