Skip to content

Latest commit

 

History

History
67 lines (47 loc) · 3.88 KB

File metadata and controls

67 lines (47 loc) · 3.88 KB

How It Works: Binary Data to Video

This project implements a system for storing arbitrary binary data within a video stream. Here is the technical breakdown of the process.

1. Encoding Process

The encoder treats a video frame as a grid of data "blocks".

The Grid

Given a resolution (e.g., 1280x720) and a block_size (e.g., 16 pixels):

  • The frame is divided into (1280/16) * (720/16) blocks.
  • With a block size of 16, this creates an 80x45 grid, allowing for 3,600 bits (450 bytes) per frame.

Data Mapping

  • Each bit of binary data is mapped to a block.
  • Bit 1: The entire block is filled with white pixels (value 255).
  • Bit 0: The entire block is filled with black pixels (value 0).

Frame Header

To ensure the data can be reassembled correctly and verified, each frame includes a 16-byte header:

  1. Frame Index (4B): The sequence number of the frame.
  2. Total Frames (4B): The total number of frames in the sequence.
  3. Data Length (4B): The number of actual data bytes in this specific frame.
  4. CRC32 Checksum (4B): A checksum of the payload data for integrity verification.

2. Dealing with Compression

The primary challenge of using YouTube is lossy compression. Platforms like YouTube use aggressive codecs (H.264, VP9, AV1) that discard high-frequency information to save space.

Why 16px Blocks?

If we used 1x1 pixel blocks, video compression would blur the edges, making it impossible to distinguish between a 0 and a 1. By using larger blocks (16x16), we create redundancy. Even if the edges of the block are blurred or "artifacted," the center of the block remains clearly black or white.

Adaptive Block Sizing

The system can automatically choose an optimal block size based on the input file size when --block-size 0 is specified:

  • Small files (<100KB): Uses 32px blocks for extreme robustness.
  • Medium files: Uses 16px blocks (default).
  • Large files (>10MB): Uses 8px blocks to maximize density.

During decoding, if --block-size 0 is passed, the decoder attempts to "probe" the first frame with various common block sizes (8, 16, 32, etc.) until it finds one that yields a valid header and passing CRC.

Lossless vs Lossy

  • When encoding locally, we use the FFV1 codec (lossless) or MP4V.
  • When YouTube processes the video, it re-encodes it. The large block size ensures that the "signal" (the black/white blocks) survives this transformation.

3. Decoding Process

The decoder reverses the mapping:

  1. It reads the video frame by frame.
  2. It samples the center pixel of each block using optimized NumPy grid sampling.
  3. If the pixel value is > 128, it registers a 1; otherwise, a 0.
  4. It parses the 12-byte header to identify the frame's position and payload size.
  5. Once all frames are collected, it reassembles them into the original binary file.

4. Performance Optimization

Multi-threaded Processing

Encoding and decoding are CPU-intensive due to Reed-Solomon calculations and image processing. The system utilizes Python's ProcessPoolExecutor to distribute frame processing across all available CPU cores.

  • Encoder: Generates multiple frames in parallel and writes them to the video stream in sequence.
  • Decoder: Reads frames sequentially (as required by OpenCV) but offloads the grayscale conversion, sampling, and Reed-Solomon decoding to a pool of worker processes.

NumPy Vectorization

Instead of iterating through pixels with nested loops, the system uses NumPy's repeat and reshape functions for frame generation and advanced indexing for grid sampling, drastically reducing the overhead of processing high-resolution video frames.

5. Restoration

Since the decoded file is just a stream of bytes, the restore_format.py script uses magic bytes (via the file utility) to identify what the original file extension was (e.g., .zip, .pdf, .mp4) and renames it accordingly.