@@ -91,6 +91,12 @@ with a single seek to `file_size - 32`, without first reading the header.
9191+--------+------+---------+----------------------------------------+
9292```
9393
94+ The magic number ` 0x54414348 ` ("TACH" for Tachyon) identifies the file format
95+ and also serves as an ** endianness marker** . When read on a system with
96+ different byte order than the writer, it appears as ` 0x48434154 ` . The reader
97+ uses this to detect cross-endian files and automatically byte-swap all
98+ multi-byte integer fields.
99+
94100The Python version field records the major, minor, and micro version numbers
95101of the Python interpreter that generated the file. This allows analysis tools
96102to detect version mismatches when replaying data collected on a different
@@ -399,14 +405,17 @@ enable O(1) lookup during interning.
399405
400406### Reading
401407
402- 1 . Read the header and validate magic/version
403- 2 . Seek to end − 32 and read the footer
404- 3 . Allocate string array of ` string_count ` elements
405- 4 . Parse the string table, populating the array
406- 5 . Allocate frame array of ` frame_count * 3 ` uint32 elements
407- 6 . Parse the frame table, populating the array
408- 7 . If compressed, decompress the sample data region
409- 8 . Iterate through samples, resolving indices to strings/frames
408+ 1 . Read the header magic number to detect endianness (set ` needs_swap ` flag
409+ if the magic appears byte-swapped)
410+ 2 . Validate version and read remaining header fields (byte-swapping if needed)
411+ 3 . Seek to end − 32 and read the footer (byte-swapping counts if needed)
412+ 4 . Allocate string array of ` string_count ` elements
413+ 5 . Parse the string table, populating the array
414+ 6 . Allocate frame array of ` frame_count * 3 ` uint32 elements
415+ 7 . Parse the frame table, populating the array
416+ 8 . If compressed, decompress the sample data region
417+ 9 . Iterate through samples, resolving indices to strings/frames
418+ (byte-swapping thread_id and interpreter_id if needed)
410419
411420The reader builds lookup arrays rather than dictionaries since it only needs
412421index-to-value mapping, not value-to-index.
@@ -435,7 +444,8 @@ copy the wrong bytes (high-order zeros instead of the actual value).
435444** Reader Implementation** : The reader tracks whether byte-swapping is needed
436445via a ` needs_swap ` flag set during header parsing. All fixed-width fields
437446in the header, footer, and sample data are conditionally byte-swapped using
438- inline swap functions (` bswap32 ` , ` bswap64 ` ).
447+ Python's internal byte-swap functions (` _Py_bswap32 ` , ` _Py_bswap64 ` from
448+ ` pycore_bitutils.h ` ).
439449
440450Variable-length integers (varints) are byte-order independent since they
441451encode values one byte at a time using the LEB128 scheme, so they require
0 commit comments