Recover PG-vendored C types collapsed to int during header parsing#15
Conversation
4692909 to
4755cbb
Compare
4755cbb to
c57f7f7
Compare
The host-symbol-collision build prefix-renames PG types and the parse lacks pg_config.h, so opaque PG-vendored types reach libclang already macro-collapsed and are spelled int / int * / int ** in the parsed IDL. This post-parse pass recovers each from the header declaration text, preserving const / pointer levels, and only when the function's parsed type actually collapsed to int. Recovered base types: bool, int64, Timestamp(Tz), H3Index, text, GSERIALIZED, Interval, DateADT, Datum, size_t, GBOX, BOX3D, AFFINE. Audited against a correct-typed reference IDL: zero int*-where-a-named- pointer-belongs mismatches remain, so every binding that codegens from the catalog gets the real types (e.g. tcbuffer_convex_hull -> GSERIALIZED *, temporal_tprecision(..., const Interval *, ...)).
c57f7f7 to
9ef867a
Compare
Approving and mergingEmpirical verificationI cross-checked this end-to-end via MobilityDB/MEOS.js#2 (which carries an IDL regenerated against this branch):
For context, without this fix, the same regen produces 234 TypeScript errors (123 Relationship to #1 (stdbool-stub)This PR functionally supersedes #1: it handles the same scalar collapses ( Suggesting we close #1 as superseded once this lands, running both would mean two layers fighting over the same slots, and the preprocessor stub approach is the more brittle of the two (drift any time MEOS adds a typedef). Approve and merge from my side. |
Problem
Some MEOS header sets reach libclang with
bool/int64/Timestamp/TimestampTz/H3Indexalready collapsed toint(orint *) at thepreprocessor level — the real type name is gone before parsing, so it cannot be
recovered from the AST. The extracted IDL then carries
intwhere the sourcesays one of those types, and downstream binding generators mis-map them:
bool→int(should be a boolean)int64/H3Index→int(should be 64-bit /long)TimestampTz *out-param →int *— generators size the result buffer at4 bytes for an 8-byte native write (a buffer under-allocation; observed as
IndexOutOfBounds/ native-heap corruption in a JMEOS consumer).Fix
A post-parse pass (
parser/typerecover.py) that recovers these from the rawheader declaration text (which still spells the real type) and rewrites the
IDL entry. Wired into
run.pyright afterparse_all_headers.It is idempotent and a no-op on correctly-parsed headers: it only rewrites a
type that is currently
"int"/"int *"and whose header declaration spellsa recoverable type. Genuinely-int functions (e.g.
intspan_width) are leftuntouched.
Validation
plus a genuine-int control: all recovered correctly, control unchanged,
re-run is a 0/0 no-op (idempotent).
externdecls).This makes the IDL — and any binding regenerated from it — reproducible without
external post-processing scripts.