Skip to content

[FLINK-39622] [postgres] Fix O(N²) JDBC metadata lookups in CustomPostgresSchema d…#4403

Open
ThorneANN wants to merge 1 commit into
apache:masterfrom
ThorneANN:fix/postgres-schema-cache-n-squared
Open

[FLINK-39622] [postgres] Fix O(N²) JDBC metadata lookups in CustomPostgresSchema d…#4403
ThorneANN wants to merge 1 commit into
apache:masterfrom
ThorneANN:fix/postgres-schema-cache-n-squared

Conversation

@ThorneANN
Copy link
Copy Markdown
Contributor

、 CustomPostgresSchema#readTableSchema invokes jdbcConnection.readSchema with
the full captured-table filter, so a single call already loads metadata for
every captured table. However the cache-population loop only iterates the
requested subset, discarding the rest. As a result, snapshot startup performs
one full pg_catalog scan per split, scaling as O(N²) with the number of
captured tables and causing severe latency on multi-tenant Postgres deployments
that capture hundreds of tables across schemas.

This change caches every table discovered by readSchema into schemasByTableId,
while the returned tableChanges still contains only the originally-requested
subset. Subsequent splits are served entirely from the cache.

Also fixes a related issue where getTableSchema(List) re-fetched
already-cached tables by passing the full tableIds list to readTableSchema
instead of the unmatched subset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant