Skip to content

Added backport for support of Google BigLake Metastore#1528

Merged
zvonand merged 4 commits intoantalya-26.1from
google_big_lake_catalog
Mar 27, 2026
Merged

Added backport for support of Google BigLake Metastore#1528
zvonand merged 4 commits intoantalya-26.1from
google_big_lake_catalog

Conversation

@subkanthi
Copy link
Copy Markdown
Collaborator

@subkanthi subkanthi commented Mar 14, 2026

Changelog category (leave one):

  • Improvement
  • Bugfix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Google biglake catalog integration ClickHouse#97104
Included bugfix for REST Catalog ClickHouse#97561

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 14, 2026

Workflow [PR], commit [497edb7]

@ianton-ru
Copy link
Copy Markdown

Build is failed
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1528&sha=bd7e111acc0aabf4456d971b135c4cb612c385e0&name_0=PR&name_1=Fast%20test
getAbsolutePathFromObjectInfo, getResolvedStorageFromObjectInfo does not exist, createArchiveReader has different description.
@subkanthi

@subkanthi subkanthi force-pushed the google_big_lake_catalog branch from bd7e111 to bda5bcc Compare March 26, 2026 15:09
Signed-off-by: Kanthi Subramanian <subkanthi@gmail.com>
Signed-off-by: Kanthi Subramanian <subkanthi@gmail.com>
@subkanthi subkanthi force-pushed the google_big_lake_catalog branch from bda5bcc to 90890ff Compare March 26, 2026 15:14
@subkanthi
Copy link
Copy Markdown
Collaborator Author

subkanthi commented Mar 26, 2026

Testing
Using Google cloud ADC

gcloud auth application-default login

cat ~/.config/gcloud/application_default_credentials.json 
ClickHouse local version 26.1.3.20001.altinityantalya.

:) SET allow_database_iceberg = 1


SET allow_database_iceberg = 1

Query id: d6689315-fad0-4759-9a88-88d4be181e18

Ok.

0 rows in set. Elapsed: 0.001 sec. 

:) CREATE DATABASE biglake_db2
ENGINE = DataLakeCatalog('https://biglake.googleapis.com/iceberg/v1/restcatalog')
SETTINGS
    catalog_type = 'biglake',
    google_adc_client_id = '',
    google_adc_client_secret = '',
    google_adc_refresh_token = '',
    google_adc_quota_project_id = 'kanthi-test',
    warehouse = 'gs://biglake-public-nyc-taxi-iceberg';

SHOW TABLES FROM biglake_db2

Query id: d120cf9d-4ec2-45c5-9061-8ae22991a2d0

   ┌─name─────────────────────────┐
1. │ public_data.nyc_taxicab      │
2. │ public_data.nyc_taxicab_2021 │
   └──────────────────────────────┘

2 rows in set. Elapsed: 3.887 sec. 
:)  select * from biglake_db2.`public_data.nyc_taxicab` limit 1;

SELECT *
FROM biglake_db2.`public_data.nyc_taxicab`
LIMIT 1

Query id: e858217e-b0aa-4e0f-b408-a6956f0ca900

Row 1:
──────
vendor_id:           1
pickup_datetime:     2016-11-17 15:43:00.000000
dropoff_datetime:    2016-11-17 15:43:05.000000
passenger_count:     1
trip_distance:       0
rate_code:           1
store_and_fwd_flag:  N
payment_type:        1
fare_amount:         0
extra:               0
mta_tax:             0
tip_amount:          0
tolls_amount:        0
imp_surcharge:       0
airport_fee:         ᴺᵁᴸᴸ
total_amount:        0
pickup_location_id:  236
dropoff_location_id: 236
data_file_year:      2016
data_file_month:     11

1 row in set. Elapsed: 11.280 sec. 

:) 


@subkanthi subkanthi marked this pull request as ready for review March 26, 2026 16:45
@subkanthi
Copy link
Copy Markdown
Collaborator Author

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1528 (Added backport for support of Google BigLake Metastore):

Confirmed defects:

Medium: Silent fallback from ADC refresh-token auth to GCE metadata credentials
Impact: On a GCP VM, a misconfigured or rejected ADC refresh-token flow can still succeed using the instance’s metadata token, so the catalog may authenticate as the wrong principal and errors that should fail closed are downgraded to debug logs.
Anchor: src/Databases/DataLake/RestCatalog.cpp — BigLakeCatalog::retrieveGoogleCloudAccessToken
Trigger: google_adc_client_id, google_adc_client_secret, and google_adc_refresh_token are all non-empty, but the refresh-token request throws DB::Exception (e.g. invalid secret); process runs on GCE with metadata reachable.
Why defect: User explicitly chose ADC parameters; catching and falling through to metadata is fail-open relative to that choice and hides the real configuration error.
Fix direction (short): Remove metadata fallback when all three ADC fields are set; only use metadata when ADC fields are empty (or gate fallback behind an explicit setting).
Regression test direction (short): Unit/integration test: ADC refresh fails → expect exception, no metadata request / no successful auth.

RestCatalog.cpp
Lines 602-614
AccessToken BigLakeCatalog::retrieveGoogleCloudAccessToken() const
{
if (!google_adc_client_id.empty() && !google_adc_client_secret.empty() && !google_adc_refresh_token.empty())
{
try
{
return retrieveGoogleCloudAccessTokenFromRefreshToken();
}
catch (const DB::Exception & e)
{
LOG_DEBUG(log, "Failed to use ADC credentials, falling back to metadata service: {}", e.what());
}
}
Low: OAuth token lifetime buffer can go negative when expires_in is small
Impact: For small expires_in (e.g. < 300), expires_in - 300 makes the computed expiry immediate, causing constant refresh and extra load or rate-limit risk.
Anchor: src/Databases/DataLake/RestCatalog.cpp — RestCatalog::retrieveAccessTokenOAuth (same pattern in other token parsers in this file).
Trigger: Token endpoint returns expires_in smaller than the hard-coded 300s buffer.
Why defect: Buffer should be applied with a floor (e.g. max(0, expires_in - buffer) or max(1, expires_in - min(buffer, expires_in-1))), not raw subtraction.
Fix direction (short): Clamp: std::chrono::seconds(std::max(0, expires_in - 300)) or cap buffer to expires_in - 1.
Regression test direction (short): Parse mock JSON with expires_in=60 and assert isExpired() / refresh behavior is sane (single refresh, not immediate permanent expiry loop).

RestCatalog.cpp
Lines 313-317
if (object->has("expires_in"))
{
Int64 expires_in = object->getValue("expires_in");
token.expires_at = std::chrono::system_clock::now() + std::chrono::seconds(expires_in - 300);
}
Coverage summary:

Scope reviewed: PR diff files — DatabaseDataLake.cpp, DatabaseDataLakeSettings.cpp, RestCatalog.cpp / RestCatalog.h, SettingsEnums.{h,cpp} (file list from PR).
Categories failed: Auth / fail-open fallback; token lifetime arithmetic edge case.
Categories passed: Catalog wiring (ICEBERG_BIGLAKE, OneLakeCatalog split), ADC file parsing validation, namespace workaround for BigLake, enum registration, static concurrency review (no new lock introduced for token cache in diff — not elevated to confirmed race without call-site concurrency proof).
Assumptions/limits: Static review only; no execution of Altinity CI or integration tests for this PR in this pass.

@CarlosFelipeOR
Copy link
Copy Markdown
Collaborator

CarlosFelipeOR commented Mar 27, 2026

QA Verification

BigLake Catalog Integration

Tested BigLake catalog integration using a real GCP environment (project with billing enabled and BigLake API active) against public Iceberg dataset gs://biglake-public-nyc-taxi-iceberg.

Verified:

  • Successful catalog creation using DataLakeCatalog(..., catalog_type='biglake')
  • Table discovery via SHOW TABLES
  • Schema resolution via DESCRIBE TABLE
  • Data access via SELECT ... LIMIT
  • Full scan (count()), filters, projections, and aggregations
  • Behavior of external tables (not present in system.tables) is correct
  • Namespace deduplication fix works correctly (BigLake returning base_namespace as child of itself is handled — SHOW TABLES returns clean results)
  • DROP DATABASE works cleanly for BigLake catalog databases

End-to-end flow ClickHouse → BigLake → Iceberg → GCS works as expected.


Authentication

Two authentication methods were tested:

  • Explicit OAuth settings (google_adc_client_id, google_adc_client_secret, google_adc_refresh_token, google_adc_quota_project_id) — works correctly
  • ADC credentials file (google_adc_credentials_file pointing to application_default_credentials.json) — works correctly

Note: There is no automatic ADC discovery (e.g. auto-reading ~/.config/gcloud/application_default_credentials.json). The credentials file path must be explicitly provided via the google_adc_credentials_file setting. This is consistent with the PR implementation — the file is only read when google_adc_credentials_file.changed is true.


Negative Scenarios

  • Non-existent table → correct UNKNOWN_TABLE error
  • Invalid warehouse (gs://bucket-that-does-not-exist-anywhere) → returned DNS_ERROR. This is not introduced by this PR — it comes from the generic DataLakeCatalog/object storage layer attempting to resolve the bucket as a hostname. Not ideal UX but out of scope for this change.

REST Catalog Refactoring

This PR refactors RestCatalog from a final class to an inheritable base, extracts OneLakeCatalog into its own subclass, and changes access_token from std::string to a struct with expiration tracking.

  • The existing regression suite (iceberg/tests/iceberg_engine/ under Feature("rest catalog")) covers the RestCatalog base class extensively (sanity, alter, RBAC, schema evolution, predicate pushdown, etc.), providing confidence that the refactoring did not break REST catalog functionality.
  • OneLake — not tested. No Azure environment was available for manual testing, and no automated tests (integration, stateless, or regression) exist in either repository.
  • Vended credentials — the PR changes if (!lightweight && with_vended_credentials) to if (with_vended_credentials), removing the lightweight check, and adds ICEBERG_BIGLAKE to the vended credentials catalog list. The BigLake manual tests implicitly exercised the vended_credentials=true path (default setting) including SHOW TABLES (which uses the lightweight=true code path affected by this change) — no issues observed. No automated tests currently cover the vended_credentials=true path in either repository.

CI Status

Integration tests

The only failure in CI is test_storage_s3_queue/test_0.py::test_move_after_processing[another_bucket-AzureQueue].

  • Occurs only on arm_binary, distributed plan
  • ~14–22% fail rate, 0% on all non-ARM builds
  • Reproduced across many unrelated PRs and branch runs

Assessment: Known flaky ARM-specific issue (pre-existing)
Recommendation: Safe to ignore for PR #1528 — not related to BigLake changes


Regression Tests

Run on arm64: https://github.com/Altinity/clickhouse-regression/actions/runs/23628648659
Run on x86: https://github.com/Altinity/clickhouse-regression/actions/runs/23628611059/job/68847967121

Only two suites failed:

All other tests are green — no regressions detected in existing test suites (integration, stateless, regression).


Summary

Area Status
BigLake catalog (create, query, drop) ✅ Verified
Explicit OAuth authentication ✅ Verified
ADC credentials file authentication ✅ Verified
Namespace dedup fix ✅ Verified (implicitly)
DROP DATABASE cleanup ✅ Verified
REST catalog refactoring (base class) ✅ Covered by existing regression suite
Vended credentials behavior change ✅ Implicitly verified via BigLake tests
OneLake refactoring ⚠️ No test coverage available
Negative scenarios ✅ Verified
CI ✅ Green — no regressions

Conclusion: BigLake integration works correctly in a real GCP environment with both authentication methods. The REST catalog refactoring is covered by existing regression tests, and the vended credentials change was implicitly validated through BigLake manual testing. OneLake refactoring remains untested due to lack of environment and automated test coverage. CI is green with no regressions.

@CarlosFelipeOR CarlosFelipeOR added the verified Verified by QA label Mar 27, 2026
@zvonand zvonand merged commit 84f4a41 into antalya-26.1 Mar 27, 2026
333 of 340 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants