Feat: Add BigQuery lineage support for PowerBI .pbit files by AntoineGlacet · Pull Request #25511 · open-metadata/OpenMetadata

AntoineGlacet · 2026-01-25T10:00:07Z

Summary

Adds BigQuery lineage extraction support for PowerBI .pbit template files.

Relates to #25509

Motivation

PowerBI .pbit files containing BigQuery data sources currently have no lineage extracted. The parser supports Snowflake, Redshift, and Databricks, but BigQuery was missing. This adds parity for multi-cloud environments.

Implementation

1. BigQuery Expression Parser (`metadata.py`)

Added _parse_bigquery_source() method following the same pattern as _parse_snowflake_source() and _parse_redshift_source().

Key features:

Pattern matching: Extracts [Name="...", Kind="..."] patterns from GoogleBigQuery.Database() expressions
Expression reference resolution: Recursively resolves indirect references like Source = S_PJ_CODE
Database/Schema/Table extraction:
- Project: [Name="project"] (no Kind attribute)
- Dataset: [Name="dataset", Kind="Schema"]
- Table: [Name="table", Kind="Table"] or Kind="View"

Example expression parsed:

GoogleBigQuery.Database(
  [Name="kap-nami-prod"],
  [Name="iruca_aligned", Kind="Schema"],  
  [Name="dbo_S_PJ_CODE", Kind="Table"]
)
# Returns: [{"database": "kap-nami-prod", "schema": "iruca_aligned", "table": "dbo_S_PJ_CODE"}]

2. Partitions Support (`models.py`)

Many BigQuery connections in .pbit files use partitions instead of direct source references.

Added:

PowerBIPartition Pydantic model with name/mode/source fields
partitions field to PowerBiTable model
@model_validator to automatically extract source from partitions[0].source when main source is None

Why this matters:
Without partition support, tables with partition-based sources would have source=None, preventing lineage extraction.

Changes

Commit 1: BigQuery Parser

ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py (+97)
- Added _parse_bigquery_source() method with docstring and examples
- Integrated into parse_table_name_from_source() flow
- Proper error handling and debug logging

Commit 2: Partitions Support

ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py (+25, -1)
- Added PowerBIPartition model
- Added partitions: Optional[List[PowerBIPartition]] to PowerBiTable
- Added @model_validator(mode='before') for automatic source extraction

Testing

Docker Integration Testing

Tested with production .pbit file (Monthly Financial_trusted.pbit):

BigQuery lineage detected:

✅ Source: kap-nami-prod.iruca_aligned.dbo_S_PJ_CODE
   Table: Map PJ Code Master
   Method: Expression reference resolution (Source = S_PJ_CODE → GoogleBigQuery.Database())

Partitions extraction:

✅ Table: Map PJ Code Master
   - Source extracted from partitions[0].source
   - Expression normalized from list to string (via PR #25510)
   - Lineage parsed successfully

Data source summary:

1 BigQuery connection - ✅ 100% lineage coverage
11 SharePoint/Excel sources - No DB lineage (expected)
29 embedded/calculated tables - No DB lineage (expected)

Code Quality

✅ Follows existing parser patterns (_parse_snowflake_source, _parse_redshift_source)
✅ Pydantic v2 best practices (mode='before', @classmethod)
✅ Comprehensive error handling with try/except
✅ Debug logging at key decision points
✅ Recursive reference resolution for indirect sources
✅ Non-breaking change (only activates for BigQuery expressions)

Backward Compatibility

✅ 100% backward compatible - Only affects .pbit files with BigQuery sources. All existing Snowflake/Redshift/Databricks parsing unchanged.

Checklist

Follows established code patterns
Integration tested with real .pbit files
Proper error handling and logging
Pydantic v2 compliant
Non-breaking addition
Documented with docstrings and examples

github-actions · 2026-01-25T10:00:29Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Fixes #25483 Problem: - PowerBI .pbit files store multiline DAX expressions as JSON arrays (one string per line) - Pydantic validation failed with 'Input should be a valid string [type=string_type, input_value=[...], input_type=list]' - Parser could not ingest .pbit files with multiline DAX measures, source expressions, or dataset expressions Solution: - Added Pydantic field_validator decorators to normalize list expressions to multiline strings - Updated PowerBiMeasures.expression to accept Union[str, List[str]] - Updated PowerBITableSource.expression to accept Union[str, List[str]] - Updated DatasetExpression.expression to accept Union[str, List[str]] - Made expression optional for PowerBiMeasures to handle measures without expressions - Updated _get_child_measures() to handle None expression values Testing: - Successfully parsed .pbit file with 41 tables, 93 measures (72 multiline), 32 multiline sources - Added 9 new unit tests for validators - Added integration test case for multiline DAX expressions - All existing tests pass (backward compatible) Files changed: - ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py - ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py - ingestion/tests/unit/test_powerbi_table_measures.py

Adds support for parsing BigQuery connections in PowerBI .pbit files to enable lineage tracking from BigQuery tables/views to PowerBI tables. Features: - Parses GoogleBigQuery.Database() Power Query M expressions - Resolves dataset expression references (e.g., Source = S_PJ_CODE) - Extracts BigQuery project, dataset, and table information - Handles both direct connections and indirect references through expressions - Follows existing pattern for Snowflake, Redshift, and Databricks Implementation: - Added _parse_bigquery_source() method to parse BigQuery M expressions - Integrated into parse_table_name_from_source() lineage flow - Recursively resolves expression references to find BigQuery connections - Uses regex patterns to extract: [Name="project"], [Name="dataset",Kind="Schema"], [Name="table",Kind="Table"] Testing: - Verified with Monthly Financial_trusted.pbit file containing BigQuery connections - Successfully parsed lineage: kap-nami-prod.iruca_aligned.dbo_S_PJ_CODE → Map PJ Code Master - Expression resolution tested and working Example lineage chain: BigQuery: project.dataset.table ↓ (via dataset expression) Expression: S_PJ_CODE ↓ PowerBI Table: Map PJ Code Master Files changed: - ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py

.pbit files store table source information in partitions[0].source instead of directly in the table.source field. This commit adds partition support and automatically extracts source from partitions when the source field is empty. Changes: - Added PowerBIPartition model to represent table partitions in .pbit files - Added partitions field to PowerBiTable model - Added model_validator to extract source from partitions[0] when table.source is None - This enables lineage parsing for .pbit files where source is in partitions This fixes the issue where BigQuery lineage was not being detected even though the parsing logic was correct - the source field was simply not being populated. Testing: - Verified Map PJ Code Master table now has source populated from partitions - Confirmed BigQuery lineage detection works: kap-nami-prod.iruca_aligned.dbo_S_PJ_CODE

github-actions · 2026-01-27T05:54:55Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-01-27T06:10:20Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

github-actions · 2026-01-27T06:13:09Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Vulnerabilities (13)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`imagemagick`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`imagemagick-6-common`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`imagemagick-6.q16`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickcore-6-arch-config`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickcore-6-headers`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickcore-6.q16-6`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickcore-6.q16-6-extra`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickcore-6.q16-dev`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickcore-dev`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickwand-6-headers`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickwand-6.q16-6`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickwand-6.q16-dev`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6
`libmagickwand-dev`	CVE-2026-23876	🔥 CRITICAL	8:6.9.11.60+dfsg-1.6+deb12u5	8:6.9.11.60+dfsg-1.6+deb12u6

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (10)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

github-actions · 2026-01-27T06:14:37Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpam-modules`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-modules-bin`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-runtime`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam0g`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (20)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.13.2	3.13.3
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6
`azure-core`	CVE-2026-21226	🚨 HIGH	1.37.0	1.38.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`protobuf`	CVE-2026-0994	🚨 HIGH	4.25.8	6.33.5
`pyasn1`	CVE-2026-23490	🚨 HIGH	0.6.1	0.6.2
`python-multipart`	CVE-2026-24486	🚨 HIGH	0.0.20	0.0.22
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

github-actions · 2026-02-03T06:26:40Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

gitar-bot · 2026-02-03T06:47:24Z

🔍 CI failure analysis for 92fe40a: Four CI failures total: SonarCloud ingestion quality gate failed (external check, requires dashboard review) plus 3 Playwright shards (50% failure rate) with infrastructure issues. Python tests passed successfully.

Issue

Four CI failures detected after the "models file fix" commit:

[open-metadata-ingestion] SonarCloud Code Analysis: FAILURE (external quality gate)
Playwright Shard 3/6: 2 test failures + 8 flaky tests
Playwright Shard 4/6: 1 test failure + 14 flaky tests
Playwright Shard 6/6: 1 test failure + 11 flaky tests (recurring issue)

Good News: Python tests (3.10, 3.11) passed successfully.

Failure #1: SonarCloud Code Analysis

Status

Check Name: [open-metadata-ingestion] SonarCloud Code Analysis
Conclusion: FAILURE
Type: External third-party quality gate (not a GitHub Actions job)
Details URL: https://sonarcloud.io

Analysis Limitation

This is an external SonarCloud quality gate check that runs independently from GitHub Actions. The failure details are not accessible through GitHub Actions API. To view the specific code quality issues:

Visit the SonarCloud dashboard at https://sonarcloud.io
Navigate to the open-metadata-ingestion project
Review the quality gate failures for this PR/commit

SonarCloud typically fails for:

Code coverage below threshold
Code smells above threshold
Security hotspots
Bugs detected by static analysis
Technical debt ratio
Duplicate code blocks

Note: The main "SonarCloud" check passed successfully. Only the specific "[open-metadata-ingestion]" package check failed, suggesting the issue is isolated to the ingestion package quality metrics.

PowerBI Changes Context

This PR modifies files in the ingestion package:

ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py
ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
ingestion/tests/unit/test_powerbi_table_measures.py

The SonarCloud failure for the ingestion package may be related to these PowerBI changes, unlike the Playwright failures which are completely unrelated. However, without access to the specific SonarCloud findings, I cannot determine if the quality gate failure is legitimate or a threshold issue.

Failure #2: Playwright Shard 3/6

Failed Tests

QueryEntity.spec.ts:61 - "Query Entity"
- Error: expect(locator).toBeAttached() failed
- Element attachment validation issue
LineageSettings.spec.ts:94 - "Verify global lineage config"
- Lineage configuration verification failure

Flaky Tests (8 total)

ImpactAnalysis.spec.ts:585 - Entity popover card on hover
DataProductPermissions.spec.ts:84 - Data Product operations
GlossaryPermissions.spec.ts:365 - Team-based permissions
ServiceEntityPermissions.spec.ts:72 - SearchIndex Service permissions
SettingsNavigationPage.spec.ts:254 - Drag and drop reordering
ExploreDiscovery.spec.ts:142 - Display deleted assets
Metric.spec.ts:107 - Metric expression update
RightEntityPanelFlow.spec.ts:1176 - Data Quality Tab

Test Results

✅ 567 passed
❌ 2 failed
🔄 8 flaky
Pass rate: 98.6% (567/577)
Duration: 1.0 hour

Failure #3: Playwright Shard 4/6

Failed Test

Customproperties-part1.spec.ts:62 - "Add Integer custom property for glossaryTerm"

Flaky Tests (14 total)

13/14 concentrated in custom properties UI (Integer, String, Duration, Email, Number, SqlQuery, Timestamp, Hyperlink, Entity Reference, Date, DateTime)

Test Results

✅ 592 passed
❌ 1 failed
🔄 14 flaky
Pass rate: 97.7% (592/607)

Failure #4: Playwright Shard 6/6 (Recurring)

Failed Test

SearchIndexApplication.spec.ts:73 - Recurring across multiple CI runs

Flaky Tests (11 total)

Glossary, Lineage, ODCS, Tags, Users, EntityVersionPages

Test Results

✅ 526 passed
❌ 1 failed
🔄 11 flaky
Pass rate: 98.1% (526/537)

Current CI Status Summary

All Completed Jobs:

✅ Successful (13 jobs):

java-checkstyle
maven-mysql-ci
maven-postgresql-ci
py-checkstyle
py-run-build-tests
py-run-tests (3.10) ⭐ Passed
py-run-tests (3.11) ⭐ Passed
build-and-scan
Team Label
SonarCloud (main check)
playwright-ci-postgresql (shards 1, 2, 5)

❌ Failed (4 jobs):

[open-metadata-ingestion] SonarCloud Code Analysis (external)
playwright-ci-postgresql (3/6): 2 failures + 8 flaky (98.6% pass rate)
playwright-ci-postgresql (4/6): 1 failure + 14 flaky (97.7% pass rate)
playwright-ci-postgresql (6/6): 1 failure + 11 flaky (98.1% pass rate)

Playwright Results:

❌ 3/6 shards failed (50% failure rate)
Total: 5 test failures + 33 flaky tests
Combined pass rate: 98.1% (1685/1721)

Zero Code Path Analysis (Playwright Failures)

PowerBI Python ingestion has NO interaction with:

Frontend React UI (Query, Lineage, Permissions, Custom Properties, Search, etc.)
Complete module separation: Python backend vs React frontend
No shared code, dependencies, or data models

Conclusion

SonarCloud Failure

Status: External quality gate for ingestion package
Actionability: Requires SonarCloud dashboard review
Potential relevance: May be related to PowerBI code changes
Recommendation: Review at https://sonarcloud.io

Playwright Failures

Status: 50% shard failure rate - significant CI infrastructure problem
Evidence: Zero code path, 98.1% pass rate, recurring patterns, escalating failure rate
Root cause: Infrastructure resource constraints on shards 3, 4, 6
Historical: Shard 6 has failed repeatedly across multiple runs
Recommendation: Retry failed shards; investigate CI infrastructure

Python Tests

Status: ✅ Both Python 3.10 and 3.11 passed successfully

Critical: The 50% Playwright shard failure rate indicates serious CI infrastructure issues requiring operational intervention, not code fixes.

Code Review 👍 Approved with suggestions 1 resolved / 4 findings

Well-implemented BigQuery lineage support following established patterns. Three minor previous findings remain but are low-risk edge cases with existing defensive coding.

💡 Edge Case: BigQuery parser extracts first Name without Kind as project

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:1022-1033

The logic assumes the first [Name="..."] pattern without a Kind attribute is the project. This works for the documented pattern but may incorrectly identify the project if the expression contains other Name patterns without Kind before the actual project identifier.

For example, consider an edge case where the expression contains metadata or comments with [Name="something"] patterns before the actual BigQuery connection:

/* Config [Name="metadata"] */ GoogleBigQuery.Database()[Name="actual-project"]...

The current implementation would incorrectly identify "metadata" as the project.

Suggested improvement:
Consider parsing only after detecting GoogleBigQuery.Database by splitting/finding that substring first:

# Find the BigQuery portion of the expression
bq_start = source_expression.find("GoogleBigQuery.Database")
if bq_start >= 0:
    bq_expression = source_expression[bq_start:]
    name_matches = re.findall(r'\[Name="([^"]+)"(?:,Kind="([^"]+)")?\]', bq_expression)

This is minor risk as real-world .pbit files likely don't have this pattern, but it would make the parser more robust.

💡 Quality: Regex pattern may capture trailing spaces or quotes

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:980-985

The regex pattern r'Source\s*=\s*([A-Za-z0-9_#"&\s]+?)\s*,' uses a character class that includes whitespace (\s) and quotes ("). The subsequent cleanup with .strip().strip('"').strip('#').strip('"') handles some cases, but the order of operations may not fully clean all edge cases.

For example, if the matched string is " MyRef ", the current cleanup chain:

.strip() → "MyRef" (quotes remain)
.strip('"') → MyRef (outer quotes removed)
.strip('#') → MyRef
.strip('"') → MyRef

However, patterns like #"My Ref" would result in My Ref" after processing because .strip('#') only removes leading/trailing #, not the quote that follows.

Suggested improvement:
Use a more targeted regex or refine the cleanup:

ref_name = source_ref_match.group(1).strip()
# Remove surrounding quotes and hash symbols commonly found in M expressions
ref_name = re.sub(r'^[#"]+|[#"]+$', '', ref_name).strip()

This is minor since the current implementation likely works for common cases, but it could cause issues with certain M expression naming conventions.

💡 Edge Case: Partition source extraction assumes dict structure

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:196-200

The extract_source_from_partitions validator accesses partitions[0].get("source") assuming the partition is a dict. However, when Pydantic processes nested models in mode='before' validators, the inner objects might already be parsed into Pydantic models (e.g., PowerBIPartition instances) rather than dicts, depending on how the data is constructed.

If partitions[0] is already a PowerBIPartition instance (not a dict), calling .get("source") will raise an AttributeError.

Suggested fix:
Handle both dict and model instance cases:

@model_validator(mode='before')
@classmethod
def extract_source_from_partitions(cls, values):
    if isinstance(values, dict):
        if values.get("source") is None and values.get("partitions"):
            partitions = values.get("partitions", [])
            if partitions and len(partitions) > 0:
                first_partition = partitions[0]
                if isinstance(first_partition, dict):
                    partition_source = first_partition.get("source")
                elif hasattr(first_partition, "source"):
                    partition_source = first_partition.source
                else:
                    partition_source = None
                if partition_source:
                    values["source"] = [partition_source]
    return values

This is likely low-impact since mode='before' typically receives raw data, but defensive coding would prevent future regressions.

✅ 1 resolved

✅ Bug: Duplicate imports, field declarations, and validators in models.py

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:120-122 📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:131-132 📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:140-149 📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:192-199
The models.py file contains several duplications that will cause issues:

Duplicate import (line 121): from typing import List, Optional, Union appears twice

Duplicate field in PowerBiMeasures (line 132): expression: Optional[Union[str, List[str]]] = None is declared twice - this will cause Pydantic validation issues

Duplicate validator in PowerBiMeasures (lines 140-149): The normalize_expression validator is defined twice

Duplicate field and validator in DatasetExpression (lines 192-199): The expression field and normalize_expression validator are duplicated

These appear to be merge artifacts. The duplicate field declarations will cause Pydantic to behave unexpectedly, and the duplicate validators may cause double processing.

Fix: Remove the duplicate import, field declarations, and validators, keeping only one instance of each.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-02-03T09:04:18Z

Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

AntoineGlacet marked this pull request as ready for review January 25, 2026 10:24

AntoineGlacet requested a review from a team as a code owner January 25, 2026 10:24

AntoineGlacet had a problem deploying to test January 25, 2026 10:24 — with GitHub Actions Failure

AntoineGlacet added 3 commits January 27, 2026 14:54

AntoineGlacet force-pushed the feat/powerbi-bigquery-lineage branch from 1d8cdae to 79b0402 Compare January 27, 2026 05:54

AntoineGlacet had a problem deploying to test January 27, 2026 05:54 — with GitHub Actions Failure

harshsoni2024 added the safe to test Add this label to run secure Github workflows on PRs label Jan 27, 2026

harshsoni2024 temporarily deployed to test January 27, 2026 06:03 — with GitHub Actions Inactive

harshsoni2024 had a problem deploying to test January 27, 2026 06:03 — with GitHub Actions Failure

harshsoni2024 temporarily deployed to test January 27, 2026 06:03 — with GitHub Actions Inactive

AntoineGlacet had a problem deploying to test February 2, 2026 00:49 — with GitHub Actions Failure

Merge branch 'main' into feat/powerbi-bigquery-lineage

7ebe629

harshsoni2024 temporarily deployed to test February 2, 2026 10:59 — with GitHub Actions Inactive

harshsoni2024 had a problem deploying to test February 2, 2026 10:59 — with GitHub Actions Failure

harshsoni2024 temporarily deployed to test February 2, 2026 10:59 — with GitHub Actions Inactive

Merge branch 'main' into feat/powerbi-bigquery-lineage

25a87fe

harshsoni2024 had a problem deploying to test February 2, 2026 17:53 — with GitHub Actions Failure

harshsoni2024 temporarily deployed to test February 2, 2026 17:53 — with GitHub Actions Inactive

harshsoni2024 had a problem deploying to test February 2, 2026 17:53 — with GitHub Actions Failure

harshsoni2024 temporarily deployed to test February 2, 2026 17:53 — with GitHub Actions Inactive

ulixius9 previously approved these changes Feb 3, 2026

View reviewed changes

Merge branch 'main' into feat/powerbi-bigquery-lineage

001e382

harshsoni2024 dismissed ulixius9’s stale review via 001e382 February 3, 2026 06:18

harshsoni2024 had a problem deploying to test February 3, 2026 06:18 — with GitHub Actions Error

gitar-bot Bot reviewed Feb 3, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py Outdated

models file fix

92fe40a

ulixius9 approved these changes Feb 4, 2026

View reviewed changes

AntoineGlacet mentioned this pull request Feb 6, 2026

Add BigQuery lineage support for PowerBI .pbit files #25509

Closed

Conversation

AntoineGlacet commented Jan 25, 2026

Summary

Motivation

Implementation

1. BigQuery Expression Parser (metadata.py)

2. Partitions Support (models.py)

Changes

Commit 1: BigQuery Parser

Commit 2: Partitions Support

Testing

Docker Integration Testing

Code Quality

Backward Compatibility

Checklist

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 27, 2026

Uh oh!

github-actions Bot commented Jan 27, 2026

Uh oh!

github-actions Bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (13)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (10)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

Uh oh!

github-actions Bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (20)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

1. BigQuery Expression Parser (`metadata.py`)

2. Partitions Support (`models.py`)

github-actions Bot commented Jan 27, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

github-actions Bot commented Jan 27, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

gitar-bot Bot commented Feb 3, 2026 •

edited

Loading