Skip to content

Feat: Add BigQuery lineage support for PowerBI .pbit files#25511

Merged
ulixius9 merged 13 commits intoopen-metadata:mainfrom
AntoineGlacet:feat/powerbi-bigquery-lineage
Feb 4, 2026
Merged

Feat: Add BigQuery lineage support for PowerBI .pbit files#25511
ulixius9 merged 13 commits intoopen-metadata:mainfrom
AntoineGlacet:feat/powerbi-bigquery-lineage

Conversation

@AntoineGlacet
Copy link
Copy Markdown
Contributor

Summary

Adds BigQuery lineage extraction support for PowerBI .pbit template files.

Relates to #25509

Motivation

PowerBI .pbit files containing BigQuery data sources currently have no lineage extracted. The parser supports Snowflake, Redshift, and Databricks, but BigQuery was missing. This adds parity for multi-cloud environments.

Implementation

1. BigQuery Expression Parser (metadata.py)

Added _parse_bigquery_source() method following the same pattern as _parse_snowflake_source() and _parse_redshift_source().

Key features:

  • Pattern matching: Extracts [Name="...", Kind="..."] patterns from GoogleBigQuery.Database() expressions
  • Expression reference resolution: Recursively resolves indirect references like Source = S_PJ_CODE
  • Database/Schema/Table extraction:
    • Project: [Name="project"] (no Kind attribute)
    • Dataset: [Name="dataset", Kind="Schema"]
    • Table: [Name="table", Kind="Table"] or Kind="View"

Example expression parsed:

GoogleBigQuery.Database(
  [Name="kap-nami-prod"],
  [Name="iruca_aligned", Kind="Schema"],  
  [Name="dbo_S_PJ_CODE", Kind="Table"]
)
# Returns: [{"database": "kap-nami-prod", "schema": "iruca_aligned", "table": "dbo_S_PJ_CODE"}]

2. Partitions Support (models.py)

Many BigQuery connections in .pbit files use partitions instead of direct source references.

Added:

  • PowerBIPartition Pydantic model with name/mode/source fields
  • partitions field to PowerBiTable model
  • @model_validator to automatically extract source from partitions[0].source when main source is None

Why this matters:
Without partition support, tables with partition-based sources would have source=None, preventing lineage extraction.

Changes

Commit 1: BigQuery Parser

  • ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py (+97)
    • Added _parse_bigquery_source() method with docstring and examples
    • Integrated into parse_table_name_from_source() flow
    • Proper error handling and debug logging

Commit 2: Partitions Support

  • ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py (+25, -1)
    • Added PowerBIPartition model
    • Added partitions: Optional[List[PowerBIPartition]] to PowerBiTable
    • Added @model_validator(mode='before') for automatic source extraction

Testing

Docker Integration Testing

Tested with production .pbit file (Monthly Financial_trusted.pbit):

BigQuery lineage detected:

✅ Source: kap-nami-prod.iruca_aligned.dbo_S_PJ_CODE
   Table: Map PJ Code Master
   Method: Expression reference resolution (Source = S_PJ_CODE → GoogleBigQuery.Database())

Partitions extraction:

✅ Table: Map PJ Code Master
   - Source extracted from partitions[0].source
   - Expression normalized from list to string (via PR #25510)
   - Lineage parsed successfully

Data source summary:

  • 1 BigQuery connection - ✅ 100% lineage coverage
  • 11 SharePoint/Excel sources - No DB lineage (expected)
  • 29 embedded/calculated tables - No DB lineage (expected)

Code Quality

  • ✅ Follows existing parser patterns (_parse_snowflake_source, _parse_redshift_source)
  • ✅ Pydantic v2 best practices (mode='before', @classmethod)
  • ✅ Comprehensive error handling with try/except
  • ✅ Debug logging at key decision points
  • ✅ Recursive reference resolution for indirect sources
  • ✅ Non-breaking change (only activates for BigQuery expressions)

Backward Compatibility

100% backward compatible - Only affects .pbit files with BigQuery sources. All existing Snowflake/Redshift/Databricks parsing unchanged.

Checklist

  • Follows established code patterns
  • Integration tested with real .pbit files
  • Proper error handling and logging
  • Pydantic v2 compliant
  • Non-breaking addition
  • Documented with docstrings and examples

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Fixes #25483

Problem:
- PowerBI .pbit files store multiline DAX expressions as JSON arrays (one string per line)
- Pydantic validation failed with 'Input should be a valid string [type=string_type, input_value=[...], input_type=list]'
- Parser could not ingest .pbit files with multiline DAX measures, source expressions, or dataset expressions

Solution:
- Added Pydantic field_validator decorators to normalize list expressions to multiline strings
- Updated PowerBiMeasures.expression to accept Union[str, List[str]]
- Updated PowerBITableSource.expression to accept Union[str, List[str]]
- Updated DatasetExpression.expression to accept Union[str, List[str]]
- Made expression optional for PowerBiMeasures to handle measures without expressions
- Updated _get_child_measures() to handle None expression values

Testing:
- Successfully parsed .pbit file with 41 tables, 93 measures (72 multiline), 32 multiline sources
- Added 9 new unit tests for validators
- Added integration test case for multiline DAX expressions
- All existing tests pass (backward compatible)

Files changed:
- ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
- ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py
- ingestion/tests/unit/test_powerbi_table_measures.py
Adds support for parsing BigQuery connections in PowerBI .pbit files to
enable lineage tracking from BigQuery tables/views to PowerBI tables.

Features:
- Parses GoogleBigQuery.Database() Power Query M expressions
- Resolves dataset expression references (e.g., Source = S_PJ_CODE)
- Extracts BigQuery project, dataset, and table information
- Handles both direct connections and indirect references through expressions
- Follows existing pattern for Snowflake, Redshift, and Databricks

Implementation:
- Added _parse_bigquery_source() method to parse BigQuery M expressions
- Integrated into parse_table_name_from_source() lineage flow
- Recursively resolves expression references to find BigQuery connections
- Uses regex patterns to extract: [Name="project"], [Name="dataset",Kind="Schema"], [Name="table",Kind="Table"]

Testing:
- Verified with Monthly Financial_trusted.pbit file containing BigQuery connections
- Successfully parsed lineage: kap-nami-prod.iruca_aligned.dbo_S_PJ_CODE → Map PJ Code Master
- Expression resolution tested and working

Example lineage chain:
  BigQuery: project.dataset.table
     ↓ (via dataset expression)
  Expression: S_PJ_CODE
     ↓
  PowerBI Table: Map PJ Code Master

Files changed:
- ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py
.pbit files store table source information in partitions[0].source instead of
directly in the table.source field. This commit adds partition support and
automatically extracts source from partitions when the source field is empty.

Changes:
- Added PowerBIPartition model to represent table partitions in .pbit files
- Added partitions field to PowerBiTable model
- Added model_validator to extract source from partitions[0] when table.source is None
- This enables lineage parsing for .pbit files where source is in partitions

This fixes the issue where BigQuery lineage was not being detected even though
the parsing logic was correct - the source field was simply not being populated.

Testing:
- Verified Map PJ Code Master table now has source populated from partitions
- Confirmed BigQuery lineage detection works: kap-nami-prod.iruca_aligned.dbo_S_PJ_CODE
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 27, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (13)

Package Vulnerability ID Severity Installed Version Fixed Version
imagemagick CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
imagemagick-6-common CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
imagemagick-6.q16 CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickcore-6-arch-config CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickcore-6-headers CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickcore-6.q16-6 CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickcore-6.q16-6-extra CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickcore-6.q16-dev CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickcore-dev CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickwand-6-headers CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickwand-6.q16-6 CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickwand-6.q16-dev CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6
libmagickwand-dev CVE-2026-23876 🔥 CRITICAL 8:6.9.11.60+dfsg-1.6+deb12u5 8:6.9.11.60+dfsg-1.6+deb12u6

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (10)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 27, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpam-modules CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-modules-bin CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-runtime CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam0g CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (20)

Package Vulnerability ID Severity Installed Version Fixed Version
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6
azure-core CVE-2026-21226 🚨 HIGH 1.37.0 1.38.0
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5
pyasn1 CVE-2026-23490 🚨 HIGH 0.6.1 0.6.2
python-multipart CVE-2026-24486 🚨 HIGH 0.0.20 0.0.22
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

ulixius9
ulixius9 previously approved these changes Feb 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 3, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py Outdated
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Feb 3, 2026

🔍 CI failure analysis for 92fe40a: Four CI failures total: SonarCloud ingestion quality gate failed (external check, requires dashboard review) plus 3 Playwright shards (50% failure rate) with infrastructure issues. Python tests passed successfully.

Issue

Four CI failures detected after the "models file fix" commit:

  1. [open-metadata-ingestion] SonarCloud Code Analysis: FAILURE (external quality gate)
  2. Playwright Shard 3/6: 2 test failures + 8 flaky tests
  3. Playwright Shard 4/6: 1 test failure + 14 flaky tests
  4. Playwright Shard 6/6: 1 test failure + 11 flaky tests (recurring issue)

Good News: Python tests (3.10, 3.11) passed successfully.


Failure #1: SonarCloud Code Analysis

Status

  • Check Name: [open-metadata-ingestion] SonarCloud Code Analysis
  • Conclusion: FAILURE
  • Type: External third-party quality gate (not a GitHub Actions job)
  • Details URL: https://sonarcloud.io

Analysis Limitation

This is an external SonarCloud quality gate check that runs independently from GitHub Actions. The failure details are not accessible through GitHub Actions API. To view the specific code quality issues:

  1. Visit the SonarCloud dashboard at https://sonarcloud.io
  2. Navigate to the open-metadata-ingestion project
  3. Review the quality gate failures for this PR/commit

SonarCloud typically fails for:

  • Code coverage below threshold
  • Code smells above threshold
  • Security hotspots
  • Bugs detected by static analysis
  • Technical debt ratio
  • Duplicate code blocks

Note: The main "SonarCloud" check passed successfully. Only the specific "[open-metadata-ingestion]" package check failed, suggesting the issue is isolated to the ingestion package quality metrics.

PowerBI Changes Context

This PR modifies files in the ingestion package:

ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py
ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
ingestion/tests/unit/test_powerbi_table_measures.py

The SonarCloud failure for the ingestion package may be related to these PowerBI changes, unlike the Playwright failures which are completely unrelated. However, without access to the specific SonarCloud findings, I cannot determine if the quality gate failure is legitimate or a threshold issue.


Failure #2: Playwright Shard 3/6

Failed Tests

  1. QueryEntity.spec.ts:61 - "Query Entity"

    • Error: expect(locator).toBeAttached() failed
    • Element attachment validation issue
  2. LineageSettings.spec.ts:94 - "Verify global lineage config"

    • Lineage configuration verification failure

Flaky Tests (8 total)

  1. ImpactAnalysis.spec.ts:585 - Entity popover card on hover
  2. DataProductPermissions.spec.ts:84 - Data Product operations
  3. GlossaryPermissions.spec.ts:365 - Team-based permissions
  4. ServiceEntityPermissions.spec.ts:72 - SearchIndex Service permissions
  5. SettingsNavigationPage.spec.ts:254 - Drag and drop reordering
  6. ExploreDiscovery.spec.ts:142 - Display deleted assets
  7. Metric.spec.ts:107 - Metric expression update
  8. RightEntityPanelFlow.spec.ts:1176 - Data Quality Tab

Test Results

  • ✅ 567 passed
  • ❌ 2 failed
  • 🔄 8 flaky
  • Pass rate: 98.6% (567/577)
  • Duration: 1.0 hour

Failure #3: Playwright Shard 4/6

Failed Test

  1. Customproperties-part1.spec.ts:62 - "Add Integer custom property for glossaryTerm"

Flaky Tests (14 total)

13/14 concentrated in custom properties UI (Integer, String, Duration, Email, Number, SqlQuery, Timestamp, Hyperlink, Entity Reference, Date, DateTime)

Test Results

  • ✅ 592 passed
  • ❌ 1 failed
  • 🔄 14 flaky
  • Pass rate: 97.7% (592/607)

Failure #4: Playwright Shard 6/6 (Recurring)

Failed Test

  1. SearchIndexApplication.spec.ts:73 - Recurring across multiple CI runs

Flaky Tests (11 total)

Glossary, Lineage, ODCS, Tags, Users, EntityVersionPages

Test Results

  • ✅ 526 passed
  • ❌ 1 failed
  • 🔄 11 flaky
  • Pass rate: 98.1% (526/537)

Current CI Status Summary

All Completed Jobs:

Successful (13 jobs):

  • java-checkstyle
  • maven-mysql-ci
  • maven-postgresql-ci
  • py-checkstyle
  • py-run-build-tests
  • py-run-tests (3.10) ⭐ Passed
  • py-run-tests (3.11) ⭐ Passed
  • build-and-scan
  • Team Label
  • SonarCloud (main check)
  • playwright-ci-postgresql (shards 1, 2, 5)

Failed (4 jobs):

  • [open-metadata-ingestion] SonarCloud Code Analysis (external)
  • playwright-ci-postgresql (3/6): 2 failures + 8 flaky (98.6% pass rate)
  • playwright-ci-postgresql (4/6): 1 failure + 14 flaky (97.7% pass rate)
  • playwright-ci-postgresql (6/6): 1 failure + 11 flaky (98.1% pass rate)

Playwright Results:

  • ❌ 3/6 shards failed (50% failure rate)
  • Total: 5 test failures + 33 flaky tests
  • Combined pass rate: 98.1% (1685/1721)

Zero Code Path Analysis (Playwright Failures)

PowerBI Python ingestion has NO interaction with:

  • Frontend React UI (Query, Lineage, Permissions, Custom Properties, Search, etc.)
  • Complete module separation: Python backend vs React frontend
  • No shared code, dependencies, or data models

Conclusion

SonarCloud Failure

  • Status: External quality gate for ingestion package
  • Actionability: Requires SonarCloud dashboard review
  • Potential relevance: May be related to PowerBI code changes
  • Recommendation: Review at https://sonarcloud.io

Playwright Failures

  • Status: 50% shard failure rate - significant CI infrastructure problem
  • Evidence: Zero code path, 98.1% pass rate, recurring patterns, escalating failure rate
  • Root cause: Infrastructure resource constraints on shards 3, 4, 6
  • Historical: Shard 6 has failed repeatedly across multiple runs
  • Recommendation: Retry failed shards; investigate CI infrastructure

Python Tests

  • Status: ✅ Both Python 3.10 and 3.11 passed successfully

Critical: The 50% Playwright shard failure rate indicates serious CI infrastructure issues requiring operational intervention, not code fixes.

Code Review 👍 Approved with suggestions 1 resolved / 4 findings

Well-implemented BigQuery lineage support following established patterns. Three minor previous findings remain but are low-risk edge cases with existing defensive coding.

💡 Edge Case: BigQuery parser extracts first Name without Kind as project

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:1022-1033

The logic assumes the first [Name="..."] pattern without a Kind attribute is the project. This works for the documented pattern but may incorrectly identify the project if the expression contains other Name patterns without Kind before the actual project identifier.

For example, consider an edge case where the expression contains metadata or comments with [Name="something"] patterns before the actual BigQuery connection:

/* Config [Name="metadata"] */ GoogleBigQuery.Database()[Name="actual-project"]...

The current implementation would incorrectly identify "metadata" as the project.

Suggested improvement:
Consider parsing only after detecting GoogleBigQuery.Database by splitting/finding that substring first:

# Find the BigQuery portion of the expression
bq_start = source_expression.find("GoogleBigQuery.Database")
if bq_start >= 0:
    bq_expression = source_expression[bq_start:]
    name_matches = re.findall(r'\[Name="([^"]+)"(?:,Kind="([^"]+)")?\]', bq_expression)

This is minor risk as real-world .pbit files likely don't have this pattern, but it would make the parser more robust.

💡 Quality: Regex pattern may capture trailing spaces or quotes

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:980-985

The regex pattern r'Source\s*=\s*([A-Za-z0-9_#"&\s]+?)\s*,' uses a character class that includes whitespace (\s) and quotes ("). The subsequent cleanup with .strip().strip('"').strip('#').strip('"') handles some cases, but the order of operations may not fully clean all edge cases.

For example, if the matched string is " MyRef ", the current cleanup chain:

  1. .strip()"MyRef" (quotes remain)
  2. .strip('"')MyRef (outer quotes removed)
  3. .strip('#')MyRef
  4. .strip('"')MyRef

However, patterns like #"My Ref" would result in My Ref" after processing because .strip('#') only removes leading/trailing #, not the quote that follows.

Suggested improvement:
Use a more targeted regex or refine the cleanup:

ref_name = source_ref_match.group(1).strip()
# Remove surrounding quotes and hash symbols commonly found in M expressions
ref_name = re.sub(r'^[#"]+|[#"]+$', '', ref_name).strip()

This is minor since the current implementation likely works for common cases, but it could cause issues with certain M expression naming conventions.

💡 Edge Case: Partition source extraction assumes dict structure

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:196-200

The extract_source_from_partitions validator accesses partitions[0].get("source") assuming the partition is a dict. However, when Pydantic processes nested models in mode='before' validators, the inner objects might already be parsed into Pydantic models (e.g., PowerBIPartition instances) rather than dicts, depending on how the data is constructed.

If partitions[0] is already a PowerBIPartition instance (not a dict), calling .get("source") will raise an AttributeError.

Suggested fix:
Handle both dict and model instance cases:

@model_validator(mode='before')
@classmethod
def extract_source_from_partitions(cls, values):
    if isinstance(values, dict):
        if values.get("source") is None and values.get("partitions"):
            partitions = values.get("partitions", [])
            if partitions and len(partitions) > 0:
                first_partition = partitions[0]
                if isinstance(first_partition, dict):
                    partition_source = first_partition.get("source")
                elif hasattr(first_partition, "source"):
                    partition_source = first_partition.source
                else:
                    partition_source = None
                if partition_source:
                    values["source"] = [partition_source]
    return values

This is likely low-impact since mode='before' typically receives raw data, but defensive coding would prevent future regressions.

✅ 1 resolved
Bug: Duplicate imports, field declarations, and validators in models.py

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:120-122 📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:131-132 📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:140-149 📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py:192-199
The models.py file contains several duplications that will cause issues:

  1. Duplicate import (line 121): from typing import List, Optional, Union appears twice
  2. Duplicate field in PowerBiMeasures (line 132): expression: Optional[Union[str, List[str]]] = None is declared twice - this will cause Pydantic validation issues
  3. Duplicate validator in PowerBiMeasures (lines 140-149): The normalize_expression validator is defined twice
  4. Duplicate field and validator in DatasetExpression (lines 192-199): The expression field and normalize_expression validator are duplicated

These appear to be merge artifacts. The duplicate field declarations will cause Pydantic to behave unexpectedly, and the duplicate validators may cause double processing.

Fix: Remove the duplicate import, field declarations, and validators, keeping only one instance of each.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Feb 3, 2026

Quality Gate Failed Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants