Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# limitations under the License.

[tool.bumpversion]
current_version = "2.0.0"
current_version = "2.1.0-rc.1"
commit = true
message = "Update version {current_version} -> {new_version}"
ignore_missing_version = true
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# [2.1.0-rc.1](https://github.com/IBM/data-intelligence-sdk/compare/v2.0.0...v2.1.0-rc.1) (2026-05-20)


### Features

* **dq:** Sync from enterprise cf57469 on 2026-05-20 ([8f0474a](https://github.com/IBM/data-intelligence-sdk/commit/8f0474ac4c8890bc1171367f616a32f7c900fdbb))

# [2.0.0](https://github.com/IBM/data-intelligence-sdk/compare/v1.0.0...v2.0.0) (2026-04-23)


Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
See the License for the specific language governing permissions and
limitations under the License.
-->
# IBM watsonx.data intelligence SDK Version 2.0.0
# IBM watsonx.data intelligence SDK Version 2.1.0-rc.1

A comprehensive Python SDK for data intelligence operations including:
- **Data Quality Validation**: Validate streaming data records, Pandas DataFrames, and PySpark DataFrames
Expand Down Expand Up @@ -295,7 +295,7 @@ container_response = dph_service.initialize(
# Create a data product
data_product = dph_service.create_data_product(
drafts=[{
'version': '2.0.0',
'version': '2.1.0-rc.1',
'name': 'My Data Product',
'description': 'A sample data product',
'asset': {
Expand Down Expand Up @@ -1186,5 +1186,5 @@ For issues, questions, or contributions, please open an issue on GitHub.
- pytest-cov >= 4.0.0
- pytest-mock >= 3.7.0
- black >= 26.3.1
- mypy >= 2.0.0
- mypy >= 1.0.0

11 changes: 9 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ docs/
├── requirements.txt # Documentation dependencies
├── build_docs.py # Build script
├── README.md # This file
├── CONTRIBUTING_TO_DOCS.md # Documentation contribution guide
├── _static/ # Static assets
│ ├── css/
Expand All @@ -39,11 +40,17 @@ docs/
│ ├── 02_overview/ # Features and release notes
│ ├── 03_common_modules/ # Shared authentication
│ ├── 04_dq_validator/ # DQ Validator module
│ └── 05_future_modules/ # Future module guidelines
│ ├── 05_dph_services/ # Data Product Hub Services
│ ├── 06_odcs_generator/ # ODCS Generator
│ ├── 07_data_product_recommender/ # Data Product Recommender
│ └── 08_future_modules/ # Future module guidelines
└── api/ # API reference
├── common/ # Common modules API
└── dq_validator/ # DQ Validator API
├── dq_validator/ # DQ Validator API
├── dph_services/ # DPH Services API
├── odcs_generator/ # ODCS Generator Class Reference
└── data_product_recommender/ # Data Product Recommender Class Reference
```

## Building Documentation Locally
Expand Down
22 changes: 21 additions & 1 deletion docs/chapters/01_welcome/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,27 @@ If you're new to the SDK or installing it for the first time, be sure to check o

You can find details on the latest releases, FAQs, known issues, and more in the :ref:`Overview<overview>` section.

If you already have the SDK installed and are looking to get started using it, please refer to the :ref:`Common Modules<common_modules>` section for authentication setup, and the :ref:`DQ Validator<dq_validator>` section for data quality validation.
SDK Modules
-----------

The SDK provides several powerful modules for data intelligence and governance:

**Data Quality Validator**
Comprehensive data quality validation framework with support for multiple check types, integration with Pandas and PySpark DataFrames, and CEL (Common Expression Language) for complex validation rules. See :ref:`DQ Validator<dq_validator>` for details.

**Data Product Hub Services**
Python client library for IBM Data Product Hub API, enabling programmatic management of data products, containers, contract terms, and the complete data product lifecycle from creation to retirement. See :ref:`Data Product Hub Services<dph_services>` for details.

**ODCS Generator**
Automated generation of Open Data Contract Standard (ODCS) v3.1.0 compliant YAML files from enterprise data catalogs including Collibra and Informatica CDGC. Streamlines data contract creation by extracting and transforming catalog metadata. See :ref:`ODCS Generator<odcs_generator>` for details.

**Data Product Recommender**
Intelligent analysis of database query logs to identify high-value tables and logical groupings for data product prioritization. Supports multiple platforms (Snowflake, Databricks, BigQuery, watsonx.data) and provides actionable recommendations based on usage patterns. See :ref:`Data Product Recommender<data_product_recommender>` for details.

Getting Started
---------------

If you already have the SDK installed and are looking to get started using it, please refer to the :ref:`Common Modules<common_modules>` section for authentication setup, and explore the individual module documentation for specific use cases.

Looking for documentation on the SDK's interfaces and abstractions? Please check out our :ref:`API Reference Documentation<api_ref>` for an in-depth breakdown of all the SDK's classes, properties, and methods - including detailed descriptions of any required or optional parameters.

Expand Down
6 changes: 3 additions & 3 deletions docs/chapters/01_welcome/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ To verify that the SDK is installed correctly:
>>> import wxdi.dq_validator
>>> from wxdi.common.auth import AuthProvider
>>> print(dq_validator.__version__)
2.0.0
2.1.0-rc.1

Versioning
----------
Expand All @@ -116,7 +116,7 @@ Version numbers follow the format ``MAJOR.MINOR.PATCH``:
Current Version
~~~~~~~~~~~~~~~

The current version of the SDK is **2.0.0**.
The current version of the SDK is **2.1.0-rc.1**.

Checking Your Version
~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -133,7 +133,7 @@ Or programmatically:

>>> import wxdi.dq_validator
>>> print(dq_validator.__version__)
2.0.0
2.1.0-rc.1

Upgrading
---------
Expand Down
130 changes: 130 additions & 0 deletions docs/chapters/02_overview/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,136 @@ Type Safety
* IDE autocomplete and type checking support
* Runtime type validation

DPH Services Module
-------------------

Python client library for IBM Data Product Hub API, providing programmatic access to data product management.

Container Management
~~~~~~~~~~~~~~~~~~~~

* Initialize and configure data product containers
* Manage delivery methods and domain structures
* Service credential management
* API key operations

Data Product Lifecycle
~~~~~~~~~~~~~~~~~~~~~~

* Create, update, and delete data products
* Draft management with version control
* Publish drafts to releases
* Retire releases when needed
* Pagination support for large datasets

Contract Terms
~~~~~~~~~~~~~~

* Manage contract terms and documents
* Create reusable contract templates
* Attach terms and conditions to data products
* Service level agreement management

Domain Organization
~~~~~~~~~~~~~~~~~~~

* Create and manage domains and subdomains
* Organize data products by business area
* Multi-industry domain support
* Hierarchical domain structures

Asset Visualization
~~~~~~~~~~~~~~~~~~~

* Create data asset visualizations
* Reinitiate visualizations with updated assets
* Support for multiple assets per visualization

ODCS Generator Module
---------------------

Automated generation of Open Data Contract Standard (ODCS) v3.1.0 compliant YAML files from data catalog metadata.

Multi-Catalog Support
~~~~~~~~~~~~~~~~~~~~~

* **Collibra Integration**: Extract metadata from Collibra data catalog
* **Informatica CDGC**: Extract metadata from Informatica Cloud Data Governance and Catalog
* Extensible architecture for additional catalog sources

Metadata Extraction
~~~~~~~~~~~~~~~~~~~

* Automatic asset metadata extraction via REST APIs
* Column discovery through catalog relations
* Data type mapping (logical and physical)
* Classification support via GraphQL (Collibra)
* Tag integration at asset and column levels
* Custom attribute preservation

ODCS Generation
~~~~~~~~~~~~~~~

* ODCS v3.1.0 compliant YAML output
* Complete schema definition with column metadata
* Data quality rules integration
* Service level agreement specifications
* Governance and ownership information

Data Type Mapping
~~~~~~~~~~~~~~~~~

* Intelligent mapping of catalog types to ODCS types
* Support for logical types (string, integer, number, timestamp, boolean)
* Physical type preservation with precision and scale
* Custom type mapping support

Data Product Recommender Module
--------------------------------

Analyze database query logs to identify high-value tables and logical groupings for data product prioritization.

Multi-Platform Support
~~~~~~~~~~~~~~~~~~~~~~

* **Snowflake**: Query log analysis from ACCOUNT_USAGE.QUERY_HISTORY
* **Databricks**: Query log analysis from system.query.history
* **BigQuery**: Query log analysis from INFORMATION_SCHEMA.JOBS_BY_PROJECT
* **watsonx.data**: Query log analysis from system.runtime.queries

Intelligent Scoring
~~~~~~~~~~~~~~~~~~~

* Query frequency analysis (37.5% weight)
* User diversity metrics (37.5% weight)
* Recency scoring (15% weight)
* Consistency patterns (10% weight)
* Customizable scoring weights

Table Grouping
~~~~~~~~~~~~~~

* Identify tables frequently used together
* Cohesion analysis for logical groupings
* User reach metrics across groups
* Group scoring with multiple factors

Output Formats
~~~~~~~~~~~~~~

* **Markdown**: Human-readable reports with tables and formatting
* **JSON**: Machine-readable format for automation and AI agents
* Star ratings (1-5 stars) for quick assessment
* Detailed metrics and query pattern analysis

CLI and Python API
~~~~~~~~~~~~~~~~~~

* Command-line interface for quick analysis
* Python API for programmatic integration
* File-based input (CSV and JSON)
* Configurable output directory and format

Future Modules
--------------

Expand Down
Loading