Add IBM Db2 dialect support#7275
Conversation
| | BigQuery | Official | | ||
| | ClickHouse | Official | | ||
| | Databricks | Official | | ||
| | Db2 | Official | |
There was a problem hiding this comment.
i'm not sure if we will end up merging this PR, i'll let the core team decide, but it certainly will not be official. please change this to community
|
Thank you for the PR @ShubhamKapoor992. The core team does not have review bandwidth for new community contributed dialects at the moment, so I'll go ahead and close this PR for now. Will let you know if this changes in the future to re-open. |
|
Thank you for the update. This contribution is driven by the IBM customer requirement and we want to support it officially, and we are also working on enabling Db2 support in Ibis, sqlmesh which internally relies on SQLGlot. Having the Db2 dialect available in SQLGlot would therefore help unblock the release and integration on the Ibis and sqlmesh side as well. We completely understand the bandwidth constraints of the core team. If there is any way we can assist in reducing the review effort (for example, by addressing specific feedback or aligning the implementation with project guidelines), we would be happy to help. Please let us know if there is any possibility of reconsidering the PR or if there is an alternative path we should follow for moving this forward. Thank you again for your time and guidance. |
|
Adding to what i stated earlier I am from IBM organization and we want to support it officially, so please don't consider it as community contribution. For official contribution if you required any changes from our end, do let us know. |
|
@ShubhamKapoor992 thank you for expressing your interest to help. Despite your team taking on the burden of implementing the dialect, the SQLGlot core team still has to spend review cycles on every PR, because once it is merged we will be the ones responsible for maintaining the code and ensuring its quality adheres to our standards and existing conventions. In order for us to consider getting this in, your team first needs to demonstrate that it can make high quality contributions to SQLGlot. Thus, the suggestion is to find issues and/or improvements that can be made to the codebase and solve them. If the incoming PRs are of high quality over the course of, let's say, two weeks, then we can get this dialect in, as long as you're committed to maintaining it. Otherwise, there's also the option of developing the dialect as a plugin. From the README:
Lastly, regarding:
Please note that both official and community -developed dialects are shipped with SQLGlot. The difference is who drives their development, which, in the latter case, is the community. For example, Exasol is shipped with SQLGlot but the core team is not involved in its development (that is, other than reviewing the incoming PRs.) |
## Summary Add a custom [SQLGlot](https://github.com/tobymao/sqlglot) dialect for DataFusion and a `to_datafusion_sql` function that transpiles SQL from any supported source dialect to DataFusion SQL. This is the first step in decoupling the `TableTransformer` API from DataFusion internals. Instead of returning a DataFusion DataFrame (leaking the execution engine to users), the `TableTransformer` will return a SQL string and its dialect. The data loader will then use SQLGlot to translate that SQL to DataFusion for execution. It will be used in #496. We maintain the DataFusion dialect in-repo rather than contributing it upstream to SQLGlot because the SQLGlot maintainers don't have capacity to review more community dialects right now ([source](tobymao/sqlglot#7275 (comment))). Context: #496 (comment) ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [x] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [x] Tests **DataFusion dialect** (`datafusion_sql.py`): custom SQLGlot dialect with DataFusion-specific function mappings (e.g. `SIZE` → `cardinality`, `ARRAY()` → `make_array`, `CURRENT_TIMESTAMP()` → `now()`), type mappings (e.g. `CHAR`/`TEXT` → `VARCHAR`, `BINARY` → `BYTEA`), and identifier/normalization rules. **SQL translator** (`datafusion_sql.py`): `to_datafusion_sql(sql, source_dialect)` accepts any supported source dialect (spark, postgres, mysql, etc.) and transpiles to DataFusion. When source_dialect is `"datafusion"` it returns the SQL unchanged. Validates the dialect with a clear error listing all supported options. **Dependency**: added `sqlglot>=29.0.0`. ## Testing Done - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [x] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. Parametrized transpilation tests cover spark, mysql, postgres, and datafusion identity. Edge case tests for unsupported dialects and multi-statement errors. E2E test executes transpiled SQL against DataFusion and validates output data. ``` make check # All checks passed (ruff, mypy) make test # 19 dialect tests pass ``` # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [x] Large PR broken into smaller PRs, and PR plan linked in the description. This is the first PR. Follow-up PRs will integrate the translator into the `TableTransformer` API and data loader pipeline. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
Hi @georgesittas, As per your suggestion, we’ve contributed and successfully merged three high-quality PRs to build community trust: We’d really appreciate it if you could reconsider our PR for review when you have bandwidth. Looking forward to your feedback! |
Summary
This PR adds support for IBM Db2 dialect to SQLGlot, enabling parsing and transpilation of Db2 SQL syntax.
Changes
sqlglot/dialects/db2.pywith full Db2 dialect implementationtests/dialects/test_db2.pysqlglot/dialects/__init__.pyto register Db2 dialectREADME.mdto list Db2 as officially supported dialectFeatures Implemented
Data Types
SQL Syntax
FETCH FIRST n ROWS ONLYsyntaxOFFSET n ROWSsyntaxCURRENT DATEandCURRENT TIMESTAMP(without underscores)||operatorFunctions
POSSTR(string position)VARCHAR_FORMAT(date/time formatting)DAYOFWEEK,DAYOFYEARextractsDAYSfunction for date arithmeticMIDNIGHT_SECONDSfunctionTransformations
+/-operatorsTesting
Compatibility
Documentation