Skip to content

Add IBM Db2 dialect support#7275

Closed
ShubhamKapoor992 wants to merge 8 commits intotobymao:mainfrom
ShubhamKapoor992:feature/db2-sqlglot-dialect
Closed

Add IBM Db2 dialect support#7275
ShubhamKapoor992 wants to merge 8 commits intotobymao:mainfrom
ShubhamKapoor992:feature/db2-sqlglot-dialect

Conversation

@ShubhamKapoor992
Copy link
Copy Markdown
Contributor

@ShubhamKapoor992 ShubhamKapoor992 commented Mar 12, 2026

Summary

This PR adds support for IBM Db2 dialect to SQLGlot, enabling parsing and transpilation of Db2 SQL syntax.

Changes

  • ✅ Added sqlglot/dialects/db2.py with full Db2 dialect implementation
  • ✅ Added comprehensive test suite in tests/dialects/test_db2.py
  • ✅ Updated sqlglot/dialects/__init__.py to register Db2 dialect
  • ✅ Updated README.md to list Db2 as officially supported dialect

Features Implemented

Data Types

  • Db2-specific types: CLOB, DBCLOB, DECFLOAT, GRAPHIC, VARGRAPHIC
  • Type mappings: BOOLEAN → SMALLINT, TEXT → CLOB, etc.

SQL Syntax

  • FETCH FIRST n ROWS ONLY syntax
  • OFFSET n ROWS syntax
  • CURRENT DATE and CURRENT TIMESTAMP (without underscores)
  • String concatenation with || operator

Functions

  • POSSTR (string position)
  • VARCHAR_FORMAT (date/time formatting)
  • DAYOFWEEK, DAYOFYEAR extracts
  • DAYS function for date arithmetic
  • MIDNIGHT_SECONDS function

Transformations

  • Boolean values → 0/1 conversion
  • Date arithmetic with +/- operators
  • CAST to CHAR special handling
  • MAX/MIN → GREATEST/LEAST conversion

Testing

  • ✅ All 194 test cases pass
  • ✅ Covers basic SQL operations (SELECT, INSERT, UPDATE, DELETE)
  • ✅ Tests Db2-specific syntax and functions
  • ✅ Tests data type conversions
  • ✅ Tests joins, aggregations, CTEs, and complex queries

Compatibility

  • Follows SQLGlot dialect patterns
  • No breaking changes to existing functionality
  • All existing tests continue to pass

Documentation

  • Added Db2 to the supported dialects table in README.md
  • Marked as "Community" support level
  • Inline code comments for Db2-specific behavior

Comment thread README.md Outdated
| BigQuery | Official |
| ClickHouse | Official |
| Databricks | Official |
| Db2 | Official |
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure if we will end up merging this PR, i'll let the core team decide, but it certainly will not be official. please change this to community

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@georgesittas
Copy link
Copy Markdown
Collaborator

Thank you for the PR @ShubhamKapoor992.

The core team does not have review bandwidth for new community contributed dialects at the moment, so I'll go ahead and close this PR for now. Will let you know if this changes in the future to re-open.

@ShubhamKapoor992
Copy link
Copy Markdown
Contributor Author

ShubhamKapoor992 commented Mar 12, 2026

Hi @georgesittas

Thank you for the update.

This contribution is driven by the IBM customer requirement and we want to support it officially, and we are also working on enabling Db2 support in Ibis, sqlmesh which internally relies on SQLGlot. Having the Db2 dialect available in SQLGlot would therefore help unblock the release and integration on the Ibis and sqlmesh side as well.

We completely understand the bandwidth constraints of the core team. If there is any way we can assist in reducing the review effort (for example, by addressing specific feedback or aligning the implementation with project guidelines), we would be happy to help.

Please let us know if there is any possibility of reconsidering the PR or if there is an alternative path we should follow for moving this forward.

Thank you again for your time and guidance.

@ShubhamKapoor992
Copy link
Copy Markdown
Contributor Author

@georgesittas

Adding to what i stated earlier I am from IBM organization and we want to support it officially, so please don't consider it as community contribution. For official contribution if you required any changes from our end, do let us know.

@georgesittas
Copy link
Copy Markdown
Collaborator

georgesittas commented Mar 12, 2026

@ShubhamKapoor992 thank you for expressing your interest to help.

Despite your team taking on the burden of implementing the dialect, the SQLGlot core team still has to spend review cycles on every PR, because once it is merged we will be the ones responsible for maintaining the code and ensuring its quality adheres to our standards and existing conventions.

In order for us to consider getting this in, your team first needs to demonstrate that it can make high quality contributions to SQLGlot. Thus, the suggestion is to find issues and/or improvements that can be made to the codebase and solve them. If the incoming PRs are of high quality over the course of, let's say, two weeks, then we can get this dialect in, as long as you're committed to maintaining it.

Otherwise, there's also the option of developing the dialect as a plugin. From the README:

Plugin Dialects (supported since v28.6.0) are third-party dialects developed and maintained in external repositories by independent contributors. These dialects are not part of the SQLGlot codebase and are distributed as separate packages. The SQLGlot team does not provide support or maintenance for plugin dialects — please direct any issues or feature requests to their respective repositories. See Creating a Dialect Plugin below for information on how to build your own.

Lastly, regarding:

[...] we want to support it officially, so please don't consider it as community contribution.

Please note that both official and community -developed dialects are shipped with SQLGlot. The difference is who drives their development, which, in the latter case, is the community. For example, Exasol is shipped with SQLGlot but the core team is not involved in its development (that is, other than reviewing the incoming PRs.)

robreeves added a commit to linkedin/openhouse that referenced this pull request Mar 16, 2026
## Summary

Add a custom [SQLGlot](https://github.com/tobymao/sqlglot) dialect for
DataFusion and a `to_datafusion_sql` function that transpiles SQL from
any supported source dialect to DataFusion SQL.

This is the first step in decoupling the `TableTransformer` API from
DataFusion internals. Instead of returning a DataFusion DataFrame
(leaking the execution engine to users), the `TableTransformer` will
return a SQL string and its dialect. The data loader will then use
SQLGlot to translate that SQL to DataFusion for execution. It will be
used in #496.

We maintain the DataFusion dialect in-repo rather than contributing it
upstream to SQLGlot because the SQLGlot maintainers don't have capacity
to review more community dialects right now
([source](tobymao/sqlglot#7275 (comment))).

Context:
#496 (comment)

## Changes

- [ ] Client-facing API Changes
- [ ] Internal API Changes
- [ ] Bug Fixes
- [x] New Features
- [ ] Performance Improvements
- [ ] Code Style
- [ ] Refactoring
- [ ] Documentation
- [x] Tests

**DataFusion dialect** (`datafusion_sql.py`): custom SQLGlot dialect
with DataFusion-specific function mappings (e.g. `SIZE` → `cardinality`,
`ARRAY()` → `make_array`, `CURRENT_TIMESTAMP()` → `now()`), type
mappings (e.g. `CHAR`/`TEXT` → `VARCHAR`, `BINARY` → `BYTEA`), and
identifier/normalization rules.

**SQL translator** (`datafusion_sql.py`): `to_datafusion_sql(sql,
source_dialect)` accepts any supported source dialect (spark, postgres,
mysql, etc.) and transpiles to DataFusion. When source_dialect is
`"datafusion"` it returns the SQL unchanged. Validates the dialect with
a clear error listing all supported options.

**Dependency**: added `sqlglot>=29.0.0`.

## Testing Done

- [ ] Manually Tested on local docker setup. Please include commands
ran, and their output.
- [x] Added new tests for the changes made.
- [ ] Updated existing tests to reflect the changes made.
- [ ] No tests added or updated. Please explain why. If unsure, please
feel free to ask for help.
- [ ] Some other form of testing like staging or soak time in
production. Please explain.

Parametrized transpilation tests cover spark, mysql, postgres, and
datafusion identity. Edge case tests for unsupported dialects and
multi-statement errors. E2E test executes transpiled SQL against
DataFusion and validates output data.

```
make check  # All checks passed (ruff, mypy)
make test   # 19 dialect tests pass
```

# Additional Information

- [ ] Breaking Changes
- [ ] Deprecations
- [x] Large PR broken into smaller PRs, and PR plan linked in the
description.

This is the first PR. Follow-up PRs will integrate the translator into
the `TableTransformer` API and data loader pipeline.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@ShubhamKapoor992
Copy link
Copy Markdown
Contributor Author

Hi @georgesittas,

As per your suggestion, we’ve contributed and successfully merged three high-quality PRs to build community trust:

#7294
#7361
#7395

We’d really appreciate it if you could reconsider our PR for review when you have bandwidth. Looking forward to your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants