dolthub/doltgresql#1648: fix VALUES clause type inference by codeaucafe · Pull Request #2187 · dolthub/doltgresql

codeaucafe · 2026-01-09T05:56:54Z

Summary

VALUES clause type inference now uses PostgreSQL's common type resolution algorithm instead of using only the first row's type.

existing issue: SELECT * FROM (VALUES(1),(2.01),(3)) v(n) failed with integer: unhandled type: decimal.Decimal

Changes

Add ResolveValuesTypes analyzer rule that:
- Collects types from all VALUES rows (not just the first)
- Uses FindCommonType() to resolve per PostgreSQL's [UNION/CASE type resolution](https://www.postgresql.org/docs/15/typeconv-union-case.html; per FindCommonType doc comment)
- Applies implicit casts to convert values to the common type
- Updates GetField expressions in parent nodes (handles aggregates like SUM)
Add UnknownCoercion expression for unknown to target type coercion
add go and bats tests

Questions for Reviewers

Analyzer rule approach: Is adding this as an analyzer rule (after TypeSanitizer) the right approach? I considered alternatives but this seemed cleanest for handling the two-pass transformation needed for GetField updates. Open to feedback if there's a better pattern.
PostgreSQL version: The code references PostgreSQL 15 docs. Should this stay as-is since doltgresql targets PG 15, or should it use /docs/current/?

Fixes: #1648

Add ResolveValuesTypes analyzer rule to compute common types across all VALUES rows, not just the first row. Previously, DoltgreSQL would incorrectly use only the first value to determine column types, causing errors when subsequent values had different types like VALUES(1),(2.01),(3). Changes: - Two-pass transformation strategy: first pass transforms VDT nodes with unified types, second pass updates GetField expressions in parent nodes - Use FindCommonType() to resolve types per PostgreSQL rules - Apply ImplicitCast for type conversions and UnknownCoercion for unknown-typed literals - Handle aggregates via getSourceSchema() - Add UnknownCoercion expression type for unknown -> target coercion without conversion Tests: - Add 4 bats integration tests for mixed int/decimal VALUES - Add 3 Go test cases covering int-first, decimal-first, SUM aggregate, and multi-column scenarios Refs: dolthub#1648

Hydrocharged · 2026-01-09T12:56:25Z

Greetings! I'll dig into this for the review.

As for the comment regarding documentation, in general the majority of the code is written to Postgres 15 specifications, however we still add features from newer versions when applicable. Chasing the newest version is a never-ending target, but we'll definitely update to a newer target relatively soon. Regarding documentation pinning though, we should never reference the current, as that changes over time. We definitely don't want critical information changing unknowingly. As long as the version is pinned for the documentation, that should be fine.

codeaucafe · 2026-01-09T18:16:29Z

Thank you. Yea I saw the code was mostly toward psql15 spec but wasn't sure if I missed any updates or other docs indicating writing towards late version spec (if much even changed from psql15 spec to later versions' specs?). Thank you for clarifying.

Also, thanks in advance for reviewing the PR.

Hydrocharged

Within my PR comments, I tend to be a bit wordy to try and help with the general direction for future PRs. For this one though, it's a good first draft! I stepped through and modified the code locally to make sure I fully understood what was happening, and the intention behind most of the changes so I could give a decent review. I ended up making some modifications to reduce the overall size of the PR as well (with a lot of the comments left here added in), which you can view here:

https://github.com/dolthub/doltgresql/pull/2193/files

Let me know once the feedback has been addressed, and I'll take another look at it.

server/analyzer/resolve_values_types.go

server/expression/implicit_cast.go

server/analyzer/resolve_values_types.go

server/functions/framework/common_type.go

server/analyzer/resolve_values_types.go

testing/go/values_statement_test.go

Hydrocharged · 2026-01-12T13:27:12Z

Also, regarding the first question:

Analyzer rule approach: Is adding this as an analyzer rule (after TypeSanitizer) the right approach? I considered alternatives but this seemed cleanest for handling the two-pass transformation needed for GetField updates. Open to feedback if there's a better pattern.

I didn't want to answer before reviewing to make sure that I understood everything correctly, but the approach of adding another analyzer step works perfectly for this case. In general, if we're using the transform or pgtransform packages, then it should probably be an analyzer rule.

codeaucafe · 2026-01-13T02:10:29Z

@Hydrocharged the detailed comments are fantastic (what I wish all reviews were like). I'll get to addressing the sometime this week.

Thank you!

codeaucafe · 2026-01-21T01:04:53Z

update: still going through these requested changes. Had other things I needed to handle first, so I'll probably finish this sometime during the next weekend. thank you.

cheers

codeaucafe · 2026-01-30T01:59:42Z

update: got sniped by work and life so I didn't finish as much as I thought I would last weekend, but I'm still slowing going through your change reqs/comments. Should get something up by end of weekend I hope.
cheers

codeaucafe · 2026-02-03T08:53:44Z

update, made it about halfway through over weekend/today, but didn't finish since got sick over weekend from concert. Should actually be finish this week though haha. Thanks for being flexible and still watching this PR (as seen per response emojis).

Thanks again!

Hydrocharged · 2026-02-03T09:15:52Z

No worries, no rush on our end!

Refactor ResolveValuesTypes analyzer rule to use simpler implementation based on PR review feedback. Changes centralize unknown type handling and eliminate fragile tree traversal logic. Changes: - Use TableId-based lookup instead of recursive tree traversal to update GetField types, eliminating dependency on specific node types like SubqueryAlias - Leverage pgtransform.NodeExprsWithOpaque for expression updates instead of manual recursion through four helper functions - Move unknown type handling into cast functions (GetExplicitCast, GetAssignmentCast, GetImplicitCast) to eliminate scattered checks across call sites - Add requiresCasts return value to FindCommonType to optimize case where no type conversion is needed - Simplify VALUES node transformation using sql.Expressioner interface to handle both ValueDerivedTable and Values uniformly - Add comprehensive test coverage for VALUES with GROUP BY, DISTINCT, LIMIT/OFFSET, ORDER BY, subqueries, WHERE clause, aggregates, and combined operations This refactoring reduces code complexity from ~300 lines to ~180 lines while improving maintainability and eliminating potential bugs from manual tree walking. Refs: dolthub#1648

Refs: dolthub#1648

Update error messages in resolve_values_types.go to follow existing error messaging conventions in analyzer: - Add "VALUES:" prefix to match pattern used in analzyer code files, such as in assign_update_casts.go (UPDATE:) and in assign_insert_casts.go (INSERT:) - Also fix return value of n to nil when returning error Refs: dolthub#1648

codeaucafe · 2026-02-04T10:58:25Z

FYI pushed some changes. Will next work on adding/fixing tests, as well as fixing shallow copy issue in resolve_values_types.go I noticed right before signing off for tonight.

codeaucafe · 2026-02-07T23:20:38Z

I think most of the work is done, just need to go through the tests (add and/or refactor). Then I can take time to review to make sure I shouldn't refactor any of the implementations

codeaucafe · 2026-02-07T23:20:53Z

I think most of the work is done, just need to go through the tests (add and/or refactor). Then I can take time to review to make sure I shouldn't refactor any of the implementations

Add more tests for VALUES clause resolution following PR review comments; also additional edge cases. Tests here verify mixed-type column inference, NULL handling, error cases, and integration with SQL operations like GROUP BY, DISTINCT, LIMIT, ORDER BY, WHERE, aggregates, CTEs, and JOINs. Refs: dolthub#1648

codeaucafe · 2026-02-09T07:39:52Z

FYI pushed my tests but just now realized my original comment didn't get entered I guess. I believe it was about 3 tests* I added that seemed to be failing from my code changes. Will verify tomorrow and provide options.

Cheers

Hydrocharged · 2026-02-10T09:55:40Z

Make sure you pull in the latest changes from main as well.

codeaucafe · 2026-02-10T23:52:39Z

Will do. Also realized I had a typo for "3 years" when it should have said "3 tests" 🤦 .

I think maybe I figure out why and got a fix. I'll try and get the new code pushed tonight

Fix two bugs in `ResolveValuesTypes` func that were introduced by our initial code implementation. Both bugs only showed up when VALUES type inference interacted with JOINs or aggregates: - Bug 1: JOIN GetField index: The original code used gf.Index() - 1 to look up columns in VDT schemas, but GetField indices are global across joined tables (e.g., a.n=0, b.id=1, b.label=2), not per-table offsets. This caused out-of-bounds errors in JOIN's. Fixed by matching cols by name instead of index calc'ing. - Bug 2: Aggregate type propagation: The first pass updates GetFields that read directly from a VDT, BUT when a type change ripples through an aggregate (e.g., int4 to numeric inside MIN), the aggregate return type changes while parent nodes still have GetFields with the old type. This can cause runtime panics from type/value mismatches. Fixed by adding a second pass that syncs each GetField type with the child node's actual schema. Test updates: SUM now returns numeric instead of float64 when operating on numeric inputs (matches PostgreSQL behavior). Unskipped 3 tests (2 JOIN, 1 MIN/MAX) that now pass. Refs: dolthub#1648

…1648-values-clause-type-inference # Conflicts: # server/expression/explicit_cast.go

codeaucafe · 2026-02-12T03:04:24Z

FYI I think maybe this is a bit better to review now:

I fixed those 3 tests caused by my code implementation (2 JOIN, 1 MIN/MAX). Note, there are 3 skipped tests, but are the pre-existing subquery bugs on I found existed on main prior to my code main and I think are not related to this PR's changes.

Given the 3 skipped tests appear to be out of scope of the GH issue this PR attempt (resolving values types) to fix I avoided working on them in this PR. Specifically, the 3 skipped tests have the same problem which is that ops inside a subquery over VALUES (arithmetic, LIMIT, ORDER BY) are silently ignored (no error), so I guess the nodes are not getting applied. From existing GitHub issues I don't think this issue has been created yet, so I'll make an issue once this PR is good and merged and then I can reference those skipped tests for repro

For info on latest commit to fix 3 tests I realized were failing due to my implementation after fixing according to your PR comments:

The JOIN tests were failing b/c GetField indices are global across all tables in a query, not per-table offsets into the VDT schema (I think this is the case). The original code did gf.Index() - 1 which appeared to work for single-table queries but was breaking during JOINs. I switched this logic to using name-based column lookups which appears to have resolved it, but maybe I'm missing something about this being a poorly implemented; thoughts?

The MIN/MAX test were panicking because the 1st pass updated GetField types inside the aggregate (e.g., int4 to numeric), this changed the aggregate's return type, but the parent Project node's GetFields still had old type. SO, I added a second pass that walks the tree and syncs each GetField type w/ the child node's actual schema.

Hydrocharged · 2026-02-12T08:41:10Z

FYI I think maybe this is a bit better to review now:

Would you say that it's in a complete state? I was waiting until it was fully ready before giving it the second pass.

Remove inline cast logic from ExplicitCast.Eval since it's now being handled by getRecordCast() in cast.go, and called from GetExplicitCast before returning nil. Also, remove duplicate UnknownLiteralCast fallback in GetImplicitCast and unused core import from explicit_cast.go. Last, clean up test name; don't include GH issue number. Refs: dolthub#1648

codeaucafe

@Hydrocharged I think this PR is good to review now. Cleaned up some small/basic things I missed 🤦‍♂️. I think this is complete enough to review now.

I think it would be good to add more exhaustive BATS tests like the unit tests additions so I'm going to work on that. Feel free to wait to review until thats done, unless you think I shouldn't even write more bats tests for this PR.

testing/go/values_statement_test.go

codeaucafe · 2026-02-14T04:25:43Z

I should be able to push up more bats tests tomorrow FYI

Hydrocharged · 2026-02-14T08:59:57Z

I'll wait for the additional tests, since they may uncover extra points that should be addressed. Also, we can never have too many tests!

Refs: dolthub#1648

codeaucafe · 2026-02-15T00:09:07Z

thanks @Hydrocharged, I just pushed the BATS test so I think its in a state to finally be reviewed again. Thank you for your patience.

No rush on reviewing this since its the weekend.

Hydrocharged · 2026-02-16T07:04:20Z

One final thing. I just noticed that tests weren't yet enabled on the repository (that's my mistake), so I just enabled them. I'll make sure those are all passing before the final review.

Refs: dolthub#1648

codeaucafe · 2026-02-17T02:14:10Z

no problem. I just pushed up another change to fix the format workflow failure.

thanks again for taking the time to review this and answer all my questions.

Hydrocharged

Almost there! Just a few small corrections

server/analyzer/resolve_values_types.go

testing/bats/types.bats

testing/go/values_statement_test.go

Reorg VALUES bats tests to its own values.bats from types.bats. Also, inline the getFieldWithType helper func, improve error messages in transformValuesNode, and add test cases for case-sensitive quoted column names and case-differing aggregate columns. Refs: dolthub#1648

codeaucafe · 2026-02-18T08:37:19Z

@Hydrocharged thank you for reviewing. I refactored accordingly to all your suggestions EXCEPT the the one regarding not using strings.ToLower. I did make a test to prove it causes issues as you noted, but I didn't have enough time to figure out how to correct that part of the code.

I hope to get something before end of week, so feel free to hold on reviewing again until I get that pushed up.

Thanks again for taking the time to review my PR.

Hydrocharged · 2026-02-18T11:29:11Z

Also, make sure to resolve conversations on GitHub as they're resolved in your code.

codeaucafe · 2026-02-19T04:09:14Z

@Hydrocharged I did some more digging into the strings.ToLower usage in the second pass and I think the reason my tests originally failed without strings.ToLower is casing asymmetry in GMS between two code paths:

planbuilder/aggregates.go:329 lowercases the entire aggregate name when creating GetFields:
aggName := strings.ToLower(plan.AliasSubqueryString(agg)) -> "sum(v.n)"
plan/group_by.go:74 keeps original casing from e.String() when building GroupBy's schema:
name = AliasSubqueryString(e) -> "SUM(v.n)"

SO, in the second pass of ResolveValuesTypes, when the owning node is a Project, for example, and we collect the child schema via:

    for _, child := range n.Children() {
        childSchema = append(childSchema, child.Schema()...)
    }

The Project's child is the GroupBy node, so child.Schema() calls
GroupBy.Schema(), which builds column names using AliasSubqueryString(e) and returns original casing like "SUM(v.n)". We then try to match that against the Project's GetField name, which was lowercased by the planbuilder to "sum(v.n)". Without case-insensitive matching, the comparison fails silently, aggregate type propagation doesn't happen, and runtime panics occur because the type says int32 but the actual value is decimal.Decimal.

I confirmed this with runtime logging:

GetField.Name() = "min(v.n)" vs childSchema col.Name = "MIN(v.n)"
GetField.Name() = "sum(v.val)" vs childSchema col.Name = "SUM(v.Val)"

I also tried a normalizeFuncPrefix approach that would only uppercase the function keyword (before the first (), preserving column name casing. This was a janky shot in the dark that doesn't work because the planbuilder lowercases everything, including column refs inside the aggregate name. So "SUM(v.Val)" becomes "sum(v.val)" (note: "Val" is also lowered to "val"), not "sum(v.Val)".

recall that for the second pass can't use non-name matching because the two things we need to match are:

GetField on the Project, which has Index() (ColumnId at this stage, not a positional index), TableId(), and Name()
Child schema columns from GroupBy.Schema(), which are sql.Column structs with Name, Source, Type, but no ColumnId field

sql.Column (sql/column.go) appears to have no ColumnId or any ID field. GetField (sql/expression/get_field.go) has a separate exprId (ColumnId) field, but I believe there's nothing on the child schema side to match it against. So, only shared identifier between a GetField and its corresponding child schema column is name, and the name has the casing asymmetry from GMS. Maybe I'm missing something and there is something better to match on?

I think to get this second-pass part of code working properly it would require either:

Adding ColumnId field to sql.Column so we can match GetField.Id() against the child schema column's ID
Fixing casing asymmetry at the source so both paths produce the same string

Is this casing handling done in GMS correct?

Should I work on fixing this part on corresponding PR in GMS?

Hydrocharged · 2026-02-19T08:03:36Z

Should I work on fixing this part on corresponding PR in GMS?

So GMS (short for go-mysql-server) started as a MySQL engine that we're expanding to support Postgres functionality. As a consequence, there are case-insensitive places in MySQL that are case-sensitive in Postgres, and we need some structured way to discriminate between the two. We've had to do this to several features in several places, but this one is an overall relatively large task, so for this PR, I'd say let's just leave a detailed TODO comment surrounding the ToLower calls so we know to tackle it later.

codeaucafe · 2026-02-20T02:06:49Z

Thanks @Hydrocharged, I'll update the comment and skip the new edge case test

Added 2 TODOs for GMS case asymmetry issue which forces us to currently compare on strings.ToLower in ResolveValuesTypes()'s second pass: one in the implementation area itself and the corresponding test we had to skip due to this issue. Refs: dolthub#1648

testing/go/values_statement_test.go

server/analyzer/resolve_values_types.go

Refs: dolthub#1648

codeaucafe · 2026-02-22T01:52:35Z

refactored/shortened the TODOs. Let me know if you think it requires some more adjustment.

thanks again for reviewing.

Hydrocharged

LGTM!

coffeegoddd added the contribution label Jan 9, 2026

codeaucafe changed the title ~~feat(analyzer): fix VALUES clause type inference~~ dolthub/doltgresql#1648: fix VALUES clause type inference Jan 9, 2026

Hydrocharged self-requested a review January 9, 2026 12:52

codeaucafe marked this pull request as ready for review January 12, 2026 00:15

Hydrocharged requested changes Jan 12, 2026

View reviewed changes

codeaucafe added 3 commits February 4, 2026 02:04

refactor(expression): remove unnecessary UnknownCoercion

0157795

Refs: dolthub#1648

fix(analyzer): deep copy 2d slice to prevent shared-slice mutation

4967a3c

codeaucafe added 2 commits February 10, 2026 17:55

Merge branch 'main' of github.com:dolthub/doltgresql into codeaucafe/…

af77007

…1648-values-clause-type-inference # Conflicts: # server/expression/explicit_cast.go

codeaucafe commented Feb 14, 2026

View reviewed changes

test(types): add bats tests for VALUES type resolution

806f6ff

Refs: dolthub#1648

style(types): goimport format resolve_values_types.go

5b72cf1

Refs: dolthub#1648

Hydrocharged reviewed Feb 17, 2026

View reviewed changes

codeaucafe commented Feb 20, 2026

View reviewed changes

testing/go/values_statement_test.go Outdated Show resolved Hide resolved

codeaucafe commented Feb 20, 2026

View reviewed changes

server/analyzer/resolve_values_types.go Outdated Show resolved Hide resolved

docs(analyzer): shorten TODO comments

6ca7492

Refs: dolthub#1648

Hydrocharged approved these changes Feb 24, 2026

View reviewed changes

Hydrocharged merged commit fd9c0ce into dolthub:main Feb 24, 2026
19 of 20 checks passed

Uh oh!

Comments

Conversation

codeaucafe commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Questions for Reviewers

Uh oh!

Hydrocharged commented Jan 9, 2026

Uh oh!

codeaucafe commented Jan 9, 2026

Uh oh!

Hydrocharged left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hydrocharged commented Jan 12, 2026

Uh oh!

codeaucafe commented Jan 13, 2026

Uh oh!

codeaucafe commented Jan 21, 2026

Uh oh!

codeaucafe commented Jan 30, 2026

Uh oh!

codeaucafe commented Feb 3, 2026

Uh oh!

Hydrocharged commented Feb 3, 2026

Uh oh!

codeaucafe commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codeaucafe commented Feb 7, 2026

Uh oh!

codeaucafe commented Feb 7, 2026

Uh oh!

codeaucafe commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hydrocharged commented Feb 10, 2026

Uh oh!

codeaucafe commented Feb 10, 2026

Uh oh!

codeaucafe commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hydrocharged commented Feb 12, 2026

Uh oh!

codeaucafe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codeaucafe commented Feb 14, 2026

Uh oh!

Hydrocharged commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codeaucafe commented Feb 15, 2026

Uh oh!

Hydrocharged commented Feb 16, 2026

Uh oh!

codeaucafe commented Feb 17, 2026

Uh oh!

Hydrocharged left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codeaucafe commented Jan 9, 2026 •

edited

Loading

codeaucafe commented Feb 4, 2026 •

edited

Loading

codeaucafe commented Feb 9, 2026 •

edited

Loading

codeaucafe commented Feb 12, 2026 •

edited

Loading

Hydrocharged commented Feb 14, 2026 •

edited

Loading

codeaucafe commented Feb 19, 2026 •

edited

Loading