Skip to content

Do not define total ordering for Datum#1041

Open
sgrif wants to merge 2 commits into
mainfrom
sg-yeet-total-ord-datum
Open

Do not define total ordering for Datum#1041
sgrif wants to merge 2 commits into
mainfrom
sg-yeet-total-ord-datum

Conversation

@sgrif

@sgrif sgrif commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Datum has had total ordering defined since it was originally introduced in 000b4cd. There's no context given for why, but I'm willing to assume the answer is "because it could be derived". At one point the implementation moved from derived to manually defined with identical semantics (in a commit written by claude that has no reasoning in the commit message, so I assume wasn't thought out).

The reason this is possible is because PG does define total ordering for each of its data types. The only data type for which this isn't obviously the case is floats, and PG defines all NaNs as equal and greater than non NaNs. Rust then also defines what ordering for differing enum variants means in its PartialOrd derive, which primarily orders on discriminant.

In both PartialEq and PartialOrd, we diverge from PG's behavior in that we do not ever consider cross type comparisons, while PG does have several opclasses that do not require the lhs and rhs to be the same type. This is probably fine, as I believe the only we could ever end up comparing Datums of differing types expecting a meaningful answer are:

  • The datum variant of a column changes across rows (impossible)
  • The datum variant of a column changes across shards (should be impossible)
  • We are blindly comparing datums from two columns that may have differing types or against a hard coded value and didn't consider this (unlikely, but possible)

The last case is a bit of a footgun, but one that I'm comfortable enough with to not immediately go have us try to perfectly match PG's semantics with. 1::int4 != 1.0::real is not necessarily ideal, but is at least a logical answer and one that is very hard for us to reach in our code.

Leaving PartialOrd alone, however, makes me much less comfortable. The semantics of the derived implementation would mean 1::int4 > 2.0::real, which is much more obviously wrong. And were that footgun to ever go off, it's feasible that it wouldn't be caught by a test, unless the code comparing datums considers every possible way lhs and rhs could differ in type.

Because of that, I have manually implemented PartialOrd to explicitly return None when the types differ, along with any comparison with Null returning None to reflect that PG returns NULL in that case. I opted not to have this reflect the behavior of PG's ORDER BY, which is NULLS FIRST by default, as that code handles NULLs explicitly, and I would expect the behavior of PartialOrd to match the < operator in SQL.

The PartialOrd impl was written in this more verbose way, rather than with a single _ if discriminant(self) != discriminant(other) so that any variants added to Datum in the future will fail to compile if they are not handled in the impl. (Hilariously, this meant I couldn't write (Null, _) | (_, Null) in the last arm to match the shape of the others, as the compiler correclty points out that (_, Null) is redundant as we've already handled every other possible type on the left explicitly)

I have left the Eq impl alone since returning false is a reasonable answer for differing types, and the only requirement that Rust defines for Eq is that a == a which is the case.

Datum has had total ordering defined since it was originally introduced
in 000b4cd. There's no context given for why, but I'm willing to
assume the answer is "because it could be derived". At one point the
implementation moved from derived to manually defined with identical
semantics (in a commit written by claude that has no reasoning in the
commit message, so I assume wasn't thought out).

The reason this is possible is because PG does define total ordering for
each of its data types. The only data type for which this isn't
obviously the case is floats, and PG defines all NaNs as equal and
greater than non NaNs. Rust then also defines what ordering for
differing enum variants means in its `PartialOrd` derive, which
primarily orders on discriminant.

In both `PartialEq` and `PartialOrd`, we diverge from PG's behavior in
that we do not ever consider cross type comparisons, while PG does have
several opclasses that do not require the lhs and rhs to be the same
type. This is *probably* fine, as I believe the only we could ever end
up comparing `Datum`s of differing types expecting a meaningful answer
are:

- The datum variant of a column changes across rows (impossible)
- The datum variant of a column changes across shards (should be
  impossible)
- We are blindly comparing datums from two columns that may have
  differing types or against a hard coded value and didn't consider this
  (unlikely, but possible)

The last case is a bit of a footgun, but one that I'm comfortable enough
with to not immediately go have us try to perfectly match PG's semantics
with. `1::int4 != 1.0::real` is not necessarily ideal, but is at least a
logical answer and one that is very hard for us to reach in our code.

Leaving `PartialOrd` alone, however, makes me much less comfortable. The
semantics of the derived implementation would mean `1::int4 >
2.0::real`, which is much more obviously wrong. And were that footgun to
ever go off, it's feasible that it wouldn't be caught by a test, unless
the code comparing datums considers every possible way lhs and rhs could
differ in type.

Because of that, I have manually implemented `PartialOrd` to explicitly
return `None` when the types differ, along with any comparison with
`Null` returning `None` to reflect that PG returns `NULL` in that case.
I opted not to have this reflect the behavior of PG's `ORDER BY`, which
is `NULLS FIRST` by default, as that code handles NULLs explicitly, and
I would expect the behavior of `PartialOrd` to match the `<` operator in
SQL.

The `PartialOrd` impl was written in this more verbose way, rather than
with a single `_ if discriminant(self) != discriminant(other)` so that
any variants added to `Datum` in the future will fail to compile if they
are not handled in the impl. (Hilariously, this meant I couldn't write
`(Null, _) | (_, Null)` in the last arm to match the shape of the
others, as the compiler correclty points out that `(_, Null)` is
redundant as we've already handled every other possible type on the left
explicitly)

I have left the `Eq` impl alone since returning `false` is a reasonable
answer for differing types, and the only requirement that Rust defines
for `Eq` is that `a == a` which is the case.
@blacksmith-sh

This comment has been minimized.

@levkk

levkk commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

It was probably to support sorting (ORDER BY) in cross-shard SELECT queries.

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 34.84848% with 43 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pgdog-postgres-types/src/datum.rs 19.44% 29 Missing ⚠️
pgdog/src/backend/pool/connection/buffer.rs 53.33% 14 Missing ⚠️

📢 Thoughts on this report? Let us know!

@sgrif

sgrif commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

It was probably to support sorting (ORDER BY) in cross-shard SELECT queries.

Yup, I mention that in the commit message. (The test failure shows I do need to add some more explicit handling to the ordering code before this is ready to merge though)

@sgrif

sgrif commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Well this is fun, I cannot reproduce the python failure outside of the python test. And I can see that the full matrix of ASC, ASC NULLS FIRST, DESC, and DESC NULLS LAST work in the other test cases 🤪

@levkk

levkk commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Might be due to encoding? Python tests might be using asyncpg which uses binary. Most of our other tests use text I think

I reversed the order in that test because it seems like it cares about
how we compare NULL with certain timestamps, and not that we support
NULLS FIRST|LAST specifically
@sgrif

sgrif commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

For posterity, the python test was correct and the other test I was messing with was trying to decode timestamptz into NaiveDateTime which sqlx doesn't support, and did it in such a way that it converted all decoding errors into None so it would pass regardless of the ordering

// FIXME(sage): We don't handle ASC NULLS FIRST or
// DESC NULLS LAST we should either error or add
// support rather than silently do the wrong sorting
match (&left.value, &right.value, asc) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation change made this look like more churn than I would have hoped. Only meaningful change in behavior is this match compared to line 109 in the old code

@meskill

meskill commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

I'd like to bring the same point as for the Add impl pr - the usage of default traits like Add, PartialOrd, PartialEq is confusing for Datum. The default traits should specify generic behavior that's, well, applicable in most cases. But as @sgrif mentioned the current implementation quite specialized and could go off if used somewhere else (and because the default traits are convenient it's easy to run into it).
I'd go the following way: specialized traits for the required operations inside aggregate.rs module instead of std and implement it the way we need for aggregation, drop the std entirely until we'll need them. Or, if we stay with std, please, comment the reasoning and the use cases for the defined impls.


/// GROUP BY <columns>
#[derive(Hash, PartialEq, Eq, PartialOrd, Ord, Debug)]
#[derive(Hash, PartialEq, Eq, PartialOrd, Debug)]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the PartialOrd here. And it's not clear how it should work actually considering that we store an index and the value here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thought I deleted that

Comment on lines +6 to 11
#[derive(Debug, Clone, PartialOrd, PartialEq, Eq, Hash)]
pub struct Array {
elements: Vec<Option<Datum>>,
element_oid: i32,
dim: Dimension,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the automatic derive for PartialOrd is questionable here since it will compare all the fields of structure in order and the Option has this None < Some semantic.

oh, wait, what does Option mean here? is it another representation for Datum::Null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that option is sus. I will look into it

@sgrif

sgrif commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Or, if we stay with std, please, comment the reasoning and the use cases for the defined impls.

Is there an impl that I didn't lay out the reasoning for in this commit? Or do you mean literally a code comment? I prefer to leave "why" out of code comments as that gets out of sync with the code very quickly, while commit messages are tied to the code at a specific point in time and are a git blame away

I'd go the following way: specialized traits for the required operations inside aggregate.rs module

I strongly disagree in this case. Add is much more clearly only something aggregation cares about. Equality and comparison are much more generally meaningful and losing access to sorting functions and data structures from the broader ecosystem is a huge loss. I feel like I laid out my reasoning about why this impl makes sense at the datum level pretty thoroughly here. Is there something more specific you disagree with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants