[SPARK-57207][SQL] Support nanosecond timestamp types in the Types Framework by MaxGekk · Pull Request #56266 · apache/spark

MaxGekk · 2026-06-02T08:54:17Z

What changes were proposed in this pull request?

This PR wires TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) through the Spark SQL Types Framework (SPARK-53504), so that all type-specific behavior for the nanosecond timestamp types is centralized behind TypeOps / TypeApiOps. The nanos types are now supported only through the framework: the scattered legacy dispatch for them is removed.

Concretely:

Add TimestampNanosTypeOps (catalyst) with TimestampNTZNanosTypeOps / TimestampLTZNanosTypeOps, registered in TypeOps.apply(). Overrides: getPhysicalType, getJavaClass, getBoxedJavaClass, getRowWriter, getDefaultLiteral, getJavaLiteral, getMutableValue, toCatalystImpl, toScala/toScalaImpl, createSerializer, createDeserializer.
Add a getBoxedJavaClass hook to the TypeOps base (the boxed Java class used in codegen). The createSerializer / createDeserializer hooks already exist on the base trait (used by TimeTypeOps); the nanos ops above only override them.
Add TimestampNanosTypeApiOps (sql/api) with NTZ/LTZ subclasses, registered in TypeApiOps.apply(). getEncoder returns the SPARK-57033 leaves (LocalDateTimeNanosEncoder(p) / InstantNanosEncoder(p)), gated by DataTypeErrors.checkTimestampNanosTypesEnabled().
Remove the nanos branches from the legacy code paths now handled by the framework: SerializerBuildHelper, DeserializerBuildHelper, CatalystTypeConverters, EncoderUtils, CodeGenerator, Literal, and InternalRow. In SerializerBuildHelper / DeserializerBuildHelper, OptionEncoder / TransformingEncoder are unwrapped before the framework leaf dispatch, since those wrapper encoders proxy dataType to the wrapped encoder.
Add MutableTimestampNanos to SpecificInternalRow to avoid the MutableAny fallback.
Add a checkValue on spark.sql.timestampNanosTypes.enabled requiring spark.sql.types.framework.enabled=true, so the types cannot be enabled outside the framework.

Fractional-second string formatting is not implemented yet (no TimestampFormatter for these types). Until it lands, converting a nanos value to a string (CAST to STRING, EXPLAIN/SHOW output, SQL-literal rendering) raises the new UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING error rather than silently truncating to microseconds. Both the interpreted path (TimestampNanosTypeApiOps.format) and the codegen path (ToStringBase.castToStringCode) raise the identical error, so the two eval modes stay consistent.

Out of scope (follow-ups): string formatting/CAST-to-string, Connect proto, Arrow, PySpark conversion, Parquet/ColumnVector, and physical ordering/compare/hash.

Why are the changes needed?

The logical nanosecond timestamp types (SPARK-56876) and the physical row layer (SPARK-56981) already exist, but these types were wired only through scattered legacy dispatch. Centralizing the type-specific operations behind TypeOps, consistent with TimeType, is a prerequisite for the remaining nanosecond timestamp work and avoids the framework-on/off behavior divergence that the previous per-call-site handling produced.

Does this PR introduce any user-facing change?

No. The nanosecond timestamp types are a preview feature gated by spark.sql.timestampNanosTypes.enabled (and spark.sql.types.framework.enabled), both off by default in production. When these preview flags are enabled, converting a nanos timestamp to a string raises UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING because fractional-second formatting is not implemented yet.

How was this patch tested?

Added/updated tests:

TimestampNanosTypeOpsSuite (catalyst): TypeOps/TypeApiOps registration; PhysicalDataType, default Literal, and codegen Java class; InternalRow/SpecificInternalRow roundtrips incl. the dedicated MutableTimestampNanos holder; getEncoder returns the SPARK-57033 nanos encoders; CatalystTypeConverters java.time roundtrip; format/toSQLValue raise UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING; framework-disabled leaves the types unsupported (no legacy fallback); enabling the nanos types requires the framework flag.
TimestampNanosTypeOpsSuite also covers Option-wrapped nanos encoder roundtrips (Some/None for NTZ and LTZ), verifying wrapper encoders are unwrapped before the framework serde dispatch.
TimestampNanosRowSuite (catalyst): CAST nanos -> STRING raises the unsupported-feature error in both interpreted and codegen modes; unsafe/generic row roundtrips; literal validation.

build/sbt 'catalyst/testOnly *TimestampNanosTypeOpsSuite *TimestampNanosRowSuite'
build/sbt 'core/testOnly org.apache.spark.SparkThrowableSuite'

All tests pass. catalyst / sql-api scalastyle are clean.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

MaxGekk · 2026-06-02T09:02:26Z

Please, review this PR @davidm-db @stevomitric @dejankrak-db

stevomitric

LGTM, left a few comments, please resolve.

MaxGekk · 2026-06-02T14:50:03Z

A general note for reviewers: the dual logic (framework + legacy) is only present for the Time data type, as it had been developed prior to the Types Framework. Every other new data type should go through the framework. Accordingly, this PR routes the nanosecond timestamp types (TimestampNTZNanosType / TimestampLTZNanosType) exclusively through TypeOps / TypeApiOps: the legacy duplicate paths have been removed, and the types are unsupported when spark.sql.types.framework.enabled is off (enforced by a checkValue on spark.sql.timestampNanosTypes.enabled).

…ramework via TypeOps overrides Registers TimestampNTZNanosType and TimestampLTZNanosType in the Types Framework by adding TimestampNanosTypeOps (server-side, catalyst) and TimestampNanosTypeApiOps (client-side, sql/api) with their NTZ/LTZ concrete subclasses. These only override existing TypeOps/TypeApiOps methods (physical type, literals, row writer/accessor, mutable value, codegen class, formatting); no new framework methods are introduced. A dedicated MutableTimestampNanos holder is added to SpecificInternalRow to avoid the MutableAny fallback. Encoders remain out of scope (SPARK-57033), so getEncoder reports the type as unsupported to match the legacy RowEncoder behavior.

…tamp formatting Replace the interim TimestampNanosVal.toString passthrough in format() with SparkException.internalError so callers see a clear message instead of a debug string on the CAST-to-STRING path. Update the test to expect the error. Co-authored-by: Isaac

…ypeOps to reuse SPARK-57033 The Types Framework overrides for the nanosecond timestamp types short-circuit the legacy dispatch when spark.sql.types.framework.enabled is on. They were written against the pre-SPARK-57033 world, so enabling the framework regressed the encoder and java.time conversion support that PR apache#56158 already merged: - getEncoder threw UNSUPPORTED_DATA_TYPE_FOR_ENCODER instead of returning the LocalDateTimeNanosEncoder / InstantNanosEncoder added by SPARK-57033. - toCatalystImpl/toScala/toScalaImpl were identity passthroughs instead of converting java.time.LocalDateTime/Instant <-> TimestampNanosVal with per-precision truncation, as the legacy TimestampN(TZ|LTZ)NanosConverter does. Wire getEncoder (gated by spark.sql.timestampNanosTypes.enabled, mirroring the legacy RowEncoder path) to the precision-aware encoders, and route the catalyst conversions through the shared DateTimeUtils helpers (raising INVALID_EXTERNAL_VALUE on bad input). Add tests asserting the encoder/converter parity between the framework-on and framework-off paths and that the feature-flag gate still holds. Co-authored-by: Isaac

…ST-to-string behavior The Types Framework TypeApiOps.format for nanosecond timestamp types threw an internal "not yet implemented" error, which diverged from the framework-off path: ToStringBase renders the internal value via UTF8String.fromString(toString) (interpreted) / String.valueOf (codegen). That made CAST-to-string and display output differ depending on spark.sql.types.framework.enabled. Have format() mirror the legacy fallback by rendering the internal TimestampNanosVal via toString, so enabling the Types Framework does not change CAST-to-string / display output until dedicated fractional-second formatters land. Update the test to assert format/toSQLValue follow this behavior. Co-authored-by: Isaac

… Framework Route all nanosecond timestamp (TimestampNTZNanosType / TimestampLTZNanosType) handling through TypeOps / TypeApiOps and remove the legacy duplicate code paths, so these types are supported only when the Types Framework is enabled and are unsupported (no silent fallback) when it is disabled. - Move createSerializer / createDeserializer into TimestampNanosTypeOps and add a getBoxedJavaClass hook on TypeOps (overridden by TimeTypeOps) so EncoderUtils, Serializer/DeserializerBuildHelper, RowEncoder, CatalystTypeConverters, PhysicalDataType, Literal, CodeGenerator and InternalRow no longer special-case nanos types. - Make string formatting raise an internal error until a dedicated TimestampFormatter lands: TimestampNanosTypeApiOps.format / toSQLValue throw, and the codegen CAST-to-string path in ToStringBase throws the same error so interpreted and codegen behavior stay consistent. - Update the nanos suites to assert framework-only behavior and that CAST-to-string fails the same way in both eval modes.

…s timestamp types Validate via checkValue on spark.sql.timestampNanosTypes.enabled that it can only be set to true when spark.sql.types.framework.enabled is true, since the nanosecond timestamp types are implemented solely through the Types Framework. Switch the entry to createWithDefaultFunction so the validator runs only on explicitly-set values, not on the default during SQLConf initialization.

…mestamps as strings Converting a nanosecond timestamp value to a string (CAST to STRING, EXPLAIN/SHOW output, SQL-literal rendering) is reachable from public APIs once the preview flags are enabled, so the previous SparkException.internalError was inappropriate. Introduce the UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING error condition and raise it from both the interpreted (TimestampNanosTypeApiOps.format) and codegen (ToStringBase.castToStringCode) paths via a shared DataTypeErrors helper.

davidm-db

Minor, on the description: the "What changes were proposed" section says this PR adds three hooks to the TypeOps base - getBoxedJavaClass, createSerializer, and createDeserializer. Only getBoxedJavaClass is actually new here; createSerializer and createDeserializer already exist on the base trait (added in #54905), and this PR overrides them on the nanos ops. Might be worth rewording so reviewers don't go looking for base-trait additions that aren't in the diff.

davidm-db · 2026-06-03T11:16:25Z

-  @scala.annotation.tailrec
-  def javaBoxedType(dt: DataType): Class[_] = dt match {
+  def javaBoxedType(dt: DataType): Class[_] =
+    TypeOps(dt).map(_.getBoxedJavaClass).getOrElse(javaBoxedTypeDefault(dt))


Question (non-blocking): is getBoxedJavaClass reachable for the framework types today? javaBoxedType has two callers - ValidateExternalType.checkType and MapObjects.elementClassTag - and for non-native encoders both receive the external ObjectType(...) (e.g. ObjectType(LocalTime) / ObjectType(LocalDateTime)), not the raw TimestampNTZNanosType / TimeType; MapObjects's boxed-element branch only fires for native catalyst element types. So TypeOps(dt).map(_.getBoxedJavaClass) appears to return Some only for types that never reach javaBoxedType with their literal form. If that's right, the hook plus this wiring is effectively defensive for now - which is fine and consistent with centralizing the dispatch (the explicit nanos arms it replaces were equally unreached) - I just want to confirm it's intentional rather than closing a live gap. The hook is still the right shape if it ever does become reachable, since getJavaClass returns the primitive class for Long-backed types.

Claude flagged this, I wasn't able to disprove it - so this is more of a question for my knowledge...

Your analysis is correct, and it's intentional/defensive rather than closing a live gap. Both callers of javaBoxedType (ValidateExternalType.checkType, MapObjects.elementClassTag) receive the external ObjectType(...) for non-native encoders, and MapObjects's boxed-element branch only fires for native catalyst element types - so TypeOps(dt).map(_.getBoxedJavaClass) returns Some only for dataTypes that don't currently reach javaBoxedType in their raw form. It directly replaces the explicit case _: TimestampNTZNanosType => classOf[TimestampNanosVal] arms that the old javaBoxedType had, which were equally unreached. I kept it because it centralizes the dispatch consistently with the rest of the framework and is the correct shape if a primitive-backed framework type ever does reach javaBoxedType (getBoxedJavaClass returns the boxed class, getJavaClass the primitive).

stevomitric

Wrapper encoders can be bypassed by the new TypeOps serializer/deserializer dispatch. In PR head, SerializerBuildHelper.createSerializer and DeserializerBuildHelper.createDeserializer call TypeOps(enc.dataType) before matching wrapper encoders. But OptionEncoder and TransformingEncoder proxy dataType to the wrapped encoder, so OptionEncoder(LocalDateTimeNanosEncoder(p)).dataType == TimestampNTZNanosType(p). That means the nanos TypeOps path can try to serialize an Option[LocalDateTime] as if it were a bare LocalDateTime, and deserialize to LocalDateTime without WrapOption. Fix by unwrapping OptionEncoder / TransformingEncoder before framework dispatch, and add tests for optional/nested nanos encoder roundtrips.

…rk serde dispatch SerializerBuildHelper.createSerializer and DeserializerBuildHelper.createDeserializer dispatch to TypeOps(enc.dataType) before matching wrapper encoders. OptionEncoder and TransformingEncoder proxy dataType to the wrapped encoder, so e.g. OptionEncoder(LocalDateTimeNanosEncoder(p)).dataType == TimestampNTZNanosType(p), which let the framework leaf serializer/deserializer fire on the wrapper and skip UnwrapOption/WrapOption (and the transforming codec). Route OptionEncoder and TransformingEncoder through the default path first so the framework leaf dispatch only sees unwrapped leaf encoders. Add a roundtrip test for Option-wrapped nanos encoders.

MaxGekk · 2026-06-03T13:40:07Z

Thanks for the review @davidm-db @stevomitric. Addressing the two review-summary comments:

@davidm-db (description wording). You're right - only getBoxedJavaClass is newly added to the TypeOps base in this PR; createSerializer / createDeserializer already exist on the base trait and are used by TimeTypeOps, and here they're just overridden on the nanos ops. I've reworded the "What changes were proposed" section so it no longer implies base-trait additions that aren't in the diff.

@stevomitric (wrapper encoders bypassing the framework serde dispatch). Good catch. Since OptionEncoder / TransformingEncoder proxy dataType to the wrapped encoder, OptionEncoder(LocalDateTimeNanosEncoder(p)).dataType == TimestampNTZNanosType(p), so the framework createSerializer / createDeserializer dispatch (which keys off enc.dataType) fired on the wrapper and skipped UnwrapOption / WrapOption (and the codec for TransformingEncoder).

Fixed in 02db736: SerializerBuildHelper.createSerializer / DeserializerBuildHelper.createDeserializer now route OptionEncoder / TransformingEncoder through the default path first, so the framework leaf dispatch only ever sees unwrapped leaf encoders. Added a roundtrip test for Option-wrapped nanos encoders (Some / None, NTZ and LTZ).

For context: this dispatch ordering predates this PR (the helpers have routed through TypeOps(enc.dataType) since the serde hooks landed, and TimeTypeOps already overrides them), so the same fix also covers Option[LocalTime] etc. The nanos leaf encoders aren't reachable via Scala reflection today (Option[LocalDateTime] resolves to the micros LocalDateTimeEncoder), so the gap was latent - but the ordering was fragile, so better to make it robust.

MaxGekk mentioned this pull request Jun 2, 2026

[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side) #56199

Open

stevomitric approved these changes Jun 2, 2026

View reviewed changes

Comment thread sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala Outdated

Comment thread sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala Outdated

uros-db reviewed Jun 2, 2026

View reviewed changes

Comment thread sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala Outdated

uros-db reviewed Jun 2, 2026

View reviewed changes

Comment thread sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala

uros-db reviewed Jun 2, 2026

View reviewed changes

Comment thread sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala

MaxGekk force-pushed the nanos-types-typeops branch from 87cd8a6 to d2f70e6 Compare June 2, 2026 15:18

MaxGekk changed the title ~~[SPARK-57207][SQL] Register nanosecond timestamp types in the Types Framework via TypeOps overrides~~ [SPARK-57207][SQL] Support nanosecond timestamp types in the Types Framework Jun 2, 2026

MaxGekk added 7 commits June 3, 2026 10:39

MaxGekk force-pushed the nanos-types-typeops branch from d2f70e6 to 9ac8e3a Compare June 3, 2026 08:43

davidm-db approved these changes Jun 3, 2026

View reviewed changes

stevomitric reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57207][SQL] Support nanosecond timestamp types in the Types Framework#56266

[SPARK-57207][SQL] Support nanosecond timestamp types in the Types Framework#56266
MaxGekk wants to merge 8 commits into
apache:masterfrom
MaxGekk:nanos-types-typeops

MaxGekk commented Jun 2, 2026 •

edited

Loading

Uh oh!

MaxGekk commented Jun 2, 2026

Uh oh!

stevomitric left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MaxGekk commented Jun 2, 2026

Uh oh!

davidm-db left a comment

Uh oh!

davidm-db Jun 3, 2026

Uh oh!

davidm-db Jun 3, 2026

Uh oh!

MaxGekk Jun 3, 2026

Uh oh!

stevomitric left a comment

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MaxGekk commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 2, 2026

Uh oh!

stevomitric left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MaxGekk commented Jun 2, 2026

Uh oh!

davidm-db left a comment

Choose a reason for hiding this comment

Uh oh!

davidm-db Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

davidm-db Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

stevomitric left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MaxGekk commented Jun 2, 2026 •

edited

Loading