[SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal by MaxGekk · Pull Request #56295 · apache/spark

MaxGekk · 2026-06-03T09:14:04Z

What changes were proposed in this pull request?

Extend the TimestampFormatter family with additive, nanosecond-aware parse and format methods that produce and consume org.apache.spark.unsafe.types.TimestampNanosVal (epochMicros: Long + nanosWithinMicro: Short in [0, 999]) at a target fractional precision p in [7, 9]:

New trait methods: parseNanos / parseNanosOptional (LTZ), parseWithoutTimeZoneNanos / parseWithoutTimeZoneNanosOptional (NTZ, plus a final allowTimeZone = true overload), and formatNanos.
Iso8601TimestampFormatter: extractNanos / extractNanosNTZ build the Instant / LocalDateTime and delegate to SparkDateTimeUtils.instantToTimestampNanos / localDateTimeToTimestampNanos; formatNanos floors sub-precision digits and renders the reconstructed instant.
DefaultTimestampFormatter: delegates to the SPARK-57032 nanos entry points.
LegacyFastTimestampFormatter / LegacySimpleTimestampFormatter: explicitly reject nanosecond precision under the LEGACY time parser policy (they cap at micro resolution).

Sub-precision fractional digits are truncated (floored), consistent with SPARK-57032. All existing microsecond methods are unchanged (additive API).

Why are the changes needed?

Today TimestampFormatter is microsecond-only and discards the 7th-9th fractional digits. The JSON and CSV datasources drive all timestamp parsing/formatting through TimestampFormatter, so they cannot round-trip 7-9 digit fractions until the formatter is nanos-aware. This is the foundational unblocker for nanosecond support in those datasources (parent: SPARK-56822).

Does this PR introduce any user-facing change?

No. The new formatter API is additive and gated for use behind spark.sql.timestampNanosTypes.enabled by its callers.

How was this patch tested?

New cases in TimestampFormatterSuite: parse/format round-trip for p in [7, 9] across ISO default and custom patterns (LTZ and NTZ); boundary values (nanosWithinMicro 0 and 999, pre-epoch instants, the 0001/1582/1970/9999 corpus); truncation rule; NTZ time-zone rejection; and LEGACY-mode rejection.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

…g and formatting TimestampNanosVal ### What changes were proposed in this pull request? Extend the `TimestampFormatter` family with additive, nanosecond-aware parse and format methods that produce and consume `org.apache.spark.unsafe.types.TimestampNanosVal` (`epochMicros: Long` + `nanosWithinMicro: Short` in `[0, 999]`) at a target fractional precision `p` in `[7, 9]`: - New trait methods: `parseNanos` / `parseNanosOptional` (LTZ), `parseWithoutTimeZoneNanos` / `parseWithoutTimeZoneNanosOptional` (NTZ, plus a `final` `allowTimeZone = true` overload), and `formatNanos`. - `Iso8601TimestampFormatter`: `extractNanos` / `extractNanosNTZ` build the `Instant` / `LocalDateTime` and delegate to `SparkDateTimeUtils.instantToTimestampNanos` / `localDateTimeToTimestampNanos`; `formatNanos` floors sub-`precision` digits and renders the reconstructed instant. - `DefaultTimestampFormatter`: delegates to the SPARK-57032 nanos entry points. - `LegacyFastTimestampFormatter` / `LegacySimpleTimestampFormatter`: explicitly reject nanosecond precision under the `LEGACY` time parser policy (they cap at micro resolution). Sub-precision fractional digits are truncated (floored), consistent with SPARK-57032. All existing microsecond methods are unchanged (additive API). ### Why are the changes needed? Today `TimestampFormatter` is microsecond-only and discards the 7th-9th fractional digits. The JSON and CSV datasources drive all timestamp parsing/formatting through `TimestampFormatter`, so they cannot round-trip 7-9 digit fractions until the formatter is nanos-aware. This is the foundational unblocker for nanosecond support in those datasources (parent: SPARK-56822). ### Does this PR introduce _any_ user-facing change? No. The new formatter API is additive and gated for use behind `spark.sql.timestampNanosTypes.enabled` by its callers. ### How was this patch tested? New cases in `TimestampFormatterSuite`: parse/format round-trip for `p` in `[7, 9]` across ISO default and custom patterns (LTZ and NTZ); boundary values (`nanosWithinMicro` 0 and 999, pre-epoch instants, the 0001/1582/1970/9999 corpus); truncation rule; NTZ time-zone rejection; and LEGACY-mode rejection. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8)

…pFormatter Drop the escaped leading dash from @param descriptions, capitalize the first letter, and inline each description on the same `@param` line where it fits.

… the LEGACY parser Replace the INTERNAL_ERROR thrown by the legacy formatters' parseNanos/formatNanos with a user-facing UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER error. The LEGACY time parser policy is user-configurable, so a caller may legitimately combine it with nanosecond-precision timestamps; that should surface as a proper unsupported-feature error suggesting CORRECTED, not an internal error.

…tters to throw the user-facing error Co-authored-by: Max Gekk <max.gekk@gmail.com>

…its no fractional digits Co-authored-by: Max Gekk <max.gekk@gmail.com>

…tTimeZoneNanos Co-authored-by: Max Gekk <max.gekk@gmail.com>

MaxGekk · 2026-06-03T10:14:28Z

@stevomitric Could you review this PR, please.

…ormatter Reverts the @param reflow that reformatted pre-existing methods' scaladoc, which added unrelated diff noise. Restores the original dashed multi-line @param style consistent with the rest of the file.

…ampFormatter formatNanos always routes through format(Instant) and applies the formatter's zoneId, making it LTZ-only; an NTZ value (a UTC-grid wall clock) rendered under a non-UTC zone came out shifted. Add an NTZ counterpart formatWithoutTimeZoneNanos that mirrors the microsecond format(LocalDateTime) path: it rebuilds the zone-independent local date-time via timestampNanosToLocalDateTime, floors sub-precision digits, and renders with the pattern only (no zoneId). The legacy formatters reject it with the same UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER error, and formatNanos is documented as LTZ-only. Adds a +01:00 regression test that the prior all-UTC NTZ cases could not catch.

…nverters instantToTimestampNanos / localDateTimeToTimestampNanos truncated sub-micro digits via truncateNanosWithinMicroToPrecision, which silently passed through any precision outside [7, 9]. The DefaultTimestampFormatter parse path already rejects out-of-range precision (the string helpers throw), but the Iso8601 path did not, so the contract was enforced inconsistently across subclasses. Make the shared truncation helper raise SparkException.internalError for precision outside [7, 9] so both the NTZ (LocalDateTime) and LTZ (Instant) converters - and thus both formatter subclasses, in parse and format directions - enforce the bound the same way. The check is an internal invariant, not a user-facing validation: precision always originates from a validated TimestampNTZNanosType / TimestampLTZNanosType (constructible only with p in [7, 9]), so it is unreachable from public APIs.

stevomitric

LGTM, thanks for resolving comments.

MaxGekk · 2026-06-03T15:07:07Z

@uros-b Could you review this PR, please. We need this to support casting timestamps to nanos, in literals, text datasource, hive results, in some expressions and so on.

uros-b · 2026-06-03T17:22:46Z

+  /**
+   * Optional counterpart of [[parseNanos]]. The result is `None` on invalid input.
+   */
+  def parseNanosOptional(s: String, precision: Int): Option[TimestampNanosVal] =
+    try {
+      Some(parseNanos(s, precision))
+    } catch {
+      case _: Exception => None
+    }


This will swallow the user-facing error on legacy formatters. Do we really want silent nulls instead of errors in such scenarios?

Good catch - this is a real inconsistency. The legacy formatters override the strict parseNanos / parseWithoutTimeZoneNanos to throw the user-facing UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER, but the *Optional variants inherited the trait default, which catches case _: Exception => None. Since legacyNanosUnsupported() is a (non-fatal) SparkUnsupportedOperationException, the optional path swallowed it and returned None - so under the LEGACY policy the strict path errored while the optional path silently yielded null. That conflates genuinely invalid input (legitimately None) with an unsupported config combination (LEGACY + nanos), which should surface loudly so the user knows to switch to CORRECTED.

Rather than re-add a throwing override to every formatter and rely on remembering it for future ones, I removed the defaults entirely: parseNanosOptional and parseWithoutTimeZoneNanosOptional are now abstract in the (sealed) TimestampFormatter trait, so every implementer must consciously decide. The legacy formatters implement them by throwing legacyNanosUnsupported(); the Default / Iso8601 formatters already implement them with the None-on-invalid-input behavior (their parseNanos doesn't throw the unsupported error, so that's correct there). Because the trait is sealed, the compiler guarantees completeness file-locally with no external burden. Extended the LEGACY-policy test to assert both optional variants raise the error instead of returning None.

uros-b

Left one comment, otherwise LGTM!

…acy formatters surface the unsupported error The legacy formatters override the strict `parseNanos` / `parseWithoutTimeZoneNanos` to throw the user-facing UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER error, but inherited the trait default for `parseNanosOptional` / `parseWithoutTimeZoneNanosOptional`, which catches `case _: Exception => None`. Since `legacyNanosUnsupported()` is a non-fatal SparkUnsupportedOperationException, the optional path swallowed it and returned None - so under the LEGACY time parser policy the strict path errored while the optional path silently yielded null, conflating invalid input with an unsupported config combination. Remove the swallowing defaults: make `parseNanosOptional` and `parseWithoutTimeZoneNanosOptional` abstract in the sealed `TimestampFormatter` trait so every implementer must decide explicitly. The legacy formatters implement them by throwing `legacyNanosUnsupported()`; the Default/Iso8601 formatters already implement the None-on-invalid-input behavior. Extend the LEGACY-policy test to assert both optional variants raise the error instead of returning None.

MaxGekk added 3 commits June 3, 2026 11:12

[SPARK-57162][SQL] Normalize scaladoc @param descriptions in Timestam…

f272cde

…pFormatter Drop the escaped leading dash from @param descriptions, capitalize the first letter, and inline each description on the same `@param` line where it fits.

MaxGekk changed the title ~~[WIP][SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal~~ [SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal Jun 3, 2026

MaxGekk added 3 commits June 3, 2026 12:01

[SPARK-57162][SQL] Override parseWithoutTimeZoneNanos in legacy forma…

19f1b7d

…tters to throw the user-facing error Co-authored-by: Max Gekk <max.gekk@gmail.com>

[SPARK-57162][SQL] Test that DefaultTimestampFormatter.formatNanos em…

d7f9d47

…its no fractional digits Co-authored-by: Max Gekk <max.gekk@gmail.com>

[SPARK-57162][SQL] Explain why legacy formatters override parseWithou…

0496cc3

…tTimeZoneNanos Co-authored-by: Max Gekk <max.gekk@gmail.com>

stevomitric reviewed Jun 3, 2026

View reviewed changes

MaxGekk added 4 commits June 3, 2026 16:05

[SPARK-57162][SQL] Revert scaladoc @param normalization in TimestampF…

490f3ca

…ormatter Reverts the @param reflow that reformatted pre-existing methods' scaladoc, which added unrelated diff noise. Restores the original dashed multi-line @param style consistent with the rest of the file.

[SPARK-57162][SQL] Apply scalafmt formatting to TimestampFormatter

52c876e

stevomitric approved these changes Jun 3, 2026

View reviewed changes

uros-b reviewed Jun 3, 2026

View reviewed changes

uros-b approved these changes Jun 3, 2026

View reviewed changes

HyukjinKwon approved these changes Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal#56295

[SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal#56295
MaxGekk wants to merge 11 commits into
apache:masterfrom
MaxGekk:spark-57162-nanos-timestamp-formatter

MaxGekk commented Jun 3, 2026 •

edited

Loading

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevomitric left a comment

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

uros-b Jun 3, 2026

Uh oh!

MaxGekk Jun 3, 2026

Uh oh!

uros-b left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MaxGekk commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevomitric left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

uros-b Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MaxGekk commented Jun 3, 2026 •

edited

Loading