Skip to content

[SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal#56295

Open
MaxGekk wants to merge 11 commits into
apache:masterfrom
MaxGekk:spark-57162-nanos-timestamp-formatter
Open

[SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal#56295
MaxGekk wants to merge 11 commits into
apache:masterfrom
MaxGekk:spark-57162-nanos-timestamp-formatter

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Jun 3, 2026

What changes were proposed in this pull request?

Extend the TimestampFormatter family with additive, nanosecond-aware parse and format methods that produce and consume org.apache.spark.unsafe.types.TimestampNanosVal (epochMicros: Long + nanosWithinMicro: Short in [0, 999]) at a target fractional precision p in [7, 9]:

  • New trait methods: parseNanos / parseNanosOptional (LTZ), parseWithoutTimeZoneNanos / parseWithoutTimeZoneNanosOptional (NTZ, plus a final allowTimeZone = true overload), and formatNanos.
  • Iso8601TimestampFormatter: extractNanos / extractNanosNTZ build the Instant / LocalDateTime and delegate to SparkDateTimeUtils.instantToTimestampNanos / localDateTimeToTimestampNanos; formatNanos floors sub-precision digits and renders the reconstructed instant.
  • DefaultTimestampFormatter: delegates to the SPARK-57032 nanos entry points.
  • LegacyFastTimestampFormatter / LegacySimpleTimestampFormatter: explicitly reject nanosecond precision under the LEGACY time parser policy (they cap at micro resolution).

Sub-precision fractional digits are truncated (floored), consistent with SPARK-57032. All existing microsecond methods are unchanged (additive API).

Why are the changes needed?

Today TimestampFormatter is microsecond-only and discards the 7th-9th fractional digits. The JSON and CSV datasources drive all timestamp parsing/formatting through TimestampFormatter, so they cannot round-trip 7-9 digit fractions until the formatter is nanos-aware. This is the foundational unblocker for nanosecond support in those datasources (parent: SPARK-56822).

Does this PR introduce any user-facing change?

No. The new formatter API is additive and gated for use behind spark.sql.timestampNanosTypes.enabled by its callers.

How was this patch tested?

New cases in TimestampFormatterSuite: parse/format round-trip for p in [7, 9] across ISO default and custom patterns (LTZ and NTZ); boundary values (nanosWithinMicro 0 and 999, pre-epoch instants, the 0001/1582/1970/9999 corpus); truncation rule; NTZ time-zone rejection; and LEGACY-mode rejection.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk added 3 commits June 3, 2026 11:12
…g and formatting TimestampNanosVal

### What changes were proposed in this pull request?

Extend the `TimestampFormatter` family with additive, nanosecond-aware parse and
format methods that produce and consume `org.apache.spark.unsafe.types.TimestampNanosVal`
(`epochMicros: Long` + `nanosWithinMicro: Short` in `[0, 999]`) at a target fractional
precision `p` in `[7, 9]`:

- New trait methods: `parseNanos` / `parseNanosOptional` (LTZ),
  `parseWithoutTimeZoneNanos` / `parseWithoutTimeZoneNanosOptional` (NTZ, plus a `final`
  `allowTimeZone = true` overload), and `formatNanos`.
- `Iso8601TimestampFormatter`: `extractNanos` / `extractNanosNTZ` build the
  `Instant` / `LocalDateTime` and delegate to `SparkDateTimeUtils.instantToTimestampNanos`
  / `localDateTimeToTimestampNanos`; `formatNanos` floors sub-`precision` digits and renders
  the reconstructed instant.
- `DefaultTimestampFormatter`: delegates to the SPARK-57032 nanos entry points.
- `LegacyFastTimestampFormatter` / `LegacySimpleTimestampFormatter`: explicitly reject
  nanosecond precision under the `LEGACY` time parser policy (they cap at micro resolution).

Sub-precision fractional digits are truncated (floored), consistent with SPARK-57032.
All existing microsecond methods are unchanged (additive API).

### Why are the changes needed?

Today `TimestampFormatter` is microsecond-only and discards the 7th-9th fractional digits.
The JSON and CSV datasources drive all timestamp parsing/formatting through
`TimestampFormatter`, so they cannot round-trip 7-9 digit fractions until the formatter is
nanos-aware. This is the foundational unblocker for nanosecond support in those datasources
(parent: SPARK-56822).

### Does this PR introduce _any_ user-facing change?

No. The new formatter API is additive and gated for use behind
`spark.sql.timestampNanosTypes.enabled` by its callers.

### How was this patch tested?

New cases in `TimestampFormatterSuite`: parse/format round-trip for `p` in `[7, 9]` across
ISO default and custom patterns (LTZ and NTZ); boundary values (`nanosWithinMicro` 0 and 999,
pre-epoch instants, the 0001/1582/1970/9999 corpus); truncation rule; NTZ time-zone rejection;
and LEGACY-mode rejection.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)
…pFormatter

Drop the escaped leading dash from @param descriptions, capitalize the first letter,
and inline each description on the same `@param` line where it fits.
… the LEGACY parser

Replace the INTERNAL_ERROR thrown by the legacy formatters' parseNanos/formatNanos
with a user-facing UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER error.
The LEGACY time parser policy is user-configurable, so a caller may legitimately
combine it with nanosecond-precision timestamps; that should surface as a proper
unsupported-feature error suggesting CORRECTED, not an internal error.
@MaxGekk MaxGekk changed the title [WIP][SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal [SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal Jun 3, 2026
MaxGekk added 3 commits June 3, 2026 12:01
…tters to throw the user-facing error

Co-authored-by: Max Gekk <max.gekk@gmail.com>
…its no fractional digits

Co-authored-by: Max Gekk <max.gekk@gmail.com>
…tTimeZoneNanos

Co-authored-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 3, 2026

@stevomitric Could you review this PR, please.

MaxGekk added 4 commits June 3, 2026 16:05
…ormatter

Reverts the @param reflow that reformatted pre-existing methods' scaladoc,
which added unrelated diff noise. Restores the original dashed multi-line
@param style consistent with the rest of the file.
…ampFormatter

formatNanos always routes through format(Instant) and applies the formatter's
zoneId, making it LTZ-only; an NTZ value (a UTC-grid wall clock) rendered under
a non-UTC zone came out shifted. Add an NTZ counterpart
formatWithoutTimeZoneNanos that mirrors the microsecond format(LocalDateTime)
path: it rebuilds the zone-independent local date-time via
timestampNanosToLocalDateTime, floors sub-precision digits, and renders with the
pattern only (no zoneId). The legacy formatters reject it with the same
UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER error, and
formatNanos is documented as LTZ-only. Adds a +01:00 regression test that the
prior all-UTC NTZ cases could not catch.
…nverters

instantToTimestampNanos / localDateTimeToTimestampNanos truncated sub-micro
digits via truncateNanosWithinMicroToPrecision, which silently passed through
any precision outside [7, 9]. The DefaultTimestampFormatter parse path already
rejects out-of-range precision (the string helpers throw), but the Iso8601 path
did not, so the contract was enforced inconsistently across subclasses.

Make the shared truncation helper raise SparkException.internalError for
precision outside [7, 9] so both the NTZ (LocalDateTime) and LTZ (Instant)
converters - and thus both formatter subclasses, in parse and format directions
- enforce the bound the same way. The check is an internal invariant, not a
user-facing validation: precision always originates from a validated
TimestampNTZNanosType / TimestampLTZNanosType (constructible only with p in
[7, 9]), so it is unreachable from public APIs.
Copy link
Copy Markdown
Contributor

@stevomitric stevomitric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for resolving comments.

@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 3, 2026

@uros-b Could you review this PR, please. We need this to support casting timestamps to nanos, in literals, text datasource, hive results, in some expressions and so on.

Comment on lines +179 to +187
/**
* Optional counterpart of [[parseNanos]]. The result is `None` on invalid input.
*/
def parseNanosOptional(s: String, precision: Int): Option[TimestampNanosVal] =
try {
Some(parseNanos(s, precision))
} catch {
case _: Exception => None
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will swallow the user-facing error on legacy formatters. Do we really want silent nulls instead of errors in such scenarios?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - this is a real inconsistency. The legacy formatters override the strict parseNanos / parseWithoutTimeZoneNanos to throw the user-facing UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER, but the *Optional variants inherited the trait default, which catches case _: Exception => None. Since legacyNanosUnsupported() is a (non-fatal) SparkUnsupportedOperationException, the optional path swallowed it and returned None - so under the LEGACY policy the strict path errored while the optional path silently yielded null. That conflates genuinely invalid input (legitimately None) with an unsupported config combination (LEGACY + nanos), which should surface loudly so the user knows to switch to CORRECTED.

Rather than re-add a throwing override to every formatter and rely on remembering it for future ones, I removed the defaults entirely: parseNanosOptional and parseWithoutTimeZoneNanosOptional are now abstract in the (sealed) TimestampFormatter trait, so every implementer must consciously decide. The legacy formatters implement them by throwing legacyNanosUnsupported(); the Default / Iso8601 formatters already implement them with the None-on-invalid-input behavior (their parseNanos doesn't throw the unsupported error, so that's correct there). Because the trait is sealed, the compiler guarantees completeness file-locally with no external burden. Extended the LEGACY-policy test to assert both optional variants raise the error instead of returning None.

Copy link
Copy Markdown
Member

@uros-b uros-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, otherwise LGTM!

…acy formatters surface the unsupported error

The legacy formatters override the strict `parseNanos` / `parseWithoutTimeZoneNanos`
to throw the user-facing UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER
error, but inherited the trait default for `parseNanosOptional` /
`parseWithoutTimeZoneNanosOptional`, which catches `case _: Exception => None`. Since
`legacyNanosUnsupported()` is a non-fatal SparkUnsupportedOperationException, the
optional path swallowed it and returned None - so under the LEGACY time parser policy
the strict path errored while the optional path silently yielded null, conflating
invalid input with an unsupported config combination.

Remove the swallowing defaults: make `parseNanosOptional` and
`parseWithoutTimeZoneNanosOptional` abstract in the sealed `TimestampFormatter`
trait so every implementer must decide explicitly. The legacy formatters implement
them by throwing `legacyNanosUnsupported()`; the Default/Iso8601 formatters already
implement the None-on-invalid-input behavior. Extend the LEGACY-policy test to assert
both optional variants raise the error instead of returning None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants