[SPARK-57032][SQL] Extend timestamp string parsing for nanosecond fractional precision by MaxGekk · Pull Request #56205 · apache/spark

MaxGekk · 2026-05-29T13:45:56Z

What changes were proposed in this pull request?

This PR extends Spark's existing timestamp string parser to preserve fractional-second digits beyond microsecond precision, and adds package-private parse entry points that produce the nanosecond-capable composite representation for TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) with p in [7, 9].

SparkDateTimeUtils.parseTimestampString now retains fractional digits 7-9 in a new output-only slot segments(9) (the sub-microsecond remainder, a value in [0, 999]). segments(6) continues to hold microseconds (digits 1-6), so all existing callers are unaffected. Digits beyond the 9th are dropped. The parsing loop bound is pinned to 9 (the original number of parsed segments) so the new slot is never written by the loop, keeping acceptance behavior identical.
New package-private APIs returning a normalized org.apache.spark.unsafe.types.TimestampNanosVal (epochMicros + nanosWithinMicro):
- stringToTimestampLTZNanos(s, precision, timeZoneId) and stringToTimestampLTZNanosAnsi(...)
- stringToTimestampNTZNanos(s, precision, allowTimeZone = true) and stringToTimestampNTZNanosAnsi(...)
A private truncateNanosWithinMicro helper applies the target precision p: digits beyond p are truncated toward zero (consistent with the existing microsecond path, which already drops digits 7+). Since microseconds occupy fractional digits 1-6, p in [7, 9] only affects the sub-microsecond remainder.

The normalization invariant (nanosWithinMicro in [0, 999]) holds for free: the remainder is parsed as exactly the 3 sub-micro digits and epochMicros comes from the independent microsecond path, so no carry is needed; TimestampNanosVal.fromParts re-validates the range.

Why are the changes needed?

The logical types TimestampNTZNanosType / TimestampLTZNanosType, the physical value TimestampNanosVal, and the TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) SQL syntax already exist, but string inputs with 7-9 fractional digits could not be converted to the SPIP composite representation because the parser truncated the fractional part to microseconds. This change provides the missing string-to-nanos parsing building block that downstream work (cast matrix, typed SQL literals, ingest tests) depends on.

Does this PR introduce any user-facing change?

No. Existing TimestampType / TimestampNTZType string parsing is byte-for-byte unchanged, and the new parse APIs are package-private and not yet wired to user-facing casts or literals.

How was this patch tested?

Added TimestampNanosParseSuite (in sql/catalyst) covering:

7/8/9-digit fractions preserved as nanosWithinMicro;
per-precision truncation (e.g. .123456789 -> 700 at p=7, 780 at p=8, 789 at p=9), and digits beyond the 9th dropped;
edge cases: .0, .999999999, trailing zeros, exactly 6 digits, .000000001;
NTZ vs LTZ: explicit zone offset, region-based zone, session-zone fallback, and allowTimeZone / time-only rejection for NTZ;
range corpus: Unix epoch, 1582 Julian/Gregorian cutover, year 9999, with sub-micro fractions;
ANSI variants throwing on invalid input.

Verified existing DateTimeUtilsSuite (including "nanoseconds truncation") and TimestampFormatterSuite still pass unchanged.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

…ctional precision ### What changes were proposed in this pull request? Extend `SparkDateTimeUtils.parseTimestampString` to preserve fractional-second digits 7-9 in a new output-only slot `segments(9)` (sub-microsecond remainder in [0, 999]), while keeping `segments(6)` as microseconds so all existing callers are unaffected. Add package-private parse entry points that return a normalized `TimestampNanosVal` for `TIMESTAMP_NTZ(p)`/`TIMESTAMP_LTZ(p)` with `p` in [7, 9]: `stringToTimestampNTZNanos`, `stringToTimestampLTZNanos`, and their ANSI variants. Fractional digits beyond the target precision `p` are truncated toward zero, consistent with the existing microsecond parsing behavior. ### Why are the changes needed? This is the first sub-task of the nanosecond datetime conversion utilities under SPARK-56822 (SPIP: Timestamps with nanosecond precision). Without it, timestamp strings with 7-9 fractional digits cannot be converted to the nanosecond-capable composite representation (epochMicros + nanosWithinMicro). ### Does this PR introduce any user-facing change? No. Existing `TimestampType`/`TimestampNTZType` string parsing is unchanged; the new parse APIs are package-private and not yet wired to user-facing casts. ### How was this patch tested? Added `TimestampNanosParseSuite` covering 7/8/9-digit fractions, per-precision truncation, NTZ/LTZ, zone suffixes, range edge cases, and ANSI errors. Verified existing `DateTimeUtilsSuite` and `TimestampFormatterSuite` still pass.

- Fix stale `isValidDigits` comment (digits 7-9 are now retained, not truncated) - Clarify segments(7-8) comment: values are written by loop as `i` advances but never read by any caller - Extend format-string examples in `parseTimestampString` Scaladoc to show the optional [ns][ns][ns] digits - Add precision guard (throws SparkException.internalError) before the try/catch in stringToTimestampLTZNanos and stringToTimestampNTZNanos, and explicit case 9 + error fallback in truncateNanosWithinMicro - Add Scaladoc to stringToTimestampNTZNanosAnsi noting that allowTimeZone defaults to true (TZ suffix is discarded, not rejected) - New tests: null input, time-only LTZ, pre-epoch negative timestamps, out-of-range precision (checkError / INTERNAL_ERROR), ANSI NTZ TZ-discard Co-authored-by: Isaac

Co-authored-by: Isaac

MaxGekk · 2026-05-30T05:21:45Z

@davidm-db @dejankrak-db @stevomitric Could you review this PR, please.

dejankrak-db · 2026-05-31T22:39:40Z

Architecture / Simplicity

1. (High) New parse APIs duplicate the existing instantToTimestampNanos / localDateTimeToTimestampNanos path. sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
(stringToTimestampLTZNanos/stringToTimestampNTZNanos). Master already has these helpers (from the preceding SPARK-57033 work) that take a java.time value and produce the (epochMicros, nanosWithinMicro) pair with precision
truncation; the new LTZ body is a near-verbatim copy of stringToTimestamp plus one line, manually re-deriving epochMicros. Carry nanos in the java.time value (see #2) and delegate to the existing helpers, deleting the hand-rolled
reconstruction.

2. (High) Sub-microsecond precision is threaded through an output-only segments(9) side-channel rather than carried in the parsed LocalTime/Instant. SparkDateTimeUtils.scala (array growth + slot-9 population; consumed in
the new APIs). The code builds LocalTime with only micros and smuggles digits 7-9 around the java.time value, reattaching afterward — yet LocalTime/Instant support nanos natively and instantToTimestampNanos already extracts
instant.getNano % NANOS_PER_MICROS. This side-channel is the root cause of #1 and forces the fragile "slot 9 written outside the loop, slots 7-8 written-but-never-read" invariant. Retain full fractional nanos, construct a nanosecond
LocalTime, and delegate.

3. (Medium) New private truncateNanosWithinMicro duplicates the existing truncateNanosWithinMicroToPrecision. SparkDateTimeUtils.scala. Same arithmetic; differs only by throwing on out-of-range (a dead arm — callers already
validate p ∈ [7,9]) and returning Short. Delete it and call truncateNanosWithinMicroToPrecision(...).toShort, or remove entirely via #1/#2.

Low-level

4. (Medium) No test pins the unchanged micro path through the edited shared parseTimestampString. sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampNanosParseSuite.scala. The highest-blast-radius change
(loop bound + segment array) affects every TimestampType/TIMESTAMP_NTZ/TIME cast, but the new suite exercises only the new nanos APIs. DateTimeUtilsSuite already covers the >6-digit micro-truncation case in a sibling suite (so
not blocking), but the central back-compat claim deserves a co-located regression assertion calling stringToTimestamp/stringToTimestampWithoutTimeZone on .123456789-style inputs.

5. (Low) ANSI NTZ variant hardcodes allowTimeZone = true (silently discards a zone), with no strict-reject ANSI path. SparkDateTimeUtils.scala (stringToTimestampNTZNanosAnsi). Asymmetric vs. existing strict NTZ casts, but
documented in scaladoc and tested, and the APIs aren't wired to user-facing casts yet — acceptable for now. Leave a // TODO(SPARK-57032 wiring) noting the ANSI cast must decide allowTimeZone explicitly when wired.

MaxGekk added 2 commits May 29, 2026 15:39

Fix coding style

95f7e9e

MaxGekk changed the title ~~[SPARK-57032][SQL] Extend timestamp string parsing for nanosecond fractional precision~~ [WIP][SPARK-57032][SQL] Extend timestamp string parsing for nanosecond fractional precision May 29, 2026

MaxGekk added 2 commits May 29, 2026 16:16

Fix scalastyle violations in nanos string parsing code

a3aafc1

Co-authored-by: Isaac

MaxGekk changed the title ~~[WIP][SPARK-57032][SQL] Extend timestamp string parsing for nanosecond fractional precision~~ [SPARK-57032][SQL] Extend timestamp string parsing for nanosecond fractional precision May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57032][SQL] Extend timestamp string parsing for nanosecond fractional precision#56205

[SPARK-57032][SQL] Extend timestamp string parsing for nanosecond fractional precision#56205
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:nanos-parse-string

MaxGekk commented May 29, 2026 •

edited

Loading

Uh oh!

MaxGekk commented May 30, 2026

Uh oh!

dejankrak-db commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaxGekk commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented May 30, 2026

Uh oh!

dejankrak-db commented May 31, 2026

Architecture / Simplicity

Low-level

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MaxGekk commented May 29, 2026 •

edited

Loading