Skip to content

Commit d40c0bb

Browse files
committed
Add section on nullability of inner buffers
1 parent 7520684 commit d40c0bb

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

docs/source/format/CanonicalExtensions.rst

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -555,9 +555,9 @@ This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WIT
555555

556556
* The storage type of the extension is a ``Struct`` with 2 fields, in order:
557557

558-
* ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
558+
* ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
559559

560-
* ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00).
560+
* ``offset_minutes``: a preferably non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00).
561561

562562
* Extension type parameters:
563563

@@ -571,6 +571,16 @@ This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WIT
571571

572572
It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded with a preferred (*but not required*) index type of ``int8``, or run-end-encoded with a preferred (*but not required*) runs type of ``int8``.
573573

574+
.. note::
575+
576+
It is also *permissible* ``timestamp`` and ``offset_minutes`` to be nullable, even though it is not preferred.
577+
578+
If ``timestamp`` is nullable and a value is found to be null, then the whole ``TimestampWithOffset`` value should be interpreted as null. One way of achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the top-level struct validity buffer (V2) with the result of ``V1 AND V2``.
579+
580+
If ``offset`` is nullable and a value is found to be null, then this value should be interpreted as if the offset value were were zero.
581+
582+
It is *recommended* that implementations normalize this type's representation by dropping the inner validity buffers and applying the aforementioned transformations, only keeping the top-level struct validity buffer.
583+
574584
.. note::
575585

576586
Although not required, it is *recommended* that implementations represent this type as an RFC3339 string when de/serializing to/from JSON, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07.

0 commit comments

Comments
 (0)