Skip to content

[protocol] Add a ColumnSchema abstraction? #253

@pitrou

Description

@pitrou

Instead of having individual methods to query the DType, categorical description, null description and metadata (which I suspect might be replicated at the DataFrame level?), how about adding a first-class abstraction to tie them together? For example:

class ColumnSchema(TypedDict):
    # the underlying physical representation
    dtype: DType
    # if the column is categorical, describes how to interpret the contents
    categorical_encoding: Optional[CategoricalDescription]
    # if the column supports null values, describes how they are represented
    null_encoding: Optional[Tuple[ColumnNullType, Any]]
    # arbitrary metadata attached to the column, possibly empty
    metadata: Dict[str, Any]

class Column(ABC):
    ...
    @property
    @abstractmethod
    def schema(self) -> ColumnSchema: ...

(IMHO, "encoding" sounds more precise than "description")

I'm also not sure why the spec uses a mix of Tuples and TypedDicts. Is it an attempt at optimizing Python object footprint?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions