Skip to content

Feature: Add Polars DataFrame support #1133

@lmeyerov

Description

@lmeyerov

Depends on: #1132 — can be one PR
Original request: #1124

Summary

Add support for polars.DataFrame and polars.LazyFrame as inputs to edges(), nodes(), plot(), hypergraph(), materialize_nodes(), and get_degrees(). Polars is optional — no behavior change if not installed.

Current behavior

import polars as pl
import graphistry

edges = pl.DataFrame({'src': ['a', 'b'], 'dst': ['b', 'c']})
graphistry.edges(edges, 'src', 'dst').plot()

Requested behavior

import polars as pl
import graphistry

edges = pl.DataFrame({'src': ['a', 'b', 'c'], 'dst': ['b', 'c', 'a'], 'weight': [1, 2, 3]})
nodes = pl.DataFrame({'id': ['a', 'b', 'c'], 'label': ['Alice', 'Bob', 'Carol']})

g = graphistry.edges(edges, 'src', 'dst').nodes(nodes, 'id')
g.plot()               # ✅
g.materialize_nodes()  # ✅
g.get_degrees()        # ✅
g.hypergraph(edges)    # ✅


graphistry.edges(edges.lazy(), 'src', 'dst').plot()  # ✅

What's not in scope

The result of compute methods will be pandas-backed, not Polars. Convert back if needed:

pl.from_pandas(g.materialize_nodes()._nodes)

featurize() and umap() are out of scope — separate issue.

Workaround until fixed

graphistry.edges(pl_df.to_pandas(), 'src', 'dst').plot()
graphistry.edges(pl_lazy.collect().to_pandas(), 'src', 'dst').plot()

Design

Polars is an input format, not a compute engine. Like Arrow and Spark, it gets converted at path boundaries — it doesn't stay native throughout. There are two conversion paths, each needing a Polars branch:

Upload path (plot()): Polars → Arrow via .to_arrow(). No pandas intermediate. Efficient and lossless. LazyFrame is materialized first via .collect(), the same way dask uses .compute().

Compute/hypergraph paths (materialize_nodes, hypergraph, etc.): Polars → pandas via .to_pandas(). These paths operate on live pandas/cuDF engines. Once _table_to_pandas() handles Polars, the coerce-at-entry fixes from #1132 cover these paths automatically — no extra Polars-specific code needed there.

Implementation

All changes are in graphistry/PlotterBase.py unless noted. Follow the existing maybe_cudf() / cudf branch pattern exactly.

1. maybe_polars() lazy import — add after maybe_spark() (~line 142):

@lru_cache(maxsize=1)
def maybe_polars():
    try:
        import polars
        return polars
    except ImportError:
        pass
    except RuntimeError:
        logger.warning('Runtime error importing polars', exc_info=True)
    return None

2. Memoization cache — add with the other caches (~line 166):

_polars_hash_to_arrow: WeakValueDictionary = WeakValueDictionary()

And clear it in reset_caches().

3. _table_to_arrow() — Polars branch — add before the final raise (~line 2987):

if not (maybe_polars() is None) and isinstance(table, (maybe_polars().DataFrame, maybe_polars().LazyFrame)):
    if isinstance(table, maybe_polars().LazyFrame):
        table = table.collect()
    hashed = None
    if memoize:
        hashed = (
            hashlib.sha256(table.hash_rows().to_numpy().tobytes()).hexdigest()
            + hashlib.sha256(str(table.columns).encode('utf-8')).hexdigest()
        )
        if hashed in PlotterBase._polars_hash_to_arrow:
            return PlotterBase._polars_hash_to_arrow[hashed].v
    out = table.to_arrow().replace_schema_metadata({})
    # strip schema metadata: Polars attaches polars-specific metadata that can cause
    # downstream issues, same reason the pandas branch calls replace_schema_metadata({})
    if memoize and hashed is not None:
        w = WeakValueWrapper(out)
        cache_coercion(hashed, w)
        PlotterBase._polars_hash_to_arrow[hashed] = w
    return out

4. _table_to_pandas() — Polars branch — add before the final raise (~line 2834):

if not (maybe_polars() is None) and isinstance(table, (maybe_polars().DataFrame, maybe_polars().LazyFrame)):
    if isinstance(table, maybe_polars().LazyFrame):
        table = table.collect()
    return table.to_pandas()

5. _plot_dispatch() type guard — extend the isinstance chain (~line 2700):

or ( not (maybe_polars() is None) and isinstance(graph, (maybe_polars().DataFrame, maybe_polars().LazyFrame)) )

6. graphistry/Engine.py: resolve_engine() — add explicit Polars → PANDAS before the fallthrough (~line 70), matching the Arrow fix from #1132:

if not (maybe_polars() is None) and isinstance(g_or_df, (maybe_polars().DataFrame, maybe_polars().LazyFrame)):
    return Engine.PANDAS

(The coerce-at-entry in materialize_nodes and hypergraph from #1132 then handles the actual conversion.)

Testing

New file tests/test_polars.py with pytest.importorskip('polars') at the top:

  • _table_to_arrow(pl.DataFrame(...)) → returns pa.Table, schema metadata is empty
  • _table_to_arrow(pl.DataFrame(...).lazy()) → same
  • _table_to_pandas(pl.DataFrame(...)) → returns pd.DataFrame
  • edges(pl.DataFrame(...)).nodes(pl.DataFrame(...)).plot() → mock/skip upload, assert no error before upload step
  • edges(pl.DataFrame(...)).materialize_nodes() → returns result with pandas _nodes
  • hypergraph(pl.DataFrame(...)) → returns valid Hypergraph
  • Memoization: calling _table_to_arrow twice on the same frame returns the same object
  • Full suite passes with polars not installed (existing tests unaffected)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions