Depends on: #1132 — can be one PR
Original request: #1124
Summary
Add support for polars.DataFrame and polars.LazyFrame as inputs to edges(), nodes(), plot(), hypergraph(), materialize_nodes(), and get_degrees(). Polars is optional — no behavior change if not installed.
Current behavior
import polars as pl
import graphistry
edges = pl.DataFrame({'src': ['a', 'b'], 'dst': ['b', 'c']})
graphistry.edges(edges, 'src', 'dst').plot()
Requested behavior
import polars as pl
import graphistry
edges = pl.DataFrame({'src': ['a', 'b', 'c'], 'dst': ['b', 'c', 'a'], 'weight': [1, 2, 3]})
nodes = pl.DataFrame({'id': ['a', 'b', 'c'], 'label': ['Alice', 'Bob', 'Carol']})
g = graphistry.edges(edges, 'src', 'dst').nodes(nodes, 'id')
g.plot() # ✅
g.materialize_nodes() # ✅
g.get_degrees() # ✅
g.hypergraph(edges) # ✅
graphistry.edges(edges.lazy(), 'src', 'dst').plot() # ✅
What's not in scope
The result of compute methods will be pandas-backed, not Polars. Convert back if needed:
pl.from_pandas(g.materialize_nodes()._nodes)
featurize() and umap() are out of scope — separate issue.
Workaround until fixed
graphistry.edges(pl_df.to_pandas(), 'src', 'dst').plot()
graphistry.edges(pl_lazy.collect().to_pandas(), 'src', 'dst').plot()
Design
Polars is an input format, not a compute engine. Like Arrow and Spark, it gets converted at path boundaries — it doesn't stay native throughout. There are two conversion paths, each needing a Polars branch:
Upload path (plot()): Polars → Arrow via .to_arrow(). No pandas intermediate. Efficient and lossless. LazyFrame is materialized first via .collect(), the same way dask uses .compute().
Compute/hypergraph paths (materialize_nodes, hypergraph, etc.): Polars → pandas via .to_pandas(). These paths operate on live pandas/cuDF engines. Once _table_to_pandas() handles Polars, the coerce-at-entry fixes from #1132 cover these paths automatically — no extra Polars-specific code needed there.
Implementation
All changes are in graphistry/PlotterBase.py unless noted. Follow the existing maybe_cudf() / cudf branch pattern exactly.
1. maybe_polars() lazy import — add after maybe_spark() (~line 142):
@lru_cache(maxsize=1)
def maybe_polars():
try:
import polars
return polars
except ImportError:
pass
except RuntimeError:
logger.warning('Runtime error importing polars', exc_info=True)
return None
2. Memoization cache — add with the other caches (~line 166):
_polars_hash_to_arrow: WeakValueDictionary = WeakValueDictionary()
And clear it in reset_caches().
3. _table_to_arrow() — Polars branch — add before the final raise (~line 2987):
if not (maybe_polars() is None) and isinstance(table, (maybe_polars().DataFrame, maybe_polars().LazyFrame)):
if isinstance(table, maybe_polars().LazyFrame):
table = table.collect()
hashed = None
if memoize:
hashed = (
hashlib.sha256(table.hash_rows().to_numpy().tobytes()).hexdigest()
+ hashlib.sha256(str(table.columns).encode('utf-8')).hexdigest()
)
if hashed in PlotterBase._polars_hash_to_arrow:
return PlotterBase._polars_hash_to_arrow[hashed].v
out = table.to_arrow().replace_schema_metadata({})
# strip schema metadata: Polars attaches polars-specific metadata that can cause
# downstream issues, same reason the pandas branch calls replace_schema_metadata({})
if memoize and hashed is not None:
w = WeakValueWrapper(out)
cache_coercion(hashed, w)
PlotterBase._polars_hash_to_arrow[hashed] = w
return out
4. _table_to_pandas() — Polars branch — add before the final raise (~line 2834):
if not (maybe_polars() is None) and isinstance(table, (maybe_polars().DataFrame, maybe_polars().LazyFrame)):
if isinstance(table, maybe_polars().LazyFrame):
table = table.collect()
return table.to_pandas()
5. _plot_dispatch() type guard — extend the isinstance chain (~line 2700):
or ( not (maybe_polars() is None) and isinstance(graph, (maybe_polars().DataFrame, maybe_polars().LazyFrame)) )
6. graphistry/Engine.py: resolve_engine() — add explicit Polars → PANDAS before the fallthrough (~line 70), matching the Arrow fix from #1132:
if not (maybe_polars() is None) and isinstance(g_or_df, (maybe_polars().DataFrame, maybe_polars().LazyFrame)):
return Engine.PANDAS
(The coerce-at-entry in materialize_nodes and hypergraph from #1132 then handles the actual conversion.)
Testing
New file tests/test_polars.py with pytest.importorskip('polars') at the top:
_table_to_arrow(pl.DataFrame(...)) → returns pa.Table, schema metadata is empty
_table_to_arrow(pl.DataFrame(...).lazy()) → same
_table_to_pandas(pl.DataFrame(...)) → returns pd.DataFrame
edges(pl.DataFrame(...)).nodes(pl.DataFrame(...)).plot() → mock/skip upload, assert no error before upload step
edges(pl.DataFrame(...)).materialize_nodes() → returns result with pandas _nodes
hypergraph(pl.DataFrame(...)) → returns valid Hypergraph
- Memoization: calling
_table_to_arrow twice on the same frame returns the same object
- Full suite passes with polars not installed (existing tests unaffected)
Depends on: #1132 — can be one PR
Original request: #1124
Summary
Add support for
polars.DataFrameandpolars.LazyFrameas inputs toedges(),nodes(),plot(),hypergraph(),materialize_nodes(), andget_degrees(). Polars is optional — no behavior change if not installed.Current behavior
Requested behavior
What's not in scope
The result of compute methods will be pandas-backed, not Polars. Convert back if needed:
featurize()andumap()are out of scope — separate issue.Workaround until fixed
Design
Polars is an input format, not a compute engine. Like Arrow and Spark, it gets converted at path boundaries — it doesn't stay native throughout. There are two conversion paths, each needing a Polars branch:
Upload path (
plot()): Polars → Arrow via.to_arrow(). No pandas intermediate. Efficient and lossless.LazyFrameis materialized first via.collect(), the same way dask uses.compute().Compute/hypergraph paths (
materialize_nodes,hypergraph, etc.): Polars → pandas via.to_pandas(). These paths operate on live pandas/cuDF engines. Once_table_to_pandas()handles Polars, the coerce-at-entry fixes from #1132 cover these paths automatically — no extra Polars-specific code needed there.Implementation
All changes are in
graphistry/PlotterBase.pyunless noted. Follow the existingmaybe_cudf()/ cudf branch pattern exactly.1.
maybe_polars()lazy import — add aftermaybe_spark()(~line 142):2. Memoization cache — add with the other caches (~line 166):
And clear it in
reset_caches().3.
_table_to_arrow()— Polars branch — add before the finalraise(~line 2987):4.
_table_to_pandas()— Polars branch — add before the finalraise(~line 2834):5.
_plot_dispatch()type guard — extend the isinstance chain (~line 2700):6.
graphistry/Engine.py:resolve_engine()— add explicit Polars → PANDAS before the fallthrough (~line 70), matching the Arrow fix from #1132:(The coerce-at-entry in
materialize_nodesandhypergraphfrom #1132 then handles the actual conversion.)Testing
New file
tests/test_polars.pywithpytest.importorskip('polars')at the top:_table_to_arrow(pl.DataFrame(...))→ returnspa.Table, schema metadata is empty_table_to_arrow(pl.DataFrame(...).lazy())→ same_table_to_pandas(pl.DataFrame(...))→ returnspd.DataFrameedges(pl.DataFrame(...)).nodes(pl.DataFrame(...)).plot()→ mock/skip upload, assert no error before upload stepedges(pl.DataFrame(...)).materialize_nodes()→ returns result with pandas_nodeshypergraph(pl.DataFrame(...))→ returns validHypergraph_table_to_arrowtwice on the same frame returns the same object