Skip to content

feat: add lance-c crate — C/C++ bindings for Lance (Phase 1)#6254

Open
jja725 wants to merge 3 commits intolance-format:mainfrom
jja725:feat/lance-c-phase1
Open

feat: add lance-c crate — C/C++ bindings for Lance (Phase 1)#6254
jja725 wants to merge 3 commits intolance-format:mainfrom
jja725:feat/lance-c-phase1

Conversation

@jja725
Copy link
Copy Markdown

@jja725 jja725 commented Mar 23, 2026

Summary

Implements Phase 1 (Core Read Path MVP) of the Lance C/C++ library RFC. This enables native integration with C++ query engines (Velox, DuckDB) and any language with C FFI capabilities.

C API surface

  • Dataset: open/close/version/count_rows/latest_version/schema/take (all sync/blocking)
  • Scanner: builder pattern with column projection, SQL filters, limit/offset, batch size
  • Sync scan: ArrowArrayStream export, blocking batch iteration (lance_scanner_next)
  • Async scan: callback-based (lance_scanner_scan_async) for Presto/Trino-style engines
  • Poll scan: waker-based (lance_scanner_poll_next) for cooperative async runtimes (Velox/folly)
  • Arrow C Data Interface for all data exchange (zero-copy)
  • Thread-local error handling (proven pattern from lance-duckdb)

Remaining phases

  • Phase 2: Vector search & indexing (nearest, FTS, index create/drop)
  • Phase 3: Write path & mutations (append, delete, update, merge-insert, schema evolution)
  • Phase 4: Advanced features (fragment-level access, compaction, statistics, cloud storage, packaging)

Test plan

  • cargo check -p lance-c compiles
  • cargo test -p lance-c — 13 tests pass (open/close, scan, filter, projection, limit/offset, take, error handling, async scan, ArrowArrayStream export)
  • cargo clippy -p lance-c --tests -- -D warnings — clean
  • cargo fmt -p lance-c -- --check — clean

Refs: #6035

Implements Phase 1 (Core Read Path MVP) of the Lance C/C++ library RFC.
This enables native integration with C++ query engines (Velox, DuckDB)
and any language with C FFI capabilities.

C API surface:
- Dataset: open/close/version/count_rows/latest_version/schema/take
- Scanner: builder pattern with column projection, SQL filters, limit/offset
- Sync scan: ArrowArrayStream export, blocking batch iteration
- Async scan: callback-based (for Presto/Trino), poll+waker (for Velox/folly)
- Arrow C Data Interface for all data exchange (zero-copy)
- Thread-local error handling (proven pattern from lance-duckdb)

Includes:
- lance.h: C header with full API
- lance.hpp: header-only C++ RAII wrappers (Dataset, Scanner, Batch)
- 13 integration tests covering all API paths

Refs: lance-format#6035
@github-actions github-actions bot added the enhancement New feature or request label Mar 23, 2026
jja725 added 2 commits March 22, 2026 23:12
Covers: schema field types, latest_version, batch_size control,
combined filter+projection+limit, take with projection, multiple
scanners on same dataset, open specific version, invalid filter/column
errors, comprehensive NULL safety, error message lifecycle, large
dataset scan (10k rows), equality filter verification, limit-only,
offset-only, take empty indices, take value verification, async scan
with filter, poll-based iteration, scan data values, reopen dataset,
large dataset schema.

Total: 35 tests (34 active + 1 ignored poll test).
Adds real C and C++ programs that compile against lance.h/lance.hpp
and run end-to-end with a Lance dataset:
- tests/cpp/test_c_api.c: C11 program testing open, scan, limit, errors
- tests/cpp/test_cpp_api.cpp: C++17 program testing RAII wrappers,
  fluent scanner, take, move semantics, exception handling

Also adds tests using checked-in historical test datasets:
- test_historical_dataset_v0_27_1: reads test_data/v0.27.1/pq_in_schema
- test_historical_dataset_open_specific_version: opens version 1 and 2

Run C/C++ tests with: cargo test -p lance-c -- --ignored

Total: 39 tests (36 active + 3 ignored).
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! Before we start integrating this, have we considered that creating a dedicated repo for this instead?

I think we can iterate faster in that way instead of in the core repo.

@jja725
Copy link
Copy Markdown
Author

jja725 commented Mar 23, 2026

Hi @Xuanwo , that sounds great, do you mind creating the repo? I can migrate the code there

@Xuanwo
Copy link
Copy Markdown
Collaborator

Xuanwo commented Mar 23, 2026

Hi @Xuanwo , that sounds great, do you mind creating the repo? I can migrate the code there

Sure! Will create one tomorrow

@Xuanwo
Copy link
Copy Markdown
Collaborator

Xuanwo commented Mar 24, 2026

Hi @jja725, I have created https://github.com/lance-format/lance-c, let's rock!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants