|
1 | | -# Repository instructions for datafusion-python |
2 | | - |
3 | | -## Style and Linting |
4 | | -- Python formatting and linting is handled by **ruff**. |
5 | | -- Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) and these key |
6 | | - ruff rules: |
7 | | - - Use double quotes for string literals |
8 | | - - Use explicit relative imports such as `from .module import Class` |
9 | | - - Limit lines to 88 characters |
10 | | - - Provide type hints for parameters and return values |
11 | | - - Avoid unused imports |
12 | | - - Prefer f-strings over other string formatting |
13 | | - - Avoid `else` blocks after `return` |
14 | | - - Use `isinstance()` instead of direct type comparison |
15 | | - - Keep docstrings concise in Google format |
16 | | - - Assign exception messages to variables before raising |
17 | | -- All modules, classes, and functions should include docstrings. |
18 | | -- Tests must use descriptive function names, pytest style assertions, and be |
19 | | - grouped in classes when appropriate. |
20 | | -- Rust code must pass `cargo fmt`, `cargo clippy`, and `cargo tomlfmt`. |
21 | | -- Install the pre-commit hooks with `pre-commit install` and run them with |
22 | | - `pre-commit run --files <file1> <file2>` or `pre-commit run --all-files`. |
23 | | - |
24 | | -Install the required Rust components before running pre-commit: |
| 1 | +# Repository Guidelines for datafusion-python |
| 2 | + |
| 3 | +Our goal is to deliver robust, maintainable, and user-friendly Python bindings over the DataFusion Rust core. Focus on high-quality solutions first—linting and formatting come later as a final check. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## 1. Solution-First Approach |
| 8 | + |
| 9 | +* **Clarify Requirements:** Understand the use case and expected behavior before coding. Ask questions if something is unclear. |
| 10 | +* **Design Thoughtfully:** Sketch function signatures, data flow, and API ergonomics. Strive for simplicity and consistency with Python conventions. |
| 11 | +* **Demonstrate Usage:** Include concise code examples in docstrings or README updates to illustrate functionality. |
| 12 | + |
| 13 | +## 2. Python Code Quality & Best Practices |
| 14 | + |
| 15 | +### Idiomatic Python |
| 16 | + |
| 17 | +* Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) for naming, spacing, and structure. |
| 18 | +* Use **double quotes** for string literals and **f-strings** for interpolation. |
| 19 | +* Favor explicit relative imports (`from .module import Class`). |
| 20 | +* Limit lines to **88 characters**. Break long expressions with parentheses. |
| 21 | + |
| 22 | +### Type Hints & Docstrings |
| 23 | + |
| 24 | +* Provide type hints for all public functions and methods. |
| 25 | +* Use Google-style docstrings. Include brief descriptions, parameter types, return types, and examples. |
| 26 | +* Keep docstrings concise and focused on “why” and “how” to use the API. |
| 27 | + |
| 28 | +### Readability & Maintainability |
| 29 | + |
| 30 | +* Write small, single-purpose functions (≈50 lines or fewer). |
| 31 | +* Break complex logic into well-named helper functions. |
| 32 | +* Avoid unnecessary abstractions; follow patterns established in the codebase. |
| 33 | +* Use meaningful variable and function names—no `temp`, `foo`, or one-letter names except in short comprehensions. |
| 34 | + |
| 35 | +## 3. Rust-Python Integration |
| 36 | + |
| 37 | +Classes and functions with Rust-backed implementations must stay in sync: |
| 38 | + |
| 39 | +1. **Update Rust First:** Implement and test changes in the Rust core before touching Python bindings. |
| 40 | +2. **Test Rust Behavior:** Verify new Rust functionality with unit and integration tests. |
| 41 | +3. **Sync Python Bindings:** Adjust `pyo3` bindings or wrapper functions to reflect the Rust API. |
| 42 | +4. **Python Tests:** Add or update pytest tests to cover the new or changed behavior. |
| 43 | + |
| 44 | +## 4. Testing & Validation |
| 45 | + |
| 46 | +* **Unit Tests:** Use pytest with descriptive test names and style assertions. Group related tests in `Test*` classes when it improves organization. |
| 47 | +* **Integration Tests:** Validate end-to-end workflows, especially for CLI commands or data-processing pipelines. |
| 48 | +* **Continuous Testing:** Ensure tests pass reliably in CI. Leverage `pytest -v` for verbosity. |
| 49 | + |
| 50 | +## 5. Documentation & Examples |
| 51 | + |
| 52 | +* **README:** Update usage examples and installation instructions for new features. |
| 53 | +* **Docstring Examples:** Embed minimal examples illustrating typical use cases. |
| 54 | +* **Docs Directory:** For extended tutorials or guides, add Markdown files under `docs/` and link them in the TOC. |
| 55 | + |
| 56 | +## 6. Collaboration & Review |
| 57 | + |
| 58 | +* **Pull Requests:** Summarize changes, rationale, and instructions for testing. |
| 59 | +* **Code Reviews:** Focus feedback on clarity, correctness, and design consistency. |
| 60 | +* **Discussions:** Open issues for major API decisions or design proposals. |
| 61 | + |
| 62 | +## 7. Optional Final Checks |
| 63 | + |
| 64 | +Run these after your solution is complete and tests are passing: |
25 | 65 |
|
26 | 66 | ```bash |
| 67 | +# Python formatting & linting (ruff) |
| 68 | +pre-commit install |
| 69 | +pre-commit run --all-files |
| 70 | +# or manually: |
| 71 | +./ci/scripts/python_lint.sh |
| 72 | + |
| 73 | +# Rust formatting & linting (optional for bindings changes) |
27 | 74 | rustup component add rustfmt clippy |
| 75 | +taplo format --check |
| 76 | +./ci/scripts/rust_fmt.sh |
| 77 | +./ci/scripts/rust_clippy.sh |
| 78 | +./ci/scripts/rust_toml_fmt.sh |
28 | 79 | ``` |
29 | 80 |
|
30 | | -If installation is blocked by a proxy, see the [offline installation guide](https://rust-lang.github.io/rustup/installation/other.html). |
31 | | - |
32 | | -- The same checks can be run manually using the scripts in `ci/scripts`: |
33 | | - - `./ci/scripts/python_lint.sh` |
34 | | - - `./ci/scripts/rust_fmt.sh` |
35 | | - - `./ci/scripts/rust_clippy.sh` |
36 | | - - `./ci/scripts/rust_toml_fmt.sh` |
37 | | - |
38 | | -## Rust Code Style |
39 | | - |
40 | | -When generating Rust code for DataFusion, follow these guidelines: |
41 | | - |
42 | | -1. Do not add unnecessary parentheses around `if` conditions: |
43 | | - - Correct: `if some_condition` |
44 | | - - Incorrect: `if (some_condition)` |
45 | | - |
46 | | -2. Do not use `expect()` or `panic!()` except in tests - use proper error handling with `Result` types |
47 | | - |
48 | | -3. Follow the standard Rust style conventions from rustfmt |
49 | | - |
50 | | -## Rust-Python Integration |
51 | | - |
52 | | -**Important:** Some classes in this repository have their core implementations in Rust with Python bindings. For any changes related to these classes in Python: |
53 | | - |
54 | | -1. **Always update the Rust implementation first** before making corresponding changes to the Python bindings |
55 | | -2. Ensure the Rust changes are properly tested in Rust before updating Python code |
56 | | -3. Update Python bindings to reflect the new Rust API |
57 | | -4. Add or update Python tests to cover the new functionality |
58 | | - |
59 | | -This ensures consistency between the Rust core and Python interface, and prevents binding mismatches. |
60 | | - |
61 | | -## Code Organization |
62 | | -- Keep functions focused and under about 50 lines. |
63 | | -- Break complex tasks into well-named helper functions and reuse existing |
64 | | - helpers when possible. |
65 | | -- Prefer adding parameters to existing functions rather than creating new |
66 | | - versions with similar behavior. |
67 | | -- Avoid unnecessary abstractions and follow established patterns in the |
68 | | - codebase. |
69 | | - |
70 | | -## Comments |
71 | | -- Add meaningful comments for complex logic and avoid obvious comments. |
72 | | -- Start inline comments with a capital letter. |
73 | | - |
74 | | -## Running Tests |
75 | | -1. Ensure submodules are initialized: |
76 | | - ```bash |
77 | | - git submodule update --init |
78 | | - ``` |
79 | | -2. Create a development environment and install dependencies: |
80 | | - ```bash |
81 | | - uv sync --dev --no-install-package datafusion |
82 | | - ``` |
83 | | -3. Build the Python extension: |
84 | | - ```bash |
85 | | - uv run --no-project maturin develop --uv |
86 | | - ``` |
87 | | -4. Execute the test suite: |
88 | | - ```bash |
89 | | - uv run --no-project pytest -v . |
90 | | - ``` |
91 | | - |
92 | | -## Building Documentation |
93 | | -- Documentation dependencies can be installed with the `docs` group: |
94 | | - ```bash |
95 | | - uv sync --dev --group docs --no-install-package datafusion |
96 | | - ``` |
97 | | -- Build the docs with: |
98 | | - ```bash |
99 | | - uv run --no-project maturin develop --uv |
100 | | - uv run --no-project docs/build.sh |
101 | | - ``` |
102 | | -- The generated HTML appears under `docs/build/html`. |
| 81 | +Ensure these checks pass before merging, but never sacrifice clarity or correctness for formatting. 🚀 |
0 commit comments