Skip to content

Commit 90d7db7

Browse files
committed
feat: add GEMINI.md for Gemini Code Assistant workspace context and setup instructions
1 parent e2da3dc commit 90d7db7

File tree

1 file changed

+96
-0
lines changed

1 file changed

+96
-0
lines changed

GEMINI.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Gemini Code Assistant Workspace Context
2+
3+
This document provides context for the Gemini Code Assistant to understand the `datafusion-python` repository.
4+
5+
## Project Overview
6+
7+
This repository contains the source code for the Python bindings for Apache Arrow DataFusion, an in-memory query engine. It allows users to execute SQL queries and use a DataFrame API against various data sources such as Parquet, CSV, and JSON. The library also supports interoperability with other Python data ecosystem libraries like Pandas and PyArrow.
8+
9+
## Key Technologies and Libraries
10+
11+
- **Primary Language:** Rust (for the core bindings)
12+
- **Python Interface:** Python
13+
- **Core Engine:** [Apache Arrow DataFusion](https://github.com/apache/datafusion)
14+
- **Python Bindings Framework:** [PyO3](https://pyo3.rs/)
15+
- **Build and Packaging:** [maturin](https://www.maturin.rs/)
16+
- **Dependency Management:** [uv](https://github.com/astral-sh/uv) and pip
17+
- **Testing:** [pytest](https://docs.pytest.org/)
18+
- **Linting:** [ruff](https://github.com/astral-sh/ruff) for Python, `rustfmt` and `clippy` for Rust
19+
- **CI/CD:** GitHub Actions
20+
21+
## Repository Structure
22+
23+
The repository is organized as follows:
24+
25+
- `src/`: Contains the Rust source code for the Python bindings.
26+
- `python/`: Contains the Python-specific code and tests.
27+
- `examples/`: Contains example usage scripts.
28+
- `dev/`: Contains development scripts, including those for releases and changelogs.
29+
- `ci/`: Contains scripts for continuous integration checks.
30+
- `docs/`: Contains the documentation for the project.
31+
- `pyproject.toml`: Defines the Python project metadata and dependencies.
32+
- `Cargo.toml`: Defines the Rust project metadata and dependencies.
33+
34+
## How to Build and Test
35+
36+
### Development Setup
37+
38+
1. **Prerequisites:**
39+
* Rust and Cargo
40+
* Python 3
41+
* `uv` (recommended) or `pip`
42+
43+
2. **Clone the repository and set up the environment:**
44+
45+
```bash
46+
git clone git@github.com:apache/datafusion-python.git
47+
cd datafusion-python
48+
```
49+
50+
3. **Initialize git submodules:**
51+
52+
```bash
53+
git submodule update --init
54+
```
55+
56+
4. **Set up the Python virtual environment and install dependencies:**
57+
58+
Using `uv`:
59+
```bash
60+
uv sync --dev --no-install-package datafusion
61+
source .venv/bin/activate
62+
```
63+
64+
Using `pip`:
65+
```bash
66+
python3 -m venv .venv
67+
source .venv/bin/activate
68+
pip install -U pip
69+
pip install -r pyproject.toml
70+
```
71+
72+
### Building the Project
73+
74+
To build the Rust code and install the Python package in development mode, run:
75+
76+
```bash
77+
maturin develop
78+
```
79+
80+
### Running Tests
81+
82+
To run the Python tests, use `pytest`:
83+
84+
```bash
85+
pytest
86+
```
87+
88+
### Linting
89+
90+
To run the linters, you can use the scripts in the `ci/scripts` directory:
91+
92+
```bash
93+
./ci/scripts/python_lint.sh
94+
./ci/scripts/rust_clippy.sh
95+
./ci/scripts/rust_fmt.sh
96+
```

0 commit comments

Comments
 (0)