Skip to content

Commit e39bb79

Browse files
authored
Merge branch 'apache:main' into main
2 parents d8e48d8 + c3bf16c commit e39bb79

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+6274
-4707
lines changed

.github/workflows/python-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
runs-on: ubuntu-22.04
3535
strategy:
3636
matrix:
37-
python: ['3.8', '3.9', '3.10', '3.11']
37+
python: ['3.9', '3.10', '3.11', '3.12']
3838

3939
steps:
4040
- uses: actions/checkout@v4

.github/workflows/python-integration.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ concurrency:
3131

3232
jobs:
3333
integration-test:
34-
runs-on: ubuntu-20.04
34+
runs-on: ubuntu-22.04
3535

3636
steps:
3737
- uses: actions/checkout@v4

.github/workflows/python-release.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,10 @@ jobs:
4444
- uses: actions/setup-python@v5
4545
with:
4646
python-version: |
47-
3.8
47+
3.9
48+
3.10
4849
3.11
50+
3.12
4951
5052
- name: Install poetry
5153
run: pip install poetry
@@ -61,14 +63,14 @@ jobs:
6163
if: startsWith(matrix.os, 'ubuntu')
6264

6365
- name: Build wheels
64-
uses: pypa/cibuildwheel@v2.20.0
66+
uses: pypa/cibuildwheel@v2.21.3
6567
with:
6668
output-dir: wheelhouse
6769
config-file: "pyproject.toml"
6870
env:
6971
# Ignore 32 bit architectures
7072
CIBW_ARCHS: "auto64"
71-
CIBW_PROJECT_REQUIRES_PYTHON: ">=3.8,<3.12"
73+
CIBW_PROJECT_REQUIRES_PYTHON: ">=3.9,<3.13"
7274
CIBW_TEST_REQUIRES: "pytest==7.4.2 moto==5.0.1"
7375
CIBW_TEST_EXTRAS: "s3fs,glue"
7476
CIBW_TEST_COMMAND: "pytest {project}/tests/avro/test_decoder.py"

.markdownlint.yaml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
# Default state for all rules
19+
default: true
20+
21+
# MD013/line-length - Line length
22+
MD013: false
23+
24+
# MD007/ul-indent - Unordered list indentation
25+
MD007:
26+
indent: 4

.pre-commit-config.yaml

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,17 +46,11 @@ repos:
4646
hooks:
4747
- id: pycln
4848
args: [--config=pyproject.toml]
49-
- repo: https://github.com/executablebooks/mdformat
50-
rev: 0.7.17
49+
- repo: https://github.com/igorshubovych/markdownlint-cli
50+
rev: v0.41.0
5151
hooks:
52-
- id: mdformat
53-
additional_dependencies:
54-
- mdformat-black==0.1.1
55-
- mdformat-config==0.1.3
56-
- mdformat-beautysh==0.1.1
57-
- mdformat-admon==1.0.1
58-
- mdformat-mkdocs==1.0.1
59-
- mdformat-frontmatter==2.0.1
52+
- id: markdownlint
53+
args: ["--fix"]
6054
- repo: https://github.com/pycqa/pydocstyle
6155
rev: 6.3.0
6256
hooks:

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ test-integration-rebuild:
5959
docker compose -f dev/docker-compose-integration.yml rm -f
6060
docker compose -f dev/docker-compose-integration.yml build --no-cache
6161

62-
test-adlfs: ## Run tests marked with adlfs, can add arguments with PYTEST_ARGS="-vv"
62+
test-adls: ## Run tests marked with adls, can add arguments with PYTEST_ARGS="-vv"
6363
sh ./dev/run-azurite.sh
64-
poetry run pytest tests/ -m adlfs ${PYTEST_ARGS}
64+
poetry run pytest tests/ -m adls ${PYTEST_ARGS}
6565

6666
test-gcs: ## Run tests marked with gcs, can add arguments with PYTEST_ARGS="-vv"
6767
sh ./dev/run-gcs-server.sh

mkdocs/docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
<!-- prettier-ignore-start -->
1919

2020
<!-- markdown-link-check-disable -->
21+
# Summary
2122

2223
- [Getting started](index.md)
2324
- [Configuration](configuration.md)

mkdocs/docs/api.md

Lines changed: 23 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ tbl.overwrite(df)
280280

281281
The data is written to the table, and when the table is read using `tbl.scan().to_arrow()`:
282282

283-
```
283+
```python
284284
pyarrow.Table
285285
city: string
286286
lat: double
@@ -303,7 +303,7 @@ tbl.append(df)
303303

304304
When reading the table `tbl.scan().to_arrow()` you can see that `Groningen` is now also part of the table:
305305

306-
```
306+
```python
307307
pyarrow.Table
308308
city: string
309309
lat: double
@@ -342,7 +342,7 @@ tbl.delete(delete_filter="city == 'Paris'")
342342
In the above example, any records where the city field value equals to `Paris` will be deleted.
343343
Running `tbl.scan().to_arrow()` will now yield:
344344

345-
```
345+
```python
346346
pyarrow.Table
347347
city: string
348348
lat: double
@@ -362,7 +362,6 @@ To explore the table metadata, tables can be inspected.
362362
!!! tip "Time Travel"
363363
To inspect a tables's metadata with the time travel feature, call the inspect table method with the `snapshot_id` argument.
364364
Time travel is supported on all metadata tables except `snapshots` and `refs`.
365-
366365
```python
367366
table.inspect.entries(snapshot_id=805611270568163028)
368367
```
@@ -377,7 +376,7 @@ Inspect the snapshots of the table:
377376
table.inspect.snapshots()
378377
```
379378

380-
```
379+
```python
381380
pyarrow.Table
382381
committed_at: timestamp[ms] not null
383382
snapshot_id: int64 not null
@@ -405,7 +404,7 @@ Inspect the partitions of the table:
405404
table.inspect.partitions()
406405
```
407406

408-
```
407+
```python
409408
pyarrow.Table
410409
partition: struct<dt_month: int32, dt_day: date32[day]> not null
411410
child 0, dt_month: int32
@@ -446,7 +445,7 @@ To show all the table's current manifest entries for both data and delete files.
446445
table.inspect.entries()
447446
```
448447

449-
```
448+
```python
450449
pyarrow.Table
451450
status: int8 not null
452451
snapshot_id: int64 not null
@@ -604,7 +603,7 @@ To show a table's known snapshot references:
604603
table.inspect.refs()
605604
```
606605

607-
```
606+
```python
608607
pyarrow.Table
609608
name: string not null
610609
type: string not null
@@ -629,7 +628,7 @@ To show a table's current file manifests:
629628
table.inspect.manifests()
630629
```
631630

632-
```
631+
```python
633632
pyarrow.Table
634633
content: int8 not null
635634
path: string not null
@@ -679,7 +678,7 @@ To show table metadata log entries:
679678
table.inspect.metadata_log_entries()
680679
```
681680

682-
```
681+
```python
683682
pyarrow.Table
684683
timestamp: timestamp[ms] not null
685684
file: string not null
@@ -702,7 +701,7 @@ To show a table's history:
702701
table.inspect.history()
703702
```
704703

705-
```
704+
```python
706705
pyarrow.Table
707706
made_current_at: timestamp[ms] not null
708707
snapshot_id: int64 not null
@@ -723,7 +722,7 @@ Inspect the data files in the current snapshot of the table:
723722
table.inspect.files()
724723
```
725724

726-
```
725+
```python
727726
pyarrow.Table
728727
content: int8 not null
729728
file_path: string not null
@@ -846,11 +845,16 @@ readable_metrics: [
846845
[6.0989]]
847846
```
848847

848+
!!! info
849+
Content refers to type of content stored by the data file: `0` - `Data`, `1` - `Position Deletes`, `2` - `Equality Deletes`
850+
851+
To show only data files or delete files in the current snapshot, use `table.inspect.data_files()` and `table.inspect.delete_files()` respectively.
852+
849853
## Add Files
850854

851855
Expert Iceberg users may choose to commit existing parquet files to the Iceberg table as data files, without rewriting them.
852856

853-
```
857+
```python
854858
# Given that these parquet files have schema consistent with the Iceberg table
855859
856860
file_paths = [
@@ -930,7 +934,7 @@ with table.update_schema() as update:
930934

931935
Now the table has the union of the two schemas `print(table.schema())`:
932936

933-
```
937+
```python
934938
table {
935939
1: city: optional string
936940
2: lat: optional double
@@ -1180,7 +1184,7 @@ table.scan(
11801184

11811185
This will return a PyArrow table:
11821186

1183-
```
1187+
```python
11841188
pyarrow.Table
11851189
VendorID: int64
11861190
tpep_pickup_datetime: timestamp[us, tz=+00:00]
@@ -1222,7 +1226,7 @@ table.scan(
12221226

12231227
This will return a Pandas dataframe:
12241228

1225-
```
1229+
```python
12261230
VendorID tpep_pickup_datetime tpep_dropoff_datetime
12271231
0 2 2021-04-01 00:28:05+00:00 2021-04-01 00:47:59+00:00
12281232
1 1 2021-04-01 00:39:01+00:00 2021-04-01 00:57:39+00:00
@@ -1295,7 +1299,7 @@ ray_dataset = table.scan(
12951299

12961300
This will return a Ray dataset:
12971301

1298-
```
1302+
```python
12991303
Dataset(
13001304
num_blocks=1,
13011305
num_rows=1168798,
@@ -1346,7 +1350,7 @@ df = df.select("VendorID", "tpep_pickup_datetime", "tpep_dropoff_datetime")
13461350

13471351
This returns a Daft Dataframe which is lazily materialized. Printing `df` will display the schema:
13481352

1349-
```
1353+
```python
13501354
╭──────────┬───────────────────────────────┬───────────────────────────────╮
13511355
│ VendorID ┆ tpep_pickup_datetime ┆ tpep_dropoff_datetime │
13521356
---------
@@ -1364,7 +1368,7 @@ This is correctly optimized to take advantage of Iceberg features such as hidden
13641368
df.show(2)
13651369
```
13661370

1367-
```
1371+
```python
13681372
╭──────────┬───────────────────────────────┬───────────────────────────────╮
13691373
│ VendorID ┆ tpep_pickup_datetime ┆ tpep_dropoff_datetime │
13701374
---------

0 commit comments

Comments
 (0)