Skip to content

Commit bee6aca

Browse files
committed
Merge branch 'main' of github.com:apache/iceberg-python into fd-add-deletion-vectors
2 parents d565221 + 9945f83 commit bee6aca

34 files changed

+1153
-704
lines changed

.github/workflows/pypi-build-artifacts.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ jobs:
6262
if: startsWith(matrix.os, 'ubuntu')
6363

6464
- name: Build wheels
65-
uses: pypa/cibuildwheel@v2.22.0
65+
uses: pypa/cibuildwheel@v2.23.0
6666
with:
6767
output-dir: wheelhouse
6868
config-file: "pyproject.toml"

.github/workflows/svn-build-artifacts.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ jobs:
5757
if: startsWith(matrix.os, 'ubuntu')
5858

5959
- name: Build wheels
60-
uses: pypa/cibuildwheel@v2.22.0
60+
uses: pypa/cibuildwheel@v2.23.0
6161
with:
6262
output-dir: wheelhouse
6363
config-file: "pyproject.toml"

mkdocs/docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
- [Verify a release](verify-release.md)
3131
- [How to release](how-to-release.md)
3232
- [Release Notes](https://github.com/apache/iceberg-python/releases)
33+
- [Nightly Build](nightly-build.md)
3334
- [Code Reference](reference/)
3435

3536
<!-- markdown-link-check-enable-->

mkdocs/docs/configuration.md

Lines changed: 21 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Iceberg tables support table properties to configure table behavior.
6464
| `write.parquet.dict-size-bytes` | Size in bytes | 2MB | Set the dictionary page size limit per row group |
6565
| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. |
6666
| `write.metadata.delete-after-commit.enabled` | Boolean | False | Whether to automatically delete old *tracked* metadata files after each table commit. It will retain a number of the most recent metadata files, which can be set using property `write.metadata.previous-versions-max`. |
67-
| `write.object-storage.enabled` | Boolean | True | Enables the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider) that adds a hash component to file paths. Note: the default value of `True` differs from Iceberg's Java implementation |
67+
| `write.object-storage.enabled` | Boolean | False | Enables the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider) that adds a hash component to file paths. |
6868
| `write.object-storage.partitioned-paths` | Boolean | True | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled |
6969
| `write.py-location-provider.impl` | String of form `module.ClassName` | null | Optional, [custom `LocationProvider`](configuration.md#loading-a-custom-location-provider) implementation |
7070
| `write.data.path` | String pointing to location | `{metadata.location}/data` | Sets the location under which data is written. |
@@ -108,22 +108,23 @@ For the FileIO there are several configuration options available:
108108

109109
<!-- markdown-link-check-disable -->
110110

111-
| Key | Example | Description |
112-
|----------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
113-
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
114-
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
115-
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
116-
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
117-
| s3.role-session-name | session | An optional identifier for the assumed role session. |
118-
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
119-
| s3.signer | bearer | Configure the signature version of the FileIO. |
120-
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
121-
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
122-
| s3.region | us-west-2 | Configure the default region used to initialize an `S3FileSystem`. `PyArrowFileIO` attempts to automatically resolve the region for each S3 bucket, falling back to this value if resolution fails. |
123-
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
124-
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
125-
| s3.request-timeout | 60.0 | Configure socket read timeouts on Windows and macOS, in seconds. |
126-
| s3.force-virtual-addressing | False | Whether to use virtual addressing of buckets. If true, then virtual addressing is always enabled. If false, then virtual addressing is only enabled if endpoint_override is empty. This can be used for non-AWS backends that only support virtual hosted-style access. |
111+
| Key | Example | Description |
112+
|-----------------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
113+
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
114+
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
115+
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
116+
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
117+
| s3.role-session-name | session | An optional identifier for the assumed role session. |
118+
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
119+
| s3.signer | bearer | Configure the signature version of the FileIO. |
120+
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
121+
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
122+
| s3.region | us-west-2 | Configure the default region used to initialize an `S3FileSystem`. `PyArrowFileIO` attempts to automatically tries to resolve the region if this isn't set (only supported for AWS S3 Buckets). |
123+
| s3.resolve-region | False | Only supported for `PyArrowFileIO`, when enabled, it will always try to resolve the location of the bucket (only supported for AWS S3 Buckets). |
124+
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
125+
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
126+
| s3.request-timeout | 60.0 | Configure socket read timeouts on Windows and macOS, in seconds. |
127+
| s3.force-virtual-addressing | False | Whether to use virtual addressing of buckets. If true, then virtual addressing is always enabled. If false, then virtual addressing is only enabled if endpoint_override is empty. This can be used for non-AWS backends that only support virtual hosted-style access. |
127128

128129
<!-- markdown-link-check-enable-->
129130

@@ -212,8 +213,7 @@ Both data file and metadata file locations can be customized by configuring the
212213

213214
For more granular control, you can override the `LocationProvider`'s `new_data_location` and `new_metadata_location` methods to define custom logic for generating file paths. See [`Loading a Custom Location Provider`](configuration.md#loading-a-custom-location-provider).
214215

215-
PyIceberg defaults to the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider), which generates file paths for
216-
data files that are optimized for object storage.
216+
PyIceberg defaults to the [`SimpleLocationProvider`](configuration.md#simple-location-provider) for managing file paths.
217217

218218
### Simple Location Provider
219219

@@ -233,9 +233,6 @@ partitioned over a string column `category` might have a data file with location
233233
s3://bucket/ns/table/data/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
234234
```
235235

236-
The `SimpleLocationProvider` is enabled for a table by explicitly setting its `write.object-storage.enabled` table
237-
property to `False`.
238-
239236
### Object Store Location Provider
240237

241238
PyIceberg offers the `ObjectStoreLocationProvider`, and an optional [partition-exclusion](configuration.md#partition-exclusion)
@@ -254,8 +251,8 @@ For example, a table partitioned over a string column `category` might have a da
254251
s3://bucket/ns/table/data/0101/0110/1001/10110010/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
255252
```
256253

257-
The `write.object-storage.enabled` table property determines whether the `ObjectStoreLocationProvider` is enabled for a
258-
table. It is used by default.
254+
The `ObjectStoreLocationProvider` is enabled for a table by explicitly setting its `write.object-storage.enabled` table
255+
property to `True`.
259256

260257
#### Partition Exclusion
261258

mkdocs/docs/contributing.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ To get started, you can run `make install`, which installs Poetry and all the de
5252

5353
If you want to install the library on the host, you can simply run `pip3 install -e .`. If you wish to use a virtual environment, you can run `poetry shell`. Poetry will open up a virtual environment with all the dependencies set.
5454

55+
> **Note:** If you want to use `poetry shell`, you need to install it using `pip install poetry-plugin-shell`. Alternatively, you can run commands directly with `poetry run`.
56+
5557
To set up IDEA with Poetry:
5658

5759
- Open up the Python project in IntelliJ

mkdocs/docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ You either need to install `s3fs`, `adlfs`, `gcsfs`, or `pyarrow` to be able to
6464

6565
## Connecting to a catalog
6666

67-
Iceberg leverages the [catalog to have one centralized place to organize the tables](https://iceberg.apache.org/concepts/catalog/). This can be a traditional Hive catalog to store your Iceberg tables next to the rest, a vendor solution like the AWS Glue catalog, or an implementation of Icebergs' own [REST protocol](https://github.com/apache/iceberg/tree/main/open-api). Checkout the [configuration](configuration.md) page to find all the configuration details.
67+
Iceberg leverages the [catalog to have one centralized place to organize the tables](https://iceberg.apache.org/terms/#catalog). This can be a traditional Hive catalog to store your Iceberg tables next to the rest, a vendor solution like the AWS Glue catalog, or an implementation of Icebergs' own [REST protocol](https://github.com/apache/iceberg/tree/main/open-api). Checkout the [configuration](configuration.md) page to find all the configuration details.
6868

6969
For the sake of demonstration, we'll configure the catalog to use the `SqlCatalog` implementation, which will store information in a local `sqlite` database. We'll also configure the catalog to store data files in the local filesystem instead of an object store. This should not be used in production due to the limited scalability.
7070

mkdocs/docs/nightly-build.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
<!--
2+
- Licensed to the Apache Software Foundation (ASF) under one
3+
- or more contributor license agreements. See the NOTICE file
4+
- distributed with this work for additional information
5+
- regarding copyright ownership. The ASF licenses this file
6+
- to you under the Apache License, Version 2.0 (the
7+
- "License"); you may not use this file except in compliance
8+
- with the License. You may obtain a copy of the License at
9+
-
10+
- http://www.apache.org/licenses/LICENSE-2.0
11+
-
12+
- Unless required by applicable law or agreed to in writing,
13+
- software distributed under the License is distributed on an
14+
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
- KIND, either express or implied. See the License for the
16+
- specific language governing permissions and limitations
17+
- under the License.
18+
-->
19+
20+
# Nightly Build
21+
22+
A nightly build of PyIceberg is available on testpypi, [https://test.pypi.org/project/pyiceberg/](https://test.pypi.org/project/pyiceberg/).
23+
24+
To install the nightly build,
25+
26+
```shell
27+
pip install -i https://test.pypi.org/simple/ --pre pyiceberg
28+
```
29+
30+
<!-- prettier-ignore-start -->
31+
32+
!!! warning "For Testing Purposes Only"
33+
Nightly builds are for testing purposes only and have not been validated. Please use at your own risk, as they may contain untested changes, potential bugs, or incomplete features. Additionally, ensure compliance with any applicable licenses, as these builds may include changes that have not been reviewed for legal or licensing implications.
34+
35+
<!-- prettier-ignore-end -->

0 commit comments

Comments
 (0)