Skip to content

Commit 346cd6f

Browse files
committed
docs: add s3tables catalog
1 parent 5aa855c commit 346cd6f

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed

mkdocs/docs/configuration.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,104 @@ catalog:
543543

544544
<!-- prettier-ignore-end -->
545545

546+
### S3Tables Catalog
547+
548+
The S3Tables Catalog leverages the catalog functionalities of the Amazon S3Tables service and requires an existing S3 Tables Bucket to operate.
549+
550+
To use Amazon S3Tables as your catalog, you can configure pyiceberg using one of the following methods. Additionally, refer to the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) on configuring credentials to set up your AWS account credentials locally.
551+
552+
If you intend to use the same credentials for both the S3Tables Catalog and S3 FileIO, you can configure the [`client.*` properties](configuration.md#unified-aws-credentials) to streamline the process.
553+
554+
Note that the S3Tables Catalog manages the underlying table locations internally, which makes it incompatible with S3-like storage systems such as MinIO. If you specify the `s3tables.endpoint`, ensure that the `s3.endpoint` is configured accordingly.
555+
556+
```yaml
557+
catalog:
558+
default:
559+
type: s3tables
560+
warehouse: arn:aws:s3tables:us-east-1:012345678901:bucket/pyiceberg-catalog
561+
```
562+
563+
If you prefer to pass the credentials explicitly to the client instead of relying on environment variables,
564+
565+
```yaml
566+
catalog:
567+
default:
568+
type: s3tables
569+
s3tables.access-key-id: <ACCESS_KEY_ID>
570+
s3tables.secret-access-key: <SECRET_ACCESS_KEY>
571+
s3tables.session-token: <SESSION_TOKEN>
572+
s3tables.region: <REGION_NAME>
573+
s3tables.endpoint: http://localhost:9000
574+
s3.endpoint: http://localhost:9000
575+
```
576+
577+
<!-- prettier-ignore-start -->
578+
579+
!!! Note "Client-specific Properties"
580+
`s3tables.*` properties are for S3TablesCatalog only. If you want to use the same credentials for both S3TablesCatalog and S3 FileIO, you can set the `client.*` properties. See the [Unified AWS Credentials](configuration.md#unified-aws-credentials) section for more details.
581+
582+
<!-- prettier-ignore-end -->
583+
584+
<!-- markdown-link-check-disable -->
585+
586+
| Key | Example | Description |
587+
| -------------------------- | ------------------- | -------------------------------------------------------------------------- |
588+
| s3tables.profile-name | default | Configure the static profile used to access the S3Tables Catalog |
589+
| s3tables.region | us-east-1 | Set the region of the S3Tables Catalog |
590+
| s3tables.access-key-id | admin | Configure the static access key id used to access the S3Tables Catalog |
591+
| s3tables.secret-access-key | password | Configure the static secret access key used to access the S3Tables Catalog |
592+
| s3tables.session-token | AQoDYXdzEJr... | Configure the static session token used to access the S3Tables Catalog |
593+
| s3tables.endpoint | <http://localhost>... | Configure the AWS endpoint |
594+
| s3tables.warehouse | arn:aws:s3tables... | Set the underlying S3 Table Bucket |
595+
596+
<!-- markdown-link-check-enable-->
597+
598+
<!-- prettier-ignore-start -->
599+
600+
!!! warning "Removed Properties"
601+
The properties `profile_name`, `region_name`, `aws_access_key_id`, `aws_secret_access_key`, and `aws_session_token` were deprecated and removed in 0.8.0
602+
603+
<!-- prettier-ignore-end -->
604+
605+
An example usage of the S3Tables Catalog is shown below:
606+
607+
```python
608+
from pyiceberg.catalog.s3tables import S3TablesCatalog
609+
import pyarrow as pa
610+
611+
612+
table_bucket_arn: str = "..."
613+
aws_region: str = "..."
614+
615+
properties = {"s3tables.warehouse": table_bucket_arn, "s3tables.region": aws_region}
616+
catalog = S3TablesCatalog(name="s3tables_catalog", **properties)
617+
618+
database_name = "prod"
619+
620+
catalog.create_namespace(namespace=database_name)
621+
622+
pyarrow_table = pa.Table.from_arrays(
623+
[
624+
pa.array([None, "A", "B", "C"]),
625+
pa.array([1, 2, 3, 4]),
626+
pa.array([True, None, False, True]),
627+
pa.array([None, "A", "B", "C"]),
628+
],
629+
schema=pa.schema(
630+
[
631+
pa.field("foo", pa.large_string(), nullable=True),
632+
pa.field("bar", pa.int32(), nullable=False),
633+
pa.field("baz", pa.bool_(), nullable=True),
634+
pa.field("large", pa.large_string(), nullable=True),
635+
]
636+
),
637+
)
638+
639+
identifier = (database_name, "orders")
640+
table = catalog.create_table(identifier=identifier, schema=pyarrow_table.schema)
641+
table.append(pyarrow_table)
642+
```
643+
546644
### Custom Catalog Implementations
547645

548646
If you want to load any custom catalog implementation, you can set catalog configurations like the following:

0 commit comments

Comments
 (0)