Skip to content

Commit 5b0e622

Browse files
committed
docs: add s3tables catalog
1 parent cf020b3 commit 5b0e622

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed

mkdocs/docs/configuration.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,104 @@ catalog:
524524

525525
<!-- prettier-ignore-end -->
526526

527+
### S3Tables Catalog
528+
529+
The S3Tables Catalog leverages the catalog functionalities of the Amazon S3Tables service and requires an existing S3 Tables Bucket to operate.
530+
531+
To use Amazon S3Tables as your catalog, you can configure pyiceberg using one of the following methods. Additionally, refer to the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) on configuring credentials to set up your AWS account credentials locally.
532+
533+
If you intend to use the same credentials for both the S3Tables Catalog and S3 FileIO, you can configure the [`client.*` properties](configuration.md#unified-aws-credentials) to streamline the process.
534+
535+
Note that the S3Tables Catalog manages the underlying table locations internally, which makes it incompatible with S3-like storage systems such as MinIO. If you specify the `s3tables.endpoint`, ensure that the `s3.endpoint` is configured accordingly.
536+
537+
```yaml
538+
catalog:
539+
default:
540+
type: s3tables
541+
warehouse: arn:aws:s3tables:us-east-1:012345678901:bucket/pyiceberg-catalog
542+
```
543+
544+
If you prefer to pass the credentials explicitly to the client instead of relying on environment variables,
545+
546+
```yaml
547+
catalog:
548+
default:
549+
type: s3tables
550+
s3tables.access-key-id: <ACCESS_KEY_ID>
551+
s3tables.secret-access-key: <SECRET_ACCESS_KEY>
552+
s3tables.session-token: <SESSION_TOKEN>
553+
s3tables.region: <REGION_NAME>
554+
s3tables.endpoint: http://localhost:9000
555+
s3.endpoint: http://localhost:9000
556+
```
557+
558+
<!-- prettier-ignore-start -->
559+
560+
!!! Note "Client-specific Properties"
561+
`s3tables.*` properties are for S3TablesCatalog only. If you want to use the same credentials for both S3TablesCatalog and S3 FileIO, you can set the `client.*` properties. See the [Unified AWS Credentials](configuration.md#unified-aws-credentials) section for more details.
562+
563+
<!-- prettier-ignore-end -->
564+
565+
<!-- markdown-link-check-disable -->
566+
567+
| Key | Example | Description |
568+
| -------------------------- | ------------------- | -------------------------------------------------------------------------- |
569+
| s3tables.profile-name | default | Configure the static profile used to access the S3Tables Catalog |
570+
| s3tables.region | us-east-1 | Set the region of the S3Tables Catalog |
571+
| s3tables.access-key-id | admin | Configure the static access key id used to access the S3Tables Catalog |
572+
| s3tables.secret-access-key | password | Configure the static secret access key used to access the S3Tables Catalog |
573+
| s3tables.session-token | AQoDYXdzEJr... | Configure the static session token used to access the S3Tables Catalog |
574+
| s3tables.endpoint | <http://localhost>... | Configure the AWS endpoint |
575+
| s3tables.warehouse | arn:aws:s3tables... | Set the underlying S3 Table Bucket |
576+
577+
<!-- markdown-link-check-enable-->
578+
579+
<!-- prettier-ignore-start -->
580+
581+
!!! warning "Removed Properties"
582+
The properties `profile_name`, `region_name`, `aws_access_key_id`, `aws_secret_access_key`, and `aws_session_token` were deprecated and removed in 0.8.0
583+
584+
<!-- prettier-ignore-end -->
585+
586+
An example usage of the S3Tables Catalog is shown below:
587+
588+
```python
589+
from pyiceberg.catalog.s3tables import S3TablesCatalog
590+
import pyarrow as pa
591+
592+
593+
table_bucket_arn: str = "..."
594+
aws_region: str = "..."
595+
596+
properties = {"s3tables.warehouse": table_bucket_arn, "s3tables.region": aws_region}
597+
catalog = S3TablesCatalog(name="s3tables_catalog", **properties)
598+
599+
database_name = "prod"
600+
601+
catalog.create_namespace(namespace=database_name)
602+
603+
pyarrow_table = pa.Table.from_arrays(
604+
[
605+
pa.array([None, "A", "B", "C"]),
606+
pa.array([1, 2, 3, 4]),
607+
pa.array([True, None, False, True]),
608+
pa.array([None, "A", "B", "C"]),
609+
],
610+
schema=pa.schema(
611+
[
612+
pa.field("foo", pa.large_string(), nullable=True),
613+
pa.field("bar", pa.int32(), nullable=False),
614+
pa.field("baz", pa.bool_(), nullable=True),
615+
pa.field("large", pa.large_string(), nullable=True),
616+
]
617+
),
618+
)
619+
620+
identifier = (database_name, "orders")
621+
table = catalog.create_table(identifier=identifier, schema=pyarrow_table.schema)
622+
table.append(pyarrow_table)
623+
```
624+
527625
### Custom Catalog Implementations
528626

529627
If you want to load any custom catalog implementation, you can set catalog configurations like the following:

0 commit comments

Comments
 (0)