Skip to content

Commit ea0ce8f

Browse files
committed
Implement Kerberos authentication support for Hive Catalog
1 parent 42afc43 commit ea0ce8f

File tree

5 files changed

+102
-36
lines changed

5 files changed

+102
-36
lines changed

mkdocs/docs/configuration.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -227,19 +227,19 @@ catalog:
227227
catalog:
228228
default:
229229
uri: thrift://localhost:9083
230-
s3.endpoint: http://localhost:9000
231-
s3.access-key-id: admin
232-
s3.secret-access-key: password
230+
hive:
231+
hive2-compatible: true
232+
use-kerberos: true
233233
```
234234

235-
When using Hive 2.x, make sure to set the compatibility flag:
235+
<!-- markdown-link-check-disable -->
236236

237-
```yaml
238-
catalog:
239-
default:
240-
...
241-
hive.hive2-compatible: true
242-
```
237+
| Key | Example | Description |
238+
| --------------------- | ------------------- | ------------------------------------------------ |
239+
| hive.hive2-compatible | true | Using Hive 2.x compatibility mode |
240+
| hive.use-kerberos | true | Using authentication via Kerberos |
241+
242+
<!-- markdown-link-check-enable-->
243243

244244
## Glue Catalog
245245

mkdocs/docs/index.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -40,22 +40,23 @@ pip install "pyiceberg[s3fs,hive]"
4040

4141
You can mix and match optional dependencies depending on your needs:
4242

43-
| Key | Description: |
44-
| ------------ | -------------------------------------------------------------------- |
45-
| hive | Support for the Hive metastore |
46-
| glue | Support for AWS Glue |
47-
| dynamodb | Support for AWS DynamoDB |
48-
| sql-postgres | Support for SQL Catalog backed by Postgresql |
49-
| sql-sqlite | Support for SQL Catalog backed by SQLite |
50-
| pyarrow | PyArrow as a FileIO implementation to interact with the object store |
51-
| pandas | Installs both PyArrow and Pandas |
52-
| duckdb | Installs both PyArrow and DuckDB |
53-
| ray | Installs PyArrow, Pandas, and Ray |
54-
| daft | Installs Daft |
55-
| s3fs | S3FS as a FileIO implementation to interact with the object store |
56-
| adlfs | ADLFS as a FileIO implementation to interact with the object store |
57-
| snappy | Support for snappy Avro compression |
58-
| gcsfs | GCSFS as a FileIO implementation to interact with the object store |
43+
| Key | Description: |
44+
| ------------- | -------------------------------------------------------------------- |
45+
| hive | Support for the Hive metastore |
46+
| hive-kerberos | Support for Hive metastore in Kerberos environment |
47+
| glue | Support for AWS Glue |
48+
| dynamodb | Support for AWS DynamoDB |
49+
| sql-postgres | Support for SQL Catalog backed by Postgresql |
50+
| sql-sqlite | Support for SQL Catalog backed by SQLite |
51+
| pyarrow | PyArrow as a FileIO implementation to interact with the object store |
52+
| pandas | Installs both PyArrow and Pandas |
53+
| duckdb | Installs both PyArrow and DuckDB |
54+
| ray | Installs PyArrow, Pandas, and Ray |
55+
| daft | Installs Daft |
56+
| s3fs | S3FS as a FileIO implementation to interact with the object store |
57+
| adlfs | ADLFS as a FileIO implementation to interact with the object store |
58+
| snappy | Support for snappy Avro compression |
59+
| gcsfs | GCSFS as a FileIO implementation to interact with the object store |
5960

6061
You either need to install `s3fs`, `adlfs`, `gcsfs`, or `pyarrow` to be able to fetch files from an object store.
6162

poetry.lock

Lines changed: 46 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyiceberg/catalog/hive.py

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,9 @@
122122
HIVE2_COMPATIBLE = "hive.hive2-compatible"
123123
HIVE2_COMPATIBLE_DEFAULT = False
124124

125+
HIVE_KERBEROS_AUTH = "hive.use-kerberos"
126+
HIVE_KERBEROS_AUTH_DEFAULT = False
127+
125128
LOCK_CHECK_MIN_WAIT_TIME = "lock-check-min-wait-time"
126129
LOCK_CHECK_MAX_WAIT_TIME = "lock-check-max-wait-time"
127130
LOCK_CHECK_RETRIES = "lock-check-retries"
@@ -139,11 +142,26 @@ class _HiveClient:
139142
_client: Client
140143
_ugi: Optional[List[str]]
141144

142-
def __init__(self, uri: str, ugi: Optional[str] = None):
145+
def __init__(
146+
self,
147+
uri: str,
148+
ugi: Optional[str] = None,
149+
use_kerberos: Optional[bool] = HIVE_KERBEROS_AUTH_DEFAULT
150+
):
143151
url_parts = urlparse(uri)
152+
144153
transport = TSocket.TSocket(url_parts.hostname, url_parts.port)
145-
self._transport = TTransport.TBufferedTransport(transport)
146-
protocol = TBinaryProtocol.TBinaryProtocol(transport)
154+
155+
if not use_kerberos:
156+
self._transport = TTransport.TBufferedTransport(transport)
157+
else:
158+
self._transport = TTransport.TSaslClientTransport(
159+
transport,
160+
host=url_parts.hostname,
161+
service="hive"
162+
)
163+
164+
protocol = TBinaryProtocol.TBinaryProtocol(self._transport)
147165

148166
self._client = Client(protocol)
149167
self._ugi = ugi.split(':') if ugi else None
@@ -258,7 +276,11 @@ class HiveCatalog(MetastoreCatalog):
258276

259277
def __init__(self, name: str, **properties: str):
260278
super().__init__(name, **properties)
261-
self._client = _HiveClient(properties["uri"], properties.get("ugi"))
279+
self._client = _HiveClient(
280+
properties["uri"],
281+
properties.get("ugi"),
282+
properties.get(HIVE_KERBEROS_AUTH, HIVE_KERBEROS_AUTH_DEFAULT)
283+
)
262284

263285
self._lock_check_min_wait_time = PropertyUtil.property_as_float(
264286
properties, LOCK_CHECK_MIN_WAIT_TIME, DEFAULT_LOCK_CHECK_MIN_WAIT_TIME

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ gcsfs = { version = ">=2023.1.0,<2024.1.0", optional = true }
7272
psycopg2-binary = { version = ">=2.9.6", optional = true }
7373
sqlalchemy = { version = "^2.0.18", optional = true }
7474
getdaft = { version = ">=0.2.12", optional = true }
75+
thrift-sasl = { version = ">=0.4.3", optional = true }
76+
kerberos = { version = "1.3.1", optional = true }
7577

7678
[tool.poetry.group.dev.dependencies]
7779
pytest = "7.4.4"
@@ -580,6 +582,7 @@ ray = ["ray", "pyarrow", "pandas"]
580582
daft = ["getdaft"]
581583
snappy = ["python-snappy"]
582584
hive = ["thrift"]
585+
hive-kerberos = ["thrift", "thrift_sasl", "kerberos"]
583586
s3fs = ["s3fs"]
584587
glue = ["boto3", "mypy-boto3-glue"]
585588
adlfs = ["adlfs"]

0 commit comments

Comments
 (0)