-
Notifications
You must be signed in to change notification settings - Fork 414
Description
Apache Iceberg version
None
Please describe the bug 🐞
Bug Description
Invoking multiple methods (or the same method multiple times) on an object of pyiceberg.catalog.hive.HiveCatalog when accessing a kerberized HMS results in failed SASL negotiation.
Steps to reproduce
- Install
pyicebergandkerberospython wrapper:
$ pip install "pyiceberg[hive-kerberos,pyarrow]==0.9.0rc3"
$ pip install "kerberos>=1.3.0"- Initialize
HiveCatalog:
from pyiceberg.catalog.hive import HiveCatalog
catalog = HiveCatalog(
name="hive",
**{
"uri": "thrift://hms:9083",
"hive.kerberos-authentication": "true"
},
)- Invoke multiple methods (or the same method multiple times) that use the
_HiveClientvia a context manager:
Specifically:iceberg-python/pyiceberg/catalog/hive.py
Lines 701 to 702 in 8bfb16c
with self._client as open_client: return list(map(self.identifier_to_tuple, open_client.get_all_databases()))
catalog.list_namespaces()
catalog.load_table("db.iceberg_table")Expected
Namespaces and tables can be loaded successfully.
Actual
Listing namespaces succeeds but loading the table results in:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
----> 1 catalog.load_table("db.iceberg_table")
File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/pyiceberg/catalog/hive.py:573, in HiveCatalog.load_table(self, identifier)
557 """Load the table's metadata and return the table instance.
558
559 You can also use this method to check for table existence using 'try catalog.table() except TableNotFoundError'.
(...)
569 NoSuchTableError: If a table with the name does not exist, or the identifier is invalid.
570 """
571 database_name, table_name = self.identifier_to_database_and_table(identifier, NoSuchTableError)
--> 573 with self._client as open_client:
574 hive_table = self._get_hive_table(open_client, database_name, table_name)
576 return self._convert_hive_into_iceberg(hive_table)
File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/pyiceberg/catalog/hive.py:170, in _HiveClient.__enter__(self)
169 def __enter__(self) -> Client:
--> 170 self._transport.open()
171 if self._ugi:
172 self._client.set_ugi(*self._ugi)
File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/thrift/transport/TTransport.py:381, in TSaslClientTransport.open(self)
378 self.transport.open()
380 self.send_sasl_msg(self.START, bytes(self.sasl.mechanism, 'ascii'))
--> 381 self.send_sasl_msg(self.OK, self.sasl.process())
383 while True:
384 status, challenge = self.recv_sasl_msg()
File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/client.py:16, in _require_mech.<locals>.wrapped(self, *args, **kwargs)
14 if not self._chosen_mech:
15 raise SASLError("A mechanism has not been chosen yet")
---> 16 return f(self, *args, **kwargs)
File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/client.py:148, in SASLClient.process(self, challenge)
137 @_require_mech
138 def process(self, challenge=None):
139 """
140 Process a challenge from the server during SASL negotiation.
141 A response will be returned which should typically be sent to the
(...)
146 to be sent to the server.
147 """
--> 148 return self._chosen_mech.process(challenge)
File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/mechanisms.py:510, in GSSAPIMechanism.process(self, challenge)
507 self._have_negotiated_details = True
508 return base64.b64decode(_negotiated_details)
--> 510 challenge = base64.b64encode(challenge).decode('ascii') # kerberos methods expect strings, not bytes
511 if self.user is None:
512 ret = kerberos.authGSSClientStep(self.context, challenge)
File ~/.conda/envs/iceberg-env/lib/python3.10/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'NoneType'
Additional comments
It seems that this happens because the transport gets closed every time we exit the context manager for _HiveClient and thrift.transport.TTransport.TSaslClientTransport doesn't seem to support re-opening as this error can also be reproduced outside of pyiceberg with:
from thrift.transport import TSocket, TTransport
from urllib.parse import urlparse
uri = "thrift://hms:9083"
url_parts = urlparse(uri)
socket = TSocket.TSocket(url_parts.hostname, url_parts.port)
transport = TTransport.TSaslClientTransport(
socket, host=url_parts.hostname, service="hive"
)
transport.open()
transport.close()
transport.open()So it looks the transport needs to be re-created instead of re-opened in _HiveClient.__enter__?
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time