Skip to content

SASL negotiation fails when invoking multiple methods on pyiceberg.catalog.hive.HiveCatalog #1744

@mnzpk

Description

@mnzpk

Apache Iceberg version

None

Please describe the bug 🐞

Bug Description

Invoking multiple methods (or the same method multiple times) on an object of pyiceberg.catalog.hive.HiveCatalog when accessing a kerberized HMS results in failed SASL negotiation.

Steps to reproduce

  1. Install pyiceberg and kerberos python wrapper:
$ pip install "pyiceberg[hive-kerberos,pyarrow]==0.9.0rc3"
$ pip install "kerberos>=1.3.0"
  1. Initialize HiveCatalog:
from pyiceberg.catalog.hive import HiveCatalog

catalog = HiveCatalog(
    name="hive",
    **{
        "uri": "thrift://hms:9083",
        "hive.kerberos-authentication": "true"
    },
)
  1. Invoke multiple methods (or the same method multiple times) that use the_HiveClient via a context manager:
    Specifically:
    with self._client as open_client:
    return list(map(self.identifier_to_tuple, open_client.get_all_databases()))
catalog.list_namespaces()
catalog.load_table("db.iceberg_table")

Expected

Namespaces and tables can be loaded successfully.

Actual

Listing namespaces succeeds but loading the table results in:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
----> 1 catalog.load_table("db.iceberg_table")

File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/pyiceberg/catalog/hive.py:573, in HiveCatalog.load_table(self, identifier)
    557 """Load the table's metadata and return the table instance.
    558 
    559 You can also use this method to check for table existence using 'try catalog.table() except TableNotFoundError'.
   (...)
    569     NoSuchTableError: If a table with the name does not exist, or the identifier is invalid.
    570 """
    571 database_name, table_name = self.identifier_to_database_and_table(identifier, NoSuchTableError)
--> 573 with self._client as open_client:
    574     hive_table = self._get_hive_table(open_client, database_name, table_name)
    576 return self._convert_hive_into_iceberg(hive_table)

File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/pyiceberg/catalog/hive.py:170, in _HiveClient.__enter__(self)
    169 def __enter__(self) -> Client:
--> 170     self._transport.open()
    171     if self._ugi:
    172         self._client.set_ugi(*self._ugi)

File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/thrift/transport/TTransport.py:381, in TSaslClientTransport.open(self)
    378     self.transport.open()
    380 self.send_sasl_msg(self.START, bytes(self.sasl.mechanism, 'ascii'))
--> 381 self.send_sasl_msg(self.OK, self.sasl.process())
    383 while True:
    384     status, challenge = self.recv_sasl_msg()

File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/client.py:16, in _require_mech.<locals>.wrapped(self, *args, **kwargs)
     14 if not self._chosen_mech:
     15     raise SASLError("A mechanism has not been chosen yet")
---> 16 return f(self, *args, **kwargs)

File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/client.py:148, in SASLClient.process(self, challenge)
    137 @_require_mech
    138 def process(self, challenge=None):
    139     """
    140     Process a challenge from the server during SASL negotiation.
    141     A response will be returned which should typically be sent to the
   (...)
    146     to be sent to the server.
    147     """
--> 148     return self._chosen_mech.process(challenge)

File ~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/mechanisms.py:510, in GSSAPIMechanism.process(self, challenge)
    507     self._have_negotiated_details = True
    508     return base64.b64decode(_negotiated_details)
--> 510 challenge = base64.b64encode(challenge).decode('ascii')  # kerberos methods expect strings, not bytes
    511 if self.user is None:
    512     ret = kerberos.authGSSClientStep(self.context, challenge)

File ~/.conda/envs/iceberg-env/lib/python3.10/base64.py:58, in b64encode(s, altchars)
     51 def b64encode(s, altchars=None):
     52     """Encode the bytes-like object s using Base64 and return a bytes object.
     53 
     54     Optional altchars should be a byte string of length 2 which specifies an
     55     alternative alphabet for the '+' and '/' characters.  This allows an
     56     application to e.g. generate url or filesystem safe Base64 strings.
     57     """
---> 58     encoded = binascii.b2a_base64(s, newline=False)
     59     if altchars is not None:
     60         assert len(altchars) == 2, repr(altchars)

TypeError: a bytes-like object is required, not 'NoneType'

Additional comments

It seems that this happens because the transport gets closed every time we exit the context manager for _HiveClient and thrift.transport.TTransport.TSaslClientTransport doesn't seem to support re-opening as this error can also be reproduced outside of pyiceberg with:

from thrift.transport import TSocket, TTransport
from urllib.parse import urlparse

uri = "thrift://hms:9083"
url_parts = urlparse(uri)
socket = TSocket.TSocket(url_parts.hostname, url_parts.port)
transport = TTransport.TSaslClientTransport(
    socket, host=url_parts.hostname, service="hive"
)

transport.open()
transport.close()
transport.open()

So it looks the transport needs to be re-created instead of re-opened in _HiveClient.__enter__?

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions