|
1 | 1 | # PyCapsule Read Failure Analysis |
2 | 2 |
|
3 | 3 | ## Symptom |
4 | | -Running `python examples/pycapsule_failure.py` terminates the interpreter with a segmentation fault. |
| 4 | +Running `python examples/pycapsule_failure.py` now fails immediately with |
| 5 | +``ValueError: Table provider capsule is missing a destructor`` when |
| 6 | +``SessionContext.read_table`` attempts to coerce the fabricated capsule. |
5 | 7 |
|
6 | 8 | ## Regression Surface |
7 | | -Commit range `9b4f1442^..d629ced2` replaced the `Table` wrapper-based API with a generic constructor that accepts arbitrary Python objects and auto-discovers how to turn them into a `TableProvider`. The new `PyTable::new` implementation in Rust now attempts to coerce any object that exposes `__datafusion_table_provider__` into an FFI provider. 【F:src/table.rs†L54-L77】 |
| 9 | +Commit range `9b4f1442^..6e449da5` teaches |
| 10 | +``table_provider_from_pycapsule`` to recognize raw ``PyCapsule`` instances |
| 11 | +before checking for a ``__datafusion_table_provider__`` attribute. |
| 12 | +【F:src/utils.rs†L205-L222】 As soon as ``SessionContext.read_table`` sees the |
| 13 | +dummy capsule exposed by the example it now calls straight into the capsule |
| 14 | +conversion path without first wrapping it in ``Table``. |
8 | 15 |
|
9 | 16 | ## Root Cause |
10 | | -`table_provider_from_pycapsule` only validates the capsule name before transmuting its pointer into an `FFI_TableProvider`. 【F:src/utils.rs†L127-L141】 The helper assumes the capsule contains a valid `FFI_TableProvider` allocation created by our bindings. The regression example fabricates a capsule with the correct name but with an arbitrary pointer (`ctypes.create_string_buffer`). 【F:examples/pycapsule_failure.py†L8-L24】 Because the new constructor now reaches this path for any `read_table` call, the bogus pointer is dereferenced immediately, corrupting memory and crashing the interpreter. |
11 | | - |
12 | | -Prior to the refactor, callers could not pass arbitrary capsule-bearing objects to `SessionContext.read_table`; they first had to wrap them in `Table`/`RawTable`, which were only constructible through safe helpers that produced trusted capsules. The new auto-coercion path therefore widened the attack surface to unvalidated capsules, exposing the latent unsafety. |
| 17 | +The new fast-path reuses ``table_provider_from_capsule``, which enforces that |
| 18 | +the capsule has a destructor originating from ``datafusion_ffi``’s helper. |
| 19 | +【F:src/utils.rs†L193-L205】 The regression example still fabricates its capsule |
| 20 | +with ``PyCapsule_New`` and passes ``NULL`` for the destructor. |
| 21 | +【F:examples/pycapsule_failure.py†L10-L24】 Because no release callback is |
| 22 | +registered, the validation now fails with the ``missing a destructor`` error. |
| 23 | +Prior to the change, the example first wrapped the capsule in ``Table`` which |
| 24 | +never triggered the destructor check, masking the issue. |
13 | 25 |
|
14 | 26 | ## Runtime failure after 91b90f44 |
15 | | -Commit 91b90f44 changed :meth:`SessionContext.read_table` so that any object |
16 | | -exposing ``__datafusion_table_provider__`` is normalized through |
17 | | -``Table.from_table_provider_capsule`` before delegating to the Rust context. |
18 | | -【F:python/datafusion/context.py†L1189-L1198】 That helper now calls into the |
19 | | -private binding ``df_internal.catalog.RawTable.from_table_provider_capsule`` to |
20 | | -wrap the capsule, but the ``RawTable`` type exported from |
21 | | -``datafusion._internal`` does not currently expose such a constructor. |
22 | | -【F:python/datafusion/catalog.py†L176-L187】 At runtime the lookup therefore |
23 | | -raises ``AttributeError`` and prevents `examples/pycapsule_failure.py` from |
24 | | -running, regressing the original reproducer from a segfault into a hard failure. |
25 | | - |
26 | 27 | ## Suggested Tasks |
27 | | -1. Export a ``RawTable.from_table_provider_capsule`` constructor from the Rust |
28 | | - bindings and ensure it becomes available through |
29 | | - ``datafusion._internal.catalog`` during the wheel build so that the Python |
30 | | - shim can locate it. |
31 | | -2. Add an integration test that imports ``datafusion._internal`` and asserts |
32 | | - ``hasattr(df_internal.catalog.RawTable, "from_table_provider_capsule")`` |
33 | | - before exercising ``SessionContext.read_table`` with a raw capsule to catch |
34 | | - regressions. |
35 | | -3. Consider extending ``table_provider_from_pycapsule`` so that |
36 | | - ``RawTable.__new__`` can directly accept capsule instances (without going |
37 | | - through the static helper) to reduce the surface area for Python/Rust API |
38 | | - skew in the future. |
| 28 | +1. Provide a public helper (Python or Rust) that fabricates a minimal, but |
| 29 | + valid, table-provider capsule so that examples/tests no longer rely on |
| 30 | + ``PyCapsule_New`` with a ``NULL`` destructor. |
| 31 | +2. Document the requirement that all externally supplied table-provider |
| 32 | + capsules must originate from ``datafusion_ffi`` helpers so the release hook |
| 33 | + is present; include guidance in ``examples/pycapsule_failure.py``. |
| 34 | +3. Add a regression test that round-trips a capsule produced by the new helper |
| 35 | + through ``SessionContext.read_table`` to ensure the destructor validation and |
| 36 | + success path both work as intended. |
0 commit comments