Skip to content

Conversation

@gabeiglio
Copy link

No description provided.

dependabot bot and others added 30 commits August 13, 2025 11:42
Potential fix for apache#1804

Might want to write a test, but not sure yet how to reproduce without
using glue.

Closes apache#1804
# Rationale for this change

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
Bumps [duckdb](https://github.com/duckdb/duckdb) from 1.2.1 to 1.2.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/duckdb/duckdb/releases">duckdb's
releases</a>.</em></p>
<blockquote>
<h2>v1.2.2 Bugfix Release</h2>
<p>This is a bug fix release for various issues discovered after we
released 1.2.1. There are no new major features, just bug fixes.
Database files created by DuckDB versions all the way back to v0.9.* can
be read by this version.</p>
<h2>What's Changed</h2>
<ul>
<li>[Python] Fix deadlock in <code>from_parquet</code> with
<code>file_globs</code> caused by not releasing the GIL by <a
href="https://github.com/Tishj"><code>@​Tishj</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16522">duckdb/duckdb#16522</a></li>
<li>[Python] Fix some stubs issues by <a
href="https://github.com/Tishj"><code>@​Tishj</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16523">duckdb/duckdb#16523</a></li>
<li>[Dev] Fix issues in the benchmark runner related to the serialized
<code>storage_version</code> by <a
href="https://github.com/Tishj"><code>@​Tishj</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16533">duckdb/duckdb#16533</a></li>
<li>Set estimated cardinality to newly created logical operators by <a
href="https://github.com/jeewonhh"><code>@​jeewonhh</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16528">duckdb/duckdb#16528</a></li>
<li>custom_extension_repository to also be the default
autoinstall_extension_repository by <a
href="https://github.com/carlopi"><code>@​carlopi</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16459">duckdb/duckdb#16459</a></li>
<li>[Python Dev] No longer trigger a DeprecationWarning when using a UDF
by <a href="https://github.com/Tishj"><code>@​Tishj</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16433">duckdb/duckdb#16433</a></li>
<li>Fixup problems connected to prep to 1.2.1 by <a
href="https://github.com/carlopi"><code>@​carlopi</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16578">duckdb/duckdb#16578</a></li>
<li>Give preference to quote=escape if we can't do better by <a
href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16584">duckdb/duckdb#16584</a></li>
<li>Max ART key length by <a
href="https://github.com/taniabogatsch"><code>@​taniabogatsch</code></a>
in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16588">duckdb/duckdb#16588</a></li>
<li>Issue <a
href="https://redirect.github.com/duckdb/duckdb/issues/16617">#16617</a>:
Window Constant Finalize by <a
href="https://github.com/hawkfish"><code>@​hawkfish</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16628">duckdb/duckdb#16628</a></li>
<li>[Fix] Index scan - Move the table scan state into the local state by
<a
href="https://github.com/taniabogatsch"><code>@​taniabogatsch</code></a>
in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16650">duckdb/duckdb#16650</a></li>
<li>[Fix] Perform eager constraint checking during physical batch insert
by <a
href="https://github.com/taniabogatsch"><code>@​taniabogatsch</code></a>
in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16651">duckdb/duckdb#16651</a></li>
<li>[Python] Don't push down <code>OPTIONAL_FILTER</code> into
<code>pyarrow</code> for the arrow scan. by <a
href="https://github.com/Tishj"><code>@​Tishj</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16657">duckdb/duckdb#16657</a></li>
<li>move OrderPreservationRecursive to physical_plan_generator.hpp by <a
href="https://github.com/jeewonhh"><code>@​jeewonhh</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16656">duckdb/duckdb#16656</a></li>
<li>Add libs folder to bundled static libs by <a
href="https://github.com/taniabogatsch"><code>@​taniabogatsch</code></a>
in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16655">duckdb/duckdb#16655</a></li>
<li>Avoid UMA by <a
href="https://github.com/krlmlr"><code>@​krlmlr</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16641">duckdb/duckdb#16641</a></li>
<li>Avoid UMA by <a
href="https://github.com/krlmlr"><code>@​krlmlr</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16643">duckdb/duckdb#16643</a></li>
<li>Avoid UMA by <a
href="https://github.com/krlmlr"><code>@​krlmlr</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16642">duckdb/duckdb#16642</a></li>
<li>Several fixes for CSV fuzzer tests by <a
href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16636">duckdb/duckdb#16636</a></li>
<li>FSST Fix: Correctly detect the situation where we have 3 bytes
remaining by <a
href="https://github.com/Mytherin"><code>@​Mytherin</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16688">duckdb/duckdb#16688</a></li>
<li>Fix <a
href="https://redirect.github.com/duckdb/duckdb/issues/16627">#16627</a>:
handle DISTINCT FROM and NOT DISTINCT FROM in zone-map pushdown by <a
href="https://github.com/Mytherin"><code>@​Mytherin</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16691">duckdb/duckdb#16691</a></li>
<li>Fix <a
href="https://redirect.github.com/duckdb/duckdb/issues/16554">#16554</a>:
emit blobs as part of .dump by <a
href="https://github.com/Mytherin"><code>@​Mytherin</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16693">duckdb/duckdb#16693</a></li>
<li>add avro by <a
href="https://github.com/samansmink"><code>@​samansmink</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16708">duckdb/duckdb#16708</a></li>
<li>Issue <a
href="https://redirect.github.com/duckdb/duckdb/issues/16652">#16652</a>:
Implicit Ordered Aggregation by <a
href="https://github.com/hawkfish"><code>@​hawkfish</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16718">duckdb/duckdb#16718</a></li>
<li>Issue <a
href="https://redirect.github.com/duckdb/duckdb/issues/16649">#16649</a>:
SelectNth Remainders by <a
href="https://github.com/hawkfish"><code>@​hawkfish</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16705">duckdb/duckdb#16705</a></li>
<li>[Dev] Fix <code>EXPORT DATABASE</code> missing semicolons in
produced <code>schema.sql</code> by <a
href="https://github.com/Tishj"><code>@​Tishj</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16723">duckdb/duckdb#16723</a></li>
<li>Always throw the error that happens the earliest in the CSV Reader
by <a href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16728">duckdb/duckdb#16728</a></li>
<li>Fix <a
href="https://redirect.github.com/duckdb/duckdb/issues/16662">#16662</a>
by <a href="https://github.com/lnkuiper"><code>@​lnkuiper</code></a> in
<a
href="https://redirect.github.com/duckdb/duckdb/pull/16732">duckdb/duckdb#16732</a></li>
<li>[Test] Add missing test for eager-constraint checking fix by <a
href="https://github.com/taniabogatsch"><code>@​taniabogatsch</code></a>
in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16745">duckdb/duckdb#16745</a></li>
<li>Clamp reported memory in duckdb_memory by <a
href="https://github.com/Mytherin"><code>@​Mytherin</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16753">duckdb/duckdb#16753</a></li>
<li>CLI <code>-help</code>: Fix default value for -nullvalue by <a
href="https://github.com/carlopi"><code>@​carlopi</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16755">duckdb/duckdb#16755</a></li>
<li>Apply RLE fix to v1.2 by <a
href="https://github.com/Mytherin"><code>@​Mytherin</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16756">duckdb/duckdb#16756</a></li>
<li>[Arrow] Setting schema of the keys to not null for maps and properly
write Null columns to arrow by <a
href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16711">duckdb/duckdb#16711</a></li>
<li>Fix min/max values in numeric cast exception message by <a
href="https://github.com/abramk"><code>@​abramk</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16777">duckdb/duckdb#16777</a></li>
<li>[ADBC] Add wrapper to connection new, set options at connection
init, if any. by <a
href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16748">duckdb/duckdb#16748</a></li>
<li>Remove delta from extensions built on a nightly basis (vs
1.2-histrionicus) by <a
href="https://github.com/carlopi"><code>@​carlopi</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16794">duckdb/duckdb#16794</a></li>
<li><code>PhysicalTopN</code>: Buffer-allocated <code>StringHeap</code>
by <a href="https://github.com/lnkuiper"><code>@​lnkuiper</code></a> in
<a
href="https://redirect.github.com/duckdb/duckdb/pull/16770">duckdb/duckdb#16770</a></li>
<li>[chore] Add v1.2.2 to storage versions, preparation for upcoming
patch release by <a
href="https://github.com/carlopi"><code>@​carlopi</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16799">duckdb/duckdb#16799</a></li>
<li>Fix issue when line is empty by <a
href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16823">duckdb/duckdb#16823</a></li>
<li>Add extra check in normalize for commit, rollback and abort by <a
href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16802">duckdb/duckdb#16802</a></li>
<li>Fix <a
href="https://redirect.github.com/duckdb/duckdb/issues/16783">#16783</a>:
Fix DistributivityRule by <a
href="https://github.com/flashmouse"><code>@​flashmouse</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16804">duckdb/duckdb#16804</a></li>
<li>Internal <a
href="https://redirect.github.com/duckdb/duckdb/issues/4492">#4492</a>:
Ignore Nulls Orrification by <a
href="https://github.com/hawkfish"><code>@​hawkfish</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16837">duckdb/duckdb#16837</a></li>
<li>Fix <a
href="https://redirect.github.com/duckdb/duckdb/issues/16836">#16836</a>:
rewrite main column data in case of an update that only modifies the
validity by <a
href="https://github.com/Mytherin"><code>@​Mytherin</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16851">duckdb/duckdb#16851</a></li>
<li>Initialize type by <a
href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16848">duckdb/duckdb#16848</a></li>
<li>[CSV Reader] Add check on ever quoted for candidate selection by <a
href="https://github.com/pdet"><code>@​pdet</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16868">duckdb/duckdb#16868</a></li>
<li>Clean up partial deletes when encountering a transaction conflict in
a vector by <a
href="https://github.com/Mytherin"><code>@​Mytherin</code></a> in <a
href="https://redirect.github.com/duckdb/duckdb/pull/16905">duckdb/duckdb#16905</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/duckdb/duckdb/commit/d3970ae863320b456e21de93a692dedc02824565"><code>d3970ae</code></a>
add special handling for comparison against nans to ensure compatibility
with...</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/91ac9db4366d8911038d639ec5e9d89a54c1382c"><code>91ac9db</code></a>
Fix default value for -nullvalue</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/5f6fc8f3be9ed223404a10c5a0f7afbdd06b4b48"><code>5f6fc8f</code></a>
Fix <a
href="https://redirect.github.com/duckdb/duckdb/issues/16554">#16554</a>:
fix blobs as part of .dump</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/01f6a909ba6fe771629180196aed39570883d598"><code>01f6a90</code></a>
looks nicer</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/e6da38a97be8d441e984b19629f128269e6eebac"><code>e6da38a</code></a>
slight cleanup</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/2ba742302a0e23d7ac9437c159b6a095fc94efe8"><code>2ba7423</code></a>
add the adjusted version of the submitted reproduction as a test</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/20445e790a3d1066d01d8f03fd2216050b78ccec"><code>20445e7</code></a>
py::none() is produced for an OPTIONAL_FILTER, skip these inside OR /
AND fil...</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/d9673ba133b6b97cb81ea34dfe66420ae995e7e7"><code>d9673ba</code></a>
[Python Dev] No longer trigger a DeprecationWarning when using a UDF (<a
href="https://redirect.github.com/duckdb/duckdb/issues/16433">#16433</a>)</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/3d1a1a9c2a737f24ff0b81d3acc7e1ced075c46c"><code>3d1a1a9</code></a>
[Python] Fix some stubs issues (<a
href="https://redirect.github.com/duckdb/duckdb/issues/16523">#16523</a>)</li>
<li><a
href="https://github.com/duckdb/duckdb/commit/e5eca77cb7f99d74ced18c0f33e51533538b67cd"><code>e5eca77</code></a>
fix stubs for LambdaExpression</li>
<li>Additional commits viewable in <a
href="https://github.com/duckdb/duckdb/compare/v1.2.1...v1.2.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=duckdb&package-manager=pip&previous-version=1.2.1&new-version=1.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change
apache#1906

# Are these changes tested?
Yes, unit tested

# Are there any user-facing changes?
Not yet

<!-- In the case of user-facing changes, please add the changelog label.
-->
Bumps [typing-extensions](https://github.com/python/typing_extensions)
from 4.13.1 to 4.13.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/python/typing_extensions/releases">typing-extensions's
releases</a>.</em></p>
<blockquote>
<h2>4.13.2</h2>
<ul>
<li>Fix <code>TypeError</code> when taking the union of
<code>typing_extensions.TypeAliasType</code> and a
<code>typing.TypeAliasType</code> on Python 3.12 and 3.13.
Patch by <a href="https://github.com/jorenham">Joren
Hammudoglu</a>.</li>
<li>Backport from CPython PR <a
href="https://redirect.github.com/python/cpython/pull/132160">#132160</a>
to avoid having user arguments shadowed in generated
<code>__new__</code> by
<code>@typing_extensions.deprecated</code>.
Patch by <a href="https://github.com/Viicos">Victorien Plot</a>.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/python/typing_extensions/blob/main/CHANGELOG.md">typing-extensions's
changelog</a>.</em></p>
<blockquote>
<h1>Release 4.13.2 (April 10, 2025)</h1>
<ul>
<li>Fix <code>TypeError</code> when taking the union of
<code>typing_extensions.TypeAliasType</code> and a
<code>typing.TypeAliasType</code> on Python 3.12 and 3.13.
Patch by <a href="https://github.com/jorenham">Joren
Hammudoglu</a>.</li>
<li>Backport from CPython PR <a
href="https://redirect.github.com/python/cpython/pull/132160">#132160</a>
to avoid having user arguments shadowed in generated
<code>__new__</code> by
<code>@typing_extensions.deprecated</code>.
Patch by <a href="https://github.com/Viicos">Victorien Plot</a>.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/python/typing_extensions/commit/4525e9dbbd177b4ef8a84f55ff5fe127582a071d"><code>4525e9d</code></a>
Prepare release 4.13.2 (<a
href="https://redirect.github.com/python/typing_extensions/issues/583">#583</a>)</li>
<li><a
href="https://github.com/python/typing_extensions/commit/88a0c200ceb0ccfe4329d3db8a1a863a2381e44c"><code>88a0c20</code></a>
Do not shadow user arguments in generated <code>__new__</code> by
<code>@deprecated</code> (<a
href="https://redirect.github.com/python/typing_extensions/issues/581">#581</a>)</li>
<li><a
href="https://github.com/python/typing_extensions/commit/281d7b0ca6edad384e641d1066b759c280602919"><code>281d7b0</code></a>
Add 3rd party tests for litestar (<a
href="https://redirect.github.com/python/typing_extensions/issues/578">#578</a>)</li>
<li><a
href="https://github.com/python/typing_extensions/commit/8092c3996f4902ad9c74ac2d1d8dd19371ecbaa3"><code>8092c39</code></a>
fix <code>TypeAliasType</code> union with
<code>typing.TypeAliasType</code> (<a
href="https://redirect.github.com/python/typing_extensions/issues/575">#575</a>)</li>
<li>See full diff in <a
href="https://github.com/python/typing_extensions/compare/4.13.1...4.13.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=typing-extensions&package-manager=pip&previous-version=4.13.1&new-version=4.13.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Rationale for this change

Intermediate fix while the test is broken, reported in
duckdb/duckdb-iceberg#185

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

Found out I broke this myself after doing a `git bisect`:

```
36d383d is the first bad commit
commit 36d383d
Author: Fokko Driesprong <fokko@apache.org>
Date:   Thu Jan 23 07:50:54 2025 +0100

    PyArrow: Avoid buffer-overflow by avoid doing a sort (apache#1555)
    
    Second attempt of apache#1539
    
    This was already being discussed back here:
    apache#208 (comment)
    
    This PR changes from doing a sort, and then a single pass over the table
    to the approach where we determine the unique partition tuples filter on
    them individually.
    
    Fixes apache#1491
    
    Because the sort caused buffers to be joined where it would overflow in
    Arrow. I think this is an issue on the Arrow side, and it should
    automatically break up into smaller buffers. The `combine_chunks` method
    does this correctly.
    
    Now:
    
    ```
    0.42877754200890195
    Run 1 took: 0.2507691659993725
    Run 2 took: 0.24833179199777078
    Run 3 took: 0.24401691700040828
    Run 4 took: 0.2419595829996979
    Average runtime of 0.28 seconds
    ```
    
    Before:
    
    ```
    Run 0 took: 1.0768639159941813
    Run 1 took: 0.8784021250030492
    Run 2 took: 0.8486490420036716
    Run 3 took: 0.8614017910003895
    Run 4 took: 0.8497851670108503
    Average runtime of 0.9 seconds
    ```
    
    So it comes with a nice speedup as well :)
    
    ---------
    
    Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>

 pyiceberg/io/pyarrow.py                    |  129 ++-
 pyiceberg/partitioning.py                  |   39 +-
 pyiceberg/table/__init__.py                |    6 +-
 pyproject.toml                             |    1 +
 tests/benchmark/test_benchmark.py          |   72 ++
 tests/integration/test_partitioning_key.py | 1299 ++++++++++++++--------------
 tests/table/test_locations.py              |    2 +-
 7 files changed, 805 insertions(+), 743 deletions(-)
 create mode 100644 tests/benchmark/test_benchmark.py
```

Closes apache#1917


# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
Closes apache#1744

`TSaslClientTransport` cannot be reopen. This PR changes the behavior to
recreate a `TSaslClientTransport` when its already closed.

Note, `_HiveClient` should be used with context manager, but can be used
without.
# Rationale for this change

@kevinjqliu PTAL. I took the liberty of providing a fix for this since I
was curious where this was coming from, hope you don't mind! I've
cherry-picked your commit with the test.


![image](https://github.com/user-attachments/assets/14227da9-1f4a-4411-88f0-309907d3d332)


Java produces:

```json
{
    "added-data-files": "1",
    "added-files-size": "707",
    "added-records": "2",
    "app-id": "local-1743678304626",
    "changed-partition-count": "1",
    "deleted-data-files": "1",
    "deleted-records": "3",
    "engine-name": "spark",
    "engine-version": "3.5.5",
    "iceberg-version": "Apache Iceberg 1.8.1 (commit 9ce0fcf0af7becf25ad9fc996c3bad2afdcfd33d)",
    "removed-files-size": "693",
    "spark.app.id": "local-1743678304626",
    "total-data-files": "3",
    "total-delete-files": "0",
    "total-equality-deletes": "0",
    "total-files-size": "1993",
    "total-position-deletes": "0",
    "total-records": "4"
}
```

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->

---------

Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
apache#1902)

<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
Closes apache#1884 

# Rationale for this change
table.inspect.entries() fails when table is MOR table and has Delete
Files present in it. Iceberg MOR Table is created via Apache Spark 3.5.0
with Iceberg 1.5.0 and it's being read via PyIceberg 0.9.0 using
StaticTable.from_metadata()


# Are these changes tested?
Yes

# Are there any user-facing changes?
No

<!-- In the case of user-facing changes, please add the changelog label.
-->
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change
Issue is fixed upstream,
duckdb/duckdb-iceberg#185

This reverts commit eb8756a. (apache#1918)

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
This change allow making use of the `version-hint.text` file when a
static table is instantiated with a `metadata_location` not ending with
'.metadata.json'.
User can just point to the table location, and metadata file path will
be read from `version-hint.text`.


Closes apache#763

# Rationale for this change

`version-hint.text` is useful in context where you does not want or need
a full-fledge catalog.
Our use case is sharing datasets publicly as Iceberg tables on S3.

# Are these changes tested?

No yet.

# Are there any user-facing changes?

Yes. User can now points `StaticTable` to the table location rather than
a specific version file.
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

This fix addressed issue apache#1922 and changed the behavior for both
`_oss_fs` and `_s3_fs` to be able to parse `s3.force-virtual-addressing`
correctly.

# Are these changes tested?

No unit tests are given to this change yet

# Are there any user-facing changes?

No

<!-- In the case of user-facing changes, please add the changelog label.
-->
Closes apache#1919

# Rationale for this change
This is an additional good to have feature where users can convert valid
Float and Double strings to Iceberg FloatType and Double Literal.

# Are these changes tested?
Yes. I have also added tests for the same.

# Are there any user-facing changes?
Yes. Users can cast valid float and double strings to `FloatType` and
`DecimalType`
# Rationale for this change

Today, we have a copy of the `TableMetadata` on the `Table` and the
`Transaction`. This PR changes that logic to re-use the one on the
table, and add the changes to the one on the `Transaction`.

This also allows us to stack changes, for example, to first change a
schema, and then write data with the new schema right away.

Also a prerequisite for
apache#1772

# Are these changes tested?

Includes a new test :)

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
Closes apache#1882

# Rationale for this change
Feature request: Ability to pass transform name as string in
`add_fields`

# Are these changes tested?
Yes

# Are there any user-facing changes?
Yes. Users will be able to pass transform names as string while calling
add_fields method of update_spec.
Closes apache#1910 

# Rationale for this change

When working with the GlueCatalog, I may already have a GlueClient that
I've instantiated from elsewhere, and perhaps wish to keep. This allows
passing our client to the GlueCatalog constructor so that we aren't
forced into getting a new client.

This is slightly interesting because it's the only catalog that now has
a different constructor signature. Also it may be odd for users to pass
a client, but then none of their properties (which may have retry
configs) are applied.

An alternative to consider is having a `from_client` or `with_client`
staticmethod, but I did not see precedence elsewhere. I will leave it to
the maintainers to decide which they prefer and will update accordingly.
Similarly, I can do the same for dynamodb 🙂

I've also skipped the event_handler for a user-provided client because I
wouldn't want to impede on their existing events, also the param is
optional. Something to consider is using the [unique_id
arg](https://github.com/boto/botocore/blob/aaa6690e45c8dabcde3a8d2d1aa34b5fd399fba7/botocore/hooks.py#L89)
when registering an event.

> If a ``unique_id`` is given, the handler will not be registered
> if a handler with the ``unique_id`` has already been registered.


# Are these changes tested?

Basic unit test to assert the client passed is the client used.

# Are there any user-facing changes?

I believe so since this is an addition to the public API.
This aligns the implementation with Java.

We had the keywords there mostly for the tests, but they should not be
used, and it seems like that's already the case :'( I was undecided if
the costs of this PR (all the changes), are worth it, but I see more PRs
using the Record in a bad way (example
apache#1743) that might lead to
very subtle bugs where the position might sometime change based on the
ordering of the dict.

Blocked by Eventual-Inc/Daft#3917
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

Closes apache#1744 (second try) 

# Rationale for this change
First try (apache#1747) did not fully resolve the issue. See
apache#1747 (review)

# Are these changes tested?
yes

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->

---------

Co-authored-by: mnzpk <84433140+mnzpk@users.noreply.github.com>
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change
[`poetry.lock`
file](https://github.com/apache/iceberg-python/blob/59742e01e8d190aa759e0154b03300bd81a28d02/poetry.lock#L1)
is generated with poetry 2.1.1
I suspect this is from dependabot. I couldn't find a way to control the
version of poetry dependabot uses, so let's just match the version its
using.
This way updating the lock file locally wont create a bunch of unrelated
changed


# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
Fokko and others added 24 commits August 13, 2025 11:42
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->

---------

Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
# Rationale for this change

Missed this in another PR

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
)

<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
Closes apache#1853 

This adds a new repr function that ensures that `initial-default` and
`write-default` will not appear if they are None. Unfortunately, this
functionality isn't baked into Pydantic.

# Rationale for this change
__repr__ changes may be breaking.

# Are these changes tested?
Tests included.

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
Closes apache#2270 
Related to apache#1045 

# Rationale for this change
This allows us to read nanosecond information from pyarrow. Right now,
we always downcast to microseconds or throw an error. By passing through
the format-version, we can grab nanosecond precision *just for v3
tables*

# Are these changes tested?
Included a test. I can't do a test involving writing since we don't
support v3 writing yet (there's a PR out for that)

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->

---------

Co-authored-by: Fokko Driesprong <fokko@apache.org>
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change
Similar to apache#2299
This PR adds the rest of the parameters to
[`pyarrow.fs.AzureFileSystem`](https://arrow.apache.org/docs/python/generated/pyarrow.fs.AzureFileSystem.html)

Note the [Azure Data Lake configuration
page](https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md#azure-data-lake)
already has these 3 parameters

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
…he#2143)

<!--
Thanks for opening a pull request!
-->

<!-- Closes apache#2150 -->

# Rationale for this change

- Consolidates snapshot expiration functionality from the standalone
`ExpireSnapshots` class into the `MaintenanceTable` class for a unified
maintenance API.
- Resolves planned work left over from apache#1880, and closes
apache#2142
- Achieves feature and API parity with the Java implementation for
snapshot retention and table maintenance.

# Features & Enhancements

- Introduces `table.maintenance.expire_snapshots()` as the unified entry
point for snapshot expiration and future maintenance operations.
- Retains the existing `ExpireSnapshots` implementation internally. The
`expire_snapshots()` method on `MaintenanceTable` now returns an
`ExpireSnapshots` object, preserving transaction semantics and
supporting context manager usage:
  ```python
  with table.maintenance.expire_snapshots() as expire_snapshots:
      expire_snapshots.by_id(1)
      expire_snapshots.by_id(2)
  ```
- Focuses this PR on refactoring and documentation improvements, while
maintaining compatibility with the prior `ExpireSnapshots` interface.
- Sets a foundation for future expansion of the `MaintenanceTable`
abstraction to encapsulate additional maintenance operations.


# Bug Fixes & Cleanups

- **ManageSnapshots Cleanup
([apache#2151](apache#2151
- Removes an unrelated instance variable from the `ManageSnapshots`
class, aligning with the Java reference implementation.

# Testing & Documentation

- **Testing:**
  - Tested the new API interface including:
    - Expiration by ID 
    - Protection of branch/tag snapshots
- **Documentation:**
  - Added and updated documentation to describe:
    - API usage examples

Preview:
<img width="1686" height="1015" alt="Screenshot 2025-08-11 at 1 37
04 PM"
src="https://github.com/user-attachments/assets/f469f3fc-b4b1-4ec9-b1ca-b9185e22643e"
/>


# Are these changes tested?

Yes. All changes are tested.~, with this PR predicated on the final
changes from apache#1200.~ This work builds on the framework introduced by
@jayceslesar in apache#1200 for the `MaintenanceTable`.

# Are there any user-facing changes?

---

**Closes:**  
- Closes apache#2151
- Closes apache#2142

---------

Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
# Rationale for this change

I was looking into this, and took the liberty of changing the API to a
datetime rather than milliseconds to avoid anyone passing in seconds or
microseconds.


# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
@gabeiglio gabeiglio merged commit 8711533 into main Aug 13, 2025
11 checks passed
@gabeiglio gabeiglio mentioned this pull request Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.