Skip to content

Conversation

@ding-young
Copy link
Owner

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

brayanjuls and others added 30 commits May 21, 2025 17:28
…pache#16019)

* draft commit to rolledback changes on function naming and include prepare clause on the infer types tests

* include data types in plan when it is not included in the prepare statement

* fix: prepare statement error

* Update datafusion/sql/src/statement.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* remove infer types from prepare statement

the infer data type changes in statement will be introduced in a new PR

* fix to show correct output message

* include data types on logical plans of prepare statements without explicit type declaration

* fix using clippy sugestions

* explicitly get the data types using the placeholder id to avoid sorting

* Restore the original tests too

* update set data type routine to be more rust idiomatic

Co-authored-by: Tommy shu <qstommyshu@gmail.com>

* update set datatype routine

* fix formatting in sql_integration

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Tommy shu <qstommyshu@gmail.com>
…ache#16119)

* minor fixes to arch docs


Co-authored-by: Oleks V <comphead@users.noreply.github.com>

---------

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
add snapshot tests for memory exhaustion
…s & nested Column expressions in maybe_fix_physical_column_name (apache#16064)

* Fix union schema name coercion

* Address renaming for columns that are not in the top level as well

* Add unit test

* Format

* Use insta tests properly

* Address review - comment + minor simplification change

---------

Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
…6071)

* initial Iteration

* add Sql Logic tests

* tweak comments

* unify data, structure tests

* Deleted by mistake
* Move prepare/parameter handling tests into `params.rs`

* Resolve conflicts
…pache#16029)

* Support filtering specific sqllogictests identified by line number

* Add license header

* Try parsing in different dialects

* Add test filtering example to README.md

* Improve Filter doc comment

* Factor out statement_is_skippable into its own function

* Add example about how filters work in the doc comments
…hausted errors (apache#16152)

* Enrich GroupedHashAggregateStream name to ease debugging Resources exhausted errors

* Use human_display

* clippy
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.16.0 to 1.17.0.
- [Release notes](https://github.com/uuid-rs/uuid/releases)
- [Commits](uuid-rs/uuid@v1.16.0...v1.17.0)

---
updated-dependencies:
- dependency-name: uuid
  dependency-version: 1.17.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Both WHERE clause and HAVING clause translate to a Filter plan node.
They differ in how the references and aggregates are handled.
HAVING goes after aggregation and may reference aggregate expressions
and therefore HAVING's filter will be placed after Aggregation plan
node.

Once a plan has been built, however, there is no special additional
semantics to filters created from HAVING. Remove the unnecessary field.

For reference, the field was added along with usage in
a50aeef commit and the usage was later
removed in eb62e28 commit.
)

* Clarify docs and names in parquet predicate pushdown tests

* Update datafusion/datasource/src/file_scan_config.rs

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>

* clippy

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
…16175)

* Fix name() for FilterPushdown physical optimizer rule

Typo that wasn't caught during review...

* fix
fix according to review

fix to_string error

fix test by stripping backtrace
…che#16138)

Added `tables: HashMap<String, Arc<dyn TableSource>>` and `MyContextProvider::with_schema` method for dynamically defining tables for optimizer integration tests.
* Speedup tpch run with memtable

* Clippy

* Clippy
* Specialize unique join

* handle splitting

* rename a bit

* fix

* fix

* fix

* fix

* Fix the test, add explanation

* Simplify

* Update datafusion/physical-plan/src/joins/join_hash_map.rs

Co-authored-by: Christian <9384305+ctsk@users.noreply.github.com>

* Update datafusion/physical-plan/src/joins/join_hash_map.rs

Co-authored-by: Christian <9384305+ctsk@users.noreply.github.com>

* Simplify

* Simplify

* Simplify

---------

Co-authored-by: Christian <9384305+ctsk@users.noreply.github.com>
…e#16079)

* added test

* added parameterTest

* cargo fmt

* Update sql_integration.rs

* allow needless_lifetimes

* remove needless lifetime

* update some tests

* move to params.rs
* feat: array_length for fixed size list

* remove list view
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.45.0 to 1.45.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](tokio-rs/tokio@tokio-1.45.0...tokio-1.45.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.45.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#16127)

* Add failing test to demonstrate problem

* Improve `unproject_sort_expr` to handle arbitrary expressions (apache#83)

* Remove redundant return
Bumps [rustyline](https://github.com/kkawakam/rustyline) from 15.0.0 to 16.0.0.
- [Release notes](https://github.com/kkawakam/rustyline/releases)
- [Changelog](https://github.com/kkawakam/rustyline/blob/master/History.md)
- [Commits](kkawakam/rustyline@v15.0.0...v16.0.0)

---
updated-dependencies:
- dependency-name: rustyline
  dependency-version: 16.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add macro for creating DataFrame (apache#16090)


---------

Co-authored-by: Sergey Zhukov <szhukov@aligntech.com>
* migrate `logical_plan` tests to insta

* fix assert error

* fix according to review

* strip backtrace from internal error

* format

* format

* fix `format("outer_query")`

* fix `Internal` error
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.38 to 4.5.39.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](clap-rs/clap@clap_complete-v4.5.38...clap_complete-v4.5.39)

---
updated-dependencies:
- dependency-name: clap
  dependency-version: 4.5.39
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
crepererum and others added 27 commits July 1, 2025 12:38
* Add support for Arrow Time types in Substrait

This commit adds support for Arrow Time types Time32 and Time64 in
Substrait plans.

Resolves apache#16296
Resolves apache#16275

* Clean up test
…16610)

* fix: support scalar function nested in get_field

* update

* update test

* fix bug

* update
Bumps [substrait](https://github.com/substrait-io/substrait-rs) from 0.57.0 to 0.58.0.
- [Release notes](https://github.com/substrait-io/substrait-rs/releases)
- [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md)
- [Commits](substrait-io/substrait-rs@v0.57.0...v0.58.0)

---
updated-dependencies:
- dependency-name: substrait
  dependency-version: 0.58.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Support explain tree format debug for benchmark debug

* fmt

* format

* Address comments

* doc fix
* Add microbenchmark for spilling with compression

* add wide batch

* make num_rows configurable

* calculate write/read throughput
…n scan (apache#16646)

* respect parquet filter pushdown config in scan

* Add test
Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/smithy-lang/smithy-rs/releases)
- [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/smithy-lang/smithy-rs/commits)

---
updated-dependencies:
- dependency-name: aws-config
  dependency-version: 1.8.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat: replace snapshot tests for enforce_sorting

* feat: modify assert_optimized macro to test one snapshot with a combined physical plan

* feat: update assert_optimized to support snapshot testing

* Revert "feat: replace snapshot tests for enforce_sorting"

This reverts commit 8c921fa.

* feat: migrate core test to insta

* fix format

* fix format

* fix typo

* refactor: rename function

* fix: remove trimming

* refactor: replace get_plan_string with displayable in projection_pushdown

---------

Co-authored-by: Cheng-Yuan-Lai <a186235@g,ail.com>
Co-authored-by: Ian Lai <Ian.Lai@senao.com>
Run `cargo test --test sqllogictests -- --complete` and commit the
results.
* Add PhysicalExpr optimizer and cast unwrapping

* address pr feedback

* Update datafusion/pruning/src/pruning_predicate.rs

* more lit(Xi64)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.45.1 to 1.46.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](tokio-rs/tokio@tokio-1.45.1...tokio-1.46.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.46.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…pt limit pushdown (apache#16641)

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
…16615)

* Convert Option<Vec<sort expression>> to Vec<sort expression>

* clippy

* fix comment

* fix doc

* change back to Expr

* remove redundant check
)

* Improve error message when ScalarValue fails to cast array

The `as_*_array` functions and the `downcast_value!` have the benefit of
reporting the array type when there is a mismatch. This makes the error
message more actionable.

* test
* Add an example of embedding indexes inside a parquet file

* Add page image

* Add prune file example

* Fix clippy

* polish code

* Fmt

* address comments

* Add debug

* Add new example, but it will fail with page index

* add debug

* add debug

* polish

* debug

* Using low level API to support

* polish

* fix

* merge

* fix

* complte solution

* polish comments

* adjust image

* add comments part 1

* pin to new arrow-rs

* pin to new arrow-rs

* add comments part 2

* merge upstream

* merge upstream

* polish code

* Rename example and add it to the list

* Work on comments

* More documentation

* Documentation obession, encapsulate example

* Update datafusion-examples/examples/parquet_embedded_index.rs

Co-authored-by: Sherin Jacob <jacob@protoship.io>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Sherin Jacob <jacob@protoship.io>
* Implementation for regex_instr

* linting and typo addressed in bench

* prettier formatting

* scalar_functions_formatting

* linting format macros

* formatting

* address comments to PR

* formatting

* clippy

* fmt

* address docs typo

* remove unnecessary struct and comment

* delete redundant lines
add tests for subexp
correct function signature for benches

* refactor get_index

* comments addressed

* update doc

* clippy upgrade

---------

Co-authored-by: Nirnay Roy <nirnayroy1012@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
…nts (apache#16672)

- Refactored the `DataFusionError` enum to use `Box<T>` for:
  - `ArrowError`
  - `ParquetError`
  - `AvroError`
  - `object_store::Error`
  - `ParserError`
  - `SchemaError`
  - `JoinError`
- Updated all relevant match arms and constructors to handle boxed errors.
- Refactored error-related macros (`arrow_datafusion_err!`, `sql_datafusion_err!`, etc.) to use `Box<T>`.
- Adjusted test cases and error assertions for boxed variants.
- Documentation update to the upgrade guide to explain the required changes and rationale.
…on and Mapping (apache#16583)

- Introduced a new `schema_adapter_factory` field in `ListingTableConfig` and `ListingTable`
- Added `with_schema_adapter_factory` and `schema_adapter_factory()` methods to both structs
- Modified execution planning logic to apply schema adapters during scanning
- Updated statistics collection to use mapped schemas
- Implemented detailed documentation and example usage in doc comments
- Added new unit and integration tests validating schema adapter behavior and error cases
* Reuse Rows in RowCursorStream

* WIP

* Fmt

* Add comment, make it backwards compatible

* Add comment, make it backwards compatible

* Add comment, make it backwards compatible

* Clippy

* Clippy

* Return error on non-unique reference

* Comment

* Update datafusion/physical-plan/src/sorts/stream.rs

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* Fix

* Extract logic

* Doc fix

---------

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
apache#16630)

* Perf: fast CursorValues compare for StringViewArray using inline_key_fast

* fix

* polish

* polish

* add test

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>
One step towards apache#16652.

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Daniël Heres <danielheres@gmail.com>
@github-actions
Copy link

github-actions bot commented Sep 4, 2025

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 4, 2025
@github-actions github-actions bot closed this Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.