test 2 #2

ding-young · 2025-07-05T12:59:31Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

* doc: fix indent format explain * update

…pache#16058) * Add test generated from schema in Comet. * Checkpoint DFS. * Checkpoint with working transformation. * fmt, clippy fixes. * Remove maximum stack depth. * More testing. * Improve tests. * Improve docs. * Use a smaller HashSet instead of HashMap with every field in it. More docs. * Use a smaller HashSet instead of HashMap with every field in it. More docs. * More docs. * More docs. * Fix typo. * Refactor match with nested if lets to make it more readable. * Address some PR feedback. * Rename variables in struct processing to address PR feedback. Do List next. * Rename variables in list processing to address PR feedback. * Update docs. * Simplify list parquet path generation. * Map support. * Remove old TODO. * Reduce redundant docs be referring to docs above. * Reduce redundant docs be referring to docs above. * Add parquet file generated from CometFuzzTestSuite ParquetGenerator (similar to schema in file_format tests) to exercise end-to-end support. * Fix clippy.

…pache#16100) * Update documentation for `datafusion.execution.collect_statistics` setting * Update test * Update datafusion/common/src/config.rs Co-authored-by: Leonardo Yvens <leoyvens@gmail.com> * update docs * Update doc --------- Co-authored-by: Leonardo Yvens <leoyvens@gmail.com>

* handle coercion for Float16 types * Add some basic slt tests --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore(deps): bump testcontainers from 0.23.3 to 0.24.0 Bumps [testcontainers](https://github.com/testcontainers/testcontainers-rs) from 0.23.3 to 0.24.0. - [Release notes](https://github.com/testcontainers/testcontainers-rs/releases) - [Changelog](https://github.com/testcontainers/testcontainers-rs/blob/main/CHANGELOG.md) - [Commits](testcontainers/testcontainers-rs@0.23.3...0.24.0) --- updated-dependencies: - dependency-name: testcontainers dependency-version: 0.24.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Update test_containers_modules too --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

…ree (apache#16097) * feat: make error handling in indent consistent with that in tree * update test * return all plans instead of throwing err * update test

* Support GroupsAccumulator for avg duration * update test --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Move PruningStatistics into datafusion::common * fix doc * remove new code * fmt

* wip * comment * Update datafusion/core/src/datasource/physical_plan/parquet.rs * remove prints * better test * fmt

…fig (apache#16080) * fix * add a test * fmt * add to upgrade guide * fix tests * fix test * fix test * fix ci * Fix example in upgrade guide (apache#29) --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* feat: escape quote wrap identifiers in describe rm: dev files fmt: final formatting sed: s/<comment>// * fix: use ident instead of col + format

* Update documentation about DDL and DML * Improve the DML Documentation * Apply suggestions from code review Co-authored-by: Oleks V <comphead@users.noreply.github.com> * Fix docs * Fix docs --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* Optimize performance of string::ascii function d * Add benchmark with with NULL_DENSITY=0 d --------- Co-authored-by: Tai Le Manh <tailm2@vingroup.net> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore: Use pre created data for filter pushdown tests * chore: Use pre created data for filter pushdown tests

* chore: Upgrade `rand` crate and some other minor crates --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

…pache#16019) * draft commit to rolledback changes on function naming and include prepare clause on the infer types tests * include data types in plan when it is not included in the prepare statement * fix: prepare statement error * Update datafusion/sql/src/statement.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * remove infer types from prepare statement the infer data type changes in statement will be introduced in a new PR * fix to show correct output message * include data types on logical plans of prepare statements without explicit type declaration * fix using clippy sugestions * explicitly get the data types using the placeholder id to avoid sorting * Restore the original tests too * update set data type routine to be more rust idiomatic Co-authored-by: Tommy shu <qstommyshu@gmail.com> * update set datatype routine * fix formatting in sql_integration --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Tommy shu <qstommyshu@gmail.com>

…ache#16119) * minor fixes to arch docs Co-authored-by: Oleks V <comphead@users.noreply.github.com> --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>

add snapshot tests for memory exhaustion

…s & nested Column expressions in maybe_fix_physical_column_name (apache#16064) * Fix union schema name coercion * Address renaming for columns that are not in the top level as well * Add unit test * Format * Use insta tests properly * Address review - comment + minor simplification change --------- Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>

…6071) * initial Iteration * add Sql Logic tests * tweak comments * unify data, structure tests * Deleted by mistake

* Move prepare/parameter handling tests into `params.rs` * Resolve conflicts

…pache#16029) * Support filtering specific sqllogictests identified by line number * Add license header * Try parsing in different dialects * Add test filtering example to README.md * Improve Filter doc comment * Factor out statement_is_skippable into its own function * Add example about how filters work in the doc comments

) * feat: add `array_min` scalar function and associated tests * update docs * nit * refactor: merge `array_min` and `array_max` into `min_max` module for better code reuse

…e#16488) * feat: Finalize support for `RightMark` join * Update utils.rs * add `join_selection` tests * fmt * Update join_selection.rs --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>

Bumps [indexmap](https://github.com/indexmap-rs/indexmap) from 2.9.0 to 2.10.0. - [Changelog](https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md) - [Commits](indexmap-rs/indexmap@2.9.0...2.10.0) --- updated-dependencies: - dependency-name: indexmap dependency-version: 2.10.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…re are no dynamic filters (apache#16424)

* Column indices were not computed correctly, causing a panic * Add unit tests

…` table functions (apache#16552) * init * trait based * also support date * explain

* Fix Column mgmt when parsing USING joins. In SqlToRel::parse_join(), when handling JoinContraint::Using, the identifiers are normalized using IdentNormalizer::normalize(). That normalization lower-cases unquoted identifiers, and keeps the case otherwise (but not the quotes). Until this commit, the normalized column names were passed to LogicalPlanBuilder::join_using() as strings. When each goes through LogicalPlanBuilder::normalize(), Column::From<String>() is called, leading to Column::from_qualified_named(). As it gets an unqualified column, it lower-cases it. This means that if a join is USING("SOME_COLUMN_NAME"), we end up with a Column { name: "some_column_name", ..}. In the end, the join fails, as that lower-case column does not exist. With this commit, SqlToRel::parse_join() calls Column::from_name() on each normalized column and passed those to LogicalPlanBuilder::join_using(). Downstream, in LogicalPlanBuilder::normalize(), there is no need to create the Column objects from strings, and the bug does not happen. This fixes apache#16120. * Remove genericity from LogicalPlanBuilder::join_using(). Until this commit, LogicalPlanBuilder::join_using() accepted using_keys: Vec<impl Into<Column> + Clone>. This commit removes this, only allowing Vec<Column>. Motivation: passing e.g. Vec<String> for using_keys is bug-prone, as the Strings can get (their case) modified when made into Column. That logic is admissible with a common column name that can be qualified, but some column names cannot (e.g. USING keys). This commit changes the API. However, potential users can trivially fix their code by calling Column::from/from_qualified_name on their using_keys. This forces them to things about what their identifier represent and that removes a class of potential bugs. Additional bonus: shorter compilation time & binary size. --------- Co-authored-by: Bruno Cauet <bruno.cauet@qube-rt.com>

* fix: reject within_group for non ordered aggregate function * update error * support within

apache#16488)" (apache#16597) This reverts commit d73f0e8.

* Initial commit to form PR for datafusion encryption support * Add tests for encryption configuration * Apply cargo fmt * Add a roundtrip encryption test to the parquet tests. * cargo fmt * Update test to add decryption parameter to called functions. * Try to get DataFrame.write_parquet to work with encryption. Doesn't quite, column encryption is broken. * Update datafusion/datasource-parquet/src/opener.rs Co-authored-by: Adam Reeve <adreeve@gmail.com> * Update datafusion/datasource-parquet/src/source.rs Co-authored-by: Adam Reeve <adreeve@gmail.com> * Fix write test in parquet.rs * Simplify encryption test. Remove unused imports. * Run cargo fmt. * Further streamline roundtrip test. * Change From methods for FileEncryptionProperties and FileDecryptionProperties to use references. * Change encryption config to directly hold column keys using custom config fields. * Fix generated field names in visit for encryptor and decryptor to use "." instead of "::" * 1. Disable parallel writes with enccryption. 2. Fixed unused header warning in config.rs. 3. Fix test case in encryption.rs to call conversion to ConfigFileDecryption properties correctly. * cargo fmt * Update datafusion/common/src/file_options/parquet_writer.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix variables shown in information schema test. * Backout bad suggestion from copilot * Remove unused serde reference Add an example to read and write encrypted parquet files. * cargo fmt * change file_format.rs to use global encryption options in struct. * Turn off page_index for encrypted example. Get encrypted example working with filter. * Tidy up example output. * Add missing license. Run taplo format * Update configs.md by running dev/update_config_docs.sh * Cargo fmt + clippy changes. * Add filter test for encrypted files. * Cargo clippy changes. * Fix link in README.md * Add issue tag for parallel writes. * Move file encryption and decryption properties out of global options * Use config_namespace_with_hashmap for column encryption/decryption props * Remove outdated docs on crypto settings. Signed-off-by: Corwin Joy <corwin.joy@gmail.com> * 1. Add docs for using encryption configuration. 2. Add example SQL for using encryption from CLI. 3. Fix removed variables in test for configuration information. 4. Clippy and cargo fmt. Signed-off-by: Corwin Joy <corwin.joy@gmail.com> * Update code to add missing ParquetOpener parameter due to merge from main Signed-off-by: Corwin Joy <corwin.joy@gmail.com> * Add CLI documentation for Parquet options and provide an encryption example Signed-off-by: Corwin Joy <corwin.joy@gmail.com> * Use ConfigFileDecryptionProperties in ParquetReadOptions Signed-off-by: Adam Reeve <adam.reeve@gr-oss.io> * Implement default for ConfigFileEncryptionProperties Signed-off-by: Corwin Joy <corwin.joy@gmail.com> * Add sqllogictest for parquet with encryption Signed-off-by: Corwin Joy <corwin.joy@gmail.com> * Apply prettier changes from CI Signed-off-by: Corwin Joy <corwin.joy@gmail.com> * logical conflift * fix another logical conflict --------- Signed-off-by: Corwin Joy <corwin.joy@gmail.com> Signed-off-by: Adam Reeve <adam.reeve@gr-oss.io> Co-authored-by: Adam Reeve <adreeve@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Adam Reeve <adam.reeve@gr-oss.io> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Allow usage of table funstions in relations * Rebase

* Update to arrow/parquet 55.2.0 Update to released version * Update plans

github-actions · 2025-09-04T02:41:00Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

chenkovsky and others added 30 commits May 19, 2025 17:05

doc: fix indent format explain (apache#16085)

4417d5c

* doc: fix indent format explain * update

fix: Add coercion rules for Float16 types (apache#15816)

3fa111e

* handle coercion for Float16 types * Add some basic slt tests --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Use qualified names on DELETE selections (apache#16033)

8c2264c

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

feat: make error handling in indent explain consistent with that in t…

963b649

…ree (apache#16097) * feat: make error handling in indent consistent with that in tree * update test * return all plans instead of throwing err * update test

Clean up ExternalSorter and use upstream converter (apache#16109)

9ec679b

Support GroupsAccumulator for Avg duration (apache#15748)

52f340b

* Support GroupsAccumulator for avg duration * update test --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Test Duration in fuzz tests (apache#16111)

febc77e

Move PruningStatistics into datafusion::common (apache#16069)

46d3f52

* Move PruningStatistics into datafusion::common * fix doc * remove new code * fmt

Revert use file schema in parquet pruning (apache#16086)

ca55f1c

* wip * comment * Update datafusion/core/src/datasource/physical_plan/parquet.rs * remove prints * better test * fmt

fix: describe escaped quoted identifiers (apache#16082)

0589dbb

* feat: escape quote wrap identifiers in describe rm: dev files fmt: final formatting sed: s/<comment>// * fix: use ident instead of col + format

Minor: Add ScalarFunctionArgs::return_type method (apache#16113)

4597d3b

feat: coerce from fixed size binary to binary view (apache#16110)

ce3e387

Fix contains function expression (apache#16046)

40fca47

Optimize performance of string::ascii function (apache#16087)

37c266a

* Optimize performance of string::ascii function d * Add benchmark with with NULL_DENSITY=0 d --------- Co-authored-by: Tai Le Manh <tailm2@vingroup.net> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

chore: Use materialized data for filter pushdown tests (apache#16123)

0b6678b

* chore: Use pre created data for filter pushdown tests * chore: Use pre created data for filter pushdown tests

chore: Upgrade rand crate and some other minor crates (apache#16062)

5669500

* chore: Upgrade `rand` crate and some other minor crates --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

docs: Fix typos and minor grammatical issues in Architecture docs (ap…

39063f6

…ache#16119) * minor fixes to arch docs Co-authored-by: Oleks V <comphead@users.noreply.github.com> --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>

add top-memory-consumers option in cli (apache#16081)

cb45f1f

add snapshot tests for memory exhaustion

fix ci extended test (apache#16144)

67a2173

adding support for Min/Max over LargeList and FixedSizeList (apache#1…

5293b70

…6071) * initial Iteration * add Sql Logic tests * tweak comments * unify data, structure tests * Deleted by mistake

Move prepare/parameter handling tests into params.rs (apache#16141)

dc8161e

* Move prepare/parameter handling tests into `params.rs` * Resolve conflicts

Add StateFieldsArgs::return_field (apache#16112)

ce835da

dharanad and others added 27 commits June 27, 2025 11:31

feat: add array_min scalar function and associated tests (apache#16574

3839736

) * feat: add `array_min` scalar function and associated tests * update docs * nit * refactor: merge `array_min` and `array_max` into `min_max` module for better code reuse

refactor: move PruningPredicate into its own module (apache#16587)

586a88c

feat: Finalize support for RightMark join + Mark join swap (apach…

d73f0e8

…e#16488) * feat: Finalize support for `RightMark` join * Update utils.rs * add `join_selection` tests * fmt * Update join_selection.rs --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>

Skip re-pruning based on partition values and file level stats if the…

b4ba1c6

…re are no dynamic filters (apache#16424)

fix: column indices in FFI partition evaluator (apache#16480)

cce3f3f

* Column indices were not computed correctly, causing a panic * Add unit tests

Support timestamp and date arguments for range and `generate_series…

2999e41

…` table functions (apache#16552) * init * trait based * also support date * explain

fix: support within_group (apache#16538)

1de4d0e

* fix: reject within_group for non ordered aggregate function * update error * support within

Revert "feat: Finalize support for RightMark join + Mark join swap (

8d772e5

apache#16488)" (apache#16597) This reverts commit d73f0e8.

move min_batch/max_batch to functions-aggregate-common (apache#16593)

9f3cc7b

fix: disallow specify both order_by and within_group (apache#16606)

1c0dcad

fix: format within_group error message (apache#16613)

ebf49b4

Allow usage of table functions in relations (apache#16571)

649a36f

* Allow usage of table funstions in relations * Rebase

Update to arrow/parquet 55.2.0 (apache#16575)

f65da24

* Update to arrow/parquet 55.2.0 Update to released version * Update plans

docs: Minor grammatical fixes for the scalar UDF docs (apache#16618)

f4d1990

feat: add multi level merge sort that will always fit in memory

4814deb

test: add fuzz test for aggregate

ac7b7a1

update

e8358e1

add more tests

6d082ee

fix test

12fb83c

update tests

5b05822

added more aggregate fuzz

c02d8e6

align with add fuzz tests

ab8536c

add sort fuzz

c9b5a79

fix lints and formatting

1d28679

github-actions bot added the Stale label Sep 4, 2025

github-actions bot closed this Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test 2 #2

test 2 #2

Uh oh!

ding-young commented Jul 5, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

test 2 #2

test 2 #2

Uh oh!

Conversation

ding-young commented Jul 5, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants