Skip to content

Releases: apache/datafusion-comet

0.12.0

01 Dec 16:23
6086438

Choose a tag to compare

0.12.0 Pre-release
Pre-release

DataFusion Comet 0.12.0 Changelog

This release consists of 105 commits from 13 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: Fix None.get in stringDecode when bin child cannot be converted #2606 (cfmcgrady)
  • fix: Update FuzzDataGenerator to produce dictionary-encoded string arrays & fix bugs that this exposes #2635 (andygrove)
  • fix: Fallback to Spark for lpad/rpad for unsupported arguments & fix negative length handling #2630 (andygrove)
  • fix: Mark SortOrder with floating-point as incompatible #2650 (andygrove)
  • fix: Fall back to Spark for trunc / date_trunc functions when format string is unsupported, or is not a literal value #2634 (andygrove)
  • fix: [native_datafusion] only pass single partition of PartitionedFiles into DataSourceExec #2675 (mbutrovich)
  • fix: Fix subcommands options in fuzz-testing #2684 (manuzhang)
  • fix: Do not replace SMJ with HJ for LeftSemi #2687 (comphead)
  • fix: Apply spotless on Iceberg 1.8.1 diff [iceberg] #2700 (hsiang-c)
  • fix: Fix generate-user-guide-reference-docs failure when mvn command is not executed at root #2691 (manuzhang)
  • fix: Fix missing SortOrder fallback reason in range partitioning #2716 (andygrove)
  • fix: CometLiteral class cast exception with arrays #2718 (andygrove)
  • fix: NormalizeNaNAndZero::children() returns child's child #2732 (mbutrovich)
  • fix: checkSparkMaybeThrows should compare Spark and Comet results in success case #2728 (andygrove)
  • fix: Mark WindowsExec as incompatible #2748 (andygrove)
  • fix: Add strict floating point mode and fallback to Spark for min/max/sort on floating point inputs when enabled #2747 (andygrove)
  • fix: Implement producedAttributes for CometWindowExec #2789 (rahulbabarwal89)
  • fix: Pass all Comet configs to native plan #2801 (andygrove)

Implemented enhancements:

  • feat: Add option to write benchmark results to file #2640 (andygrove)
  • feat: Implement metrics for iceberg compat #2615 (EmilyMatt)
  • feat: Define function signatures in CometFuzz #2614 (andygrove)
  • feat: cherry-pick UUID conversion logic from #2528 #2648 (mbutrovich)
  • feat: support concat for strings #2604 (comphead)
  • feat: Add support for abs #2689 (andygrove)
  • feat: Support variadic function in CometFuzz #2682 (manuzhang)
  • feat: CometExecRule refactor: Unify CometNativeExec creation with Serde in CometOperatorSerde trait #2768 (andygrove)
  • feat: support cot #2755 (psvri)
  • feat: Add bash script to build and run fuzz testing #2686 (manuzhang)
  • feat: Add getSupportLevel to CometAggregateExpressionSerde trait #2777 (andygrove)
  • feat: Add CI check to ensure generated docs are in sync with code #2779 (andygrove)
  • feat: Add prettier enforcement #2783 (andygrove)
  • feat: hyperbolic trig functions #2784 (psvri)
  • feat: [iceberg] Native scan by serializing FileScanTasks to iceberg-rust #2528 (mbutrovich)

Documentation updates:

  • docs: Add changelog for 0.11.0 release #2585 (mbutrovich)
  • docs: Improve documentation layout #2587 (andygrove)
  • docs: Publish 0.11.0 user guide #2589 (andygrove)
  • docs: Put Comet logo in top nav bar, respect light/dark mode #2591 (andygrove)
  • docs: Improve main landing page #2593 (andygrove)
  • docs: Improve site navigation #2597 (andygrove)
  • docs: Update benchmark results #2596 (andygrove)
  • docs: Upgrade pydata-sphinx-theme to 0.16.1 #2602 (andygrove)
  • docs: Fix redirect #2603 (andygrove)
  • docs: Fix broken image link #2613 (andygrove)
  • docs: Add FFI docs to contributor guide #2668 (andygrove)
  • docs: Various documentation updates #2674 (andygrove)
  • docs: Add supported SortOrder expressions and fix a typo #2694 (andygrove)
  • docs: Minor docs update for running Spark SQL tests #2712 (andygrove)
  • docs: Update contributor guide for adding a new expression #2704 (andygrove)
  • docs: Documentation updates for LocalTableScan and WindowExec #2742 (andygrove)
  • docs: Typo fix #2752 (wForget)
  • docs: Categorize some configs as testing and add notes about known time zone issues #2740 (andygrove)
  • docs: Run prettier on all markdown files #2782 (andygrove)
  • docs: Ignore prettier formatting for generated tables #2790 (andygrove)
  • docs: Add new section to contributor guide, explaining how to add a new operator #2758 (andygrove)

Other:

  • chore: Start 0.12.0 development #2584 (mbutrovich)
  • chore: Bump Spark from 3.5.6 to 3.5.7 #2574 (cfmcgrady)
  • chore(deps): bump parquet from 56.0.0 to 56.2.0 in /native #2608 (dependabot[bot])
  • chore(deps): bump tikv-jemallocator from 0.6.0 to 0.6.1 in /native #2609 (dependabot[bot])
  • chore(deps): bump tikv-jemalloc-ctl from 0.6.0 to 0.6.1 in /native #2610 (dependabot[bot])
  • tests: FuzzDataGenerator instead of Parquet-specific generator #2616 (mbutrovich)
  • chore: Simplify on-heap memory configuration #2599 (andygrove)
  • Feat: Add sha1 function impl #2471 (kazantsev-maksim)
  • chore: Refactor Parquet/DataFrame fuzz data generators #2629 (andygrove)
  • chore: Remove needless from_raw calls #2638 (EmilyMatt)
  • chore: support DataFusion 50.3.0 #2605 (comphead)
  • chore(deps): bump actions/upload-artifact from 4 to 5 #2654 (dependabot[bot])
  • chore(deps): bump cc from 1.2.42 to 1.2.43 in /native #2653 (dependabot[bot])
  • chore(deps): bump actions/download-artifact from 5 to 6 #2652 (dependabot[bot])
  • chore: extract c...
Read more

0.11.0

19 Oct 18:00

Choose a tag to compare

0.11.0 Pre-release
Pre-release

DataFusion Comet 0.11.0 Changelog

This release consists of 131 commits from 15 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: temporarily ignore test for hdfs file systems #2359 (parthchandra)
  • fix: Check reused broadcast plan in non-AQE and make setNumPartitions thread safe #2398 (wForget)
  • fix: correct missingInput for CometHashAggregateExec #2409 (comphead)
  • fix:clippy errros rust 1.9.0 update #2419 (coderfender)
  • fix: Avoid spark plan execution cache preventing CometBatchRDD numPartitions change #2420 (wForget)
  • fix: regressions in CometToPrettyStringSuite #2384 (hsiang-c)
  • fix: Byte array Literals failed on cast #2432 (comphead)
  • fix: Do not push down subquery filters on native_datafusion scan #2438 (wForget)
  • fix: Improve error handling when resolving S3 bucket region #2440 (andygrove)
  • fix: [iceberg] additional parquet independent api for iceberg integration #2442 (parthchandra)
  • fix: Specify reqwest crate features #2446 (andygrove)
  • fix: distributed RangePartitioning bounds calculation with native shuffle #2258 (mbutrovich)
  • fix: fix regression in tpcbench.py #2512 (andygrove)
  • fix: [iceberg] Close reader instance in ReadConf #2510 (hsiang-c)
  • fix: Enable plan stability tests for auto scan #2516 (andygrove)
  • fix: Capture unexpected output when retrieving JVM 17 args in Makefile #2566 (zuston)

Performance related:

  • perf: New Configuration from shared conf to avoid high costs #2402 (wForget)
  • perf: Use DataFusion's count_udaf instead of SUM(IF(expr IS NOT NULL, 1, 0)) #2407 (andygrove)
  • perf: Improve BroadcastExchangeExec conversion #2417 (wForget)

Implemented enhancements:

  • feat: Add dynamic enabled and allowIncompat configs for all supported expressions #2329 (andygrove)
  • feat: feature specific tests #2372 (parthchandra)
  • feat: Support more date part expressions #2316 (wForget)
  • feat: rpad support column for second arg instead of just literal #2099 (coderfender)
  • feat: Support comet native log level conf #2379 (wForget)
  • feat: Enable WeekDay function #2411 (wForget)
  • feat: Add nested Array literal support #2181 (comphead)
  • feat:add_additional_char_support_rpad #2436 (coderfender)
  • feat: do not fallback to Spark for COUNT(distinct) #2429 (comphead)
  • feat: implement_ansi_eval_mode_arithmetic #2136 (coderfender)
  • feat: Add plan conversion statistics to extended explain info #2412 (andygrove)
  • feat: implement_comet_native_lpad_expr #2102 (coderfender)
  • feat: Add backtrace feature to simplify enabling native backtraces in CometNativeException #2515 (andygrove)
  • feat: Support reverse function with ArrayType input #2481 (cfmcgrady)
  • feat: Change default off-heap memory pool from greedy_unified to fair_unified #2526 (andygrove)
  • feat: Make DiskManager max_temp_directory_size configurable #2479 (manuzhang)
  • feat: Parquet Modular Encryption with Spark KMS for native readers #2447 (mbutrovich)
  • feat: Add support for Spark-compatible cast from integral to decimal #2472 (coderfender)
  • feat:Support ANSI mode integral divide #2421 (coderfender)
  • feat: Add config to enable running Comet in onheap mode #2554 (andygrove)
  • feat:support ansi mode rounding function #2542 (coderfender)
  • feat:support ansi mode remainder function #2556 (coderfender)
  • feat: Implement array-to-string cast support #2425 (cfmcgrady)
  • feat: Various improvements to memory pool configuration, logging, and documentation #2538 (andygrove)
  • feat: Enable complex types for columnar shuffle #2573 (mbutrovich)
  • feat: support_decimal_types_bool_cast_native_impl #2490 (coderfender)
  • feat: Use buf write to reduce system call on index write #2579 (zuston)

Documentation updates:

  • doc: Document usage IcebergCometBatchReader.java #2347 (comphead)
  • docs: Add changelog for 0.10.0 release #2361 (andygrove)
  • docs: Fix error in docs #2373 (andygrove)
  • docs: Fix more comet versions in docs #2374 (andygrove)
  • docs: Publish 0.10.0 user guide #2394 (andygrove)
  • doc: macos benches doc clarifications #2418 (comphead)
  • docs: update configs.md after #2422 #2428 (mbutrovich)
  • docs: update docs and tuning guide related to native shuffle #2487 (mbutrovich)
  • docs: Improve EC2 benchmarking guide #2474 (andygrove)
  • docs: docs_update_ansi_support #2496 (coderfender)
  • docs:support lpad expression documentation update #2517 (coderfender)
  • docs: doc changes to support ANSI mode integral divide #2570 (coderfender)
  • docs: Split configuration guide into different sections (scan, exec, shuffle, etc) #2568 (andygrove)
  • docs: doc update to support ANSI mode remainder function #2576 (coderfender)
  • docs: Documentation updates #2581 (andygrove)

Other:

  • chore(deps): bump uuid from 1.18.0 to 1.18.1 in /native #2336 (dependabot[bot])
  • build: Check that all Scala test suites run in PR builds #2304 (andygrove)
  • chore: Start 0.11.0 development #2365 (andygrove)
  • chore: Split expression serde hash map into separate categories #2322 (andygrove)
  • chore: exclude Iceberg diffs from rat checks #2376 (hsiang-c)
  • chore: Refactor UnaryMinus serde #2378 (andygrove)
  • chore: Revert "chore: [1941-Part1]: Introduce map_sort scalar function (#2#2381 (comphead)
  • chore: Refactor Literal serde [#2377](https://github.com/apache/datafusion-comet/pull/...
Read more

0.10.1

06 Oct 18:44

Choose a tag to compare

0.10.1 Pre-release
Pre-release

DataFusion Comet 0.10.1 Changelog

This release consists of 7 commits from 1 contributors. See credits at the end of this changelog for more information.

Documentation updates:

  • docs: [branch-0.10] Update version number in branch-0.10 user guide #2395 (andygrove)

Other:

  • chore: [branch-0.10] Support Spark 4.0.1 instead of 4.0.0 (#2414) #2497 (andygrove)
  • build: [branch-0.10] Stop caching libcomet in CI (#2498) #2502 (andygrove)
  • chore: [branch-0.10] perf: Improve BroadcastExchangeExec conversion #2501 (andygrove)
  • chore: [branch-0.10] [iceberg] additional parquet independent api for iceberg integration (#2442) #2499 (andygrove)
  • fix: [branch-0.10] Avoid spark plan execution cache preventing CometBatchRDD numPartitions change (#2420) #2503 (andygrove)
  • build: [branch-0.10] Bump version to 0.10.1 #2508 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

     7	Andy Grove

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

0.10.0

16 Sep 17:21
9cb0cc4

Choose a tag to compare

0.10.0 Pre-release
Pre-release

DataFusion Comet 0.10.0 Changelog

This release consists of 183 commits from 26 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: [Iceberg] Fix decimal corruption #1985 (andygrove)
  • fix: broken link in development.md #2024 (petern48)
  • fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec #2000 (huaxingao)
  • fix: hdfs read into buffer fully #2031 (parthchandra)
  • fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY #2018 (andygrove)
  • fix: clean up [iceberg] integration APIs #2032 (huaxingao)
  • fix: zero Arrow Array offset before sending across FFI #2052 (mbutrovich)
  • fix: [iceberg] more fixes for Iceberg integration APIs. #2078 (parthchandra)
  • fix: Add support for StringDecode in Spark 4.0.0 #2075 (peter-toth)
  • fix: Avoid double free in CometUnifiedShuffleMemoryAllocator #2122 (andygrove)
  • fix: Remove duplicate serde code #2098 (andygrove)
  • fix: Improve logic for determining when an UnpackOrDeepCopy is needed #2142 (andygrove)
  • fix: Add CopyExec to inputs to SortMergeJoinExec #2155 (andygrove)
  • fix: Fix repeatedly url-decode path when reading parquet from s3 using native parquet reader #2138 (Kontinuation)
  • fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987 (hsiang-c)
  • fix: [iceberg] Fall back to spark for schemas with empty structs #2204 (andygrove)
  • fix: Fix failing TPC-DS workflow in PR CI runs #2207 (andygrove)
  • fix: [iceberg] order query result deterministically #2208 (hsiang-c)
  • fix: use spark.comet.batchSize instead of conf.arrowMaxRecordsPerBatch for data that is coming from Java #2196 (rluvaton)
  • fix: if expr nullable #2217 (Asura7969)
  • fix: Support auto scan mode with Spark 4.0.0 #1975 (andygrove)
  • fix: Make Sha2 fallback message more user-friendly #2213 (rishvin)
  • fix: separate type checking for CometExchange and CometColumnarExchange #2241 (mbutrovich)
  • fix: Fix potential resource leak in native shuffle block reader #2247 (andygrove)
  • fix: Remove unreachable code in CometScanRule #2252 (andygrove)
  • fix: Fall back to native_comet for encrypted Parquet scans #2250 (andygrove)
  • fix: Fall back to native_comet when object store not supported by native_iceberg_compat #2251 (andygrove)
  • fix: split expr.proto file (new) #2267 (kination)
  • fix: handle cast to dictionary vector introduced by case when #2044 (parthchandra)
  • fix: Remove check for custom S3 endpoints #2288 (andygrove)
  • fix: implement lazy evaluation in Coalesce function #2270 (coderfender)
  • fix: Update benchmarking scripts #2293 (andygrove)
  • fix: Fix regression in NativeConfigSuite #2299 (andygrove)
  • fix: Validating object store configs should not throw exception #2308 (andygrove)
  • fix: TakeOrderedAndProjectExec is not reporting all fallback reasons #2323 (kazuyukitanimura)
  • fix: Fallback length function with binary input #2349 (wForget)

Performance related:

  • perf: Optimize AvgDecimalGroupsAccumulator #1893 (leung-ming)
  • perf: Optimize SumDecimalGroupsAccumulator::update_single #2069 (leung-ming)
  • perf: Avoid FFI copy in ScanExec when reading data from exchanges #2268 (andygrove)

Implemented enhancements:

  • feat: Add from_unixtime support #1943 (kazuyukitanimura)
  • feat: randn expression support #2010 (akupchinskiy)
  • feat: monotonically_increasing_id and spark_partition_id implementation #2037 (akupchinskiy)
  • feat: support map_entries #2059 (comphead)
  • feat: Support Array Literal #2057 (comphead)
  • feat: Add new trait for operator serde #2115 (andygrove)
  • feat: limit with offset support #2070 (akupchinskiy)
  • feat: Include scan implementation name in CometScan nodeName #2141 (andygrove)
  • feat: Add config option to log fallback reasons #2154 (andygrove)
  • feat: [iceberg] Enable Comet shuffle in Iceberg diff #2205 (andygrove)
  • feat: Improve shuffle fallback reporting #2194 (andygrove)
  • feat: Reset data buf of NativeBatchDecoderIterator on close #2235 (wForget)
  • feat: Improve fallback mechanism for ANSI mode #2211 (andygrove)
  • feat: Support hdfs with OpenDAL #2244 (wForget)
  • feat: Ignore fallback info for command execs #2297 (wForget)
  • feat: Improve some confusing fallback reasons #2301 (wForget)
  • feat: Make supported hadoop filesystem schemes configurable #2272 (wForget)
  • feat: [1941-Part1]: Introduce map-sort scalar function #2262 (rishvin)
  • feat: [iceberg] delete rows support using selection vectors #2346 (parthchandra)

Documentation updates:

  • docs: Update benchmark results for 0.9.0 #1959 (andygrove)
  • doc: Add comment about local clippy run before submitting a pull request #1961 (akupchinskiy)
  • docs: Minor improvements to Spark SQL test docs #1980 (andygrove)
  • docs: Update Maven links for 0.9.0 release #1988 (andygrove)
  • docs: Documentation updates for 0.9.0 release #1981 (andygrove)
  • docs: Add guide showing comparison between Comet and Gluten #2012 (andygrove)
  • docs: Remove legacy comment in docs #2022 (andygrove)
  • docs: Update Gluten comparision to clarify that Velox is open-source #2043 (andygrove)
  • docs: Improve Gluten comparison based on feedback from the community #2048 (andygrove)
  • docs: added a missing export into the plan stability section #2071 (akupchinskiy)
  • doc: Added documentation for supported map functions #2074 (codetyri0n)
  • doc: Alternative way to start Spark Master to run benchmarks #2072 (comphead)
  • docs: Update to support try arithmetic functions #2143 (coderfender)
  • doc: update macos standalone spark start instructions #2103 (comphead)
  • docs: Update confs to bypass Iceberg Spark issues #2166 (hsiang-c)
  • docs: Add Roadmap #2191 (andygrove)
  • docs: Update installation guide for 0.9.1 #2230 (andygrov...
Read more

0.9.1

25 Aug 16:49
a168c9a

Choose a tag to compare

0.9.1 Pre-release
Pre-release

DataFusion Comet 0.9.1 Changelog

This release consists of 2 commits from 1 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: [branch-0.9] Backport FFI fix #2164 (andygrove)
  • fix: [branch-0.9] Avoid double free in CometUnifiedShuffleMemoryAllocator #2201 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

     2	Andy Grove

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

0.9.0

04 Jul 17:01
1c462bc

Choose a tag to compare

0.9.0 Pre-release
Pre-release

DataFusion Comet 0.9.0 Changelog

This release consists of 139 commits from 24 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: typo for instr in fuzz testing #1686 (mbutrovich)
  • fix: Bucketed scan fallback for native_datafusion Parquet scan #1720 (mbutrovich)
  • fix: Skip row index Spark SQL tests for native_datafusion Parquet scan #1724 (mbutrovich)
  • fix: Check acquired memory when CometMemoryPool grows #1732 (wForget)
  • fix: Fix data race in memory profiling #1727 (andygrove)
  • fix: Enable some DPP Spark SQL tests #1734 (andygrove)
  • fix: support literal null list and map #1742 (kazuyukitanimura)
  • fix: get_struct field is incorrect when struct in array #1687 (comphead)
  • fix: cast map types correctly in schema adapter #1771 (parthchandra)
  • fix: correct schema type checking in native_iceberg_compat #1755 (parthchandra)
  • fix: default values for native_datafusion scan #1756 (mbutrovich)
  • fix: [native_scans] Support CASE_SENSITIVE when reading Parquet #1782 (andygrove)
  • fix: cargo install tpchgen-cli in benchmark doc #1797 (zhuqi-lucas)
  • fix: support map_keys #1788 (comphead)
  • fix: fall back on nested types for default values #1799 (mbutrovich)
  • fix: Re-enable Spark 4 tests on Linux #1806 (andygrove)
  • fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) #1785 (parthchandra)
  • fix: native_iceberg_compat: move checking parquet types above fetching batch #1809 (mbutrovich)
  • fix: translate missing or corrupt file exceptions, fall back if asked to ignore #1765 (mbutrovich)
  • fix: Fix Spark SQL AQE exchange reuse test failures #1811 (coderfender)
  • fix: Enable more Spark SQL tests #1834 (andygrove)
  • fix: support map_values #1835 (comphead)
  • fix: Handle case where num_cols == 0 in native execution #1840 (andygrove)
  • fix: Fix shuffle writing rows containing null struct fields #1845 (Kontinuation)
  • fix: Fall back to Spark for RANGE BETWEEN window expressions #1848 (andygrove)
  • fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack #1865 (andygrove)
  • fix: support read Struct by user schema #1860 (comphead)
  • fix: map parquet field_id correctly (native_iceberg_compat) #1815 (parthchandra)
  • fix: cast_struct_to_struct aligns to Spark behavior #1879 (mbutrovich)
  • fix: correctly handle schemas with nested array of struct (native_iceberg_compat) #1883 (parthchandra)
  • fix: set RangePartitioning for native shuffle default to false #1907 (mbutrovich)
  • fix: conflict between #1905 and #1892. #1919 (mbutrovich)
  • fix: Add overflow check to evaluate of sum decimal accumulator #1922 (leung-ming)
  • fix: Fix overflow handling when casting float to decimal #1914 (leung-ming)
  • fix: Ignore a test case fails on Miri #1951 (leung-ming)

Performance related:

  • perf: Add memory profiling #1702 (andygrove)
  • perf: Add performance tracing capability #1706 (andygrove)
  • perf: Add COMET_RESPECT_PARQUET_FILTER_PUSHDOWN config #1936 (andygrove)

Implemented enhancements:

  • feat: add jemalloc as optional custom allocator #1679 (mbutrovich)
  • feat: support array_repeat #1680 (comphead)
  • feat: More warning info for users #1667 (hsiang-c)
  • feat: decode() expression when using 'utf-8' encoding #1697 (mbutrovich)
  • feat: regexp_replace() expression with no starting offset #1700 (mbutrovich)
  • feat: Improve performance tracing feature #1730 (andygrove)
  • feat: Set/cancel with job tag and make max broadcast table size configurable #1693 (wForget)
  • feat: Add support for expm1 expression from datafusion-spark crate #1711 (andygrove)
  • feat: Add config option for showing all Comet plan transformations #1780 (andygrove)
  • feat: Support Type widening: byte → short/int/long, short → int/long #1770 (huaxingao)
  • feat: Translate Hadoop S3A configurations to object_store configurations #1817 (Kontinuation)
  • feat: Upgrade to official DataFusion 48.0.0 release #1877 (andygrove)
  • feat: Add experimental auto mode for COMET_PARQUET_SCAN_IMPL #1747 (andygrove)
  • feat: support RangePartitioning with native shuffle #1862 (mbutrovich)
  • feat: Add support for signum expression #1889 (andygrove)
  • feat: Add support to lookup map by key #1898 (comphead)
  • feat: support array_max #1892 (drexler-sky)
  • feat: pass ignore_nulls flag to first and last #1866 (rluvaton)
  • feat: Implement ToPrettyString #1921 (andygrove)
  • feat: Support hadoop s3a config in native_iceberg_compat #1925 (parthchandra)
  • feat: rand expression support #1199 (akupchinskiy)
  • feat: supports array_distinct #1923 (drexler-sky)
  • feat: auto scan mode should check for supported file location #1930 (andygrove)
  • feat: Encapsulate Parquet objects #1920 (huaxingao)
  • feat: Change default value of COMET_NATIVE_SCAN_IMPL to auto #1933 (andygrove)
  • feat: Supports array_union #1945 (drexler-sky)

Documentation updates:

  • docs: Add changelog for 0.8.0 #1675 (andygrove)
  • docs: Add instructions on running TPC-H on macOS #1647 (andygrove)
  • docs: Add documentation for accelerating Iceberg Parquet scans with Comet #1683 (andygrove)
  • docs: Add note on setting core.abbrev when generating diffs #1735 (andygrove)
  • docs: Remove outdated param in macos bench guide #1748 (ding-young)
  • docs: Add instructions for running i...
Read more

0.8.0

30 Apr 16:27

Choose a tag to compare

0.8.0 Pre-release
Pre-release

DataFusion Comet 0.8.0 Changelog

This release consists of 81 commits from 11 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: remove code duplication in native_datafusion and native_iceberg_compat implementations #1443 (parthchandra)
  • fix: Refactor CometScanRule and fix bugs #1483 (andygrove)
  • fix: check if handle has been initialized before closing #1554 (wForget)
  • fix: Taking slicing into account when writing BooleanBuffers as fast-encoding format #1522 (Kontinuation)
  • fix: isCometEnabled name conflict #1569 (kazuyukitanimura)
  • fix: make register_object_store use same session_env as file scan #1555 (wForget)
  • fix: adjust CometNativeScan's doCanonicalize and hashCode for AQE, use DataSourceScanExec trait #1578 (mbutrovich)
  • fix: corrected the logic of eliminating CometSparkToColumnarExec #1597 (wForget)
  • fix: avoid panic caused by close null handle of parquet reader #1604 (wForget)
  • fix: Make AQE capable of converting Comet shuffled joins to Comet broadcast hash joins #1605 (Kontinuation)
  • fix: Making shuffle files generated in native shuffle mode reclaimable #1568 (Kontinuation)
  • fix: Support per-task shuffle write rows and shuffle write time metrics #1617 (Kontinuation)
  • fix: Modify Spark SQL core 2 tests for native_datafusion reader, change 3.5.5 diff hash length to 11 #1641 (mbutrovich)
  • fix: fix spark/sql test failures in native_iceberg_compat #1593 (parthchandra)
  • fix: handle missing field correctly in native_iceberg_compat #1656 (parthchandra)
  • fix: better int96 support for experimental native scans #1652 (mbutrovich)
  • fix: respect ignoreNulls flag in first_value and last_value #1626 (andygrove)
  • fix: update row groups count in internal metrics accumulator #1658 (parthchandra)
  • fix: Shuffle should maintain insertion order #1660 (EmilyMatt)

Performance related:

  • perf: Use a global tokio runtime #1614 (andygrove)
  • perf: Respect Spark's PARQUET_FILTER_PUSHDOWN_ENABLED config #1619 (andygrove)
  • perf: Experimental fix to avoid join strategy regression #1674 (andygrove)

Implemented enhancements:

  • feat: add read array support #1456 (comphead)
  • feat: introduce hadoop mini cluster to test native scan on hdfs #1556 (wForget)
  • feat: make parquet native scan schema case insensitive #1575 (wForget)
  • feat: enable iceberg compat tests, more tests for complex types #1550 (comphead)
  • feat: pushdown filter for native_iceberg_compat #1566 (wForget)
  • feat: Fix struct of arrays schema issue #1592 (comphead)
  • feat: adding more struct/arrays tests #1594 (comphead)
  • feat: respect batchSize/workerThreads/blockingThreads configurations for native_iceberg_compat scan #1587 (wForget)
  • feat: add MAP type support for first level #1603 (comphead)
  • feat: Add more tests for nested types combinations for native_datafusion #1632 (comphead)
  • feat: Override MapBuilder values field with expected schema #1643 (comphead)
  • feat: track unified memory pool #1651 (wForget)
  • feat: Add support for complex types in native shuffle #1655 (andygrove)

Documentation updates:

  • docs: Update configuration guide to show optional configs #1524 (andygrove)
  • docs: Add changelog for 0.7.0 release #1527 (andygrove)
  • docs: Use a shallow clone for Spark SQL test instructions #1547 (mbutrovich)
  • docs: Update benchmark results for 0.7.0 release #1548 (andygrove)
  • doc: Renew kubernetes.md #1549 (comphead)
  • docs: various improvements to tuning guide #1525 (andygrove)
  • docs: Update supported Spark versions #1580 (andygrove)
  • docs: change OSX/OS X to macOS #1584 (mbutrovich)
  • docs: docs for benchmarking in aws ec2 #1601 (andygrove)
  • docs: Update compatibility docs for new native scans #1657 (andygrove)
  • doc: Document local HDFS setup #1673 (comphead)

Other:

  • chore: fix issue in release process #1528 (andygrove)
  • chore: Remove all subdependencies #1514 (EmilyMatt)
  • chore: Drop support for Spark 3.3 (EOL) #1529 (andygrove)
  • chore: Prepare for 0.8.0 development #1530 (andygrove)
  • chore: Re-enable GitHub discussions #1535 (andygrove)
  • chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) #1534 (kazuyukitanimura)
  • build: Use unique name for surefire artifacts #1544 (andygrove)
  • chore: Update links for released version #1540 (andygrove)
  • chore: Enable Comet explicitly in CometTPCDSQueryTestSuite #1559 (andygrove)
  • chore: Fix some inconsistencies in memory pool configuration #1561 (andygrove)
  • upgraded spark 3.5.4 to 3.5.5 #1565 (YanivKunda)
  • minor: fix typo #1570 (wForget)
  • Chore: simplify array related functions impl #1490 (kazantsev-maksim)
  • added fallback using reflection for backward-compatibility #1573 (YanivKunda)
  • chore: Override node name for CometSparkToColumnar #1577 (l0kr)
  • chore: Reimplement ShuffleWriterExec using interleave_record_batch #1511 (Kontinuation)
  • chore: Run Comet tests for more Spark versions #1582 (andygrove)
  • Feat: support array_except function #1343 (kazantsev-maksim)
  • minor: Fix clippy warnings #1606 (Kontinuation)
  • chore: Remove some unwraps in hashing code #1600 (andygrove)
  • chore: Remove redundant shims for getFailOnError #1608 (andygrove)
  • chore: Making comet native operators write spill files to spark local dir #1581 (Kontinuation)
  • chore: Refactor QueryPlanSerde to use idiomatic Scala and red...
Read more

0.7.0

30 Apr 16:26

Choose a tag to compare

0.7.0 Pre-release
Pre-release

DataFusion Comet 0.7.0 Changelog

This release consists of 46 commits from 11 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation #1398 (andygrove)
  • fix: Reduce cast.rs and utils.rs logic from parquet_support.rs for experimental native scans #1387 (mbutrovich)
  • fix: Remove more cast.rs logic from parquet_support.rs for experimental native scans #1413 (mbutrovich)
  • fix: fix various unit test failures in native_datafusion and native_iceberg_compat readers #1415 (parthchandra)
  • fix: metrics tests for native_datafusion experimental native scan #1445 (mbutrovich)
  • fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests #1440 (andygrove)
  • fix: Executor memory overhead overriding #1462 (LukMRVC)
  • fix: Stop copying rust-toolchain to docker file #1475 (andygrove)
  • fix: PartitionBuffers should not have their own MemoryConsumer #1496 (EmilyMatt)
  • fix: enable full decimal to decimal support #1385 (himadripal)
  • fix: use common implementation of handling object store and hdfs urls for native_datafusion and native_iceberg_compat #1494 (parthchandra)
  • fix: Simplify CometShuffleMemoryAllocator logic, rename classes, remove config #1485 (mbutrovich)
  • fix: check overflow for decimal integral division #1512 (wForget)

Performance related:

  • perf: Update RewriteJoin logic to choose optimal build side #1424 (andygrove)
  • perf: Reduce native shuffle memory overhead by 50% #1452 (andygrove)

Implemented enhancements:

  • feat: CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics #1172 (mbutrovich)
  • feat: add experimental remote HDFS support for native DataFusion reader #1359 (comphead)
  • feat: add Win-amd64 profile #1410 (wForget)
  • feat: Support IntegralDivide function #1428 (wForget)
  • feat: Add div operator for fuzz testing and update expression doc #1464 (wForget)
  • feat: Upgrade to DataFusion 46.0.0-rc2 #1423 (andygrove)
  • feat: Add support for rpad #1470 (andygrove)
  • feat: Use official DataFusion 46.0.0 release #1484 (andygrove)

Documentation updates:

  • docs: Add changelog for 0.6.0 release #1402 (andygrove)
  • docs: Improve documentation for running stability plan tests #1469 (andygrove)

Other:

  • test: Add experimental native scans to CometReadBenchmark #1150 (mbutrovich)
  • chore: Prepare for 0.7.0 development #1404 (andygrove)
  • chore: Update released version in documentation #1418 (andygrove)
  • chore: Update protobuf to 3.25.5 #1434 (kazuyukitanimura)
  • chore: Update guava to 33.2.1-jre #1435 (kazuyukitanimura)
  • test: Register Spark-compatible expressions with a DataFusion context #1432 (viczsaurav)
  • chore: fixes for kube build #1421 (comphead)
  • build: pin machete to version 0.7.0 #1444 (andygrove)
  • chore: Re-organize shuffle writer code #1439 (andygrove)
  • chore: faster maven mirror #1447 (comphead)
  • build: Use stable channel in rust-toolchain #1465 (andygrove)
  • Feat: support array_compact function #1321 (kazantsev-maksim)
  • chore: Upgrade to Spark 3.5.4 #1471 (andygrove)
  • chore: Enable CI checks for native_datafusion scan #1479 (andygrove)
  • chore: Add native_iceberg_compat CI checks #1487 (andygrove)
  • chore: Stop disabling readside padding in TPC stability suite #1491 (andygrove)
  • chore: Remove num partitions from repartitioner #1498 (EmilyMatt)
  • test: fix Spark 3.5 tests #1482 (kazuyukitanimura)
  • minor: Remove hard-coded config default #1503 (andygrove)
  • chore: Use Datafusion's existing empty stream #1517 (EmilyMatt)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    20	Andy Grove
     6	Matt Butrovich
     4	Zhen Wang
     3	Emily Matheys
     3	KAZUYUKI TANIMURA
     3	Oleks V
     2	Himadri Pal
     2	Parth Chandra
     1	Kazantsev Maksim
     1	Lukas Moravec
     1	Saurav Verma

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

0.6.0

17 Feb 19:25

Choose a tag to compare

0.6.0 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: 0.5.0...0.6.0

0.5.0

17 Jan 19:06

Choose a tag to compare

0.5.0 Pre-release
Pre-release

DataFusion Comet 0.5.0 Changelog

This release consists of 69 commits from 15 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: Unsigned type related bugs #1095 (kazuyukitanimura)
  • fix: Use RDD partition index #1112 (viirya)
  • fix: Various metrics bug fixes and improvements #1111 (andygrove)
  • fix: Don't create CometScanExec for subclasses of ParquetFileFormat #1129 (Kimahriman)
  • fix: Fix metrics regressions #1132 (andygrove)
  • fix: Enable scenarios accidentally commented out in CometExecBenchmark #1151 (mbutrovich)
  • fix: Spark 4.0-preview1 SPARK-47120 #1156 (kazuyukitanimura)
  • fix: Document enabling comet explain plan usage in Spark (4.0) #1176 (parthchandra)
  • fix: stddev_pop should not directly return 0.0 when count is 1.0 #1184 (viirya)
  • fix: fix missing explanation for then branch in case when #1200 (rluvaton)
  • fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates #1253 (andygrove)
  • fix: Fall back to Spark for distinct aggregates #1262 (andygrove)
  • fix: disable initCap by default #1276 (kazuyukitanimura)

Performance related:

  • perf: Stop passing Java config map into native createPlan #1101 (andygrove)
  • feat: Make native shuffle compression configurable and respect spark.shuffle.compress #1185 (andygrove)
  • perf: Improve query planning to more reliably fall back to columnar shuffle when native shuffle is not supported #1209 (andygrove)
  • feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support #1192 (andygrove)
  • feat: Implement custom RecordBatch serde for shuffle for improved performance #1190 (andygrove)

Implemented enhancements:

  • feat: support array_insert #1073 (SemyonSinchenko)
  • feat: enable decimal to decimal cast of different precision and scale #1086 (himadripal)
  • feat: Improve ScanExec native metrics #1133 (andygrove)
  • feat: Add Spark-compatible implementation of SchemaAdapterFactory #1169 (andygrove)
  • feat: Improve shuffle metrics (second attempt) #1175 (andygrove)
  • feat: Add a spark.comet.exec.memoryPool configuration for experimenting with various datafusion memory pool setups. #1021 (Kontinuation)
  • feat: Reenable tests for filtered SMJ anti join #1211 (comphead)
  • feat: add support for array_remove expression #1179 (jatin510)

Documentation updates:

  • docs: Update documentation for 0.4.0 release #1096 (andygrove)
  • docs: Fix readme typo FGPA -> FPGA #1117 (gstvg)
  • docs: Add more technical detail and new diagram to Comet plugin overview #1119 (andygrove)
  • docs: Add some documentation explaining how shuffle works #1148 (andygrove)
  • docs: Update TPC-H benchmark results #1257 (andygrove)

Other:

  • chore: Add changelog for 0.4.0 #1089 (andygrove)
  • chore: Prepare for 0.5.0 development #1090 (andygrove)
  • build: Skip installation of spark-integration and fuzz testing modules #1091 (parthchandra)
  • minor: Add hint for finding the GPG key to use when publishing to maven #1093 (andygrove)
  • chore: Include first ScanExec batch in metrics #1105 (andygrove)
  • chore: Improve CometScan metrics #1100 (andygrove)
  • chore: Add custom metric for native shuffle fetching batches from JVM #1108 (andygrove)
  • chore: Remove unused StringView struct #1143 (andygrove)
  • test: enable more Spark 4.0 tests #1145 (kazuyukitanimura)
  • chore: Refactor cast to use SparkCastOptions param #1146 (andygrove)
  • chore: Move more expressions from core crate to spark-expr crate #1152 (andygrove)
  • chore: Remove dead code #1155 (andygrove)
  • chore: Move string kernels and expressions to spark-expr crate #1164 (andygrove)
  • chore: Move remaining expressions to spark-expr crate + some minor refactoring #1165 (andygrove)
  • chore: Add ignored tests for reading complex types from Parquet #1167 (andygrove)
  • test: enabling Spark tests with offHeap requirement #1177 (kazuyukitanimura)
  • minor: move shuffle classes from common to spark #1193 (andygrove)
  • minor: refactor to move decodeBatches to broadcast exchange code as private function #1195 (andygrove)
  • minor: refactor prepare_output so that it does not require an ExecutionContext #1194 (andygrove)
  • minor: remove unused source files #1202 (andygrove)
  • chore: Upgrade to DataFusion 44.0.0-rc2 #1154 (andygrove)
  • chore: Add safety check to CometBuffer #1050 (viirya)
  • chore: Remove unreachable code #1213 (andygrove)
  • test: Enable Comet by default except some tests in SparkSessionExtensionSuite #1201 (kazuyukitanimura)
  • chore: extract struct expressions to folders based on spark grouping #1216 (rluvaton)
  • chore: extract static invoke expressions to folders based on spark grouping #1217 (rluvaton)
  • chore: Follow-on PR to fully enable onheap memory usage #1210 (andygrove)
  • chore: extract agg_funcs expressions to folders based on spark grouping #1224 (rluvaton)
  • chore: extract datetime_funcs expressions to folders based on spark grouping #1222 (rluvaton)
  • chore: Upgrade to DataFusion 44.0.0 from 44.0.0 RC2 #1232 (rluvaton)
  • chore: extract strings file to strings_func like in spark grouping #1215 (rluvaton)
  • chore: extract predicate_functions expressions to folders based on spark grouping #1218 (rluvaton)
  • build(deps): bump protobuf version to 3.21.12 #1234 (wForget)
  • chore: extract json_funcs expressions to folders based on spark grouping #1220 (rluvaton)
  • test: Enable shuffle by default in Spark tests #1240 (kazuyukitanimura)
  • chore: extract hash_funcs expressions to folders based on spark grouping #1221 (rluvaton)
  • build: Fix test failure caused by merging conflicting PRs #1259 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    37	Andy Grove
    10	Raz Luvaton
     7	KAZUYUKI TANIMURA
     3	Liang-Chi Hsieh
     2	Parth Chandra
     1	Adam Binford
     1	Dharan Aditya
     1	Himadri Pal
     1	Jagdish Parihar
     1	Kristin Cowalcijk
     1	Matt Butrovich
     1	Oleks V
     1	Sem
     1	Zhen Wang
     1	gstvg

Thank you also to everyone who contributed ...

Read more