From ea198d939d5cc5937f0909aad089e974def43f28 Mon Sep 17 00:00:00 2001 From: hsiang-c Date: Fri, 15 Aug 2025 22:20:20 -0700 Subject: [PATCH 1/4] Update confs to bypass Iceberg Spark issues - Document current limitation --- docs/source/user-guide/iceberg.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/docs/source/user-guide/iceberg.md b/docs/source/user-guide/iceberg.md index bc507e3a65..badd3ff451 100644 --- a/docs/source/user-guide/iceberg.md +++ b/docs/source/user-guide/iceberg.md @@ -80,11 +80,14 @@ $SPARK_HOME/bin/spark-shell \ --conf spark.sql.catalog.spark_catalog.type=hadoop \ --conf spark.sql.catalog.spark_catalog.warehouse=/tmp/warehouse \ --conf spark.plugins=org.apache.spark.CometPlugin \ - --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ + --conf spark.comet.exec.shuffle.enabled=false \ --conf spark.sql.iceberg.parquet.reader-type=COMET \ --conf spark.comet.explainFallback.enabled=true \ --conf spark.memory.offHeap.enabled=true \ - --conf spark.memory.offHeap.size=2g + --conf spark.memory.offHeap.size=2g \ + --conf spark.comet.use.lazyMaterialization=false \ + --conf spark.comet.schemaEvolution.enabled=true \ + --conf spark.comet.exec.broadcastExchange.enabled=false ``` Create an Iceberg table. Note that Comet will not accelerate this part. @@ -138,4 +141,10 @@ scala> spark.sql(s"SELECT * from t1").explain() == Physical Plan == *(1) CometColumnarToRow +- CometBatchScan spark_catalog.default.t1[c0#26, c1#27] spark_catalog.default.t1 (branch=null) [filters=, groupedBy=] RuntimeFilters: [] -``` \ No newline at end of file +``` + +## Known issues + - We temporarily disable Comet when there are delete files in Iceberg scan, see Iceberg [1.8.1 diff](../../../dev/diffs/iceberg/1.8.1.diff) + - Iceberg scan w/ delete files lead to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2117) and [incorrect results](https://github.com/apache/datafusion-comet/issues/2118) + - Enabling `CometShuffleManager` leads to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2086) + - Enabling `CometBroadcastExchangeExec` leads to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2116) From 4cf0b143ca61ae6e8fc7f22516b6af70953069aa Mon Sep 17 00:00:00 2001 From: hsiang-c <137842490+hsiang-c@users.noreply.github.com> Date: Mon, 18 Aug 2025 09:24:52 -0700 Subject: [PATCH 2/4] Update iceberg.md --- docs/source/user-guide/iceberg.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user-guide/iceberg.md b/docs/source/user-guide/iceberg.md index badd3ff451..e21cf26138 100644 --- a/docs/source/user-guide/iceberg.md +++ b/docs/source/user-guide/iceberg.md @@ -144,7 +144,7 @@ scala> spark.sql(s"SELECT * from t1").explain() ``` ## Known issues - - We temporarily disable Comet when there are delete files in Iceberg scan, see Iceberg [1.8.1 diff](../../../dev/diffs/iceberg/1.8.1.diff) + - We temporarily disable Comet when there are delete files in Iceberg scan, see Iceberg [1.8.1 diff](../../../dev/diffs/iceberg/1.8.1.diff) and this [PR](https://github.com/apache/iceberg/pull/13793) - Iceberg scan w/ delete files lead to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2117) and [incorrect results](https://github.com/apache/datafusion-comet/issues/2118) - Enabling `CometShuffleManager` leads to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2086) - Enabling `CometBroadcastExchangeExec` leads to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2116) From 47f85a33bf4e1e8c273afbd15378269adc88294a Mon Sep 17 00:00:00 2001 From: hsiang-c Date: Mon, 18 Aug 2025 15:00:25 -0700 Subject: [PATCH 3/4] Users can diable Spark's AQE as well --- docs/source/user-guide/iceberg.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/user-guide/iceberg.md b/docs/source/user-guide/iceberg.md index e21cf26138..977432480a 100644 --- a/docs/source/user-guide/iceberg.md +++ b/docs/source/user-guide/iceberg.md @@ -87,7 +87,7 @@ $SPARK_HOME/bin/spark-shell \ --conf spark.memory.offHeap.size=2g \ --conf spark.comet.use.lazyMaterialization=false \ --conf spark.comet.schemaEvolution.enabled=true \ - --conf spark.comet.exec.broadcastExchange.enabled=false + --conf spark.sql.adaptive.enabled=false # or spark.comet.exec.broadcastExchange.enabled=false, see `Known issues` ``` Create an Iceberg table. Note that Comet will not accelerate this part. @@ -147,4 +147,5 @@ scala> spark.sql(s"SELECT * from t1").explain() - We temporarily disable Comet when there are delete files in Iceberg scan, see Iceberg [1.8.1 diff](../../../dev/diffs/iceberg/1.8.1.diff) and this [PR](https://github.com/apache/iceberg/pull/13793) - Iceberg scan w/ delete files lead to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2117) and [incorrect results](https://github.com/apache/datafusion-comet/issues/2118) - Enabling `CometShuffleManager` leads to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2086) - - Enabling `CometBroadcastExchangeExec` leads to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2116) + - Spark Runtime Filtering isn't [working](https://github.com/apache/datafusion-comet/issues/2116) + - You can bypass the issue by either setting `spark.sql.adaptive.enabled=false` or `spark.comet.exec.broadcastExchange.enabled=false` From 63d9b7139c16448e7e89e5a683038de234665644 Mon Sep 17 00:00:00 2001 From: hsiang-c Date: Mon, 18 Aug 2025 15:37:55 -0700 Subject: [PATCH 4/4] Let users turn off AQE or Comet's broadcastExchange --- docs/source/user-guide/iceberg.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/user-guide/iceberg.md b/docs/source/user-guide/iceberg.md index 977432480a..af6efc7f7b 100644 --- a/docs/source/user-guide/iceberg.md +++ b/docs/source/user-guide/iceberg.md @@ -86,8 +86,7 @@ $SPARK_HOME/bin/spark-shell \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=2g \ --conf spark.comet.use.lazyMaterialization=false \ - --conf spark.comet.schemaEvolution.enabled=true \ - --conf spark.sql.adaptive.enabled=false # or spark.comet.exec.broadcastExchange.enabled=false, see `Known issues` + --conf spark.comet.schemaEvolution.enabled=true ``` Create an Iceberg table. Note that Comet will not accelerate this part.