Skip to content

Conversation

@haritamar
Copy link
Collaborator

@haritamar haritamar commented Oct 8, 2025

Summary by CodeRabbit

  • New Features

    • Added a replace-table-data operation for Spark and Dremio that truncates then inserts in chunks (non-atomic).
    • Default configuration now includes a cache_artifacts flag (disabled by default).
  • Performance

    • Improved reliability for large data inserts via chunked execution.
  • Tests

    • Tests updated to exercise artifact caching and verify artifact updates across consecutive runs.

@linear
Copy link

linear bot commented Oct 8, 2025

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2025

👋 @haritamar
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

Walkthrough

Enables artifact caching in two integration tests by setting cache_artifacts = True; adds a cache_artifacts: false default to the system config; and introduces engine-specific replace_table_data macros for Spark and Dremio that truncate target tables and insert rows in chunked, non-atomic batches.

Changes

Cohort / File(s) Summary
Tests: enable artifact caching
integration_tests/tests/test_dbt_artifacts/test_artifacts.py
Set dbt_project.dbt_runner.vars["cache_artifacts"] = True in two tests to exercise artifact caching across consecutive runs.
Default config update
macros/edr/system/system_utils/get_config_var.sql
dremio__get_default_config() now includes 'cache_artifacts': false alongside 'dbt_artifacts_chunk_size': 100.
Engine-specific table replace macros
macros/utils/table_operations/replace_table_data.sql
Added spark__replace_table_data(relation, rows) and dremio__replace_table_data(relation, rows) that TRUNCATE the target relation and then INSERT rows in chunked, non-atomic batches; Spark macro formatting aligned.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant TestRunner
  participant Macro as replace_table_data (spark/dremio)
  participant DB as Target Database

  TestRunner->>Macro: call replace_table_data(relation, rows)
  Note over Macro: Engine-specific path (Spark or Dremio)
  Macro->>DB: TRUNCATE relation
  alt rows exist
    loop per chunk
      Macro->>DB: INSERT chunk (non-atomic)
    end
  else no rows
    Note over Macro: skip inserts
  end
  Macro-->>TestRunner: return success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • GuyEshdat

Poem

I hop through tables, trim and renew,
Cache in my pocket, artifacts too.
Chunk by chunk, I nibble the load,
Spark on the path, Dremio road.
Truncate, insert — a rabbit's small cheer. 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately highlights the Dremio replace_table_data fix which is a real and significant aspect of the changeset, thus it is directly related to the PR’s main objective.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ele-5121-dremio-fix-for-replace_table_data

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c9d2a47 and 75fa968.

📒 Files selected for processing (1)
  • macros/edr/system/system_utils/get_config_var.sql (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • macros/edr/system/system_utils/get_config_var.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (1.8.0, postgres) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (fusion, redshift) / test

Comment @coderabbitai help to get the list of available commands and usage tips.

@haritamar haritamar merged commit 57a8101 into master Oct 8, 2025
29 of 34 checks passed
@haritamar haritamar deleted the ele-5121-dremio-fix-for-replace_table_data branch October 8, 2025 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants