-
Notifications
You must be signed in to change notification settings - Fork 121
Make Elementary work with ClickHouse cluster mode #878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a dispatchable macro for per-model query settings and integrates it into ClickHouse-specific SQL generation for delete, insert, and insert-rows; also copies two boolean properties into ClickHouse temporary relation creation. Changes
Sequence Diagram(s)sequenceDiagram
participant Macro as caller macro
participant GetQS as get_query_settings()
participant Adapter as adapter.dispatch
participant ClickHouse as clickhouse__get_query_settings
participant Default as default__get_query_settings
participant SQLGen as SQL generation macros
Macro->>GetQS: call get_query_settings(model)
GetQS->>Adapter: adapter.dispatch("get_query_settings", adapter_name)(model)
alt adapter = clickhouse
Adapter->>ClickHouse: invoke clickhouse__get_query_settings(model)
ClickHouse-->>GetQS: return model-specific settings
else other adapters
Adapter->>Default: invoke default__get_query_settings(model)
Default-->>GetQS: return ""
end
GetQS-->>Macro: settings string
note right of SQLGen: Integration points
Macro->>SQLGen: build DELETE/INSERT/INSERT_ROWS SQL
SQLGen->>GetQS: retrieve settings
SQLGen-->>Macro: emit SQL with settings injected
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 @StevenReitsma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
macros/utils/cross_db_utils/get_query_settings.sql(1 hunks)macros/utils/table_operations/delete_and_insert.sql(1 hunks)macros/utils/table_operations/insert_rows.sql(1 hunks)macros/utils/table_operations/make_temp_relation.sql(1 hunks)
🔇 Additional comments (3)
macros/utils/table_operations/make_temp_relation.sql (1)
65-67: Correctly propagates ClickHouse cluster metadata to temporary relations.The addition of
can_on_clusterandcan_exchangeproperties ensures temporary tables preserve cluster-related attributes needed for ClickHouse distributed operations. This directly addresses the first issue mentioned in the PR description.macros/utils/table_operations/insert_rows.sql (1)
66-68: Injecting query settings for adapter-specific behavior.The placement of
{{ elementary.get_query_settings() }}between the column list and VALUES clause is correct. For non-ClickHouse adapters,default__get_query_settings()returns an empty string, so there's no impact. The injection happens inbase_insert_query, which is reused consistently across chunked inserts.Note: This relies on the global
modelcontext variable being available during macro execution, which is standard in dbt model runs. Verify this pattern holds if these macros are called outside typical model execution contexts.macros/utils/cross_db_utils/get_query_settings.sql (1)
1-11: Establishes dispatch pattern for adapter-specific query settings.The macro structure is sound: dispatches to adapter-specific implementations with a sensible default (empty string). The ClickHouse variant delegates to
adapter.get_model_query_settings(model).Verify that
adapter.get_model_query_settings()is a documented and real method available on all adapters used by Elementary. If this method doesn't exist universally, you may need a defensive fallback or adapter-specific guards.
| or {{ delete_column_key }} in (select {{ delete_column_key }} from {{ delete_relation }}) | ||
| {{ adapter.get_model_query_settings(model) }}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Two issues here:
-
Inconsistent pattern: This file directly calls
adapter.get_model_query_settings(model)whileinsert_rows.sqluses the dispatch macroelementary.get_query_settings(). Both target the same goal (injecting ClickHouse SETTINGS) but through different mechanisms. Standardize on one approach—the dispatch pattern is preferable for extensibility. -
SQL syntax concern (line 72): The insert query places the SETTINGS clause between
INSERT INTOandSELECT:insert into {{ relation }} {{ adapter.get_model_query_settings(model) }} select * from {{ insert_relation }};ClickHouse typically expects
SETTINGSat the very end of the query. Verify this syntax is valid for ClickHouseINSERT...SELECT...SETTINGSstatements. If not, move the SETTINGS clause to the end of the query.
Also applies to: 72-72
I'll verify the concerns in this review comment by examining the actual files and checking ClickHouse syntax.
🏁 Script executed:
#!/bin/bash
# Locate and inspect the SQL files mentioned
echo "=== Locating delete_and_insert.sql ==="
fd -type f -name "delete_and_insert.sql"
echo -e "\n=== Locating insert_rows.sql ==="
fd -type f -name "insert_rows.sql"Length of output: 414
🌐 Web query:
ClickHouse INSERT SELECT SETTINGS clause syntax location
💡 Result:
Put the SETTINGS clause immediately after the table name and optional column list (i.e., before VALUES, FORMAT, or SELECT).
You can also set query-level options inside the SELECT itself using its own SETTINGS clause. [1][2]
Sources:
[1] ClickHouse INSERT INTO statement (syntax and note about SETTINGS placement).
[2] ClickHouse SELECT — SETTINGS in SELECT queries.
Let me correct the shell script and inspect both files:
🏁 Script executed:
#!/bin/bash
# Find and inspect both SQL files
echo "=== Finding delete_and_insert.sql ==="
find . -name "delete_and_insert.sql" -type f 2>/dev/null | head -5
echo -e "\n=== Finding insert_rows.sql ==="
find . -name "insert_rows.sql" -type f 2>/dev/null | head -5
echo -e "\n=== Content of delete_and_insert.sql (around lines 64-72) ==="
delete_file=$(find . -name "delete_and_insert.sql" -type f 2>/dev/null | head -1)
if [ -n "$delete_file" ]; then
sed -n '60,75p' "$delete_file" | cat -n
fi
echo -e "\n=== Content of insert_rows.sql (full file for inspection) ==="
insert_file=$(find . -name "insert_rows.sql" -type f 2>/dev/null | head -1)
if [ -n "$insert_file" ]; then
cat "$insert_file" | cat -n
fi
echo -e "\n=== Search for get_query_settings macro definition ==="
rg -n "get_query_settings" --type sqlLength of output: 11002
Let me examine the macro implementations to verify if the inconsistency claim is valid and check the dispatch patterns:
🏁 Script executed:
#!/bin/bash
# Inspect get_query_settings macro definition and implementations
echo "=== get_query_settings macro definitions ==="
cat ./macros/utils/cross_db_utils/get_query_settings.sql
echo -e "\n=== Search for get_model_query_settings usage and definition ==="
rg -n "get_model_query_settings" -C 3
echo -e "\n=== Check if get_query_settings macro is used elsewhere ==="
rg -n "elementary.get_query_settings\|get_query_settings" --type sql -C 2 | head -50Length of output: 2290
Standardize on dispatch macro pattern for query settings.
This file directly calls adapter.get_model_query_settings(model) (lines 65, 72) while insert_rows.sql uses the dispatch macro elementary.get_query_settings(). Both target the same goal for ClickHouse, but the dispatch pattern (used in insert_rows.sql) is preferable—it provides adapter-agnostic abstraction and returns an empty string for non-ClickHouse adapters, while direct calls may fail on unsupported adapters.
Replace adapter.get_model_query_settings(model) with elementary.get_query_settings() to match the pattern used elsewhere in the codebase.
🤖 Prompt for AI Agents
In macros/utils/table_operations/delete_and_insert.sql around lines 64-65 (and
also update the similar call at line 72), replace the direct call to
adapter.get_model_query_settings(model) with the dispatch macro
elementary.get_query_settings() so the code uses the adapter-agnostic pattern
used in insert_rows.sql; update both occurrences accordingly to call
elementary.get_query_settings() with the same model argument so ClickHouse gets
its settings and other adapters safely receive an empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see this as an issue since this code is already in ClickHouse-adapter specific code. But can change if wanted from a human reviewer :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StevenReitsma You're absolutely right! I apologize for the overly prescriptive suggestion.
The key distinction is that your code is in the clickhouse__get_delete_and_insert_queries macro, which is already adapter-specific. In adapter-specific macros, directly calling adapter methods like adapter.get_model_query_settings(model) is perfectly reasonable and arguably more direct.
The dispatch pattern (elementary.get_query_settings()) is most valuable in generic/cross-adapter code where you need a single implementation to work across multiple database types. In insert_rows.sql, the dispatch is used in the base get_insert_rows_queries macro (not an adapter override), which is why the pattern makes sense there.
Your current approach is clean and appropriate for adapter-specific code. No changes needed here! 👍
|
Hey @StevenReitsma ! Thanks a lot for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
macros/utils/cross_db_utils/get_query_settings.sql (1)
1-11: Add documentation explaining the purpose and ClickHouse-specific context.The macro lacks inline comments explaining its purpose. Given that this is adapter-specific functionality addressing a ClickHouse cluster requirement (per PR description), adding a brief docstring would clarify intent for future maintainers.
+{# + Return adapter-specific query settings for model execution. + ClickHouse adapter uses this to inject SETTINGS clause required for cluster-mode compatibility. + Other adapters return empty string (no-op). +#} {% macro get_query_settings() %}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
macros/utils/cross_db_utils/get_query_settings.sql(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-05T16:25:01.719Z
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/utils/cross_db_utils/incremental_strategy.sql:13-15
Timestamp: 2025-10-05T16:25:01.719Z
Learning: In macros/utils/cross_db_utils/incremental_strategy.sql, the get_default_incremental_strategy() macro and its adapter-specific variants are used exclusively for Elementary's internal dbt artifact tables, all of which have unique keys defined. Therefore, defaulting to "merge" strategy for Redshift (and other adapters) is safe in this context.
Applied to files:
macros/utils/cross_db_utils/get_query_settings.sql
🔇 Additional comments (1)
macros/utils/cross_db_utils/get_query_settings.sql (1)
1-11: Address incomplete macro integration and variable scoping inget_query_settingsmacro.Two issues were identified during verification:
Incomplete Integration:
delete_and_insert.sqlstill uses directadapter.get_model_query_settings(model)calls instead of the newelementary.get_query_settings()macro. This creates inconsistency—insert_rows.sqlcorrectly uses the macro, butdelete_and_insert.sqldoes not. Both should use the macro for consistency and to centralize adapter-specific logic.Variable Scoping Concern: The
get_query_settingsmacro references themodelvariable without explicit parameter passing. While this works in dbt's execution context, it relies on implicit global scope. The macro should either acceptmodelas an explicit parameter or include a safeguard for cases wheremodelmay be undefined.Recommended Actions:
- Update
delete_and_insert.sqlto call{{ elementary.get_query_settings() }}instead of direct adapter calls (matching the pattern ininsert_rows.sql)- Consider making
modelan explicit parameter toget_query_settings()for robustness, or add defensive checks⛔ Skipped due to learnings
Learnt from: haritamar Repo: elementary-data/dbt-data-reliability PR: 825 File: macros/utils/cross_db_utils/incremental_strategy.sql:13-15 Timestamp: 2025-10-05T16:25:01.719Z Learning: In macros/utils/cross_db_utils/incremental_strategy.sql, the get_default_incremental_strategy() macro and its adapter-specific variants are used exclusively for Elementary's internal dbt artifact tables, all of which have unique keys defined. Therefore, defaulting to "merge" strategy for Redshift (and other adapters) is safe in this context.Learnt from: GuyEshdat Repo: elementary-data/dbt-data-reliability PR: 838 File: macros/utils/table_operations/insert_rows.sql:156-0 Timestamp: 2025-08-10T11:28:43.632Z Learning: In the Elementary dbt-data-reliability codebase, the `dremio__escape_special_chars` macro in `macros/utils/table_operations/insert_rows.sql` receives strings where single quotes are already escaped as `\'` from upstream processing. The macro correctly converts these to Dremio's SQL escaping format by replacing `\'` with `''`. This is different from other database implementations that handle bare single quotes directly.
|
@NoyaOffer any news? |
|
I merged the PR. It should be included in the next release. |
|
Awesome, thanks! |
These changes make sure Elementary works successfully on ClickHouse clusters (i.e. non-single node setups that are self-hosted, so not on ClickHouse Cloud).
It fixes two separate issues:
can_on_clusterandcan_exchangebooleans were not passed through to temp tables.SETTINGSclause was not inserted in some queries. There might be more queries where this clause is missing, but the ones I added are the most important for clustered setups.I'm not too familiar with adapter-specific dbt macro development, so I'm not sure if what I built is the way to go. However, I can confirm that Elementary now works successfully on a clustered ClickHouse setup.
Summary by CodeRabbit
Release Notes