Skip to content

Conversation

@arbiv
Copy link
Contributor

@arbiv arbiv commented Oct 20, 2025

  • Add optional exclude_detection_period_from_training parameter to test_volume_anomalies.sql
  • Pass parameter through test_table_anomalies.sql to get_anomalies_test_configuration.sql
  • Add parameter handling and configuration in get_anomalies_test_configuration.sql
  • Implement core exclusion logic in get_anomaly_scores_query.sql:
    • Calculate detection_period_start boundary based on backfill_days
    • Add is_detection_period flag to grouped_metrics CTE
    • Exclude detection period metrics from time_window_aggregation training
  • Flag defaults to false to preserve backward compatibility
  • Enables clean separation between training and detection data for improved anomaly detection accuracy

Summary by CodeRabbit

  • New Features

    • Added an option to exclude the detection period from model training; anomaly scoring and training statistics now honor this setting.
    • Added Redshift support for multi-column membership checks in cross-database queries.
  • Tests

    • Added integration tests covering behavior with and without detection-period exclusion and updated test configuration to include the new option.

- Add optional exclude_detection_period_from_training parameter to test_volume_anomalies.sql
- Pass parameter through test_table_anomalies.sql to get_anomalies_test_configuration.sql
- Add parameter handling and configuration in get_anomalies_test_configuration.sql
- Implement core exclusion logic in get_anomaly_scores_query.sql:
  - Calculate detection_period_start boundary based on backfill_days
  - Add is_detection_period flag to grouped_metrics CTE
  - Exclude detection period metrics from time_window_aggregation training
- Flag defaults to false to preserve backward compatibility
- Enables clean separation between training and detection data for improved anomaly detection accuracy
@linear
Copy link

linear bot commented Oct 20, 2025

@github-actions
Copy link
Contributor

👋 @arbiv
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Oct 20, 2025

Walkthrough

Adds an exclude_detection_period_from_training flag and threads it through tests, test macros, the anomaly scoring SQL, and a Redshift multi-value IN utility, allowing optional exclusion of the detection period from training aggregates and exposing the exclusion flag and derived field(s) in outputs.

Changes

Cohort / File(s) Summary
Test expectation update
integration_tests/tests/test_anomaly_test_configuration.py
Adds exclude_detection_period_from_training: None to the expected anomaly configuration mapping.
New/updated tests
integration_tests/tests/test_volume_anomalies.py
Adds test_exclude_detection_from_training to exercise include/exclude scenarios (skipped on ClickHouse); toggles exclude_detection_period_from_training and asserts differing outcomes.
Core anomaly scoring query
macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql
Adds has_seasonality/dynamic partition_by_keys; computes detection_period_start[_expr] when exclusion enabled; introduces should_exclude_from_training; conditions training aggregates (training_avg, training_stddev, training_set_size, training_start, training_end) to ignore flagged rows; updates GROUP BY and partitioning wiring and propagates the flag.
Test configuration macro
macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql
Macro signature extended to accept exclude_detection_period_from_training; reads it via get_test_argument and includes it in the returned test configuration dictionary.
Test macros propagation
macros/edr/tests/test_table_anomalies.sql, macros/edr/tests/test_volume_anomalies.sql
Macros updated to accept and forward exclude_detection_period_from_training (default false) into downstream calls (elementary.test_table_anomalies / get_anomalies_test_configuration).
Adapter utility macro addition
macros/utils/cross_db_utils/multi_value_in.sql
Adds redshift__edr_multi_value_in(source_cols, target_cols, target_table) to emulate multi-column IN for Redshift by concatenating casted column values and comparing via EXISTS.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Test Runner
    participant VolumeMacro as test_volume_anomalies
    participant TableMacro as test_table_anomalies
    participant ConfigMacro as get_anomalies_test_configuration
    participant ScoreQuery as get_anomaly_scores_query

    Runner->>VolumeMacro: invoke with exclude_detection_period_from_training=(true|false)
    VolumeMacro->>TableMacro: forward exclude_detection_period_from_training
    TableMacro->>ConfigMacro: request config (includes exclusion flag)
    ConfigMacro-->>TableMacro: return config with exclude_detection_period_from_training
    TableMacro->>ScoreQuery: execute scoring with config

    rect rgb(235,245,235)
      Note over ScoreQuery: when exclude_detection_period_from_training = true
      ScoreQuery->>ScoreQuery: compute detection_period_start and expr
      ScoreQuery->>ScoreQuery: mark rows as should_exclude_from_training
      ScoreQuery->>ScoreQuery: aggregate training stats excluding flagged rows
    end

    ScoreQuery-->>TableMacro: return anomaly scores (includes should_exclude_from_training)
    TableMacro-->>VolumeMacro: return test results
    VolumeMacro-->>Runner: report pass/fail
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect conditional aggregation and NULL handling in macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (stddev, counts, min/max).
  • Verify GROUP BY and dynamic partitioning changes align with downstream consumers and do not break engines (especially Redshift/BigQuery).
  • Check test data construction and ClickHouse skip logic in integration_tests/tests/test_volume_anomalies.py.
  • Validate macro signature propagation and defaulting across macros/edr/tests/*.
  • Review redshift__edr_multi_value_in for correctness, SQL injection safety, and type/casting behavior.

Suggested reviewers

  • GuyEshdat

Poem

I nibble configs, ears alert and bright,
A flag to hide the noisy detection light.
Training skips the ruckus, quiet seeds take hold,
Scores hum softer, stories told.
Hop, patch, and test — a rabbit's grin of gold. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Add exclude_detection_period_from_training flag to volume_anomalies test" is directly related to the main change in the changeset. It clearly identifies the primary feature being introduced: a new optional boolean parameter added to the volume_anomalies test. While the implementation spans multiple files (macros and test files), the title appropriately focuses on the user-facing feature being exposed rather than internal implementation details. The title is concise, specific, and sufficiently clear that a teammate reviewing git history would understand the primary change without ambiguity.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch core-12-oss-anomalies-add-flag-to-exclude-test-data-from-training

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 842a255 and 961ad2a.

📒 Files selected for processing (1)
  • macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
PR: elementary-data/dbt-data-reliability#877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

Applied to files:

  • macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (fusion, bigquery) / test
🔇 Additional comments (7)
macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (7)

20-20: ✓ Correct has_seasonality flag initialization.

The flag correctly captures whether the seasonality configuration produces a computed expression or a constant, which properly gates the dynamic partition key construction.

Also applies to: 24-24, 28-28, 32-32


35-47: ✓ Well-documented dynamic partition key construction.

The comment clearly explains the Redshift limitation and the solution. By building partition keys dynamically and omitting the constant bucket_seasonality when has_seasonality = false, the code avoids the "constant expressions are not supported in partition by clauses" error while preserving intended behavior (partitioning by a constant has no effect anyway).


53-62: ✓ Detection period calculation with documented assumptions.

The calculation correctly derives detection_period_start using backfill_days to define the window of recent data tested for anomalies. Based on learnings, backfill_days is a mandatory parameter validated by validate_mandatory_configuration, so no additional validation is needed here.


175-180: ✓ Correct detection period exclusion flag logic.

The should_exclude_from_training flag correctly identifies metrics within the detection period (where bucket_end > detection_period_start_expr). These metrics are then excluded from training aggregates to prevent contamination from potentially anomalous data.


203-208: ✓ Window functions correctly filter detection period from training aggregates.

The window function calculations use case when not should_exclude_from_training then metric_value end to exclude detection-period metrics from training_avg, training_stddev, training_set_size, training_start, and training_end. The dynamic partition_by_keys ensures compatibility with the Redshift constraint introduced in lines 35–47.


211-211: ✓ Group-by count correctly updated to 14.

The change from group_by(13) to group_by(14) accounts for the addition of the should_exclude_from_training field in the time_window_aggregation CTE.


216-257: Consider exposing should_exclude_from_training in the final output.

The PR objectives mention "exposing the exclusion flag and derived field(s) in outputs," but the should_exclude_from_training flag is calculated and used in grouped_metrics and time_window_aggregation yet is not included in the final anomaly_scores select statement (lines 216–257). Verify whether this flag should be propagated to the output for debugging or downstream consumption, or if its absence is intentional.


Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration bot and others added 9 commits October 21, 2025 05:54
- Change anomalous data from 500 to 110 rows/day (10% increase instead of 5x)
- Change sensitivity from 3 to 10 to demonstrate masking effect
- Fix parameter name from anomaly_sensitivity to sensitivity
- This ensures test passes when anomaly is included in training and fails when excluded

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
- Add parameter to test_freshness_anomalies.sql and test_event_freshness_anomalies.sql
- Add test case for freshness anomalies with exclude_detection_period_from_training flag
- Add test case for event freshness anomalies with exclude_detection_period_from_training flag
- Tests validate that anomalies are missed when included in training and detected when excluded

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
- Change volume anomalies from 110 to 150 rows/day (50% increase)
- Change freshness anomalies from 11 to 15 minutes (50% slower)
- Change event freshness anomalies from 1.1 to 1.5 hours lag (50% slower)
- Reduce sensitivity from 10 to 3 for all tests
- This ensures anomalies are detected when excluded from training

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…cted config and fix date overlap

- Add exclude_detection_period_from_training: None to test_anomaly_test_configuration expected config
- Fix date overlap in test_exclude_detection_period_from_training_use_case (days 37-8 for training, days 7-1 for detection)

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
The previous implementation incorrectly filtered out detection period rows
completely, preventing them from being scored. This fix uses CASE statements
in the window functions to exclude detection period rows from training
calculations while still including them in the final output for scoring.

Changes:
- Modified time_window_aggregation CTE to use CASE statements in AVG, STDDEV,
  COUNT, LAST_VALUE, and FIRST_VALUE window functions
- Removed the 'and not is_detection_period' filter that was excluding rows
- Added is_detection_period to SELECT and GROUP BY clauses

This allows the flag to work as intended:
- WITHOUT flag: detection period included in training, anomaly absorbed
- WITH flag: detection period excluded from training, anomaly detected

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Keep only volume anomalies support for exclude_detection_period_from_training flag.
Freshness anomalies will be addressed in a separate PR.

- Revert changes to test_freshness_anomalies.sql
- Revert changes to test_event_freshness_anomalies.sql
- Remove test_exclude_detection_freshness test
- Remove test_exclude_detection_event_freshness test

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Remove test_exclude_detection_freshness and test_exclude_detection_event_freshness
test functions that were added for freshness anomalies support.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@arbiv arbiv marked this pull request as ready for review October 22, 2025 09:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 20b7637 and 8b5791d.

📒 Files selected for processing (6)
  • integration_tests/tests/test_anomaly_test_configuration.py (1 hunks)
  • integration_tests/tests/test_volume_anomalies.py (1 hunks)
  • macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (3 hunks)
  • macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql (3 hunks)
  • macros/edr/tests/test_table_anomalies.sql (2 hunks)
  • macros/edr/tests/test_volume_anomalies.sql (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integration_tests/tests/test_volume_anomalies.py (2)
integration_tests/tests/conftest.py (2)
  • test_id (180-183)
  • dbt_project (144-147)
integration_tests/tests/dbt_project.py (4)
  • DbtProject (46-334)
  • test (110-128)
  • test (131-149)
  • test (151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, bigquery) / test
🔇 Additional comments (10)
macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (3)

151-156: LGTM! is_detection_period flag correctly implements the exclusion logic.

The flag properly compares bucket_end against detection_period_start_expr when the feature is enabled and defaults to FALSE for backward compatibility.


180-184: The edge case is already properly handled.

The verification confirms that NULL training metrics are explicitly checked in the anomaly score calculations (lines 205-208, 219-225 in get_anomaly_scores_query.sql). When all rows in a partition are detection period rows, training_stddev becomes NULL and the downstream logic returns NULL for anomaly_score, min_metric_value, and max_metric_value—preventing any division errors or invalid calculations. No fixes required.


187-187: No issues found—the group_by(14) column count is correct.

The SELECT statement contains exactly 14 regular (non-window) columns: metric_id, full_table_name, column_name, dimension, dimension_value, metric_name, metric_value, source_value, bucket_start, bucket_end, bucket_seasonality, bucket_duration_hours, updated_at, and is_detection_period. The window function expressions (training_avg, training_stddev, training_set_size, training_end, training_start) are computed separately over the grouped data, which is standard SQL. The update from group_by(13) to group_by(14) correctly accounts for the new is_detection_period column.

integration_tests/tests/test_anomaly_test_configuration.py (1)

91-91: LGTM! New configuration field correctly added.

The exclude_detection_period_from_training: None field is properly added to the expected adapted configuration, maintaining consistency with the new parameter's default value.

macros/edr/tests/test_table_anomalies.sql (1)

1-1: LGTM! Parameter correctly propagated through the macro.

The exclude_detection_period_from_training parameter is properly added to the macro signature and correctly passed through to get_anomalies_test_configuration.

Also applies to: 40-41

integration_tests/tests/test_volume_anomalies.py (3)

545-581: Well-structured test data with clear separation between normal and anomalous periods.

The test correctly creates 30 days of normal data (days 37-8 with a 98/100/102 pattern) and 7 days of anomalous data (days 7-1 with 114 rows/day). This setup properly demonstrates the exclusion feature's impact on anomaly detection.


583-603: LGTM! Test correctly verifies behavior without exclusion.

The test properly validates that when exclude_detection_period_from_training is not set (defaults to False/None), the anomalous detection period data is included in the training baseline, causing the test to pass (anomaly not detected).


605-621: LGTM! Test correctly verifies behavior with exclusion enabled.

The test properly validates that when exclude_detection_period_from_training=True, the anomalous detection period data is excluded from the training baseline, causing the anomaly to be detected (test fails as expected).

macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql (1)

26-27: LGTM! Parameter correctly integrated into test configuration.

The exclude_detection_period_from_training parameter is properly:

  • Added to the macro signature
  • Retrieved using the standard get_test_argument pattern
  • Included in the test_configuration dictionary

This maintains consistency with how other configuration parameters are handled.

Also applies to: 57-57, 76-77

macros/edr/tests/test_volume_anomalies.sql (1)

1-1: LGTM! Parameter correctly propagated to the underlying test.

The exclude_detection_period_from_training parameter is properly added to the macro signature and correctly passed through to test_table_anomalies.

Also applies to: 23-24

bucket_end > {{ detection_period_start_expr }}
{% else %}
FALSE
{% endif %} as is_detection_period,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve column naming, maybe should_exclude_from_training?


{# Calculate detection period start for exclusion logic #}
{%- if test_configuration.exclude_detection_period_from_training %}
{%- set detection_period_start = (detection_end - modules.datetime.timedelta(days=test_configuration.backfill_days)) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate this calculation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1,4 +1,4 @@
{% test table_anomalies(model, table_anomalies, timestamp_column, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, time_bucket, days_back, backfill_days, seasonality, mandatory_params=none, event_timestamp_column=none, freshness_column=none, sensitivity=none, ignore_small_changes={"spike_failure_percent_threshold": none, "drop_failure_percent_threshold": none}, fail_on_zero=false, detection_delay=none, anomaly_exclude_metrics=none, detection_period=none, training_period=none) %}
{% test table_anomalies(model, table_anomalies, timestamp_column, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, time_bucket, days_back, backfill_days, seasonality, mandatory_params=none, event_timestamp_column=none, freshness_column=none, sensitivity=none, ignore_small_changes={"spike_failure_percent_threshold": none, "drop_failure_percent_threshold": none}, fail_on_zero=false, detection_delay=none, anomaly_exclude_metrics=none, detection_period=none, training_period=none, exclude_detection_period_from_training=none) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exclude_detection_period_from_training should be a boolean with default value false

devin-ai-integration bot and others added 7 commits October 26, 2025 11:16
…t default to false

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…lumn IN compatibility issue

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…ntax error

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…uery error

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…ity expression

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
macros/utils/cross_db_utils/multi_value_in.sql (1)

45-56: Consider adding an explicit delimiter to mitigate concatenation collisions.

The current concatenation approach without a delimiter could theoretically produce collisions (e.g., "1" || "23" vs "12" || "3" both yield "123"). While this mirrors the BigQuery implementation and is likely acceptable for your use case, adding a delimiter like '|' between concatenated values would improve robustness:

    concat(
        {%- for val in source_cols -%}
            {{ elementary.edr_cast_as_string(val) -}}
-            {%- if not loop.last %}, {% endif %}
+            {%- if not loop.last %}, '|', {% endif %}
        {%- endfor %}
    ) in (
        select concat({%- for val in target_cols -%}
            {{ elementary.edr_cast_as_string(val) -}}
-            {%- if not loop.last %}, {% endif %}
+            {%- if not loop.last %}, '|', {% endif %}
         {%- endfor %})
        from {{ target_table }}
    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e4f5c73 and 842a255.

📒 Files selected for processing (1)
  • macros/utils/cross_db_utils/multi_value_in.sql (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, snowflake) / test
🔇 Additional comments (2)
macros/utils/cross_db_utils/multi_value_in.sql (2)

36-57: Implementation looks good; follows established patterns and correctly addresses Redshift limitation.

The Redshift macro properly implements a tuple IN workaround using concatenation, mirroring the BigQuery approach and avoiding Redshift's "tuple IN" limitation.


36-57: No integration issues found; implementation is sound.

The Redshift edr_multi_value_in macro is invoked in get_anomaly_scores_query.sql:98 to validate that (bucket_start, bucket_end) tuples from the metrics table exist in the buckets table. The CONCAT approach correctly produces the same semantics as native tuple IN syntax.

The detection period exclusion logic is handled independently via a separate WHERE clause condition (bucket_end > detection_period_start_expr at line 171), which filters rows after bucket validation occurs. The two concerns do not interfere—the macro validates historical data availability while the exclusion clause handles temporal filtering. The Redshift CONCAT implementation matches the BigQuery approach exactly and correctly casts both sides with edr_cast_as_string for consistent scalar comparison.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@arbiv arbiv merged commit 1b034ca into master Oct 27, 2025
30 of 32 checks passed
@arbiv arbiv deleted the core-12-oss-anomalies-add-flag-to-exclude-test-data-from-training branch October 27, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants