Add exclude_detection_period_from_training flag to volume_anomalies test #877

arbiv · 2025-10-20T14:49:58Z

Add optional exclude_detection_period_from_training parameter to test_volume_anomalies.sql
Pass parameter through test_table_anomalies.sql to get_anomalies_test_configuration.sql
Add parameter handling and configuration in get_anomalies_test_configuration.sql
Implement core exclusion logic in get_anomaly_scores_query.sql:
- Calculate detection_period_start boundary based on backfill_days
- Add is_detection_period flag to grouped_metrics CTE
- Exclude detection period metrics from time_window_aggregation training
Flag defaults to false to preserve backward compatibility
Enables clean separation between training and detection data for improved anomaly detection accuracy

Summary by CodeRabbit

New Features
- Added an option to exclude the detection period from model training; anomaly scoring and training statistics now honor this setting.
- Added Redshift support for multi-column membership checks in cross-database queries.
Tests
- Added integration tests covering behavior with and without detection-period exclusion and updated test configuration to include the new option.

- Add optional exclude_detection_period_from_training parameter to test_volume_anomalies.sql - Pass parameter through test_table_anomalies.sql to get_anomalies_test_configuration.sql - Add parameter handling and configuration in get_anomalies_test_configuration.sql - Implement core exclusion logic in get_anomaly_scores_query.sql: - Calculate detection_period_start boundary based on backfill_days - Add is_detection_period flag to grouped_metrics CTE - Exclude detection period metrics from time_window_aggregation training - Flag defaults to false to preserve backward compatibility - Enables clean separation between training and detection data for improved anomaly detection accuracy

linear · 2025-10-20T14:50:03Z

CORE-12 OSS anomalies - add flag to exclude test data from training

github-actions · 2025-10-20T14:50:08Z

👋 @arbiv
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2025-10-20T15:10:26Z

Walkthrough

Adds an exclude_detection_period_from_training flag and threads it through tests, test macros, the anomaly scoring SQL, and a Redshift multi-value IN utility, allowing optional exclusion of the detection period from training aggregates and exposing the exclusion flag and derived field(s) in outputs.

Changes

Cohort / File(s)	Summary
Test expectation update `integration_tests/tests/test_anomaly_test_configuration.py`	Adds `exclude_detection_period_from_training: None` to the expected anomaly configuration mapping.
New/updated tests `integration_tests/tests/test_volume_anomalies.py`	Adds `test_exclude_detection_from_training` to exercise include/exclude scenarios (skipped on ClickHouse); toggles `exclude_detection_period_from_training` and asserts differing outcomes.
Core anomaly scoring query `macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql`	Adds `has_seasonality`/dynamic `partition_by_keys`; computes `detection_period_start[_expr]` when exclusion enabled; introduces `should_exclude_from_training`; conditions training aggregates (`training_avg`, `training_stddev`, `training_set_size`, `training_start`, `training_end`) to ignore flagged rows; updates GROUP BY and partitioning wiring and propagates the flag.
Test configuration macro `macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql`	Macro signature extended to accept `exclude_detection_period_from_training`; reads it via `get_test_argument` and includes it in the returned test configuration dictionary.
Test macros propagation `macros/edr/tests/test_table_anomalies.sql`, `macros/edr/tests/test_volume_anomalies.sql`	Macros updated to accept and forward `exclude_detection_period_from_training` (default `false`) into downstream calls (`elementary.test_table_anomalies` / `get_anomalies_test_configuration`).
Adapter utility macro addition `macros/utils/cross_db_utils/multi_value_in.sql`	Adds `redshift__edr_multi_value_in(source_cols, target_cols, target_table)` to emulate multi-column IN for Redshift by concatenating casted column values and comparing via EXISTS.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Test Runner
    participant VolumeMacro as test_volume_anomalies
    participant TableMacro as test_table_anomalies
    participant ConfigMacro as get_anomalies_test_configuration
    participant ScoreQuery as get_anomaly_scores_query

    Runner->>VolumeMacro: invoke with exclude_detection_period_from_training=(true|false)
    VolumeMacro->>TableMacro: forward exclude_detection_period_from_training
    TableMacro->>ConfigMacro: request config (includes exclusion flag)
    ConfigMacro-->>TableMacro: return config with exclude_detection_period_from_training
    TableMacro->>ScoreQuery: execute scoring with config

    rect rgb(235,245,235)
      Note over ScoreQuery: when exclude_detection_period_from_training = true
      ScoreQuery->>ScoreQuery: compute detection_period_start and expr
      ScoreQuery->>ScoreQuery: mark rows as should_exclude_from_training
      ScoreQuery->>ScoreQuery: aggregate training stats excluding flagged rows
    end

    ScoreQuery-->>TableMacro: return anomaly scores (includes should_exclude_from_training)
    TableMacro-->>VolumeMacro: return test results
    VolumeMacro-->>Runner: report pass/fail

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Inspect conditional aggregation and NULL handling in macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (stddev, counts, min/max).
Verify GROUP BY and dynamic partitioning changes align with downstream consumers and do not break engines (especially Redshift/BigQuery).
Check test data construction and ClickHouse skip logic in integration_tests/tests/test_volume_anomalies.py.
Validate macro signature propagation and defaulting across macros/edr/tests/*.
Review redshift__edr_multi_value_in for correctness, SQL injection safety, and type/casting behavior.

Suggested reviewers

GuyEshdat

Poem

I nibble configs, ears alert and bright,
A flag to hide the noisy detection light.
Training skips the ruckus, quiet seeds take hold,
Scores hum softer, stories told.
Hop, patch, and test — a rabbit's grin of gold. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Add exclude_detection_period_from_training flag to volume_anomalies test" is directly related to the main change in the changeset. It clearly identifies the primary feature being introduced: a new optional boolean parameter added to the volume_anomalies test. While the implementation spans multiple files (macros and test files), the title appropriately focuses on the user-facing feature being exposed rather than internal implementation details. The title is concise, specific, and sufficiently clear that a teammate reviewing git history would understand the primary change without ambiguity.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch core-12-oss-anomalies-add-flag-to-exclude-test-data-from-training

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 842a255 and 961ad2a.

📒 Files selected for processing (1)

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-23T11:01:31.528Z

Learnt from: arbiv
PR: elementary-data/dbt-data-reliability#877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

Applied to files:

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: test (fusion, redshift) / test
GitHub Check: test (latest_official, trino) / test
GitHub Check: test (latest_official, snowflake) / test
GitHub Check: test (latest_official, bigquery) / test
GitHub Check: test (latest_official, databricks_catalog) / test
GitHub Check: test (latest_official, clickhouse) / test
GitHub Check: test (latest_official, redshift) / test
GitHub Check: test (latest_official, dremio) / test
GitHub Check: test (latest_official, athena) / test
GitHub Check: test (fusion, databricks_catalog) / test
GitHub Check: test (fusion, snowflake) / test
GitHub Check: test (fusion, bigquery) / test

🔇 Additional comments (7)

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (7)

20-20: ✓ Correct has_seasonality flag initialization.

The flag correctly captures whether the seasonality configuration produces a computed expression or a constant, which properly gates the dynamic partition key construction.

Also applies to: 24-24, 28-28, 32-32

35-47: ✓ Well-documented dynamic partition key construction.

The comment clearly explains the Redshift limitation and the solution. By building partition keys dynamically and omitting the constant bucket_seasonality when has_seasonality = false, the code avoids the "constant expressions are not supported in partition by clauses" error while preserving intended behavior (partitioning by a constant has no effect anyway).

53-62: ✓ Detection period calculation with documented assumptions.

The calculation correctly derives detection_period_start using backfill_days to define the window of recent data tested for anomalies. Based on learnings, backfill_days is a mandatory parameter validated by validate_mandatory_configuration, so no additional validation is needed here.

175-180: ✓ Correct detection period exclusion flag logic.

The should_exclude_from_training flag correctly identifies metrics within the detection period (where bucket_end > detection_period_start_expr). These metrics are then excluded from training aggregates to prevent contamination from potentially anomalous data.

203-208: ✓ Window functions correctly filter detection period from training aggregates.

The window function calculations use case when not should_exclude_from_training then metric_value end to exclude detection-period metrics from training_avg, training_stddev, training_set_size, training_start, and training_end. The dynamic partition_by_keys ensures compatibility with the Redshift constraint introduced in lines 35–47.

211-211: ✓ Group-by count correctly updated to 14.

The change from group_by(13) to group_by(14) accounts for the addition of the should_exclude_from_training field in the time_window_aggregation CTE.

216-257: Consider exposing should_exclude_from_training in the final output.

The PR objectives mention "exposing the exclusion flag and derived field(s) in outputs," but the should_exclude_from_training flag is calculated and used in grouped_metrics and time_window_aggregation yet is not included in the final anomaly_scores select statement (lines 216–257). Verify whether this flag should be propagated to the output for debugging or downstream consumption, or if its absence is intentional.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Change anomalous data from 500 to 110 rows/day (10% increase instead of 5x) - Change sensitivity from 3 to 10 to demonstrate masking effect - Fix parameter name from anomaly_sensitivity to sensitivity - This ensures test passes when anomaly is included in training and fails when excluded Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

- Add parameter to test_freshness_anomalies.sql and test_event_freshness_anomalies.sql - Add test case for freshness anomalies with exclude_detection_period_from_training flag - Add test case for event freshness anomalies with exclude_detection_period_from_training flag - Tests validate that anomalies are missed when included in training and detected when excluded Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

- Change volume anomalies from 110 to 150 rows/day (50% increase) - Change freshness anomalies from 11 to 15 minutes (50% slower) - Change event freshness anomalies from 1.1 to 1.5 hours lag (50% slower) - Reduce sensitivity from 10 to 3 for all tests - This ensures anomalies are detected when excluded from training Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

…cted config and fix date overlap - Add exclude_detection_period_from_training: None to test_anomaly_test_configuration expected config - Fix date overlap in test_exclude_detection_period_from_training_use_case (days 37-8 for training, days 7-1 for detection) Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

The previous implementation incorrectly filtered out detection period rows completely, preventing them from being scored. This fix uses CASE statements in the window functions to exclude detection period rows from training calculations while still including them in the final output for scoring. Changes: - Modified time_window_aggregation CTE to use CASE statements in AVG, STDDEV, COUNT, LAST_VALUE, and FIRST_VALUE window functions - Removed the 'and not is_detection_period' filter that was excluding rows - Added is_detection_period to SELECT and GROUP BY clauses This allows the flag to work as intended: - WITHOUT flag: detection period included in training, anomaly absorbed - WITH flag: detection period excluded from training, anomaly detected Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Keep only volume anomalies support for exclude_detection_period_from_training flag. Freshness anomalies will be addressed in a separate PR. - Revert changes to test_freshness_anomalies.sql - Revert changes to test_event_freshness_anomalies.sql - Remove test_exclude_detection_freshness test - Remove test_exclude_detection_event_freshness test Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Remove test_exclude_detection_freshness and test_exclude_detection_event_freshness test functions that were added for freshness anomalies support. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 20b7637 and 8b5791d.

📒 Files selected for processing (6)

integration_tests/tests/test_anomaly_test_configuration.py (1 hunks)
integration_tests/tests/test_volume_anomalies.py (1 hunks)
macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (3 hunks)
macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql (3 hunks)
macros/edr/tests/test_table_anomalies.sql (2 hunks)
macros/edr/tests/test_volume_anomalies.sql (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

integration_tests/tests/test_volume_anomalies.py (2)

integration_tests/tests/conftest.py (2)

test_id (180-183)

dbt_project (144-147)

integration_tests/tests/dbt_project.py (4)

DbtProject (46-334)

test (110-128)

test (131-149)

test (151-257)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: test (fusion, redshift) / test
GitHub Check: test (fusion, snowflake) / test
GitHub Check: test (latest_pre, postgres) / test
GitHub Check: test (latest_official, databricks_catalog) / test
GitHub Check: test (latest_official, trino) / test
GitHub Check: test (latest_official, postgres) / test
GitHub Check: test (latest_official, athena) / test
GitHub Check: test (latest_official, dremio) / test
GitHub Check: test (latest_official, redshift) / test
GitHub Check: test (latest_official, snowflake) / test
GitHub Check: test (fusion, bigquery) / test
GitHub Check: test (latest_official, bigquery) / test

🔇 Additional comments (10)

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (3)

151-156: LGTM! is_detection_period flag correctly implements the exclusion logic.

The flag properly compares bucket_end against detection_period_start_expr when the feature is enabled and defaults to FALSE for backward compatibility.

180-184: The edge case is already properly handled.

The verification confirms that NULL training metrics are explicitly checked in the anomaly score calculations (lines 205-208, 219-225 in get_anomaly_scores_query.sql). When all rows in a partition are detection period rows, training_stddev becomes NULL and the downstream logic returns NULL for anomaly_score, min_metric_value, and max_metric_value—preventing any division errors or invalid calculations. No fixes required.

187-187: No issues found—the group_by(14) column count is correct.

The SELECT statement contains exactly 14 regular (non-window) columns: metric_id, full_table_name, column_name, dimension, dimension_value, metric_name, metric_value, source_value, bucket_start, bucket_end, bucket_seasonality, bucket_duration_hours, updated_at, and is_detection_period. The window function expressions (training_avg, training_stddev, training_set_size, training_end, training_start) are computed separately over the grouped data, which is standard SQL. The update from group_by(13) to group_by(14) correctly accounts for the new is_detection_period column.

integration_tests/tests/test_anomaly_test_configuration.py (1)

91-91: LGTM! New configuration field correctly added.

The exclude_detection_period_from_training: None field is properly added to the expected adapted configuration, maintaining consistency with the new parameter's default value.

macros/edr/tests/test_table_anomalies.sql (1)

1-1: LGTM! Parameter correctly propagated through the macro.

The exclude_detection_period_from_training parameter is properly added to the macro signature and correctly passed through to get_anomalies_test_configuration.

Also applies to: 40-41

integration_tests/tests/test_volume_anomalies.py (3)

545-581: Well-structured test data with clear separation between normal and anomalous periods.

The test correctly creates 30 days of normal data (days 37-8 with a 98/100/102 pattern) and 7 days of anomalous data (days 7-1 with 114 rows/day). This setup properly demonstrates the exclusion feature's impact on anomaly detection.

583-603: LGTM! Test correctly verifies behavior without exclusion.

The test properly validates that when exclude_detection_period_from_training is not set (defaults to False/None), the anomalous detection period data is included in the training baseline, causing the test to pass (anomaly not detected).

605-621: LGTM! Test correctly verifies behavior with exclusion enabled.

The test properly validates that when exclude_detection_period_from_training=True, the anomalous detection period data is excluded from the training baseline, causing the anomaly to be detected (test fails as expected).

macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql (1)

26-27: LGTM! Parameter correctly integrated into test configuration.

The exclude_detection_period_from_training parameter is properly:

Added to the macro signature

Retrieved using the standard get_test_argument pattern

Included in the test_configuration dictionary

This maintains consistency with how other configuration parameters are handled.

Also applies to: 57-57, 76-77

macros/edr/tests/test_volume_anomalies.sql (1)

1-1: LGTM! Parameter correctly propagated to the underlying test.

The exclude_detection_period_from_training parameter is properly added to the macro signature and correctly passed through to test_table_anomalies.

Also applies to: 23-24

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql

GuyEshdat · 2025-10-23T14:50:13Z

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql

+                    bucket_end > {{ detection_period_start_expr }}
+                {% else %}
+                    FALSE
+                {% endif %} as is_detection_period,


improve column naming, maybe should_exclude_from_training?

GuyEshdat · 2025-10-23T14:50:26Z

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql


+    {# Calculate detection period start for exclusion logic #}
+    {%- if test_configuration.exclude_detection_period_from_training %}
+        {%- set detection_period_start = (detection_end - modules.datetime.timedelta(days=test_configuration.backfill_days)) %}


validate this calculation

backfill days is the days to detect on, so the calculation is correct
see https://github.com/elementary-data/dbt-data-reliability/blob/79d8d7fd64bc0b8a0e329c69a968ba0a22dda9e0/macros/edr/tests/test_configuration/get_period_vars.sql
and elementary-data/elementary#1548 (comment)

GuyEshdat · 2025-10-23T14:51:20Z

macros/edr/tests/test_table_anomalies.sql

@@ -1,4 +1,4 @@
-{% test table_anomalies(model, table_anomalies, timestamp_column, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, time_bucket, days_back, backfill_days, seasonality, mandatory_params=none, event_timestamp_column=none, freshness_column=none, sensitivity=none, ignore_small_changes={"spike_failure_percent_threshold": none, "drop_failure_percent_threshold": none}, fail_on_zero=false, detection_delay=none, anomaly_exclude_metrics=none, detection_period=none, training_period=none) %}
+{% test table_anomalies(model, table_anomalies, timestamp_column, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, time_bucket, days_back, backfill_days, seasonality, mandatory_params=none, event_timestamp_column=none, freshness_column=none, sensitivity=none, ignore_small_changes={"spike_failure_percent_threshold": none, "drop_failure_percent_threshold": none}, fail_on_zero=false, detection_delay=none, anomaly_exclude_metrics=none, detection_period=none, training_period=none, exclude_detection_period_from_training=none) %}


exclude_detection_period_from_training should be a boolean with default value false

…t default to false Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

…lumn IN compatibility issue Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

…ntax error Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

…uery error Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

…ity expression Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

macros/utils/cross_db_utils/multi_value_in.sql (1)
45-56: Consider adding an explicit delimiter to mitigate concatenation collisions.

The current concatenation approach without a delimiter could theoretically produce collisions (e.g., "1" || "23" vs "12" || "3" both yield "123"). While this mirrors the BigQuery implementation and is likely acceptable for your use case, adding a delimiter like '|' between concatenated values would improve robustness:
    concat(
        {%- for val in source_cols -%}
            {{ elementary.edr_cast_as_string(val) -}}
-            {%- if not loop.last %}, {% endif %}
+            {%- if not loop.last %}, '|', {% endif %}
        {%- endfor %}
    ) in (
        select concat({%- for val in target_cols -%}
            {{ elementary.edr_cast_as_string(val) -}}
-            {%- if not loop.last %}, {% endif %}
+            {%- if not loop.last %}, '|', {% endif %}
         {%- endfor %})
        from {{ target_table }}
    )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e4f5c73 and 842a255.

📒 Files selected for processing (1)

macros/utils/cross_db_utils/multi_value_in.sql (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: test (latest_official, bigquery) / test
GitHub Check: test (fusion, redshift) / test
GitHub Check: test (latest_official, trino) / test
GitHub Check: test (fusion, bigquery) / test
GitHub Check: test (latest_official, dremio) / test
GitHub Check: test (latest_official, clickhouse) / test
GitHub Check: test (fusion, databricks_catalog) / test
GitHub Check: test (fusion, snowflake) / test
GitHub Check: test (latest_official, databricks_catalog) / test
GitHub Check: test (latest_official, athena) / test
GitHub Check: test (latest_official, redshift) / test
GitHub Check: test (latest_official, snowflake) / test

🔇 Additional comments (2)

macros/utils/cross_db_utils/multi_value_in.sql (2)

36-57: Implementation looks good; follows established patterns and correctly addresses Redshift limitation.

The Redshift macro properly implements a tuple IN workaround using concatenation, mirroring the BigQuery approach and avoiding Redshift's "tuple IN" limitation.

36-57: No integration issues found; implementation is sound.

The Redshift edr_multi_value_in macro is invoked in get_anomaly_scores_query.sql:98 to validate that (bucket_start, bucket_end) tuples from the metrics table exist in the buckets table. The CONCAT approach correctly produces the same semantics as native tuple IN syntax.

The detection period exclusion logic is handled independently via a separate WHERE clause condition (bucket_end > detection_period_start_expr at line 171), which filters rows after bucket validation occurs. The two concerns do not interfere—the macro validates historical data availability while the exclusion clause handles temporal filtering. The Redshift CONCAT implementation matches the BigQuery approach exactly and correctly casts both sides with edr_cast_as_string for consistent scalar comparison.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

devin-ai-integration bot and others added 9 commits October 21, 2025 05:54

Remove freshness test functions from test files

3b9ae55

Remove test_exclude_detection_freshness and test_exclude_detection_event_freshness test functions that were added for freshness anomalies support. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Remove package-lock.yml from PR

40a5f36

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Remove 'NEW:' references and clean up obvious comments

8b5791d

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

arbiv marked this pull request as ready for review October 22, 2025 09:34

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql Outdated Show resolved Hide resolved

GuyEshdat reviewed Oct 23, 2025

View reviewed changes

devin-ai-integration bot and others added 7 commits October 26, 2025 11:16

Address PR review comments: rename column, add validation comment, se…

57e7b7c

…t default to false Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Improve comment: clarify backfill_days definition and configuration

9b9f435

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Add Redshift-specific override for edr_multi_value_in to fix multi-co…

e2a11d9

…lumn IN compatibility issue Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Fix whitespace handling in redshift__edr_multi_value_in to prevent sy…

ecc6a80

…ntax error Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Replace Redshift EXISTS with CONCAT approach to avoid correlated subq…

9ddf019

…uery error Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Fix Redshift PARTITION BY error by excluding constant bucket_seasonal…

e4f5c73

…ity expression Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Add explanatory comment to redshift__edr_multi_value_in macro

842a255

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

Add explanatory comment for partition_by_keys dynamic PARTITION BY logic

961ad2a

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

GuyEshdat approved these changes Oct 27, 2025

View reviewed changes

arbiv merged commit 1b034ca into master Oct 27, 2025
30 of 32 checks passed

arbiv deleted the core-12-oss-anomalies-add-flag-to-exclude-test-data-from-training branch October 27, 2025 12:39

This was referenced Nov 10, 2025

Add exclude_detection_period_from_training flag to freshness anomaly tests #885

Merged

Add exclude_detection_period_from_training flag to event freshness anomaly tests #888

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add exclude_detection_period_from_training flag to volume_anomalies test #877

Add exclude_detection_period_from_training flag to volume_anomalies test #877

Uh oh!

arbiv commented Oct 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

linear bot commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

coderabbitai bot commented Oct 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

GuyEshdat Oct 23, 2025

Uh oh!

GuyEshdat Oct 23, 2025

Uh oh!

arbiv Oct 23, 2025

Uh oh!

GuyEshdat Oct 23, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1,4 +1,4 @@
		{% test table_anomalies(model, table_anomalies, timestamp_column, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, time_bucket, days_back, backfill_days, seasonality, mandatory_params=none, event_timestamp_column=none, freshness_column=none, sensitivity=none, ignore_small_changes={"spike_failure_percent_threshold": none, "drop_failure_percent_threshold": none}, fail_on_zero=false, detection_delay=none, anomaly_exclude_metrics=none, detection_period=none, training_period=none) %}
		{% test table_anomalies(model, table_anomalies, timestamp_column, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, time_bucket, days_back, backfill_days, seasonality, mandatory_params=none, event_timestamp_column=none, freshness_column=none, sensitivity=none, ignore_small_changes={"spike_failure_percent_threshold": none, "drop_failure_percent_threshold": none}, fail_on_zero=false, detection_delay=none, anomaly_exclude_metrics=none, detection_period=none, training_period=none, exclude_detection_period_from_training=none) %}

Add exclude_detection_period_from_training flag to volume_anomalies test #877

Add exclude_detection_period_from_training flag to volume_anomalies test #877

Uh oh!

Conversation

arbiv commented Oct 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

linear bot commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

coderabbitai bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GuyEshdat Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

GuyEshdat Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

arbiv Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

GuyEshdat Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arbiv commented Oct 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 20, 2025 •

edited

Loading