-
Notifications
You must be signed in to change notification settings - Fork 121
Add exclude_detection_period_from_training flag to volume_anomalies test #877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exclude_detection_period_from_training flag to volume_anomalies test #877
Conversation
- Add optional exclude_detection_period_from_training parameter to test_volume_anomalies.sql - Pass parameter through test_table_anomalies.sql to get_anomalies_test_configuration.sql - Add parameter handling and configuration in get_anomalies_test_configuration.sql - Implement core exclusion logic in get_anomaly_scores_query.sql: - Calculate detection_period_start boundary based on backfill_days - Add is_detection_period flag to grouped_metrics CTE - Exclude detection period metrics from time_window_aggregation training - Flag defaults to false to preserve backward compatibility - Enables clean separation between training and detection data for improved anomaly detection accuracy
|
👋 @arbiv |
WalkthroughAdds an Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Test Runner
participant VolumeMacro as test_volume_anomalies
participant TableMacro as test_table_anomalies
participant ConfigMacro as get_anomalies_test_configuration
participant ScoreQuery as get_anomaly_scores_query
Runner->>VolumeMacro: invoke with exclude_detection_period_from_training=(true|false)
VolumeMacro->>TableMacro: forward exclude_detection_period_from_training
TableMacro->>ConfigMacro: request config (includes exclusion flag)
ConfigMacro-->>TableMacro: return config with exclude_detection_period_from_training
TableMacro->>ScoreQuery: execute scoring with config
rect rgb(235,245,235)
Note over ScoreQuery: when exclude_detection_period_from_training = true
ScoreQuery->>ScoreQuery: compute detection_period_start and expr
ScoreQuery->>ScoreQuery: mark rows as should_exclude_from_training
ScoreQuery->>ScoreQuery: aggregate training stats excluding flagged rows
end
ScoreQuery-->>TableMacro: return anomaly scores (includes should_exclude_from_training)
TableMacro-->>VolumeMacro: return test results
VolumeMacro-->>Runner: report pass/fail
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-10-23T11:01:31.528ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
🔇 Additional comments (7)
Comment |
- Change anomalous data from 500 to 110 rows/day (10% increase instead of 5x) - Change sensitivity from 3 to 10 to demonstrate masking effect - Fix parameter name from anomaly_sensitivity to sensitivity - This ensures test passes when anomaly is included in training and fails when excluded Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
- Add parameter to test_freshness_anomalies.sql and test_event_freshness_anomalies.sql - Add test case for freshness anomalies with exclude_detection_period_from_training flag - Add test case for event freshness anomalies with exclude_detection_period_from_training flag - Tests validate that anomalies are missed when included in training and detected when excluded Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
- Change volume anomalies from 110 to 150 rows/day (50% increase) - Change freshness anomalies from 11 to 15 minutes (50% slower) - Change event freshness anomalies from 1.1 to 1.5 hours lag (50% slower) - Reduce sensitivity from 10 to 3 for all tests - This ensures anomalies are detected when excluded from training Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…cted config and fix date overlap - Add exclude_detection_period_from_training: None to test_anomaly_test_configuration expected config - Fix date overlap in test_exclude_detection_period_from_training_use_case (days 37-8 for training, days 7-1 for detection) Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
The previous implementation incorrectly filtered out detection period rows completely, preventing them from being scored. This fix uses CASE statements in the window functions to exclude detection period rows from training calculations while still including them in the final output for scoring. Changes: - Modified time_window_aggregation CTE to use CASE statements in AVG, STDDEV, COUNT, LAST_VALUE, and FIRST_VALUE window functions - Removed the 'and not is_detection_period' filter that was excluding rows - Added is_detection_period to SELECT and GROUP BY clauses This allows the flag to work as intended: - WITHOUT flag: detection period included in training, anomaly absorbed - WITH flag: detection period excluded from training, anomaly detected Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Keep only volume anomalies support for exclude_detection_period_from_training flag. Freshness anomalies will be addressed in a separate PR. - Revert changes to test_freshness_anomalies.sql - Revert changes to test_event_freshness_anomalies.sql - Remove test_exclude_detection_freshness test - Remove test_exclude_detection_event_freshness test Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Remove test_exclude_detection_freshness and test_exclude_detection_event_freshness test functions that were added for freshness anomalies support. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
integration_tests/tests/test_anomaly_test_configuration.py(1 hunks)integration_tests/tests/test_volume_anomalies.py(1 hunks)macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql(3 hunks)macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql(3 hunks)macros/edr/tests/test_table_anomalies.sql(2 hunks)macros/edr/tests/test_volume_anomalies.sql(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integration_tests/tests/test_volume_anomalies.py (2)
integration_tests/tests/conftest.py (2)
test_id(180-183)dbt_project(144-147)integration_tests/tests/dbt_project.py (4)
DbtProject(46-334)test(110-128)test(131-149)test(151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, bigquery) / test
🔇 Additional comments (10)
macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql (3)
151-156: LGTM! is_detection_period flag correctly implements the exclusion logic.The flag properly compares
bucket_endagainstdetection_period_start_exprwhen the feature is enabled and defaults to FALSE for backward compatibility.
180-184: The edge case is already properly handled.The verification confirms that NULL training metrics are explicitly checked in the anomaly score calculations (lines 205-208, 219-225 in
get_anomaly_scores_query.sql). When all rows in a partition are detection period rows,training_stddevbecomes NULL and the downstream logic returns NULL foranomaly_score,min_metric_value, andmax_metric_value—preventing any division errors or invalid calculations. No fixes required.
187-187: No issues found—the group_by(14) column count is correct.The SELECT statement contains exactly 14 regular (non-window) columns: metric_id, full_table_name, column_name, dimension, dimension_value, metric_name, metric_value, source_value, bucket_start, bucket_end, bucket_seasonality, bucket_duration_hours, updated_at, and is_detection_period. The window function expressions (training_avg, training_stddev, training_set_size, training_end, training_start) are computed separately over the grouped data, which is standard SQL. The update from
group_by(13)togroup_by(14)correctly accounts for the new is_detection_period column.integration_tests/tests/test_anomaly_test_configuration.py (1)
91-91: LGTM! New configuration field correctly added.The
exclude_detection_period_from_training: Nonefield is properly added to the expected adapted configuration, maintaining consistency with the new parameter's default value.macros/edr/tests/test_table_anomalies.sql (1)
1-1: LGTM! Parameter correctly propagated through the macro.The
exclude_detection_period_from_trainingparameter is properly added to the macro signature and correctly passed through toget_anomalies_test_configuration.Also applies to: 40-41
integration_tests/tests/test_volume_anomalies.py (3)
545-581: Well-structured test data with clear separation between normal and anomalous periods.The test correctly creates 30 days of normal data (days 37-8 with a 98/100/102 pattern) and 7 days of anomalous data (days 7-1 with 114 rows/day). This setup properly demonstrates the exclusion feature's impact on anomaly detection.
583-603: LGTM! Test correctly verifies behavior without exclusion.The test properly validates that when
exclude_detection_period_from_trainingis not set (defaults to False/None), the anomalous detection period data is included in the training baseline, causing the test to pass (anomaly not detected).
605-621: LGTM! Test correctly verifies behavior with exclusion enabled.The test properly validates that when
exclude_detection_period_from_training=True, the anomalous detection period data is excluded from the training baseline, causing the anomaly to be detected (test fails as expected).macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql (1)
26-27: LGTM! Parameter correctly integrated into test configuration.The
exclude_detection_period_from_trainingparameter is properly:
- Added to the macro signature
- Retrieved using the standard
get_test_argumentpattern- Included in the
test_configurationdictionaryThis maintains consistency with how other configuration parameters are handled.
Also applies to: 57-57, 76-77
macros/edr/tests/test_volume_anomalies.sql (1)
1-1: LGTM! Parameter correctly propagated to the underlying test.The
exclude_detection_period_from_trainingparameter is properly added to the macro signature and correctly passed through totest_table_anomalies.Also applies to: 23-24
macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql
Outdated
Show resolved
Hide resolved
| bucket_end > {{ detection_period_start_expr }} | ||
| {% else %} | ||
| FALSE | ||
| {% endif %} as is_detection_period, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
improve column naming, maybe should_exclude_from_training?
|
|
||
| {# Calculate detection period start for exclusion logic #} | ||
| {%- if test_configuration.exclude_detection_period_from_training %} | ||
| {%- set detection_period_start = (detection_end - modules.datetime.timedelta(days=test_configuration.backfill_days)) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate this calculation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backfill days is the days to detect on, so the calculation is correct
see https://github.com/elementary-data/dbt-data-reliability/blob/79d8d7fd64bc0b8a0e329c69a968ba0a22dda9e0/macros/edr/tests/test_configuration/get_period_vars.sql
and elementary-data/elementary#1548 (comment)
| @@ -1,4 +1,4 @@ | |||
| {% test table_anomalies(model, table_anomalies, timestamp_column, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, time_bucket, days_back, backfill_days, seasonality, mandatory_params=none, event_timestamp_column=none, freshness_column=none, sensitivity=none, ignore_small_changes={"spike_failure_percent_threshold": none, "drop_failure_percent_threshold": none}, fail_on_zero=false, detection_delay=none, anomaly_exclude_metrics=none, detection_period=none, training_period=none) %} | |||
| {% test table_anomalies(model, table_anomalies, timestamp_column, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, time_bucket, days_back, backfill_days, seasonality, mandatory_params=none, event_timestamp_column=none, freshness_column=none, sensitivity=none, ignore_small_changes={"spike_failure_percent_threshold": none, "drop_failure_percent_threshold": none}, fail_on_zero=false, detection_delay=none, anomaly_exclude_metrics=none, detection_period=none, training_period=none, exclude_detection_period_from_training=none) %} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exclude_detection_period_from_training should be a boolean with default value false
…t default to false Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…lumn IN compatibility issue Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…ntax error Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…uery error Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…ity expression Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
macros/utils/cross_db_utils/multi_value_in.sql (1)
45-56: Consider adding an explicit delimiter to mitigate concatenation collisions.The current concatenation approach without a delimiter could theoretically produce collisions (e.g.,
"1" || "23"vs"12" || "3"both yield"123"). While this mirrors the BigQuery implementation and is likely acceptable for your use case, adding a delimiter like'|'between concatenated values would improve robustness:concat( {%- for val in source_cols -%} {{ elementary.edr_cast_as_string(val) -}} - {%- if not loop.last %}, {% endif %} + {%- if not loop.last %}, '|', {% endif %} {%- endfor %} ) in ( select concat({%- for val in target_cols -%} {{ elementary.edr_cast_as_string(val) -}} - {%- if not loop.last %}, {% endif %} + {%- if not loop.last %}, '|', {% endif %} {%- endfor %}) from {{ target_table }} )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
macros/utils/cross_db_utils/multi_value_in.sql(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, snowflake) / test
🔇 Additional comments (2)
macros/utils/cross_db_utils/multi_value_in.sql (2)
36-57: Implementation looks good; follows established patterns and correctly addresses Redshift limitation.The Redshift macro properly implements a tuple IN workaround using concatenation, mirroring the BigQuery approach and avoiding Redshift's "tuple IN" limitation.
36-57: No integration issues found; implementation is sound.The Redshift
edr_multi_value_inmacro is invoked inget_anomaly_scores_query.sql:98to validate that (bucket_start, bucket_end) tuples from the metrics table exist in the buckets table. The CONCAT approach correctly produces the same semantics as native tuple IN syntax.The detection period exclusion logic is handled independently via a separate WHERE clause condition (
bucket_end > detection_period_start_exprat line 171), which filters rows after bucket validation occurs. The two concerns do not interfere—the macro validates historical data availability while the exclusion clause handles temporal filtering. The Redshift CONCAT implementation matches the BigQuery approach exactly and correctly casts both sides withedr_cast_as_stringfor consistent scalar comparison.
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Summary by CodeRabbit
New Features
Tests