-
Notifications
You must be signed in to change notification settings - Fork 121
Add exclude_detection_period_from_training flag to freshness anomaly tests #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…maly tests - Added exclude_detection_period_from_training parameter to test_freshness_anomalies - Added exclude_detection_period_from_training parameter to test_event_freshness_anomalies - Both parameters default to false for backward compatibility - This allows users to exclude the detection period from training data when detecting freshness anomalies Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
WalkthroughTwo EDR freshness test macros add an optional parameter Changes
Sequence Diagram(s)sequenceDiagram
participant Tester
participant PyTest
participant EDR_Macro as "EDR Macro\n(freshness/event_freshness_anomalies)"
participant Elementary as "elementary.test_table_anomalies"
participant Trainer as "Training Logic"
Tester->>PyTest: run test_exclude_detection_from_training
PyTest->>EDR_Macro: invoke macro(..., exclude_detection_period_from_training=false)
EDR_Macro->>Elementary: test_table_anomalies(..., exclude_detection_period_from_training=false)
Elementary->>Trainer: compute training_period (detection included)
Trainer-->>Elementary: return model/thresholds
Elementary-->>EDR_Macro: return test result (expected pass)
EDR_Macro-->>PyTest: assert pass
PyTest->>EDR_Macro: invoke macro(..., exclude_detection_period_from_training=true)
EDR_Macro->>Elementary: test_table_anomalies(..., exclude_detection_period_from_training=true)
Elementary->>Trainer: compute training_period (detection excluded)
Trainer-->>Elementary: return model/thresholds
Elementary-->>EDR_Macro: return test result (expected fail)
EDR_Macro-->>PyTest: assert fail
PyTest-->>Tester: report outcomes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (2)📓 Common learnings📚 Learning: 2025-10-23T11:01:31.528ZApplied to files:
🧬 Code graph analysis (1)integration_tests/tests/test_freshness_anomalies.py (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
🔇 Additional comments (5)
Comment |
|
👋 @devin-ai-integration[bot] |
…freshness anomalies - Test validates that without the flag, anomalies in detection period are included in training (test passes, anomaly missed) - Test validates that with the flag, anomalies in detection period are excluded from training (test fails, anomaly detected) - Follows same pattern as test_volume_anomalies.py::test_exclude_detection_from_training Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (3)
integration_tests/tests/test_volume_anomalies.py (1)
test_exclude_detection_from_training(545-621)integration_tests/tests/dbt_project.py (1)
DbtProject(46-334)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, athena) / test
🔇 Additional comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
238-311: Verify the test behaves as intended after fixing the data ranges.Once the data generation is corrected, confirm:
- The anomalous update frequency (8-hour intervals vs. 2-hour baseline) is detected when exclusion is enabled
- The
sensitivity: 5value is appropriate for this scenario- The test produces the expected pass/fail outcomes
After applying the data-range fix, run this test locally or in CI to ensure it passes and correctly exercises the
exclude_detection_period_from_trainingflag.
- Reduced training period from 30 to 14 days to amplify effect - Changed anomalous data from every 8 hours to every 24 hours (only 1 update per day at midnight) - This creates freshness of ~24 hours vs ~2 hours in normal data - Reduced sensitivity from 5 to 3 (less strict) - Aligned time to midnight and added custom_run_started_at for consistent bucket boundaries across warehouses - This should make the anomaly detectable when excluded from training Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
251-269: Critical: Data generation still produces wrong time ranges.This issue was already flagged in a previous review but hasn't been fixed. The
generate_datesfunction generates dates backward frombase_datefordays_backdays. With the current parameters:
- Normal data (lines 252-259): generates from day -21 back to day -35 (
utc_now - 21 daysminus 14 days)- Anomalous data (lines 262-269): generates from day -7 back to day -14 (
utc_now - 7 daysminus 7 days)These ranges don't align with the test's intent. For a 14-day training period and 7-day detection period, you need:
- Detection period (most recent 7 days): days 0 to -7
- Training period (14 days before detection): days -7 to -21
Apply this diff to fix the date ranges:
- # Generate 14 days of normal data with frequent updates (every 2 hours) + # Generate 14 days of normal data (training period: days -21 to -7) normal_data = [ {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} for date in generate_dates( - base_date=utc_now - timedelta(days=21), + base_date=utc_now - timedelta(days=7), step=timedelta(hours=2), days_back=14, ) ] - # Generate 7 days of anomalous data (only 1 update per day at midnight) + # Generate 7 days of anomalous data (detection period: days 0 to -7) anomalous_data = [ {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} for date in generate_dates( - base_date=utc_now - timedelta(days=7), + base_date=utc_now, step=timedelta(hours=24), days_back=7, ) ]
🧹 Nitpick comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
249-249: Usedatetime.now(timezone.utc)instead of deprecatedutcnow().The
datetime.utcnow()method is deprecated in Python 3.12+ and will be removed in future versions.Apply this diff:
- utc_now = datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0) + from datetime import timezone + utc_now = datetime.now(timezone.utc).replace(hour=0, minute=0, second=0, microsecond=0)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (2)
integration_tests/tests/dbt_project.py (1)
DbtProject(46-334)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, postgres) / test
🔇 Additional comments (2)
integration_tests/tests/test_freshness_anomalies.py (2)
238-248: LGTM: Clear test structure and documentation.The test function is properly decorated to skip ClickHouse targets, and the docstring clearly explains the test scenario and expected outcomes.
273-313: Parameter verification successful — test configuration is correct.The
training_periodanddetection_periodparameters are confirmed as valid parameters in thefreshness_anomaliestest macro signature. The test correctly uses these parameters and the test logic is sound:
- Without exclusion: Anomalous pattern (1 update/day) included in training becomes baseline normal → test passes ✓
- With exclusion: Anomalous pattern excluded from training, baseline expects frequent updates (12/day) → test fails when observing 1 update/day ✓
The test properly isolates scenarios using distinct test IDs and assertions.
The integration test harness wasn't passing exclude_detection_period_from_training through to the test YAML for freshness tests. This fix passes the parameter via test_vars instead, which works with the config precedence system (test > model > vars). This ensures the flag is properly applied during the 'with_exclusion' test run. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (3)
integration_tests/tests/test_volume_anomalies.py (1)
test_exclude_detection_from_training(545-621)integration_tests/tests/dbt_project.py (1)
DbtProject(46-334)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (latest_official, athena) / test
🔇 Additional comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
238-311: Good test structure, but verify after fixing date ranges.The test logic and structure are sound:
- Clear two-scenario approach (with/without exclusion)
- Proper use of
test_varsto toggle the flag- Descriptive assertions with failure messages
- Appropriate difference in update frequencies (2h vs 24h) to create detectable anomaly
Once the date range issue is fixed, ensure the test produces the expected behavior (pass without exclusion, fail with exclusion).
Fixed the integration test to pass exclude_detection_period_from_training via test_args (matching the volume anomalies pattern) instead of test_vars. The parameter needs to be in test_args so it gets included in the test YAML and can be retrieved by get_test_argument() in the macro. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
- Lower sensitivity from 3 to 2 to ensure detection - Add min_training_set_size: 5 to ensure enough training data after exclusion - Add anomaly_direction: spike to explicitly detect higher freshness as anomalous - Remove custom_run_started_at to avoid cross-adapter boundary issues Investigation confirmed that exclude_detection_period_from_training parameter is reaching the macros correctly (verified in CI logs). The issue was that the test scenario needed proper configuration to detect the anomaly. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
- Add days_back: 30 to cover entire training + detection span - Add ignore_small_changes thresholds set to 0 to remove percent-based gating - Re-add custom_run_started_at with proper alignment to midnight (detection_end) - Lower sensitivity to 1 for with_exclusion test to force detection and validate wiring These changes address potential issues with: - Missing training buckets due to insufficient days_back - Percent-based thresholds preventing anomaly detection - Cross-adapter time boundary alignment issues - Sensitivity threshold preventing detection of the anomaly Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Key fixes based on investigation: - Add backfill_days: 7 to align exclusion window with detection period (critical for exclude logic) - Weaken anomaly from 24h to 12h freshness (noon updates instead of midnight) - Keep same sensitivity (2) for both runs to satisfy 'same values' requirement - Remove different sensitivity for with_exclusion run This should result in: - Without exclusion: PASS (anomaly diluted into training; z ≈ 1.4 < 2) - With exclusion: FAIL (anomaly stands out; z >> 2) Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
252-269: Critical: Date ranges are misaligned with test configuration.The past review comment correctly identifies that these date ranges produce data in the wrong time windows:
- Normal data (lines 252-259): generates days -21 to -35
- Anomalous data (lines 262-269): generates days -7 to -14
With
detection_end = utc_now + timedelta(days=1)and the configured training/detection periods, the test will look for data in days +1 to -6 (detection) and days -6 to -20 (training), but the actual data is in days -7 to -35. This misalignment means the detection period contains no data, and the test cannot work as intended.Apply the fix suggested in the past review:
# Generate 14 days of normal data with frequent updates (every 2 hours) normal_data = [ {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} for date in generate_dates( - base_date=utc_now - timedelta(days=21), + base_date=utc_now - timedelta(days=7), step=timedelta(hours=2), days_back=14, ) ] - # Generate 7 days of anomalous data (only 1 update per day at noon) + # Generate 7 days of anomalous data (only 1 update per day) anomalous_data = [ {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} for date in generate_dates( - base_date=(utc_now - timedelta(days=7)).replace(hour=12, minute=0), + base_date=utc_now.replace(hour=12, minute=0), step=timedelta(hours=24), days_back=7, ) ]This positions:
- Normal data (training): days -7 to -21
- Anomalous data (detection): days 0 to -7
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (2)
integration_tests/tests/dbt_project.py (1)
DbtProject(46-334)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, redshift) / test
🔇 Additional comments (2)
integration_tests/tests/test_freshness_anomalies.py (2)
238-248: Test structure and documentation look good.The test function is properly decorated and documented. The scenario description clearly explains the intended behavior: anomalous data should be missed when included in training (pass) but detected when excluded from training (fail).
292-322: Test invocations and assertions are well-structured.Both test invocations properly:
- Use unique test IDs to avoid collisions
- Pass test configuration and data correctly
- Set
custom_run_started_atviatest_vars- Assert expected outcomes with explanatory messages
The logic is sound: without exclusion, the anomaly is included in training and should be normalized (pass); with exclusion, only clean training data is used, so the anomaly should be detected (fail).
After applying the date range fixes, please verify:
- With
exclude_detection_period_from_training=False, the test passes (anomaly missed because included in baseline)- With
exclude_detection_period_from_training=True, the test fails (anomaly detected against clean baseline)- The sensitivity setting (2) appropriately distinguishes between the two scenarios
The test's success depends on the anomalous data being significant enough to dilute the baseline when included, but not so strong that it's detected even when diluted. If the test is flaky, consider adjusting sensitivity or the contrast between normal (2-hour intervals) and anomalous (24-hour intervals) data.
- Changed training period from 14 to 30 days to overlap with 7-day detection period - Changed normal data from 14 to 30 days (days 37-7 ago) - Changed days_back from 30 to 40 to cover entire data range - Changed sensitivity from 2 to 3 to allow anomaly absorption when included in training - Set backfill_days to 0 to avoid independent exclusion of recent data - This matches the pattern used in test_volume_anomalies.py::test_exclude_detection_from_training Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
- Change detection_end from utc_now + 1 day to utc_now - Adjust normal data generation from days -67 to -37 → days -37 to -7 - Adjust anomalous data generation from days -14 to -7 → days -7 to -1 - Update test documentation to reflect the new timeline This makes the test more intuitive by ensuring anomalous events fall directly within the detection period (days -7 to -1) rather than relying on the propagation effect of freshness metrics. Detection period now clearly covers days -7 to -1, and anomalous events occur at noon on those same days, making the test behavior more transparent and maintainable. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
238-324: Well-structured test for validating the exclusion flag.The test correctly validates the
exclude_detection_period_from_trainingflag by demonstrating that:
- Without exclusion: anomalous data becomes part of the training baseline → test passes (anomaly missed)
- With exclusion: anomalous data excluded from training → test fails (anomaly detected)
The data generation creates a clear 12× difference in update frequency (2-hour vs 24-hour intervals), which should be reliably detectable.
Minor suggestion: Consider clarifying the comments (lines 244-249) to be more precise:
- Line 245: Note that anomalous data points occur at noon (12:00) each day
- Line 246: Detection period is days -7 to 0 (though data exists only to day -1 at 12:00)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (2)
integration_tests/tests/dbt_project.py (1)
DbtProject(46-334)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, databricks_catalog) / check-if-requires-approval
The test was failing because with a 30-day training period, the first detection bucket was evaluated against a baseline dominated by normal data, causing it to fail even without exclusion. Changes: - Reduce training_period from 30 days to 7 days - Move normal data to days -14 to -8 (before detection period) - Keep anomalous data at days -7 to -1 (in detection period) - Reduce days_back from 40 to 20 This makes the pass/fail behavior hinge purely on the exclude flag: - Without exclusion: training = detection window (anomalous) → PASS - With exclusion: training = days -14 to -8 (normal) → FAIL Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
240-250: Clarify docstring to distinguish time windows from data coverage.The docstring describes date ranges for data coverage but could be misinterpreted as the exact period boundaries. The detection period technically extends to day 0 (since
custom_run_started_at = utc_now), and the training period (when excluded) is days -14 to -7, not days -14 to -8.Consider updating the docstring to be more precise:
""" Test the exclude_detection_period_from_training flag functionality for freshness anomalies. Scenario: - - 7 days of normal data with frequent updates (every 2 hours) from day -14 to day -8 - - 7 days of anomalous data (only 1 update per day at noon) from day -7 to day -1 - - Detection period: last 7 days (days -7 to -1) + - 7 days of normal data with frequent updates (every 2 hours) covering days -15 to -8 + - 7 days of anomalous data (only 1 update per day at noon) covering days -7 to -1 + - Detection period: 7 days ending at day 0 [day -7, day 0) - Training period: 7 days - - Without exclusion: training = detection window (anomalous pattern) → test passes - - With exclusion: training = days -14 to -8 (normal pattern) → test fails (detects anomaly) + - Without exclusion: training window = detection window (includes anomalous pattern) → test passes + - With exclusion: training window = [day -14, day -7) (uses normal pattern) → test fails (detects anomaly) """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (3)
integration_tests/tests/conftest.py (2)
test_id(180-183)dbt_project(144-147)integration_tests/tests/dbt_project.py (1)
DbtProject(46-334)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, dremio) / test
🔇 Additional comments (3)
integration_tests/tests/test_freshness_anomalies.py (3)
253-273: Data generation logic looks correct.The data ranges are well-designed for testing the exclusion flag:
- Normal data (days -15 to -8, 12 updates/day) provides a baseline when exclusion is enabled
- Anomalous data (days -7 to -1, 1 update/day) falls within the detection period
- The two datasets don't overlap, ensuring clean separation between training and detection windows
276-292: Test configuration parameters are appropriate.The configuration correctly sets up the scenario:
days_back=20adequately covers the generated data (~15 days)- 7-day training and detection periods align with the data structure
sensitivity=3provides a reasonable threshold for detecting the ~12x drop in update frequencyanomaly_direction="spike"correctly identifies increases in time between updates (i.e., less frequent updates)
294-324: Test assertions and logic are well-structured.The test effectively validates the
exclude_detection_period_from_trainingflag through a clear two-phase approach:
- Without exclusion (lines 294-305): Training includes the anomalous pattern, so the baseline is "1 update/day" → detection sees "1 update/day" → no anomaly detected (PASS) ✓
- With exclusion (lines 307-324): Training uses only normal data, so the baseline is "~12 updates/day" → detection sees "1 update/day" → significant drop detected (FAIL) ✓
The test IDs are properly isolated, and the assertions match the expected behavior described in the PR objectives.
This reverts commit 38b64db.
Changes: - Change detection_end from utc_now + 1 day to utc_now - Move normal data to days -14 to -8 (7 days with 2-hour updates) - Move anomalous data to days -7 to -1 (7 days with 1 update/day at noon) - Change training_period from 30 days to 14 days - Reduce min_training_set_size from 5 to 3 - Update days_back from 40 to 20 Why This Works: - training_period = 14 days ensures there are training buckets available when exclusion is enabled (days -14 to -8) - Without exclusion: Training includes both normal and anomalous data, so the anomalous pattern becomes part of the baseline → test PASSES - With exclusion: Training includes only normal data (days -14 to -8), so the anomalous pattern in detection (days -7 to -1) stands out → test FAILS This makes the test more intuitive by aligning anomalous data directly with the detection period while ensuring both test cases work correctly. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
267-274: Documentation inconsistency: normal data range doesn't match comment.The comment states normal data spans "from day -14 to day -8" (lines 243-244, 252), but with
base_date=utc_now - timedelta(days=8)anddays_back=7, thegenerate_datesfunction produces data from approximately day -15 to day -8. This is becausemin_date = day -8 - 7 days = day -15.While this doesn't break the test logic (the training period with exclusion still captures 7 days of normal data from days -14 to -8), it creates a maintenance hazard.
Apply this diff to align the code with the documentation:
# Generate 7 days of normal data with frequent updates (every 2 hours) from day -14 to day -8 normal_data = [ {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} for date in generate_dates( base_date=utc_now - timedelta(days=8), step=timedelta(hours=2), - days_back=7, + days_back=6, ) ]Alternatively, update the comments to reflect the actual range: "from day -15 to day -8 (7 days duration)".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (2)
integration_tests/tests/dbt_project.py (1)
DbtProject(46-334)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, redshift) / test
🔇 Additional comments (1)
integration_tests/tests/test_freshness_anomalies.py (1)
238-338: Well-structured integration test.The test effectively validates the
exclude_detection_period_from_trainingflag by constructing a scenario where:
- Training without exclusion normalizes the anomalous pattern (sparse updates) into the baseline → test passes
- Training with exclusion excludes the anomalous pattern from the baseline → anomaly is detected → test fails
The configuration is sound, assertions are clear, and the test is appropriately skipped on ClickHouse.
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Add exclude_detection_period_from_training flag to freshness anomaly tests
Summary
Added the
exclude_detection_period_from_trainingparameter totest_freshness_anomaliesandtest_event_freshness_anomaliesmacros, matching the functionality already available in volume anomaly tests. This flag allows users to exclude metrics from the detection period when calculating the training baseline, preventing contamination from potentially anomalous recent data.Changes:
exclude_detection_period_from_training=falseparameter to both freshness test macrostest_table_anomaliesmacrotest_exclude_detection_from_trainingthat validates the flag works correctlyIntegration test approach:
Review & Testing Checklist for Human
Verify test scenario validity: The test assumes that with
training_period=30 daysanddetection_period=7 days, the detection data is included in training by default when the flag is off. Confirm this matches the actual behavior of freshness anomaly detection (I based this on the volume anomalies test pattern, but freshness metrics may behave differently).Test with real data: Run a freshness anomaly test on actual data with and without the flag to verify it changes behavior as expected. The integration test uses synthetic data with a specific pattern that may not represent all real-world scenarios.
Review test parameters: The test uses
sensitivity=3,backfill_days=0, and specific time windows. Consider if these are appropriate defaults or if the test should validate other parameter combinations.Notes
Summary by CodeRabbit
New Features
Tests