-
Notifications
You must be signed in to change notification settings - Fork 121
Add exclude_detection_period_from_training flag to event freshness anomaly tests #888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tests Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
WalkthroughAdds an optional Changes
Sequence Diagram(s)sequenceDiagram
participant IT as Integration Test
participant Macro as event_freshness macro
participant Engine as elementary.test_table_anomalies
Note over IT: prepare training + detection data
IT->>Macro: call test_event_freshness_anomalies(..., exclude_detection_period_from_training=FLAG)
Note right of Macro: forwards flag and args
Macro->>Engine: elementary.test_table_anomalies(..., exclude_detection_period_from_training=FLAG)
Engine-->>Macro: result (pass/fail)
Macro-->>IT: return result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
Comment |
…eshness anomalies Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (3)
integration_tests/tests/conftest.py (2)
test_id(180-183)dbt_project(144-147)integration_tests/tests/dbt_project.py (1)
DbtProject(46-334)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, bigquery) / test
| # Generate 30 days of normal data with consistent freshness (every 2 hours) | ||
| normal_data = [ | ||
| {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} | ||
| for date in generate_dates( | ||
| utc_now - timedelta(days=33), step=timedelta(hours=2), days_back=30 | ||
| ) | ||
| ] | ||
|
|
||
| anomalous_data = [ | ||
| {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} | ||
| for date in generate_dates(utc_now, step=timedelta(hours=8), days_back=3) | ||
| ] | ||
|
|
||
| all_data = normal_data + anomalous_data | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normal data window misses the training period
Line 260 anchors the “30 days of normal data” at utc_now - timedelta(days=33), so every row lands 33–63 days in the past. Because the training period in Line 275 spans only the last 30 days, none of this “normal” history is actually available to the model once the detection window is excluded; the baseline is effectively empty, so the failing assertion is driven by missing data rather than the new flag’s behavior. Anchor the normal data immediately before the detection window so the test exercises the intended comparison.
- normal_data = [
+ detection_start = utc_now - timedelta(days=3)
+ normal_data = [
{TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)}
for date in generate_dates(
- utc_now - timedelta(days=33), step=timedelta(hours=2), days_back=30
+ detection_start, step=timedelta(hours=2), days_back=30
)
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Generate 30 days of normal data with consistent freshness (every 2 hours) | |
| normal_data = [ | |
| {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} | |
| for date in generate_dates( | |
| utc_now - timedelta(days=33), step=timedelta(hours=2), days_back=30 | |
| ) | |
| ] | |
| anomalous_data = [ | |
| {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} | |
| for date in generate_dates(utc_now, step=timedelta(hours=8), days_back=3) | |
| ] | |
| all_data = normal_data + anomalous_data | |
| # Generate 30 days of normal data with consistent freshness (every 2 hours) | |
| detection_start = utc_now - timedelta(days=3) | |
| normal_data = [ | |
| {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} | |
| for date in generate_dates( | |
| detection_start, step=timedelta(hours=2), days_back=30 | |
| ) | |
| ] | |
| anomalous_data = [ | |
| {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)} | |
| for date in generate_dates(utc_now, step=timedelta(hours=8), days_back=3) | |
| ] | |
| all_data = normal_data + anomalous_data |
🤖 Prompt for AI Agents
in integration_tests/tests/test_freshness_anomalies.py around lines 256 to 270,
the "normal" data is anchored 33 days ago so it falls outside the test's
training period; adjust the generate_dates start so the normal window ends
immediately before the detection window by changing the start argument from
utc_now - timedelta(days=33) to utc_now - timedelta(days=3) (keep
step=timedelta(hours=2) and days_back=30) so the normal 30-day history overlaps
the model's training period.
| test_args_without_exclusion = { | ||
| "timestamp_column": TIMESTAMP_COLUMN, | ||
| "training_period": {"period": "day", "count": 30}, | ||
| "detection_period": {"period": "day", "count": 3}, | ||
| "time_bucket": {"period": "day", "count": 1}, | ||
| "sensitivity": 5, # Higher sensitivity to allow anomaly to be absorbed | ||
| # exclude_detection_period_from_training is not set (defaults to False/None) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add mandatory backfill_days to test config
elementary.freshness_anomalies validates that backfill_days is provided (via validate_mandatory_configuration). Without it, this test fails at compilation instead of exercising the new flag. Please include the parameter in both runs.
"time_bucket": {"period": "day", "count": 1},
"sensitivity": 5, # Higher sensitivity to allow anomaly to be absorbed
+ "backfill_days": 30,Based on learnings
🤖 Prompt for AI Agents
in integration_tests/tests/test_freshness_anomalies.py around lines 272 to 280,
the test config dict test_args_without_exclusion is missing the mandatory
backfill_days parameter causing validation to fail; add "backfill_days": 30 (or
another appropriate integer) to this dict and also add the same backfill_days
key/value to the other test run/config in this test so both executions include
the required parameter.
…s test with proper window alignment Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_event_freshness_anomalies.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integration_tests/tests/test_event_freshness_anomalies.py (2)
integration_tests/tests/dbt_project.py (5)
DbtProject(46-334)seed(259-263)test(110-128)test(131-149)test(151-257)integration_tests/tests/data_generator.py (1)
generate_dates(6-17)
🪛 Ruff (0.14.4)
integration_tests/tests/test_event_freshness_anomalies.py
111-111: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, postgres) / test
🔇 Additional comments (1)
integration_tests/tests/test_event_freshness_anomalies.py (1)
1-1: LGTM! Random import is appropriate for test data generation.The
randommodule is correctly used to generate deterministic jitter for test data. The static analysis warning about cryptographic usage (S311) is a false positive—this is test data generation, not security-sensitive code.
…ll, and explicit parameters Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…days correctly The test was failing because detection_period parameter was overriding backfill_days, causing the detection period to span all 14 days of data instead of just the last 7 days. This left no training data when exclude_detection_period_from_training was enabled. Fixed by: - Removing training_period and detection_period parameters - Setting days_back: 14 (scoring window includes both periods) - Setting backfill_days: 7 (detection period is last 7 days) This properly splits the data into: - Training period: days 14-8 (7 days of normal data) - Detection period: days 7-1 (7 days of anomalous data) Test now passes locally with postgres. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_event_freshness_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_event_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_event_freshness_anomalies.py (3)
integration_tests/tests/test_volume_anomalies.py (1)
test_exclude_detection_from_training(545-621)integration_tests/tests/dbt_project.py (4)
DbtProject(46-334)test(110-128)test(131-149)test(151-257)integration_tests/tests/test_freshness_anomalies.py (1)
test_exclude_detection_from_training(239-320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, bigquery) / test
🔇 Additional comments (3)
integration_tests/tests/test_event_freshness_anomalies.py (3)
116-141: LGTM! Date ranges are now correctly aligned.The data generation correctly places:
- Normal data (2-8 min lag) in the training period: days -14 to -8
- Anomalous data (5-hour lag) in the detection period: days -7 to -1
This alignment ensures that with
exclude_detection_period_from_training=True, only normal data trains the model and the anomaly is detected.
145-197: Test structure and assertions look good.The test correctly implements the two-scenario pattern:
- Without exclusion: detection data included in training → test passes (anomaly missed)
- With exclusion: detection data excluded from training → test fails (anomaly detected)
Both scenarios use the same data and test configuration, differing only in the
exclude_detection_period_from_trainingflag. The assertions properly validate the expected behavior in each case.
146-159: I need to investigate whether these parameters are required, optional, or have defaults. Let me search for how they're validated and used.Let me search the codebase to see how these parameters are handled when not provided:
Let me search the codebase directly for how these parameters are validated and handled:
Based on the script output and my analysis, I can now provide the rewritten review comment. The key findings are:
- The macro
test_event_freshness_anomalies.sqlshows thatdetection_periodandtraining_periodare explicit parameters in the macro signature (line 1)- These parameters are passed through to
test_table_anomalies(lines 34-35)- The test at lines 146-159 (
test_args_without_exclusion) does NOT includetraining_periodordetection_period- The Python test documentation at line 108 claims "Explicit training_period and detection_period parameters" but the actual test config doesn't provide them
Given the learnings about mandatory parameter validation and the fact that these parameters are explicitly in the macro signature without defaults, this is a legitimate concern. The test should either specify these parameters or the code comment should be corrected.
Omit explicit
training_periodanddetection_periodparameters if they default correctly; otherwise, add them.The test compares metrics within the detection_period to training_period data, then fails if anomalies are detected in the detection period. The current test (lines 146-159) omits both
training_periodanddetection_period, relying on defaults. However, the macro signature shows these are explicit parameters. Verify they're optional with sensible defaults, or add them explicitly. The docstring at line 108 mentions "Explicit training_period and detection_period parameters," buttest_args_without_exclusiondoesn't include them.
remvoed comment
Add exclude_detection_period_from_training flag to event freshness anomaly tests
Summary
This PR implements CORE-91 by adding the
exclude_detection_period_from_trainingparameter to freshness anomaly detection tests. This parameter was already available for volume anomalies and is now exposed to:test_event_freshness_anomaliestest_freshness_anomaliesThe parameter defaults to
falseto maintain backward compatibility. When set totrue, it excludes metrics from the detection period when calculating training statistics, preventing potentially anomalous recent data from contaminating the baseline.The underlying infrastructure in
get_anomaly_scores_query.sqlandget_anomalies_test_configuration.sqlalready fully supports this parameter - this PR simply exposes it to the freshness test macros.Review & Testing Checklist for Human
test_event_freshness_anomaliesANDtest_freshness_anomaliesshould receive this parameter (I added it to both based on symmetry with volume tests, but couldn't access the Linear issue to verify)exclude_detection_period_from_training: trueto verify it works as expected and properly excludes detection period data from trainingtest_exclude_detection_period_from_trainingintest_volume_anomalies.pyfor freshness testsNotes
falseto maintain existing behaviorSummary by CodeRabbit
New Features
Tests