-
Notifications
You must be signed in to change notification settings - Fork 122
Add exclude_detection_period_from_training flag to dimension anomaly test #890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…test - Added exclude_detection_period_from_training parameter to test_dimension_anomalies macro signature with default value false - Passed the parameter through to get_anomalies_test_configuration - This brings dimension anomalies in line with table/volume anomalies which already support this flag - The underlying logic in get_anomaly_scores_query.sql already handles this parameter for all anomaly types Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
WalkthroughAdds an optional Changes
Sequence Diagram(s)sequenceDiagram
actor IT as Integration Test
participant MAC as test_dimension_anomalies macro
participant CFG as Generated Config
participant EXEC as Test Executor
Note over IT,MAC: Two scenarios: FLAG=false (include), FLAG=true (exclude)
IT->>MAC: call test_dimension_anomalies(..., exclude_detection_period_from_training=FLAG)
MAC->>CFG: construct config { ..., exclude_detection_period_from_training: FLAG }
CFG->>EXEC: run anomaly detection test with config
alt FLAG = false
EXEC->>EXEC: include detection period in training (anomaly may be masked)
EXEC-->>IT: test result (expected pass)
else FLAG = true
EXEC->>EXEC: exclude detection period from training (anomaly remains detectable)
EXEC-->>IT: test result (expected fail)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Comment |
…mension anomalies - Added test_dimension_exclude_detection_from_training to demonstrate the flag's behavior - Test shows that without exclusion, anomaly is missed (test passes) because training includes the detection period - Test shows that with exclusion, anomaly is detected (test fails) because training excludes the detection period - Uses 30 days of normal data with variance (45/50/55 pattern) and 7 days of anomalous data (72/28 distribution) - Follows the same pattern as test_exclude_detection_from_training in test_volume_anomalies.py Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration_tests/tests/test_dimension_anomalies.py (1)
229-326: Excellent test implementation with clear documentation.The test correctly validates the
exclude_detection_period_from_trainingflag behavior with well-structured data and clear expectations. The logic is sound: when exclusion is disabled, the anomalous detection period data contaminates the training baseline (causing the test to pass/miss the anomaly), and when enabled, the clean baseline correctly detects the anomaly (causing the test to fail).Optional: Consider minor consistency improvements:
Line 241 uses
datetime.utcnow()while other tests in this file usedatetime.utcnow().date()(e.g., lines 37, 59, 79). Although the code works correctly since the datetime is formatted to a date string, using.date()would improve consistency.The test only validates pass/fail status but doesn't verify which dimension values were flagged as anomalous (unlike
test_anomalous_dimension_anomaliesat lines 105-114). Adding verification of anomaly details would make the test more thorough:# After line 321, add: anomaly_test_points = get_latest_anomaly_test_points(dbt_project, test_id + "_with_exclusion") dimension_values = set([x["dimension_value"] for x in anomaly_test_points]) assert "Superman" in dimension_values, "Superman should be flagged as anomalous"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_dimension_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_dimension_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_dimension_anomalies.py (2)
integration_tests/tests/conftest.py (2)
test_id(180-183)dbt_project(144-147)integration_tests/tests/dbt_project.py (4)
DbtProject(46-334)test(110-128)test(131-149)test(151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, dremio) / test
…uffixes to avoid Postgres 63-char limit Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration_tests/tests/test_dimension_anomalies.py (1)
241-269: Clarify training/detection date arithmetic to avoid magic numbersThe
37 - iand7 - ioffsets work, but they’re non-obvious to readers and couple the loop math tightly to the period lengths. You could make this easier to reason about by deriving explicit training and detection start dates, e.g.:- utc_now = datetime.utcnow() - - # Generate 30 days of normal data with variance (45/50/55 pattern for Superman) - normal_pattern = [45, 50, 55] - normal_data = [] - for i in range(30): - date = utc_now - timedelta(days=37 - i) + utc_today = datetime.utcnow().date() + detection_start = utc_today - timedelta(days=7) + training_start = detection_start - timedelta(days=30) + + # Generate 30 days of normal data with variance (45/50/55 pattern for Superman) + normal_pattern = [45, 50, 55] + normal_data = [] + for i in range(30): + date = training_start + timedelta(days=i) @@ - # Generate 7 days of anomalous data (72 Superman, 28 Spiderman per day) - this will be in detection period - anomalous_data = [] - for i in range(7): - date = utc_now - timedelta(days=7 - i) + # Generate 7 days of anomalous data (72 Superman, 28 Spiderman per day) - this will be in detection period + anomalous_data = [] + for i in range(7): + date = detection_start + timedelta(days=i)This keeps the intent (“30 days before detection”, “7 days of detection”) explicit and makes future adjustments to window sizes less error-prone, while preserving the same date ranges.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_dimension_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
🧬 Code graph analysis (1)
integration_tests/tests/test_dimension_anomalies.py (1)
integration_tests/tests/dbt_project.py (4)
DbtProject(46-334)test(110-128)test(131-149)test(151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_pre, postgres) / test
🔇 Additional comments (1)
integration_tests/tests/test_dimension_anomalies.py (1)
223-326: Solid end-to-end coverage ofexclude_detection_period_from_trainingbehaviorThis test cleanly exercises both paths (default include vs explicit exclude), uses distinct
test_ids to isolate runs, and constructs training vs detection windows in a way that aligns with the configuredtraining_period,detection_period, andtime_bucket. Assertions on overall test status (passvsfail) match the intended semantics of the new flag. I don’t see any functional or flakiness concerns here.
…tion_period - Rename test_dimension_exclude_detection_from_training to test_anomaly_in_detection_period - Add @pytest.mark.parametrize decorator with exclude_detection and expected_status parameters - Use descriptive IDs: include_detection_in_training and exclude_detection_from_training - Consolidate two test cases into one parametrized test for better maintainability - Addresses reviewer feedback on PR #890 Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…move redundant suffix - Change parametrize IDs from 'include_detection_in_training'/'exclude_detection_from_training' to 'exclude_false'/'exclude_true' - Remove redundant suffix (_incl/_excl) since pytest parametrize IDs already differentiate test cases - New table names: test_anomaly_in_detection_period_exclude_false (44 chars) and test_anomaly_in_detection_period_exclude_true (43 chars) - Both are well under Postgres 63-character limit - Fixes CI failures on Postgres (latest_official and latest_pre) Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_dimension_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_dimension_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_dimension_anomalies.py (2)
integration_tests/tests/conftest.py (2)
test_id(180-183)dbt_project(144-147)integration_tests/tests/dbt_project.py (4)
DbtProject(46-334)test(110-128)test(131-149)test(151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, clickhouse) / test
🔇 Additional comments (5)
integration_tests/tests/test_dimension_anomalies.py (5)
223-239: LGTM! Well-structured parametrized test.The test follows the suggestion from previous review comments to use
@pytest.mark.parametrizewith both flag states and expected outcomes. The documentation clearly explains the behavior, and the shortened IDs appropriately address the Postgres identifier length constraint.
240-256: LGTM! Clear function signature and comprehensive docstring.The function signature correctly matches the parametrized inputs, and the docstring thoroughly documents the test scenario and expected behavior for both flag states.
282-300: LGTM! Anomalous data pattern is appropriate.The anomalous data generation (72% Superman vs ~50% baseline) provides a clear deviation that should be detectable when the detection period is excluded from training. The 7-day range correctly aligns with the detection_period parameter.
304-313: Rewritten review comment is incorrect—the test configuration is valid.The script output confirms that
training_periodanddetection_periodare not bypassing thebackfill_daysrequirement. Instead, they are converted tobackfill_daysanddays_backthrough dedicated macros inget_anomalies_test_configuration.sql(lines 40–48):
detection_period: {"period": "day", "count": 7}→backfill_days=7training_period: {"period": "day", "count": 30}→days_back=30The configuration processor then validates that
backfill_daysis set (line 90), which it is after conversion. Multiple tests across the codebase (test_volume_anomalies.py, test_anomaly_test_configuration.py) use this exact pattern successfully.This is intentional design: the new API accepts user-friendly period specifications instead of raw day counts.
302-321: Based on the verification through the shell script output showing howexclude_detection_period_from_trainingworks in the Elementary codebase, I can now provide an accurate assessment.Review comment is based on incorrect understanding of the
exclude_detection_period_from_traininglogic.The original concern assumed that
exclude_detection_period_from_training=Truewould exclude the fulldetection_period(7 days) from training. However, the actual implementation inget_anomaly_scores_query.sqlshows that exclusion usesbackfill_days(which defaults to 2 days), notdetection_period. For incremental models, this is also the period for re-calculating metrics.With the actual logic:
- Exclusion window: last 2 days (backfill_days default)
- Training data available: ~28 days out of the 37 days generated
- This is sufficient for the 30-day
training_periodrequestedThe data generation is appropriate, and the concerns about insufficient training data do not apply.
Likely an incorrect or invalid review comment.
…tetime Change utc_now from datetime.utcnow() to datetime.utcnow().date() to match the pattern used in other tests. Date arithmetic already works correctly with date objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration_tests/tests/test_dimension_anomalies.py (1)
257-280: Normal data generation looks correct.The datetime issue from the previous review has been properly resolved. The data generation creates 30 days of normal data with appropriate variance (45/50/55 pattern) placed before the detection period.
Optional suggestion: Consider adding a comment clarifying the date range calculation for readability:
# Generate 30 days of normal data (days -37 to -8, before detection period) for i in range(30): date = utc_now - timedelta(days=37 - i)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_dimension_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
Applied to files:
integration_tests/tests/test_dimension_anomalies.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_pre, postgres) / test
🔇 Additional comments (4)
integration_tests/tests/test_dimension_anomalies.py (4)
223-239: LGTM! Well-structured parametrization.The test parametrization is clear and follows best practices:
- Descriptive parameter names that convey the test scenarios
- Shortened IDs documented to avoid Postgres identifier limits
- Clear comments explaining the expected behavior for each case
240-256: Excellent documentation!The test function signature and docstring are well-crafted:
- Clear, descriptive function name that reflects the test purpose
- Comprehensive docstring explaining the scenario and expected outcomes
- Helpful note about Postgres identifier limit
282-300: Anomalous data generation is well-designed.The 7 days of anomalous data (72% Superman vs. normal ~50%) creates a clear distributional shift that should be detectable when the detection period is excluded from training. The date placement (days -7 to -1) correctly aligns with the intended detection period.
302-320: Test configuration and exclusion flag implementation are correct.The test properly parametrizes the
exclude_detection_period_from_trainingbehavior:
- When
exclude_detection=False: flag not added → detection period included in training → anomaly absorbed → test passes- When
exclude_detection=True: flag set to True → detection period excluded from training → anomaly detected → test failsThe macro implementations correctly support this flag through the entire call stack:
test_dimension_anomalies.sql→get_anomalies_test_configuration.sql→get_anomaly_scores_query.sql, where metrics are properly excluded from training statistics when the flag is enabled.The test data generation (30 days of 45/50/55 Superman pattern, then 7 days of 72/28 anomalous split) correctly establishes the scenario needed to verify the flag's functionality.
Please manually verify this test passes across all supported warehouse targets (Postgres, Snowflake, BigQuery) to confirm the end-to-end behavior matches the expected semantics.
Add exclude_detection_period_from_training flag to dimension anomaly test
Summary
This PR adds the
exclude_detection_period_from_trainingparameter to thetest_dimension_anomaliesmacro and includes a parametrized integration test demonstrating its functionality. The parameter allows users to exclude recent detection period data from the training baseline, which helps detect anomalies that would otherwise be absorbed into the baseline.Changes:
exclude_detection_period_from_trainingparameter totest_dimension_anomaliesmacro signature (defaults tofalse)get_anomalies_test_configurationtest_anomaly_in_detection_periodintegration test using@pytest.mark.parametrizeto test both flag statesTest Behavior:
The integration test creates a scenario with:
Expected outcomes:
exclude_detection=False→ test PASSES (anomaly absorbed into training baseline)exclude_detection=True→ test FAILS (anomaly detected because excluded from training)Review & Testing Checklist for Human
This is a YELLOW risk PR - the core logic was implemented in a previous session, but the integration test needs verification:
exclude_detection=Falseto PASS (anomaly missed) andexclude_detection=Trueto FAIL (anomaly detected). This mirrors the volume anomaly test behavior but may seem counterintuitive - verify this is the intended semanticget_anomalies_test_configurationandget_anomaly_scores_queryactually respect theexclude_detection_period_from_trainingflag (this was implemented in a previous session but not verified in this PR)Test Plan
pytest integration_tests/tests/test_dimension_anomalies.py::test_anomaly_in_detection_period -vinclude_detection_in_trainingandexclude_detection_from_trainingNotes
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.