-
Notifications
You must be signed in to change notification settings - Fork 121
Add exclude_detection_period_from_training flag to column/dimension anomaly tests #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on anomaly tests Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
WalkthroughA new integration test function Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
👋 @devin-ai-integration[bot] |
…omaly tests Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
integration_tests/tests/test_all_columns_anomalies.py (3)
204-212: Misleading comment and empty loop.Line 210 uses
range(0)which produces zero iterations, so no non-null superhero values are added to anomalous days. The comment "No non-null values to keep total similar" is misleading because:
- Normal days have 10 total rows (2 nulls + 8 non-nulls)
- Anomalous days have 10 total rows (10 nulls + 0 non-nulls)
While row counts are equal, the comment doesn't clarify this intent.
Consider clarifying the comment:
anomalous_data.extend( [ { TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT), "superhero": "Superman" if i % 2 == 0 else "Batman", } - for _ in range(0) # No non-null values to keep total similar + for _ in range(0) # No non-null values in anomalous period (all rows are null) ] )
248-273: Improve assertion robustness (same issue as Test 1).The assertion logic has the same potential issue as the first test - if
superhero_resultis None, the error message won't clearly indicate the result is missing.Apply the same improvement as suggested for Test 1:
- assert ( - superhero_result and superhero_result["status"] == "fail" - ), "Test should fail when anomaly is excluded from training" + assert superhero_result is not None, "No test result found for 'superhero' column" + assert ( + superhero_result["status"] == "fail" + ), f"Test should fail when anomaly is excluded from training, got: {superhero_result['status']}"
217-246: Improve assertion clarity.The assertion can be more explicit. Consider splitting it to clearly indicate whether the result is missing or the status failed:
+ assert superhero_result is not None, "No test result found for 'superhero' column" - assert ( - superhero_result and superhero_result["status"] == "pass" - ), "Test should pass when anomaly is included in training" + assert ( + superhero_result["status"] == "pass" + ), f"Test should pass when anomaly is included in training, got status: {superhero_result['status']}"The sensitivity parameter value is correct: in dbt Elementary, higher sensitivity values make tests less likely to detect anomalies (they expand the expected range), so the comment on line 223 accurately reflects the intent to absorb the anomaly during training.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_all_columns_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
🧬 Code graph analysis (1)
integration_tests/tests/test_all_columns_anomalies.py (2)
integration_tests/tests/conftest.py (2)
test_id(180-183)dbt_project(144-147)integration_tests/tests/dbt_project.py (4)
DbtProject(46-334)test(110-128)test(131-149)test(151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (fusion, bigquery) / test
🔇 Additional comments (2)
integration_tests/tests/test_all_columns_anomalies.py (2)
158-171: LGTM! Clear test documentation.The test function is well-structured with a clear docstring explaining the scenario and expected behavior for both flag states.
172-214: Clarify misleading comment on line 210.The comment "No non-null values to keep total similar" on line 210 is unclear. The
range(0)doesn't "keep" anything—it simply produces zero iterations. This should be clarified to explain that the anomalous period intentionally contains only nulls (10 per day) with no non-null superhero values, in contrast to the normal period (2 nulls + 8 non-nulls).Suggested revision:
"range(0) to omit non-null superhero values in anomalous period"The date range (data from 1 day ago, not today) is consistent with other tests in this file and reflects an intentional design pattern for test data generation.
… suffixes and adjust test data for proper anomaly detection Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_all_columns_anomalies.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
🧬 Code graph analysis (1)
integration_tests/tests/test_all_columns_anomalies.py (2)
integration_tests/tests/conftest.py (2)
test_id(180-183)dbt_project(144-147)integration_tests/tests/dbt_project.py (4)
DbtProject(46-334)test(110-128)test(131-149)test(151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (fusion, redshift) / test
🔇 Additional comments (3)
integration_tests/tests/test_all_columns_anomalies.py (3)
158-171: LGTM! Clear test structure and documentation.The test function follows the established pattern in this file and provides a clear docstring explaining the expected behavior for both scenarios (with and without the flag).
229-275: LGTM! Test scenarios correctly validate the flag behavior.The two test scenarios properly validate the
exclude_detection_period_from_trainingflag:
- Scenario 1: Without the flag, the detection period is included in training, shifting the baseline and causing the test to pass (anomaly absorbed).
- Scenario 2: With the flag enabled, the detection period is excluded from training, making the anomaly detectable and causing the test to fail.
The use of distinct test IDs (
test_id + "_f"andtest_id + "_t") prevents conflicts, and the assertion messages are clear and helpful for debugging.
219-227: Perfect! I now have clarity on both issues. Let me verify the sensitivity parameter behavior one more time from the documentation I found earlier to ensure accuracy:Based on my verification:
backfill_days: The test is correct. While
backfill_daysis mandatory in validation, it is automatically derived fromdetection_periodvia thedetection_period_to_backfill_days()function before validation occurs. The test providesdetection_period: {"period": "day", "count": 7}, which converts tobackfill_days: 7, satisfying the mandatory requirement.sensitivity comment: The comment on line 225 is definitively incorrect. According to Elementary's documentation, higher sensitivity values actually reduce the number of anomalies (allowing a wider acceptable range), not "allow anomaly to be absorbed." The relationship is: smaller sensitivity → more anomalies flagged; larger sensitivity → fewer anomalies flagged. This is the opposite of what the comment states.
**Fix the misleading sensitivity comment on line 225.
The comment states "Higher sensitivity to allow anomaly to be absorbed," but this is backwards. In Elementary, higher sensitivity values reduce anomaly detection (wider acceptable range), while lower values increase it. Either remove the comment or correct it to reflect that the sensitivity value of 5 is being set for testing purposes.
| - Without exclusion: anomaly gets included in training baseline, test passes (misses anomaly) | ||
| - With exclusion: anomaly excluded from training, test fails (detects anomaly) | ||
| """ | ||
| utc_now = datetime.utcnow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix datetime handling for consistency and reliability.
This test uses datetime.utcnow() while existing tests in this file use datetime.utcnow().date() (see line 19). This inconsistency could cause time-of-day variance issues in date calculations and make the test behavior unpredictable.
Additionally, datetime.utcnow() is deprecated in Python 3.12+.
Apply this diff:
- utc_now = datetime.utcnow()
+ utc_today = datetime.utcnow().date()Then update line 178 and 199 references:
- date = utc_now - timedelta(days=37 - i)
+ date = utc_today - timedelta(days=37 - i)- date = utc_now - timedelta(days=7 - i)
+ date = utc_today - timedelta(days=7 - i)🤖 Prompt for AI Agents
In integration_tests/tests/test_all_columns_anomalies.py around line 172,
replace the use of datetime.utcnow() with datetime.now(timezone.utc).date() to
match the file's existing pattern (uses datetime.utcnow().date() at line 19) and
avoid the deprecated datetime.utcnow() in Python 3.12+; also update the
subsequent references at lines 178 and 199 to operate on the date value (or to
use datetime.now(timezone.utc) if a timezone-aware datetime is required) so all
comparisons and arithmetic use the same date type and timezone-aware semantics
for consistent, predictable tests.
…tead Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…est with parametrization Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration_tests/tests/test_all_columns_anomalies.py (1)
1-1: Timezone import and usage look good; consider optional consistency cleanupThe new
timezoneimport to supportdatetime.now(timezone.utc)is appropriate for the new test. Optionally, you could later migrate the existingdatetime.utcnow().date()usages in this file to the same pattern for consistency, but that’s not required for this PR.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_all_columns_anomalies.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
🧬 Code graph analysis (1)
integration_tests/tests/test_all_columns_anomalies.py (2)
integration_tests/tests/conftest.py (2)
test_id(180-183)dbt_project(144-147)integration_tests/tests/dbt_project.py (4)
DbtProject(46-334)test(110-128)test(131-149)test(151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (latest_official, bigquery) / test
- GitHub Check: test (fusion, snowflake) / test
🔇 Additional comments (1)
integration_tests/tests/test_all_columns_anomalies.py (1)
156-256: Detection-period exclusion test scenario is well-structured and exercises both flag pathsThe parametrized
test_anomaly_in_detection_periodlooks solid:
- Training vs detection windows are correctly separated: 30 normal days (
utc_now - 37throughutc_now - 8) and 7 anomalous days (utc_now - 7throughutc_now - 1), matching thetraining_period/detection_periodof 30/7 with 1‑day buckets.- The null-count setup (mean 10 in training vs 20 in detection) gives a clear contrast so the
expected_statusflip between(False, "pass")and(True, "fail")directly validatesexclude_detection_period_from_training.- Focusing the assertion on the
superherocolumn vianext(...)is consistent with the other tests and avoids depending on unrelated columns.No changes needed here from my side.
Add integration test for exclude_detection_period_from_training flag
Summary
Adds a parametrized integration test
test_anomaly_in_detection_periodto validate theexclude_detection_period_from_trainingflag for column anomaly tests. The test verifies that:exclude_detection=False(default): anomalous data in the detection period is included in training, causing the test to pass (anomaly is absorbed into baseline)exclude_detection=True: anomalous data in the detection period is excluded from training, causing the test to fail (anomaly is detected)The test uses a 30-day training period with variance in null counts (8, 10, 12 nulls per day pattern) and a 7-day detection period with anomalous data (20 nulls per day, representing a 100% increase from the mean).
Note: This PR originally included macro changes to add the flag to
test_column_anomalies.sql,test_all_columns_anomalies.sql, andtest_dimension_anomalies.sql, but those were already merged in PR #889 and PR #890. After merging master and addressing reviewer feedback, this PR now only contains the integration test.Review & Testing Checklist for Human
exclude_detection=Trueshould cause test status "fail" (detects anomaly),exclude_detection=Falseshould cause test status "pass" (misses anomaly)all_columns_anomalieshas this integration test. Shouldcolumn_anomaliesanddimension_anomaliesalso have dedicated tests for this flag?Notes
datetime.utcnow()withdatetime.now(timezone.utc)["without_exclusion", "with_exclusion"]for clear test outputSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.