Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Nov 13, 2025

Add integration test for exclude_detection_period_from_training flag

Summary

Adds a parametrized integration test test_anomaly_in_detection_period to validate the exclude_detection_period_from_training flag for column anomaly tests. The test verifies that:

  • When exclude_detection=False (default): anomalous data in the detection period is included in training, causing the test to pass (anomaly is absorbed into baseline)
  • When exclude_detection=True: anomalous data in the detection period is excluded from training, causing the test to fail (anomaly is detected)

The test uses a 30-day training period with variance in null counts (8, 10, 12 nulls per day pattern) and a 7-day detection period with anomalous data (20 nulls per day, representing a 100% increase from the mean).

Note: This PR originally included macro changes to add the flag to test_column_anomalies.sql, test_all_columns_anomalies.sql, and test_dimension_anomalies.sql, but those were already merged in PR #889 and PR #890. After merging master and addressing reviewer feedback, this PR now only contains the integration test.

Review & Testing Checklist for Human

  • Verify CI passes across all warehouse types - The test data patterns (8,10,12 vs 20 nulls with sensitivity=5) should be robust, but confirm no flakiness across warehouses
  • Validate test semantics - Confirm the parametrization logic is correct: exclude_detection=True should cause test status "fail" (detects anomaly), exclude_detection=False should cause test status "pass" (misses anomaly)
  • Check merge conflict resolution - I resolved conflicts between my parametrization refactoring and upstream test data improvements. Verify the final test logic is correct
  • Consider test coverage - Only all_columns_anomalies has this integration test. Should column_anomalies and dimension_anomalies also have dedicated tests for this flag?

Notes

Summary by CodeRabbit

  • Tests
    • Added integration test to validate anomaly detection with optional exclusion of detection period from training. The test constructs 30 days of normal data and 7 days of anomalies to verify detection accuracy under different configuration scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

…on anomaly tests

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@linear
Copy link

linear bot commented Nov 13, 2025

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Walkthrough

A new integration test function test_anomaly_in_detection_period is added to validate the exclude_detection_period_from_training flag behavior in all column anomalies, verifying anomaly detection with and without excluding the detection period from training data. The test imports timezone support and is parameterized to run with different configurations while skipping ClickHouse.

Changes

Cohort / File(s) Summary
All Column Anomalies Integration Test
integration_tests/tests/test_all_columns_anomalies.py
Added timezone import to datetime module. Added parameterized integration test function test_anomaly_in_detection_period that constructs 30 days of normal data followed by 7 days of anomalous data, tests with and without the exclude_detection_period_from_training flag, and validates detection status of the "superhero" column against expected outcomes. Test is skipped for ClickHouse.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify test parameterization logic and expected outcomes are correct
  • Confirm test data construction (30 days normal + 7 days anomalous) aligns with intended behavior
  • Check that ClickHouse skip condition is appropriate

Possibly related PRs

Suggested reviewers

  • GuyEshdat

Poem

🐰 A test hops forth with data new,
Anomalies revealed, oh what a view!
Exclude the detection, see what's true—
With flags and parameters, the magic shines through!
thump thump 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding the exclude_detection_period_from_training flag to anomaly tests. It's specific and highlights the primary objective of the PR.
Linked Issues check ✅ Passed The PR implements all coding requirements from CORE-93: adds the exclude_detection_period_from_training flag to anomaly test macros and includes an integration test (test_anomaly_in_detection_period) demonstrating the flag's behavior with exclusion enabled and disabled.
Out of Scope Changes check ✅ Passed All changes are within scope: the new test function is directly related to CORE-93 requirements, and the timezone fix (datetime.utcnow() to datetime.now(timezone.utc)) is a necessary correction for Python 3.12+ compatibility mentioned in the PR objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/CORE-93-1763039329

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

…omaly tests

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
integration_tests/tests/test_all_columns_anomalies.py (3)

204-212: Misleading comment and empty loop.

Line 210 uses range(0) which produces zero iterations, so no non-null superhero values are added to anomalous days. The comment "No non-null values to keep total similar" is misleading because:

  • Normal days have 10 total rows (2 nulls + 8 non-nulls)
  • Anomalous days have 10 total rows (10 nulls + 0 non-nulls)

While row counts are equal, the comment doesn't clarify this intent.

Consider clarifying the comment:

             anomalous_data.extend(
                 [
                     {
                         TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT),
                         "superhero": "Superman" if i % 2 == 0 else "Batman",
                     }
-                    for _ in range(0)  # No non-null values to keep total similar
+                    for _ in range(0)  # No non-null values in anomalous period (all rows are null)
                 ]
             )

248-273: Improve assertion robustness (same issue as Test 1).

The assertion logic has the same potential issue as the first test - if superhero_result is None, the error message won't clearly indicate the result is missing.

Apply the same improvement as suggested for Test 1:

-    assert (
-        superhero_result and superhero_result["status"] == "fail"
-    ), "Test should fail when anomaly is excluded from training"
+    assert superhero_result is not None, "No test result found for 'superhero' column"
+    assert (
+        superhero_result["status"] == "fail"
+    ), f"Test should fail when anomaly is excluded from training, got: {superhero_result['status']}"

217-246: Improve assertion clarity.

The assertion can be more explicit. Consider splitting it to clearly indicate whether the result is missing or the status failed:

+    assert superhero_result is not None, "No test result found for 'superhero' column"
-    assert (
-        superhero_result and superhero_result["status"] == "pass"
-    ), "Test should pass when anomaly is included in training"
+    assert (
+        superhero_result["status"] == "pass"
+    ), f"Test should pass when anomaly is included in training, got status: {superhero_result['status']}"

The sensitivity parameter value is correct: in dbt Elementary, higher sensitivity values make tests less likely to detect anomalies (they expand the expected range), so the comment on line 223 accurately reflects the intent to absorb the anomaly during training.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 02779f6 and 36878be.

📒 Files selected for processing (1)
  • integration_tests/tests/test_all_columns_anomalies.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
🧬 Code graph analysis (1)
integration_tests/tests/test_all_columns_anomalies.py (2)
integration_tests/tests/conftest.py (2)
  • test_id (180-183)
  • dbt_project (144-147)
integration_tests/tests/dbt_project.py (4)
  • DbtProject (46-334)
  • test (110-128)
  • test (131-149)
  • test (151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (fusion, bigquery) / test
🔇 Additional comments (2)
integration_tests/tests/test_all_columns_anomalies.py (2)

158-171: LGTM! Clear test documentation.

The test function is well-structured with a clear docstring explaining the scenario and expected behavior for both flag states.


172-214: Clarify misleading comment on line 210.

The comment "No non-null values to keep total similar" on line 210 is unclear. The range(0) doesn't "keep" anything—it simply produces zero iterations. This should be clarified to explain that the anomalous period intentionally contains only nulls (10 per day) with no non-null superhero values, in contrast to the normal period (2 nulls + 8 non-nulls).

Suggested revision: "range(0) to omit non-null superhero values in anomalous period"

The date range (data from 1 day ago, not today) is consistent with other tests in this file and reflects an intentional design pattern for test data generation.

… suffixes and adjust test data for proper anomaly detection

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 36878be and 80380a8.

📒 Files selected for processing (1)
  • integration_tests/tests/test_all_columns_anomalies.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
🧬 Code graph analysis (1)
integration_tests/tests/test_all_columns_anomalies.py (2)
integration_tests/tests/conftest.py (2)
  • test_id (180-183)
  • dbt_project (144-147)
integration_tests/tests/dbt_project.py (4)
  • DbtProject (46-334)
  • test (110-128)
  • test (131-149)
  • test (151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (fusion, redshift) / test
🔇 Additional comments (3)
integration_tests/tests/test_all_columns_anomalies.py (3)

158-171: LGTM! Clear test structure and documentation.

The test function follows the established pattern in this file and provides a clear docstring explaining the expected behavior for both scenarios (with and without the flag).


229-275: LGTM! Test scenarios correctly validate the flag behavior.

The two test scenarios properly validate the exclude_detection_period_from_training flag:

  • Scenario 1: Without the flag, the detection period is included in training, shifting the baseline and causing the test to pass (anomaly absorbed).
  • Scenario 2: With the flag enabled, the detection period is excluded from training, making the anomaly detectable and causing the test to fail.

The use of distinct test IDs (test_id + "_f" and test_id + "_t") prevents conflicts, and the assertion messages are clear and helpful for debugging.


219-227: Perfect! I now have clarity on both issues. Let me verify the sensitivity parameter behavior one more time from the documentation I found earlier to ensure accuracy:

Based on my verification:

  1. backfill_days: The test is correct. While backfill_days is mandatory in validation, it is automatically derived from detection_period via the detection_period_to_backfill_days() function before validation occurs. The test provides detection_period: {"period": "day", "count": 7}, which converts to backfill_days: 7, satisfying the mandatory requirement.

  2. sensitivity comment: The comment on line 225 is definitively incorrect. According to Elementary's documentation, higher sensitivity values actually reduce the number of anomalies (allowing a wider acceptable range), not "allow anomaly to be absorbed." The relationship is: smaller sensitivity → more anomalies flagged; larger sensitivity → fewer anomalies flagged. This is the opposite of what the comment states.

**Fix the misleading sensitivity comment on line 225.

The comment states "Higher sensitivity to allow anomaly to be absorbed," but this is backwards. In Elementary, higher sensitivity values reduce anomaly detection (wider acceptable range), while lower values increase it. Either remove the comment or correct it to reflect that the sensitivity value of 5 is being set for testing purposes.

- Without exclusion: anomaly gets included in training baseline, test passes (misses anomaly)
- With exclusion: anomaly excluded from training, test fails (detects anomaly)
"""
utc_now = datetime.utcnow()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix datetime handling for consistency and reliability.

This test uses datetime.utcnow() while existing tests in this file use datetime.utcnow().date() (see line 19). This inconsistency could cause time-of-day variance issues in date calculations and make the test behavior unpredictable.

Additionally, datetime.utcnow() is deprecated in Python 3.12+.

Apply this diff:

-    utc_now = datetime.utcnow()
+    utc_today = datetime.utcnow().date()

Then update line 178 and 199 references:

-        date = utc_now - timedelta(days=37 - i)
+        date = utc_today - timedelta(days=37 - i)
-        date = utc_now - timedelta(days=7 - i)
+        date = utc_today - timedelta(days=7 - i)
🤖 Prompt for AI Agents
In integration_tests/tests/test_all_columns_anomalies.py around line 172,
replace the use of datetime.utcnow() with datetime.now(timezone.utc).date() to
match the file's existing pattern (uses datetime.utcnow().date() at line 19) and
avoid the deprecated datetime.utcnow() in Python 3.12+; also update the
subsequent references at lines 178 and 199 to operate on the date value (or to
use datetime.now(timezone.utc) if a timezone-aware datetime is required) so all
comparisons and arithmetic use the same date type and timezone-aware semantics
for consistent, predictable tests.

…tead

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@devin-ai-integration devin-ai-integration bot changed the title Add exclude_detection_period_from_training flag to column and dimension anomaly tests Add exclude_detection_period_from_training flag to column/dimension anomaly tests Nov 20, 2025
devin-ai-integration bot and others added 3 commits November 20, 2025 13:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integration_tests/tests/test_all_columns_anomalies.py (1)

1-1: Timezone import and usage look good; consider optional consistency cleanup

The new timezone import to support datetime.now(timezone.utc) is appropriate for the new test. Optionally, you could later migrate the existing datetime.utcnow().date() usages in this file to the same pattern for consistency, but that’s not required for this PR.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 745bf1c and 9b6d957.

📒 Files selected for processing (1)
  • integration_tests/tests/test_all_columns_anomalies.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
🧬 Code graph analysis (1)
integration_tests/tests/test_all_columns_anomalies.py (2)
integration_tests/tests/conftest.py (2)
  • test_id (180-183)
  • dbt_project (144-147)
integration_tests/tests/dbt_project.py (4)
  • DbtProject (46-334)
  • test (110-128)
  • test (131-149)
  • test (151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (fusion, snowflake) / test
🔇 Additional comments (1)
integration_tests/tests/test_all_columns_anomalies.py (1)

156-256: Detection-period exclusion test scenario is well-structured and exercises both flag paths

The parametrized test_anomaly_in_detection_period looks solid:

  • Training vs detection windows are correctly separated: 30 normal days (utc_now - 37 through utc_now - 8) and 7 anomalous days (utc_now - 7 through utc_now - 1), matching the training_period/detection_period of 30/7 with 1‑day buckets.
  • The null-count setup (mean 10 in training vs 20 in detection) gives a clear contrast so the expected_status flip between (False, "pass") and (True, "fail") directly validates exclude_detection_period_from_training.
  • Focusing the assertion on the superhero column via next(...) is consistent with the other tests and avoids depending on unrelated columns.

No changes needed here from my side.

@arbiv arbiv merged commit d6c7bb3 into master Nov 28, 2025
80 of 90 checks passed
@arbiv arbiv deleted the devin/CORE-93-1763039329 branch November 28, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants