Add exclude_detection_period_from_training flag to column anomalies tests #889

devin-ai-integration · 2025-11-13T10:02:31Z

Add exclude_detection_period_from_training flag to column anomalies tests

Summary

This PR adds the exclude_detection_period_from_training parameter to column-level anomaly detection tests (column_anomalies and all_columns_anomalies), bringing them to feature parity with table anomaly tests. The flag controls whether detection period data is included in the training baseline when calculating anomaly scores.

Changes:

Added exclude_detection_period_from_training=false parameter to test_column_anomalies.sql macro
Added exclude_detection_period_from_training=false parameter to test_all_columns_anomalies.sql macro
Added integration test test_column_anomalies_exclude_detection_period_from_training to verify flag behavior

Behavior:

When False (default): Detection period data is included in training baseline, maintaining backward compatibility
When True: Detection period data is excluded from training baseline, making anomaly detection more sensitive to recent changes

Review & Testing Checklist for Human

Verify the integration test logic is correct: The test assumes that with constrained 1-day windows and exclude_detection_period_from_training=False, including detection data in training will prevent anomaly detection. This assumption should be validated by examining the actual anomaly detection SQL logic in get_anomaly_scores_query and related macros.
Check CI test results across all database platforms: The integration test uses specific parameters (1-day training/detection periods, min_training_set_size=1) that may behave differently across databases. Verify the test passes consistently on Postgres, Snowflake, BigQuery, Redshift, Databricks, etc.
Validate backward compatibility: Confirm that existing column anomaly tests continue to work as expected with the default exclude_detection_period_from_training=false value.
Test the flag manually: Create a simple dbt project with column anomaly tests using both exclude_detection_period_from_training=true and false to verify the behavior matches expectations in a real-world scenario.

Notes

Initial commit had a Jinja syntax error (%> instead of %}) that was fixed in commit 8c2588a
Local testing failed due to environment configuration issues, but this shouldn't affect CI
The integration test uses force_metrics_backfill=True for both test runs to avoid incremental metric caching issues
Related to Linear issue: https://linear.app/elementary/issue/CORE-92/add-exclude-detection-period-from-training-flag-to-column-test

Requested by: Yosef Arbiv (yosef@elementary-data.com), @arbiv
Devin session: https://app.devin.ai/sessions/9129463181774101aa708c1e876174e6

Summary by CodeRabbit

Enhancements
- Added an exclude_detection_period_from_training option for anomaly detection tests (default: false) to control whether the detection window is excluded from model training.
Tests
- Added test coverage validating behavior when the option is true vs false, confirming that excluding the detection period changes anomaly detection outcomes as expected.

…ests Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

linear · 2025-11-13T10:02:34Z

CORE-92 Add `exclude_detection_period_from_training` flag to column anomalies test

devin-ai-integration · 2025-11-13T10:02:35Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2025-11-13T10:02:39Z

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2025-11-13T10:02:47Z

Walkthrough

Added a boolean parameter exclude_detection_period_from_training (default false) to all_columns_anomalies and column_anomalies macros and propagated it into get_anomalies_test_configuration. Added an integration test that verifies behavior when the flag is false (PASS) and true (FAIL).

Changes

Cohort / File(s)	Summary
Anomaly test macro signatures & propagation `macros/edr/tests/test_all_columns_anomalies.sql`, `macros/edr/tests/test_column_anomalies.sql`	Added parameter `exclude_detection_period_from_training=false` to macro signatures and passed `exclude_detection_period_from_training=exclude_detection_period_from_training` into `get_anomalies_test_configuration` calls.
Integration test `integration_tests/tests/test_column_anomalies.py`	Added `test_col_anom_excl_detect_train` which generates training (30 days) and detection (7 days with anomalies) data and asserts outcomes for `exclude_detection_period_from_training=False` (expected PASS) and `True` (expected FAIL); test is skipped for ClickHouse targets.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Macro caller
  participant Macro as column/all_columns_anomalies
  participant Config as get_anomalies_test_configuration
  participant Runner as Test runner

  rect rgb(245,250,255)
  Note over Macro: new param\nexclude_detection_period_from_training (default: false)
  end

  Caller->>Macro: invoke(..., exclude_detection_period_from_training=val)
  Macro->>Config: get_anomalies_test_configuration(..., exclude_detection_period_from_training=val)
  Config-->>Macro: test_configuration
  Macro->>Runner: execute test using test_configuration
  Runner-->>Caller: test result (PASS / FAIL)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas to focus on:
- Macro signatures in macros/edr/tests/test_all_columns_anomalies.sql and macros/edr/tests/test_column_anomalies.sql for correct default and naming.
- All invocations of get_anomalies_test_configuration to ensure the new param is propagated consistently.
- Integration test integration_tests/tests/test_column_anomalies.py: data generation windows, expected assertions, and correct use of skip_targets(["clickhouse"]).

Possibly related PRs

Add exclude_detection_period_from_training flag to volume_anomalies test #877 — Propagates exclude_detection_period_from_training into anomaly test configuration and related macros.
Add exclude_detection_period_from_training flag to freshness anomaly tests #885 — Adds the same flag and integration test wiring through anomaly macros and configuration.

Suggested reviewers

GuyEshdat

Poem

🐇 I hopped a flag into macros neat and small,
If false we learn, if true the anomaly will call,
Tests spin up data, thirty days then seven,
Two outcomes bloom beneath the testing heaven,
A carrot cheer — the rabbit hops, that's all! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: adding the exclude_detection_period_from_training flag to column anomalies tests, which matches the primary objective.
Linked Issues check	✅ Passed	The PR fully implements the CORE-92 objectives: adds the exclude_detection_period_from_training parameter to column anomaly macros, forwards it to configuration logic, maintains backward compatibility (default false), and includes an integration test validating the flag's behavior.
Out of Scope Changes check	✅ Passed	All changes are directly in scope: macro signature updates to test_column_anomalies.sql and test_all_columns_anomalies.sql, parameter forwarding to get_anomalies_test_configuration, and a focused integration test validating the new feature.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/CORE-92-1763027970

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8a70f2a and 7fdd2d7.

📒 Files selected for processing (1)

integration_tests/tests/test_column_anomalies.py (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

📚 Learning: 2025-10-23T11:01:31.528Z

Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

Applied to files:

integration_tests/tests/test_column_anomalies.py

🧬 Code graph analysis (1)

integration_tests/tests/test_column_anomalies.py (1)

integration_tests/tests/dbt_project.py (4)

DbtProject (46-334)

test (110-128)

test (131-149)

test (151-257)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: test (latest_official, athena) / test
GitHub Check: test (fusion, bigquery) / test
GitHub Check: test (latest_official, clickhouse) / test
GitHub Check: test (fusion, snowflake) / test
GitHub Check: test (fusion, redshift) / test
GitHub Check: test (latest_official, trino) / test
GitHub Check: test (latest_official, redshift) / test
GitHub Check: test (latest_official, bigquery) / test
GitHub Check: test (latest_pre, postgres) / test
GitHub Check: test (latest_official, snowflake) / test
GitHub Check: test (fusion, databricks_catalog) / test
GitHub Check: test (latest_official, postgres) / test
GitHub Check: test (latest_official, dremio) / test
GitHub Check: test (latest_official, databricks_catalog) / test

🔇 Additional comments (3)

integration_tests/tests/test_column_anomalies.py (3)

493-531: LGTM! Well-structured test data generation.

The data generation creates a clear distinction between normal (8-12 nulls/day over 30 days) and anomalous (20 nulls/day over 7 days) periods with no temporal overlap. The ~100% spike in null count should provide reliable detection behavior when the flag is toggled.

533-574: Verify cross-database consistency of detection behavior.

The test logic is sound: when exclude_detection_period_from_training=False, the 7-day anomalous period is included in the 30-day training baseline, potentially masking the anomaly. When True, the anomalous period is excluded, allowing detection. The use of force_metrics_backfill=True for both runs correctly prevents incremental metric caching issues.

However, the expected behavior depends on the anomaly detection algorithm's sensitivity to baseline composition. Ensure CI validates that:

The 100% null-count spike with anomaly_sensitivity=5 reliably avoids detection when included in training (False → PASS).

The same spike reliably triggers detection when excluded (True → FAIL).

Based on learnings and PR objectives.

555-580: LGTM! Clear assertions with descriptive messages.

The assertions correctly validate the expected behavior for both flag states, with helpful error messages that explain the reasoning. The expected outcomes align with the PR objectives and the test scenario design.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

….sql Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Test demonstrates that: - When flag=False: detection period data is included in training baseline, preventing anomaly detection - When flag=True: detection period data is excluded from training baseline, enabling anomaly detection The test uses constrained time windows (1 day training/detection) to make the behavior deterministic. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

integration_tests/tests/test_column_anomalies.py (2)

503-537: Consider extracting common test arguments to reduce duplication.

The test_args_false and test_args_true dictionaries are identical except for the exclude_detection_period_from_training parameter. This duplication could be reduced.

Apply this diff to extract common arguments:

+    base_test_args = {
+        "timestamp_column": TIMESTAMP_COLUMN,
+        "column_anomalies": ["null_count"],
+        "time_bucket": {"period": "day", "count": 1},
+        "training_period": {"period": "day", "count": 1},
+        "detection_period": {"period": "day", "count": 1},
+        "min_training_set_size": 1,
+        "anomaly_sensitivity": 3,
+        "anomaly_direction": "spike",
+    }
+
-    test_args_false = {
-        "timestamp_column": TIMESTAMP_COLUMN,
-        "column_anomalies": ["null_count"],
-        "time_bucket": {"period": "day", "count": 1},
-        "training_period": {"period": "day", "count": 1},
-        "detection_period": {"period": "day", "count": 1},
-        "min_training_set_size": 1,
-        "anomaly_sensitivity": 3,
-        "anomaly_direction": "spike",
-        "exclude_detection_period_from_training": False,
-    }
+    test_args_false = {
+        **base_test_args,
+        "exclude_detection_period_from_training": False,
+    }
     test_result_false = dbt_project.test(
         test_id,
         DBT_TEST_NAME,
         test_args_false,
         data=data,
         test_column="superhero",
         test_vars={"force_metrics_backfill": True},
     )
     assert test_result_false["status"] == "pass", (
         "Expected PASS when exclude_detection_period_from_training=False "
         "(detection data included in training baseline)"
     )

-    test_args_true = {
-        "timestamp_column": TIMESTAMP_COLUMN,
-        "column_anomalies": ["null_count"],
-        "time_bucket": {"period": "day", "count": 1},
-        "training_period": {"period": "day", "count": 1},
-        "detection_period": {"period": "day", "count": 1},
-        "min_training_set_size": 1,
-        "anomaly_sensitivity": 3,
-        "anomaly_direction": "spike",
-        "exclude_detection_period_from_training": True,
-    }
+    test_args_true = {
+        **base_test_args,
+        "exclude_detection_period_from_training": True,
+    }

486-501: Consider adding a comment explaining the test scenario.

The test uses training_period: 1 day but generates 30 days of training data. While this is correct (only the most recent day of training data is used based on the training_period parameter), a brief comment would clarify the test setup for future maintainers.

Consider adding a comment like:

    utc_today = datetime.utcnow().date()
    test_date, *training_dates = generate_dates(base_date=utc_today - timedelta(1))

    # Generate 30 days of training data (Superman/Batman, no nulls)
    # With training_period=1 day, only the most recent training day will be used as baseline
    data: List[Dict[str, Any]] = [
        ...
    ]

    # Add anomalous test data: 10 nulls on the detection day
    data += [
        ...
    ]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8c2588a and 6b60a66.

📒 Files selected for processing (1)

integration_tests/tests/test_column_anomalies.py (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

📚 Learning: 2025-10-23T11:01:31.528Z

Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

Applied to files:

integration_tests/tests/test_column_anomalies.py

🧬 Code graph analysis (1)

integration_tests/tests/test_column_anomalies.py (3)

integration_tests/tests/conftest.py (2)

test_id (180-183)

dbt_project (144-147)

integration_tests/tests/dbt_project.py (4)

DbtProject (46-334)

test (110-128)

test (131-149)

test (151-257)

integration_tests/tests/data_generator.py (1)

generate_dates (6-17)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: test (fusion, snowflake) / test
GitHub Check: test (latest_official, snowflake) / test
GitHub Check: test (latest_official, postgres) / test
GitHub Check: test (latest_official, bigquery) / test
GitHub Check: test (fusion, bigquery) / test
GitHub Check: test (latest_pre, postgres) / test
GitHub Check: test (latest_official, clickhouse) / test
GitHub Check: test (latest_official, redshift) / test
GitHub Check: test (latest_official, trino) / test
GitHub Check: test (latest_official, databricks_catalog) / test
GitHub Check: test (fusion, redshift) / test
GitHub Check: test (latest_official, athena) / test
GitHub Check: test (fusion, databricks_catalog) / test
GitHub Check: test (latest_official, dremio) / test

🔇 Additional comments (1)

integration_tests/tests/test_column_anomalies.py (1)

481-549: LGTM! Test correctly validates the exclude_detection_period_from_training feature.

The test logic is sound and follows the pattern established by other tests in this file:

When exclude_detection_period_from_training=False, the detection period data (10 nulls) is included in the training baseline, making the nulls appear normal → PASS

When exclude_detection_period_from_training=True, the detection period is excluded from the baseline (only 2 non-null rows from the training period), making the 10 nulls anomalous → FAIL

The use of force_metrics_backfill=True ensures proper test isolation when calling dbt_project.test twice with the same test_id.

… more substantial dataset - Use 30 days of normal data with low null count (0-2 nulls/day) instead of 1 day - Use 7 days of anomalous data with high null count (20 nulls/day) instead of 1 day - Update training period to 30 days and detection period to 7 days - Add more data per day to create clearer anomaly signal - Use separate test IDs for the two test runs to avoid conflicts - Pattern matches successful volume and freshness anomalies tests Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

…data - Rename test function to fix Postgres 63-char table name limit - Update test data: 8-12 nulls/day normal, 20 nulls/day anomalous - Increase sensitivity to 5 to match volume anomalies pattern - Shorten test ID suffixes to _f/_t - Test now passes locally with postgres Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

Add exclude_detection_period_from_training flag to column anomalies t…

bb75d82

…ests Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

devin-ai-integration bot and others added 2 commits November 13, 2025 10:07

Fix Jinja syntax error: change %> to %} in test_all_columns_anomalies…

8c2588a

….sql Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

devin-ai-integration bot and others added 2 commits November 19, 2025 10:49

GuyEshdat approved these changes Nov 19, 2025

View reviewed changes

arbiv merged commit cccc439 into master Nov 20, 2025
32 checks passed

arbiv deleted the devin/CORE-92-1763027970 branch November 20, 2025 10:46

coderabbitai bot mentioned this pull request Nov 20, 2025

Add exclude_detection_period_from_training flag to column/dimension anomaly tests #891

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add exclude_detection_period_from_training flag to column anomalies tests #889

Add exclude_detection_period_from_training flag to column anomalies tests #889

Uh oh!

devin-ai-integration bot commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

linear bot commented Nov 13, 2025

Uh oh!

devin-ai-integration bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add exclude_detection_period_from_training flag to column anomalies tests #889

Add exclude_detection_period_from_training flag to column anomalies tests #889

Uh oh!

Conversation

devin-ai-integration bot commented Nov 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add exclude_detection_period_from_training flag to column anomalies tests

Summary

Review & Testing Checklist for Human

Notes

Summary by CodeRabbit

Uh oh!

linear bot commented Nov 13, 2025

Uh oh!

devin-ai-integration bot commented Nov 13, 2025

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

devin-ai-integration bot commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 13, 2025 •

edited

Loading