Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Nov 13, 2025

Add exclude_detection_period_from_training flag to column anomalies tests

Summary

This PR adds the exclude_detection_period_from_training parameter to column-level anomaly detection tests (column_anomalies and all_columns_anomalies), bringing them to feature parity with table anomaly tests. The flag controls whether detection period data is included in the training baseline when calculating anomaly scores.

Changes:

  • Added exclude_detection_period_from_training=false parameter to test_column_anomalies.sql macro
  • Added exclude_detection_period_from_training=false parameter to test_all_columns_anomalies.sql macro
  • Added integration test test_column_anomalies_exclude_detection_period_from_training to verify flag behavior

Behavior:

  • When False (default): Detection period data is included in training baseline, maintaining backward compatibility
  • When True: Detection period data is excluded from training baseline, making anomaly detection more sensitive to recent changes

Review & Testing Checklist for Human

  • Verify the integration test logic is correct: The test assumes that with constrained 1-day windows and exclude_detection_period_from_training=False, including detection data in training will prevent anomaly detection. This assumption should be validated by examining the actual anomaly detection SQL logic in get_anomaly_scores_query and related macros.
  • Check CI test results across all database platforms: The integration test uses specific parameters (1-day training/detection periods, min_training_set_size=1) that may behave differently across databases. Verify the test passes consistently on Postgres, Snowflake, BigQuery, Redshift, Databricks, etc.
  • Validate backward compatibility: Confirm that existing column anomaly tests continue to work as expected with the default exclude_detection_period_from_training=false value.
  • Test the flag manually: Create a simple dbt project with column anomaly tests using both exclude_detection_period_from_training=true and false to verify the behavior matches expectations in a real-world scenario.

Notes

Requested by: Yosef Arbiv (yosef@elementary-data.com), @arbiv
Devin session: https://app.devin.ai/sessions/9129463181774101aa708c1e876174e6

Summary by CodeRabbit

  • Enhancements

    • Added an exclude_detection_period_from_training option for anomaly detection tests (default: false) to control whether the detection window is excluded from model training.
  • Tests

    • Added test coverage validating behavior when the option is true vs false, confirming that excluding the detection period changes anomaly detection outcomes as expected.

…ests

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@linear
Copy link

linear bot commented Nov 13, 2025

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Walkthrough

Added a boolean parameter exclude_detection_period_from_training (default false) to all_columns_anomalies and column_anomalies macros and propagated it into get_anomalies_test_configuration. Added an integration test that verifies behavior when the flag is false (PASS) and true (FAIL).

Changes

Cohort / File(s) Summary
Anomaly test macro signatures & propagation
macros/edr/tests/test_all_columns_anomalies.sql, macros/edr/tests/test_column_anomalies.sql
Added parameter exclude_detection_period_from_training=false to macro signatures and passed exclude_detection_period_from_training=exclude_detection_period_from_training into get_anomalies_test_configuration calls.
Integration test
integration_tests/tests/test_column_anomalies.py
Added test_col_anom_excl_detect_train which generates training (30 days) and detection (7 days with anomalies) data and asserts outcomes for exclude_detection_period_from_training=False (expected PASS) and True (expected FAIL); test is skipped for ClickHouse targets.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Macro caller
  participant Macro as column/all_columns_anomalies
  participant Config as get_anomalies_test_configuration
  participant Runner as Test runner

  rect rgb(245,250,255)
  Note over Macro: new param\nexclude_detection_period_from_training (default: false)
  end

  Caller->>Macro: invoke(..., exclude_detection_period_from_training=val)
  Macro->>Config: get_anomalies_test_configuration(..., exclude_detection_period_from_training=val)
  Config-->>Macro: test_configuration
  Macro->>Runner: execute test using test_configuration
  Runner-->>Caller: test result (PASS / FAIL)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas to focus on:
    • Macro signatures in macros/edr/tests/test_all_columns_anomalies.sql and macros/edr/tests/test_column_anomalies.sql for correct default and naming.
    • All invocations of get_anomalies_test_configuration to ensure the new param is propagated consistently.
    • Integration test integration_tests/tests/test_column_anomalies.py: data generation windows, expected assertions, and correct use of skip_targets(["clickhouse"]).

Possibly related PRs

Suggested reviewers

  • GuyEshdat

Poem

🐇 I hopped a flag into macros neat and small,
If false we learn, if true the anomaly will call,
Tests spin up data, thirty days then seven,
Two outcomes bloom beneath the testing heaven,
A carrot cheer — the rabbit hops, that's all! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding the exclude_detection_period_from_training flag to column anomalies tests, which matches the primary objective.
Linked Issues check ✅ Passed The PR fully implements the CORE-92 objectives: adds the exclude_detection_period_from_training parameter to column anomaly macros, forwards it to configuration logic, maintains backward compatibility (default false), and includes an integration test validating the flag's behavior.
Out of Scope Changes check ✅ Passed All changes are directly in scope: macro signature updates to test_column_anomalies.sql and test_all_columns_anomalies.sql, parameter forwarding to get_anomalies_test_configuration, and a focused integration test validating the new feature.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/CORE-92-1763027970

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8a70f2a and 7fdd2d7.

📒 Files selected for processing (1)
  • integration_tests/tests/test_column_anomalies.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

Applied to files:

  • integration_tests/tests/test_column_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_column_anomalies.py (1)
integration_tests/tests/dbt_project.py (4)
  • DbtProject (46-334)
  • test (110-128)
  • test (131-149)
  • test (151-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
🔇 Additional comments (3)
integration_tests/tests/test_column_anomalies.py (3)

493-531: LGTM! Well-structured test data generation.

The data generation creates a clear distinction between normal (8-12 nulls/day over 30 days) and anomalous (20 nulls/day over 7 days) periods with no temporal overlap. The ~100% spike in null count should provide reliable detection behavior when the flag is toggled.


533-574: Verify cross-database consistency of detection behavior.

The test logic is sound: when exclude_detection_period_from_training=False, the 7-day anomalous period is included in the 30-day training baseline, potentially masking the anomaly. When True, the anomalous period is excluded, allowing detection. The use of force_metrics_backfill=True for both runs correctly prevents incremental metric caching issues.

However, the expected behavior depends on the anomaly detection algorithm's sensitivity to baseline composition. Ensure CI validates that:

  1. The 100% null-count spike with anomaly_sensitivity=5 reliably avoids detection when included in training (False → PASS).
  2. The same spike reliably triggers detection when excluded (True → FAIL).

Based on learnings and PR objectives.


555-580: LGTM! Clear assertions with descriptive messages.

The assertions correctly validate the expected behavior for both flag states, with helpful error messages that explain the reasoning. The expected outcomes align with the PR objectives and the test scenario design.


Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration bot and others added 2 commits November 13, 2025 10:07
….sql

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Test demonstrates that:
- When flag=False: detection period data is included in training baseline, preventing anomaly detection
- When flag=True: detection period data is excluded from training baseline, enabling anomaly detection

The test uses constrained time windows (1 day training/detection) to make the behavior deterministic.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
integration_tests/tests/test_column_anomalies.py (2)

503-537: Consider extracting common test arguments to reduce duplication.

The test_args_false and test_args_true dictionaries are identical except for the exclude_detection_period_from_training parameter. This duplication could be reduced.

Apply this diff to extract common arguments:

+    base_test_args = {
+        "timestamp_column": TIMESTAMP_COLUMN,
+        "column_anomalies": ["null_count"],
+        "time_bucket": {"period": "day", "count": 1},
+        "training_period": {"period": "day", "count": 1},
+        "detection_period": {"period": "day", "count": 1},
+        "min_training_set_size": 1,
+        "anomaly_sensitivity": 3,
+        "anomaly_direction": "spike",
+    }
+
-    test_args_false = {
-        "timestamp_column": TIMESTAMP_COLUMN,
-        "column_anomalies": ["null_count"],
-        "time_bucket": {"period": "day", "count": 1},
-        "training_period": {"period": "day", "count": 1},
-        "detection_period": {"period": "day", "count": 1},
-        "min_training_set_size": 1,
-        "anomaly_sensitivity": 3,
-        "anomaly_direction": "spike",
-        "exclude_detection_period_from_training": False,
-    }
+    test_args_false = {
+        **base_test_args,
+        "exclude_detection_period_from_training": False,
+    }
     test_result_false = dbt_project.test(
         test_id,
         DBT_TEST_NAME,
         test_args_false,
         data=data,
         test_column="superhero",
         test_vars={"force_metrics_backfill": True},
     )
     assert test_result_false["status"] == "pass", (
         "Expected PASS when exclude_detection_period_from_training=False "
         "(detection data included in training baseline)"
     )

-    test_args_true = {
-        "timestamp_column": TIMESTAMP_COLUMN,
-        "column_anomalies": ["null_count"],
-        "time_bucket": {"period": "day", "count": 1},
-        "training_period": {"period": "day", "count": 1},
-        "detection_period": {"period": "day", "count": 1},
-        "min_training_set_size": 1,
-        "anomaly_sensitivity": 3,
-        "anomaly_direction": "spike",
-        "exclude_detection_period_from_training": True,
-    }
+    test_args_true = {
+        **base_test_args,
+        "exclude_detection_period_from_training": True,
+    }

486-501: Consider adding a comment explaining the test scenario.

The test uses training_period: 1 day but generates 30 days of training data. While this is correct (only the most recent day of training data is used based on the training_period parameter), a brief comment would clarify the test setup for future maintainers.

Consider adding a comment like:

    utc_today = datetime.utcnow().date()
    test_date, *training_dates = generate_dates(base_date=utc_today - timedelta(1))

    # Generate 30 days of training data (Superman/Batman, no nulls)
    # With training_period=1 day, only the most recent training day will be used as baseline
    data: List[Dict[str, Any]] = [
        ...
    ]

    # Add anomalous test data: 10 nulls on the detection day
    data += [
        ...
    ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8c2588a and 6b60a66.

📒 Files selected for processing (1)
  • integration_tests/tests/test_column_anomalies.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

Applied to files:

  • integration_tests/tests/test_column_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_column_anomalies.py (3)
integration_tests/tests/conftest.py (2)
  • test_id (180-183)
  • dbt_project (144-147)
integration_tests/tests/dbt_project.py (4)
  • DbtProject (46-334)
  • test (110-128)
  • test (131-149)
  • test (151-257)
integration_tests/tests/data_generator.py (1)
  • generate_dates (6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_official, dremio) / test
🔇 Additional comments (1)
integration_tests/tests/test_column_anomalies.py (1)

481-549: LGTM! Test correctly validates the exclude_detection_period_from_training feature.

The test logic is sound and follows the pattern established by other tests in this file:

  • When exclude_detection_period_from_training=False, the detection period data (10 nulls) is included in the training baseline, making the nulls appear normal → PASS
  • When exclude_detection_period_from_training=True, the detection period is excluded from the baseline (only 2 non-null rows from the training period), making the 10 nulls anomalous → FAIL

The use of force_metrics_backfill=True ensures proper test isolation when calling dbt_project.test twice with the same test_id.

devin-ai-integration bot and others added 2 commits November 19, 2025 10:49
… more substantial dataset

- Use 30 days of normal data with low null count (0-2 nulls/day) instead of 1 day
- Use 7 days of anomalous data with high null count (20 nulls/day) instead of 1 day
- Update training period to 30 days and detection period to 7 days
- Add more data per day to create clearer anomaly signal
- Use separate test IDs for the two test runs to avoid conflicts
- Pattern matches successful volume and freshness anomalies tests

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…data

- Rename test function to fix Postgres 63-char table name limit
- Update test data: 8-12 nulls/day normal, 20 nulls/day anomalous
- Increase sensitivity to 5 to match volume anomalies pattern
- Shorten test ID suffixes to _f/_t
- Test now passes locally with postgres

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@arbiv arbiv merged commit cccc439 into master Nov 20, 2025
32 checks passed
@arbiv arbiv deleted the devin/CORE-92-1763027970 branch November 20, 2025 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants