Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Nov 13, 2025

Add exclude_detection_period_from_training flag to event freshness anomaly tests

Summary

This PR implements CORE-91 by adding the exclude_detection_period_from_training parameter to freshness anomaly detection tests. This parameter was already available for volume anomalies and is now exposed to:

  • test_event_freshness_anomalies
  • test_freshness_anomalies

The parameter defaults to false to maintain backward compatibility. When set to true, it excludes metrics from the detection period when calculating training statistics, preventing potentially anomalous recent data from contaminating the baseline.

The underlying infrastructure in get_anomaly_scores_query.sql and get_anomalies_test_configuration.sql already fully supports this parameter - this PR simply exposes it to the freshness test macros.

Review & Testing Checklist for Human

  • Verify scope: Confirm that both test_event_freshness_anomalies AND test_freshness_anomalies should receive this parameter (I added it to both based on symmetry with volume tests, but couldn't access the Linear issue to verify)
  • Test the parameter: Run a freshness anomaly test with exclude_detection_period_from_training: true to verify it works as expected and properly excludes detection period data from training
  • Consider integration tests: Decide if we need to add integration tests similar to test_exclude_detection_period_from_training in test_volume_anomalies.py for freshness tests

Notes

Summary by CodeRabbit

  • New Features

    • Freshness anomaly checks now accept an optional exclude_detection_period_from_training flag (default: false) to exclude detection-period data from model training and adjust anomaly sensitivity.
  • Tests

    • Added an integration test that validates both inclusion and exclusion scenarios, asserting pass/fail outcomes with deterministic timing.

…tests

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@linear
Copy link

linear bot commented Nov 13, 2025

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Walkthrough

Adds an optional exclude_detection_period_from_training parameter (default false) to event-freshness macros and forwards it to elementary.test_table_anomalies. Adds an integration test that verifies behavior with the flag enabled and disabled (skips ClickHouse).

Changes

Cohort / File(s) Summary
Macros — flag propagation
macros/edr/tests/test_event_freshness_anomalies.sql, macros/edr/tests/test_freshness_anomalies.sql
Added parameter exclude_detection_period_from_training (default: false) to macro signatures and passed it through to elementary.test_table_anomalies(...).
Integration test — behavior validation
integration_tests/tests/test_event_freshness_anomalies.py
Added test_exclude_detection_from_training(...) which builds training and detection data, runs two scenarios (flag=false -> anomaly masked; flag=true -> anomaly detected), and skips ClickHouse targets.

Sequence Diagram(s)

sequenceDiagram
  participant IT as Integration Test
  participant Macro as event_freshness macro
  participant Engine as elementary.test_table_anomalies

  Note over IT: prepare training + detection data
  IT->>Macro: call test_event_freshness_anomalies(..., exclude_detection_period_from_training=FLAG)
  Note right of Macro: forwards flag and args
  Macro->>Engine: elementary.test_table_anomalies(..., exclude_detection_period_from_training=FLAG)
  Engine-->>Macro: result (pass/fail)
  Macro-->>IT: return result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify macro default value and correct argument propagation.
  • Review integration test: data construction, training/detection window boundaries, assertions, and ClickHouse skip.

Possibly related PRs

Suggested reviewers

  • GuyEshdat

Poem

🐇 I tucked a flag where freshness roams,
Training learns with or without recent homes.
Turn it on and anomalies spring,
Turn it off and lessons cling.
Hop, test, repeat — a joyful thing. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding the exclude_detection_period_from_training flag to event freshness anomaly tests, matching the core objective of the PR.
Linked Issues check ✅ Passed The PR successfully implements all requirements from CORE-91: adds the exclude_detection_period_from_training parameter to event freshness macros with default false, and includes an integration test demonstrating the flag's effect (anomaly detected with flag=true, not detected with flag=false).
Out of Scope Changes check ✅ Passed All changes are directly related to the CORE-91 objective: parameter additions to two freshness anomaly macros and a corresponding integration test, with no unrelated modifications.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/CORE-91-1763026029

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c809d08 and 055d881.

📒 Files selected for processing (1)
  • integration_tests/tests/test_event_freshness_anomalies.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • integration_tests/tests/test_event_freshness_anomalies.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, snowflake) / test

Comment @coderabbitai help to get the list of available commands and usage tips.

…eshness anomalies

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f432db7 and 23555db.

📒 Files selected for processing (1)
  • integration_tests/tests/test_freshness_anomalies.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

Applied to files:

  • integration_tests/tests/test_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_freshness_anomalies.py (3)
integration_tests/tests/conftest.py (2)
  • test_id (180-183)
  • dbt_project (144-147)
integration_tests/tests/dbt_project.py (1)
  • DbtProject (46-334)
integration_tests/tests/data_generator.py (1)
  • generate_dates (6-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, bigquery) / test

Comment on lines 256 to 270
# Generate 30 days of normal data with consistent freshness (every 2 hours)
normal_data = [
{TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)}
for date in generate_dates(
utc_now - timedelta(days=33), step=timedelta(hours=2), days_back=30
)
]

anomalous_data = [
{TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)}
for date in generate_dates(utc_now, step=timedelta(hours=8), days_back=3)
]

all_data = normal_data + anomalous_data

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normal data window misses the training period

Line 260 anchors the “30 days of normal data” at utc_now - timedelta(days=33), so every row lands 33–63 days in the past. Because the training period in Line 275 spans only the last 30 days, none of this “normal” history is actually available to the model once the detection window is excluded; the baseline is effectively empty, so the failing assertion is driven by missing data rather than the new flag’s behavior. Anchor the normal data immediately before the detection window so the test exercises the intended comparison.

-    normal_data = [
+    detection_start = utc_now - timedelta(days=3)
+    normal_data = [
         {TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)}
         for date in generate_dates(
-            utc_now - timedelta(days=33), step=timedelta(hours=2), days_back=30
+            detection_start, step=timedelta(hours=2), days_back=30
         )
     ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Generate 30 days of normal data with consistent freshness (every 2 hours)
normal_data = [
{TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)}
for date in generate_dates(
utc_now - timedelta(days=33), step=timedelta(hours=2), days_back=30
)
]
anomalous_data = [
{TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)}
for date in generate_dates(utc_now, step=timedelta(hours=8), days_back=3)
]
all_data = normal_data + anomalous_data
# Generate 30 days of normal data with consistent freshness (every 2 hours)
detection_start = utc_now - timedelta(days=3)
normal_data = [
{TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)}
for date in generate_dates(
detection_start, step=timedelta(hours=2), days_back=30
)
]
anomalous_data = [
{TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)}
for date in generate_dates(utc_now, step=timedelta(hours=8), days_back=3)
]
all_data = normal_data + anomalous_data
🤖 Prompt for AI Agents
in integration_tests/tests/test_freshness_anomalies.py around lines 256 to 270,
the "normal" data is anchored 33 days ago so it falls outside the test's
training period; adjust the generate_dates start so the normal window ends
immediately before the detection window by changing the start argument from
utc_now - timedelta(days=33) to utc_now - timedelta(days=3) (keep
step=timedelta(hours=2) and days_back=30) so the normal 30-day history overlaps
the model's training period.

Comment on lines 272 to 280
test_args_without_exclusion = {
"timestamp_column": TIMESTAMP_COLUMN,
"training_period": {"period": "day", "count": 30},
"detection_period": {"period": "day", "count": 3},
"time_bucket": {"period": "day", "count": 1},
"sensitivity": 5, # Higher sensitivity to allow anomaly to be absorbed
# exclude_detection_period_from_training is not set (defaults to False/None)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add mandatory backfill_days to test config

elementary.freshness_anomalies validates that backfill_days is provided (via validate_mandatory_configuration). Without it, this test fails at compilation instead of exercising the new flag. Please include the parameter in both runs.

         "time_bucket": {"period": "day", "count": 1},
         "sensitivity": 5,  # Higher sensitivity to allow anomaly to be absorbed
+        "backfill_days": 30,

Based on learnings

🤖 Prompt for AI Agents
in integration_tests/tests/test_freshness_anomalies.py around lines 272 to 280,
the test config dict test_args_without_exclusion is missing the mandatory
backfill_days parameter causing validation to fail; add "backfill_days": 30 (or
another appropriate integer) to this dict and also add the same backfill_days
key/value to the other test run/config in this test so both executions include
the required parameter.

…s test with proper window alignment

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 23555db and 098d625.

📒 Files selected for processing (1)
  • integration_tests/tests/test_event_freshness_anomalies.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integration_tests/tests/test_event_freshness_anomalies.py (2)
integration_tests/tests/dbt_project.py (5)
  • DbtProject (46-334)
  • seed (259-263)
  • test (110-128)
  • test (131-149)
  • test (151-257)
integration_tests/tests/data_generator.py (1)
  • generate_dates (6-17)
🪛 Ruff (0.14.4)
integration_tests/tests/test_event_freshness_anomalies.py

111-111: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, postgres) / test
🔇 Additional comments (1)
integration_tests/tests/test_event_freshness_anomalies.py (1)

1-1: LGTM! Random import is appropriate for test data generation.

The random module is correctly used to generate deterministic jitter for test data. The static analysis warning about cryptographic usage (S311) is a false positive—this is test data generation, not security-sensitive code.

devin-ai-integration bot and others added 2 commits November 13, 2025 10:36
…ll, and explicit parameters

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…days correctly

The test was failing because detection_period parameter was overriding
backfill_days, causing the detection period to span all 14 days of data
instead of just the last 7 days. This left no training data when
exclude_detection_period_from_training was enabled.

Fixed by:
- Removing training_period and detection_period parameters
- Setting days_back: 14 (scoring window includes both periods)
- Setting backfill_days: 7 (detection period is last 7 days)

This properly splits the data into:
- Training period: days 14-8 (7 days of normal data)
- Detection period: days 7-1 (7 days of anomalous data)

Test now passes locally with postgres.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e3b3ad2 and c809d08.

📒 Files selected for processing (1)
  • integration_tests/tests/test_event_freshness_anomalies.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-10-23T11:01:31.528Z
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

Applied to files:

  • integration_tests/tests/test_event_freshness_anomalies.py
🧬 Code graph analysis (1)
integration_tests/tests/test_event_freshness_anomalies.py (3)
integration_tests/tests/test_volume_anomalies.py (1)
  • test_exclude_detection_from_training (545-621)
integration_tests/tests/dbt_project.py (4)
  • DbtProject (46-334)
  • test (110-128)
  • test (131-149)
  • test (151-257)
integration_tests/tests/test_freshness_anomalies.py (1)
  • test_exclude_detection_from_training (239-320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_official, bigquery) / test
🔇 Additional comments (3)
integration_tests/tests/test_event_freshness_anomalies.py (3)

116-141: LGTM! Date ranges are now correctly aligned.

The data generation correctly places:

  • Normal data (2-8 min lag) in the training period: days -14 to -8
  • Anomalous data (5-hour lag) in the detection period: days -7 to -1

This alignment ensures that with exclude_detection_period_from_training=True, only normal data trains the model and the anomaly is detected.


145-197: Test structure and assertions look good.

The test correctly implements the two-scenario pattern:

  1. Without exclusion: detection data included in training → test passes (anomaly missed)
  2. With exclusion: detection data excluded from training → test fails (anomaly detected)

Both scenarios use the same data and test configuration, differing only in the exclude_detection_period_from_training flag. The assertions properly validate the expected behavior in each case.


146-159: I need to investigate whether these parameters are required, optional, or have defaults. Let me search for how they're validated and used.

Let me search the codebase to see how these parameters are handled when not provided:

Let me search the codebase directly for how these parameters are validated and handled:

Based on the script output and my analysis, I can now provide the rewritten review comment. The key findings are:

  1. The macro test_event_freshness_anomalies.sql shows that detection_period and training_period are explicit parameters in the macro signature (line 1)
  2. These parameters are passed through to test_table_anomalies (lines 34-35)
  3. The test at lines 146-159 (test_args_without_exclusion) does NOT include training_period or detection_period
  4. The Python test documentation at line 108 claims "Explicit training_period and detection_period parameters" but the actual test config doesn't provide them

Given the learnings about mandatory parameter validation and the fact that these parameters are explicitly in the macro signature without defaults, this is a legitimate concern. The test should either specify these parameters or the code comment should be corrected.

Omit explicit training_period and detection_period parameters if they default correctly; otherwise, add them.

The test compares metrics within the detection_period to training_period data, then fails if anomalies are detected in the detection period. The current test (lines 146-159) omits both training_period and detection_period, relying on defaults. However, the macro signature shows these are explicit parameters. Verify they're optional with sensible defaults, or add them explicitly. The docstring at line 108 mentions "Explicit training_period and detection_period parameters," but test_args_without_exclusion doesn't include them.

@arbiv arbiv merged commit 2cddf06 into master Nov 19, 2025
32 checks passed
@arbiv arbiv deleted the devin/CORE-91-1763026029 branch November 19, 2025 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants