Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
from datetime import datetime, timedelta

import pytest
from data_generator import DATE_FORMAT
from dbt_project import DbtProject

TIMESTAMP_COLUMN = "updated_at"
DBT_TEST_NAME = "elementary.volume_anomalies"
DBT_TEST_ARGS = {"timestamp_column": TIMESTAMP_COLUMN}


@pytest.mark.skip_targets(["clickhouse"])
def test_exclude_detection_period_from_training_baseline(
test_id: str, dbt_project: DbtProject
):
"""
Test case for CORE-19: Validates the exclude_detection_period_from_training flag functionality.

This test demonstrates the core use case where:
1. Detection period contains anomalous data that gets absorbed into training baseline
2. WITHOUT exclusion: Anomaly is missed (test passes) because it's included in training
3. WITH exclusion: Anomaly is detected (test fails) because it's excluded from training

Test Scenario:
- 30 days of normal data: 100 rows per day (baseline pattern)
- 7 days of anomalous data: 110 rows per day (10% increase) in the detection period
- Training period: 30 days
- Detection period: 7 days
- Time bucket: Daily aggregation
- Sensitivity: 10 (high threshold to demonstrate masking effect)

The 10% increase across 7 days gets absorbed into the cumulative training average,
making the anomaly undetectable with the current implementation.

Current Behavior (WITHOUT flag):
- Test PASSES (no anomaly detected) because the 10% increase is absorbed into the
cumulative training baseline when detection period data is included.

Expected Behavior (WITH flag):
- Test FAILS (anomaly detected) because the detection period is excluded from training,
so the 10% increase is properly detected against the clean 30-day baseline.
"""
Comment on lines +16 to +42
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Test scenario contradicts PR objectives and CORE-19 requirements.

The docstring documents a 10% increase (110 rows/day) and expects the test to PASS (anomaly missed), but the PR objectives and CORE-19 requirements clearly specify:

  • Anomalous data: 500 rows/day (5x spike, not 10% increase)
  • Expected behavior: Anomaly is detected (test status == "fail")

The current 10% increase may be too subtle to demonstrate the core use case. CORE-19 aims to validate that a clear anomaly (5x spike) is properly detected when excluded from training.

Apply this diff to align with requirements:

-    - 7 days of anomalous data: 110 rows per day (10% increase) in the detection period
+    - 7 days of anomalous data: 500 rows per day (5x spike) in the detection period
     - Training period: 30 days
     - Detection period: 7 days
     - Time bucket: Daily aggregation
-    - Sensitivity: 10 (high threshold to demonstrate masking effect)
+    - Sensitivity: 3 (standard threshold for clear anomaly detection)
 
-    The 10% increase across 7 days gets absorbed into the cumulative training average,
-    making the anomaly undetectable with the current implementation.
+    The 5x spike across 7 days gets absorbed into the cumulative training average,
+    but is still large enough to be detected with the current implementation.
 
     Current Behavior (WITHOUT flag):
-    - Test PASSES (no anomaly detected) because the 10% increase is absorbed into the
-      cumulative training baseline when detection period data is included.
+    - Test FAILS (anomaly detected) because the 5x spike is large enough to be detected
+      even when detection period data is included in the cumulative training baseline.
 
     Expected Behavior (WITH flag):
-    - Test FAILS (anomaly detected) because the detection period is excluded from training,
-      so the 10% increase is properly detected against the clean 30-day baseline.
+    - Test FAILS (anomaly detected) with more confidence because the detection period 
+      is excluded from training, ensuring the 5x spike is detected against the clean baseline.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""
Test case for CORE-19: Validates the exclude_detection_period_from_training flag functionality.
This test demonstrates the core use case where:
1. Detection period contains anomalous data that gets absorbed into training baseline
2. WITHOUT exclusion: Anomaly is missed (test passes) because it's included in training
3. WITH exclusion: Anomaly is detected (test fails) because it's excluded from training
Test Scenario:
- 30 days of normal data: 100 rows per day (baseline pattern)
- 7 days of anomalous data: 110 rows per day (10% increase) in the detection period
- Training period: 30 days
- Detection period: 7 days
- Time bucket: Daily aggregation
- Sensitivity: 10 (high threshold to demonstrate masking effect)
The 10% increase across 7 days gets absorbed into the cumulative training average,
making the anomaly undetectable with the current implementation.
Current Behavior (WITHOUT flag):
- Test PASSES (no anomaly detected) because the 10% increase is absorbed into the
cumulative training baseline when detection period data is included.
Expected Behavior (WITH flag):
- Test FAILS (anomaly detected) because the detection period is excluded from training,
so the 10% increase is properly detected against the clean 30-day baseline.
"""
"""
Test case for CORE-19: Validates the exclude_detection_period_from_training flag functionality.
This test demonstrates the core use case where:
1. Detection period contains anomalous data that gets absorbed into training baseline
2. WITHOUT exclusion: Anomaly is missed (test passes) because it's included in training
3. WITH exclusion: Anomaly is detected (test fails) because it's excluded from training
Test Scenario:
- 30 days of normal data: 100 rows per day (baseline pattern)
- 7 days of anomalous data: 500 rows per day (5x spike) in the detection period
- Training period: 30 days
- Detection period: 7 days
- Time bucket: Daily aggregation
- Sensitivity: 3 (standard threshold for clear anomaly detection)
The 5x spike across 7 days gets absorbed into the cumulative training average,
but is still large enough to be detected with the current implementation.
Current Behavior (WITHOUT flag):
- Test FAILS (anomaly detected) because the 5x spike is large enough to be detected
even when detection period data is included in the cumulative training baseline.
Expected Behavior (WITH flag):
- Test FAILS (anomaly detected) with more confidence because the detection period
is excluded from training, ensuring the 5x spike is detected against the clean baseline.
"""
🤖 Prompt for AI Agents
integration_tests/tests/test_exclude_detection_period_from_training.py lines
16-42: The docstring and test scenario currently describe a 10% increase (110
rows/day) and claim the test should PASS, which contradicts CORE-19 and the PR
objectives that require a clear 5x spike (500 rows/day) and an expected FAIL
when exclusion is applied. Update the docstring and any test parameters in this
range to specify anomalous data as 500 rows/day (5x spike), state the expected
behavior as "FAIL" (anomaly detected) when exclusion is used, and adjust any
referenced percentages or descriptions to reflect the 5x spike scenario.

now = datetime.utcnow()

normal_data = []
for day_offset in range(37, 7, -1):
date = now - timedelta(days=day_offset)
for _ in range(100):
normal_data.append({TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)})

anomalous_data = []
for day_offset in range(7, 0, -1):
date = now - timedelta(days=day_offset)
for _ in range(110):
anomalous_data.append({TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)})

Comment on lines +43 to +56
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Anomalous data generation contradicts PR requirements.

The code generates 110 rows per day (10% increase), but PR objectives specify 500 rows per day (5x spike). This undermines the test's ability to validate the core CORE-19 scenario.

Apply this diff to generate the 5x spike:

     anomalous_data = []
     for day_offset in range(7, 0, -1):
         date = now - timedelta(days=day_offset)
-        for _ in range(110):
+        for _ in range(500):
             anomalous_data.append({TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
now = datetime.utcnow()
normal_data = []
for day_offset in range(37, 7, -1):
date = now - timedelta(days=day_offset)
for _ in range(100):
normal_data.append({TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)})
anomalous_data = []
for day_offset in range(7, 0, -1):
date = now - timedelta(days=day_offset)
for _ in range(110):
anomalous_data.append({TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)})
now = datetime.utcnow()
normal_data = []
for day_offset in range(37, 7, -1):
date = now - timedelta(days=day_offset)
for _ in range(100):
normal_data.append({TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)})
anomalous_data = []
for day_offset in range(7, 0, -1):
date = now - timedelta(days=day_offset)
for _ in range(500):
anomalous_data.append({TIMESTAMP_COLUMN: date.strftime(DATE_FORMAT)})
🤖 Prompt for AI Agents
In integration_tests/tests/test_exclude_detection_period_from_training.py around
lines 43 to 56, the anomalous data loop currently appends 110 rows per day which
contradicts the PR requirement for a 5x spike; change the anomalous_data
generation to append 500 rows per day (i.e., replace the 110 count with 500) so
the detection-period days have a fivefold increase compared to the 100-row
baseline.

data = normal_data + anomalous_data

test_args = {
**DBT_TEST_ARGS,
"time_bucket": {"period": "day", "count": 1},
"training_period": {"period": "day", "count": 30},
"detection_period": {"period": "day", "count": 7},
"sensitivity": 10,
}
Comment on lines +59 to +65
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Adjust sensitivity for clear anomaly detection.

The sensitivity is set to 10 (high threshold), which makes anomaly detection more difficult. For a clear 5x spike scenario as specified in CORE-19, a standard sensitivity (e.g., 3) would be more appropriate to ensure the anomaly is detected.

Apply this diff:

         "time_bucket": {"period": "day", "count": 1},
         "training_period": {"period": "day", "count": 30},
         "detection_period": {"period": "day", "count": 7},
-        "sensitivity": 10,
+        "sensitivity": 3,
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test_args = {
**DBT_TEST_ARGS,
"time_bucket": {"period": "day", "count": 1},
"training_period": {"period": "day", "count": 30},
"detection_period": {"period": "day", "count": 7},
"sensitivity": 10,
}
test_args = {
**DBT_TEST_ARGS,
"time_bucket": {"period": "day", "count": 1},
"training_period": {"period": "day", "count": 30},
"detection_period": {"period": "day", "count": 7},
"sensitivity": 3,
}
🤖 Prompt for AI Agents
In integration_tests/tests/test_exclude_detection_period_from_training.py around
lines 59 to 65, the test sets sensitivity to 10 which is too high for a clear 5x
spike scenario; change the "sensitivity" value in the test_args dict from 10 to
3 so the anomaly detection threshold is standard and the 5x spike will be
detected reliably.


test_result = dbt_project.test(
test_id,
DBT_TEST_NAME,
test_args,
data=data,
)

# Current behavior: Test PASSES (no anomaly detected)
# The 10% increase is absorbed into the cumulative training baseline
assert test_result["status"] == "pass", (
"Test should PASS in current implementation (without exclusion flag). "
"The 10% increase is absorbed into training, masking the anomaly."
)
Comment on lines +74 to +79
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Assertion contradicts PR objectives.

The assertion expects the test to PASS (no anomaly detected), but the PR description states: "The test validates that the anomaly is detected (test status == 'fail')" and "documenting that the 5x spike is large enough to be detected even when detection-period data is included in the cumulative training baseline."

With a 5x spike (500 rows/day vs 100 rows/day baseline), the anomaly should be detected in the current implementation.

Apply this diff to match the documented behavior:

-    # Current behavior: Test PASSES (no anomaly detected)
-    # The 10% increase is absorbed into the cumulative training baseline
-    assert test_result["status"] == "pass", (
-        "Test should PASS in current implementation (without exclusion flag). "
-        "The 10% increase is absorbed into training, masking the anomaly."
+    # Current behavior: Test FAILS (anomaly detected)
+    # The 5x spike is large enough to be detected even when included in training
+    assert test_result["status"] == "fail", (
+        "Test should FAIL in current implementation (without exclusion flag). "
+        "The 5x spike is large enough to be detected even when included in the cumulative training baseline."
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Current behavior: Test PASSES (no anomaly detected)
# The 10% increase is absorbed into the cumulative training baseline
assert test_result["status"] == "pass", (
"Test should PASS in current implementation (without exclusion flag). "
"The 10% increase is absorbed into training, masking the anomaly."
)
# Current behavior: Test FAILS (anomaly detected)
# The 5x spike is large enough to be detected even when included in training
assert test_result["status"] == "fail", (
"Test should FAIL in current implementation (without exclusion flag). "
"The 5x spike is large enough to be detected even when included in the cumulative training baseline."
)
🤖 Prompt for AI Agents
In integration_tests/tests/test_exclude_detection_period_from_training.py around
lines 74 to 79, the test assertion expectation contradicts the PR objectives: it
currently asserts status == "pass" but the test should validate that the large
5x spike is detected and therefore status == "fail". Update the assertion to
expect "fail" and adjust the assertion message to state that the anomaly should
be detected (e.g., "Test should FAIL: 5x spike should be detected even when
detection-period data is included in training baseline.").


# TODO: When the exclude_detection_period_from_training flag is implemented, (important-comment)
# add a second test here that sets the flag to True and expects FAIL: (important-comment)
# test_args_with_exclusion = {
# **test_args,
# "exclude_detection_period_from_training": True, (important-comment)
# }
# test_result_with_exclusion = dbt_project.test( (important-comment)
# test_id,
# DBT_TEST_NAME,
# test_args_with_exclusion,
# test_vars={"force_metrics_backfill": True}, (important-comment)
# )
# assert test_result_with_exclusion["status"] == "fail", ( (important-comment)
# "Test should FAIL with exclusion flag enabled. " (important-comment)
# "The 10% increase is detected against the clean baseline."
# )
Comment on lines +81 to +96
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update TODO comment to reflect correct scenario.

The TODO comment references "10% increase" but should reference the "5x spike" per requirements. Additionally, once the main assertion is corrected to expect "fail" (anomaly detected), the TODO should clarify that the flag will improve detection confidence or catch more subtle anomalies, not just enable detection.

Apply this diff:

     # TODO: When the exclude_detection_period_from_training flag is implemented,
-    # add a second test here that sets the flag to True and expects FAIL:
+    # add a second test (or modify this one) that sets the flag to True:
     # test_args_with_exclusion = {
     #     **test_args,
     #     "exclude_detection_period_from_training": True,
     # }
     # test_result_with_exclusion = dbt_project.test(
     #     test_id,
     #     DBT_TEST_NAME,
     #     test_args_with_exclusion,
     #     test_vars={"force_metrics_backfill": True},
     # )
     # assert test_result_with_exclusion["status"] == "fail", (
     #     "Test should FAIL with exclusion flag enabled. "
-    #     "The 10% increase is detected against the clean baseline."
+    #     "The 5x spike is detected with higher confidence against the clean 30-day baseline."
     # )

Note: Consider whether this should be a separate test case with a different test_id to validate both behaviors in the same test run, or if it should conditionally test based on flag availability.

🤖 Prompt for AI Agents
In integration_tests/tests/test_exclude_detection_period_from_training.py around
lines 81 to 96, the TODO comment is incorrect and must be updated: change the
reference from "10% increase" to "5x spike" and clarify that when the
exclude_detection_period_from_training flag is implemented and set to True the
test should expect a FAIL (anomaly detected) because the flag increases
detection confidence or helps catch more subtle anomalies; also update the TODO
to recommend creating a separate test case (with a distinct test_id) or
conditionally exercising the flag so both behaviors (with and without exclusion)
are validated in the test suite.

Loading