Add z-score fallback metric tracking for anomaly detection tests #907

devin-ai-integration · 2025-12-27T21:28:05Z

Summary

Adds tracking for z-score fallback usage in anomaly detection tests. The z-score fallback occurs when training_stddev = 0 (all training values are identical), causing the z-score to return 0 instead of dividing by zero.

Changes:

Added is_zscore_fallback boolean field to anomaly scores query (true when training_stddev = 0 with valid training data)
Propagated field through get_read_anomaly_scores_query
Added zscore_fallback_passed_count and zscore_fallback_failed_count counters to test results
Added new columns to elementary_test_results schema

Review & Testing Checklist for Human

Verify the fallback condition logic: The condition training_stddev is not null and training_set_size > 1 and training_stddev = 0 should match exactly when anomaly_score = 0 due to the fallback case (line 234-237 in get_anomaly_scores_query.sql)
Test boolean handling across database adapters: The is_zscore_fallback boolean field needs to work correctly across Snowflake, BigQuery, Redshift, Postgres, etc. - verify insensitive_get_dict_value handles this properly
Schema migration for existing deployments: Adding new columns to elementary_test_results - verify existing deployments handle the schema change gracefully with on_schema_change = 'append_new_columns'
Integration test: Run an anomaly detection test with stationary data (all identical values) to verify the fallback is correctly tracked

Recommended Test Plan

Create a test table with identical values in a monitored column
Run a column_anomalies test against it
Query elementary_test_results to verify zscore_fallback_passed_count or zscore_fallback_failed_count is populated correctly

Notes

No integration tests were added in this PR - consider adding test coverage
Linear ticket: CORE-151

Link to Devin run: https://app.devin.ai/sessions/7dad763995464ea583c77f3781353c39
Requested by: Yosef Arbiv (@arbiv)

Summary by CodeRabbit

New Features
- Enhanced anomaly detection monitoring with z-score fallback detection and reporting. The system now tracks and reports metrics on detection tests passed and failed when z-score calculations fall back to alternative statistical methods. This improvement provides greater visibility into detection accuracy and system reliability across different data scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add is_zscore_fallback field to anomaly scores query (when training_stddev = 0) - Propagate is_zscore_fallback through get_read_anomaly_scores_query - Track zscore_fallback_passed_count and zscore_fallback_failed_count in test results - Add new fields to elementary_test_results schema This enables tracking the number of tests that use z-score fallback (when all training values are identical), with distinction between passed and failed tests. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

devin-ai-integration · 2025-12-27T21:28:09Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

coderabbitai · 2025-12-27T21:28:14Z

📝 Walkthrough

Walkthrough

The changes extend anomaly detection result tracking by introducing a is_zscore_fallback flag that identifies when z-score fallback is used (training_stddev is 0 with valid training data), then count passed and failed test rows using this flag across four interconnected SQL macros to expose fallback metrics in test results.

Changes

Cohort / File(s)	Summary
Z-Score Fallback Detection `macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql`	Adds computed boolean column `is_zscore_fallback` to anomaly_scores CTE; TRUE when training_stddev is 0 with valid training data, otherwise FALSE.
Fallback Metrics Tracking `macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql`	Introduces per-row counters `zscore_fallback_passed` and `zscore_fallback_failed`; increments `zscore_fallback_failed` when row is anomalous with fallback flag, `zscore_fallback_passed` when row is not anomalous with fallback flag. Extends test_result_dict with `zscore_fallback_passed_count` and `zscore_fallback_failed_count` fields.
Test Results Schema Extension `macros/edr/system/system_utils/empty_table.sql`	Adds two new columns to elementary test results table: `zscore_fallback_passed_count` (bigint) and `zscore_fallback_failed_count` (bigint).
Query Integration `macros/edr/tests/test_utils/get_anomaly_query.sql`	Includes `is_zscore_fallback` field in anomaly_scores CTE selection within get_read_anomaly_scores_query macro.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A metric born to track the fall,
When z-score stumbles, we count it all—
Passed and failed, both tallied true,
The fallback flag sees the way through! 📊✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add z-score fallback metric tracking for anomaly detection tests' clearly and specifically describes the main change—adding z-score fallback metrics to anomaly detection tests.
Linked Issues check	✅ Passed	The PR successfully implements CORE-151's requirements: adds is_zscore_fallback field to identify fallback usage, tracks passed/failed distinction via zscore_fallback_passed_count and zscore_fallback_failed_count counters.
Out of Scope Changes check	✅ Passed	All changes are scoped to implementing z-score fallback metrics: anomaly_scores query, test results storage, schema updates, and test utilities remain directly aligned with CORE-151 objectives.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1766870827-zscore-fallback-metric

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-27T21:28:14Z

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7b5fec6 and f5bea33.

📒 Files selected for processing (4)

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql
macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
macros/edr/system/system_utils/empty_table.sql
macros/edr/tests/test_utils/get_anomaly_query.sql

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.

📚 Learning: 2025-09-04T09:14:40.621Z

Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/edr/tests/test_dimension_anomalies.sql:73-75
Timestamp: 2025-09-04T09:14:40.621Z
Learning: In the dbt-data-reliability codebase, `flatten_model` is a valid macro defined in `macros/edr/dbt_artifacts/upload_dbt_models.sql` and can be used as a parameter to `elementary.get_anomaly_query()` even though other test files use the `flattened_test` variable instead.

Applied to files:

macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: test (fusion, bigquery) / test
GitHub Check: test (latest_pre, postgres) / test
GitHub Check: test (fusion, snowflake) / test
GitHub Check: test (fusion, redshift) / test
GitHub Check: test (latest_official, clickhouse) / test
GitHub Check: test (latest_official, postgres) / test
GitHub Check: test (latest_official, redshift) / test
GitHub Check: test (fusion, databricks_catalog) / test
GitHub Check: test (latest_official, snowflake) / test
GitHub Check: test (latest_official, trino) / test
GitHub Check: test (latest_official, databricks_catalog) / test
GitHub Check: test (latest_official, dremio) / test
GitHub Check: test (latest_official, athena) / test
GitHub Check: test (latest_official, bigquery) / test

🔇 Additional comments (4)

macros/edr/tests/test_utils/get_anomaly_query.sql (1)

46-46: LGTM! Field propagation looks correct.

The addition of is_zscore_fallback to the anomaly_scores CTE properly propagates the new boolean flag from the source query, making it available for downstream test logic and counting.

macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql (2)

82-83: LGTM! Counter values correctly added to test result dictionary.

The zscore_fallback_passed_count and zscore_fallback_failed_count fields are properly extracted from the namespace counters and added to the test result output, aligning with the new schema columns.

51-65: LGTM! Counter logic correctly tracks passed and failed z-score fallback tests.

The implementation properly:

Initializes counters to 0

Increments zscore_fallback_failed when a row is anomalous AND uses the fallback

Increments zscore_fallback_passed when a row is not anomalous AND uses the fallback

Maintains existing failure counting behavior

The insensitive_get_dict_value helper correctly handles the is_zscore_fallback lookup by attempting exact, lowercase, and uppercase key matches, returning the actual database value or none if not found. Both return values work correctly in Jinja if statements (none is falsy, and database boolean/numeric values evaluate as expected).

macros/edr/system/system_utils/empty_table.sql (1)

30-32: No migration concerns. The elementary_test_results model already specifies on_schema_change = 'append_new_columns', which automatically appends new columns to the incremental table with NULL values for existing rows. dbt handles this consistently across all supported database adapters (Postgres, Redshift, Snowflake, BigQuery, Databricks, ClickHouse). The type mapping uses edr_type_bigint(), which adapts to each database's native type system automatically.

coderabbitai · 2025-12-27T21:32:32Z

macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql

+                case
+                    when training_stddev is not null and training_set_size > 1 and training_stddev = 0 then TRUE
+                    else FALSE
+                end as is_zscore_fallback,


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the project structure and find dbt configuration find . -name "dbt_project.yml" -o -name "profiles.yml" | head -5

Repository: elementary-data/dbt-data-reliability

Length of output: 198

🏁 Script executed:

# Check for boolean literal patterns in SQL files rg -t sql '\b(TRUE|FALSE|true|false)\b' --no-heading -c | sort | uniq -c | sort -rn | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 1310

🏁 Script executed:

# Look at the specific file context around lines 234-237 fd get_anomaly_scores_query.sql -x cat -n {} | sed -n '220,250p'

Repository: elementary-data/dbt-data-reliability

Length of output: 2197

🏁 Script executed:

# Search for how other SQL files handle boolean values rg -t sql 'CASE.*WHEN.*THEN\s+(TRUE|FALSE|true|false|1|0)' --no-heading | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 62

🏁 Script executed:

# Check dbt_project.yml for database configuration cat -n dbt_project.yml | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 807

🏁 Script executed:

# Search for boolean patterns more broadly in macros rg -t sql --no-heading 'end as.*boolean|end as.*is_' macros/ | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 396

🏁 Script executed:

# Look for type casting patterns in the codebase rg -t sql 'cast.*boolean|boolean.*cast' -i | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 337

🏁 Script executed:

# Check what database adapters or packages are used rg -t yaml 'packages:|version:|databases:' . | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 1134

🏁 Script executed:

# Look for similar boolean-returning CASE statements in macros rg -t sql -A2 'then (TRUE|FALSE|true|false|1|0)\s*$' macros/ | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 2496

🏁 Script executed:

# Find and examine the edr_type_bool() macro fd edr_type_bool -o find . -name "*edr_type*" -o -name "*type_bool*" | xargs grep -l "edr_type_bool" 2>/dev/null | head -5

Repository: elementary-data/dbt-data-reliability

Length of output: 207

🏁 Script executed:

# Search for the macro definition rg -t sql 'macro.*edr_type_bool|edr_type_bool\(' macros/ -B2 -A5

Repository: elementary-data/dbt-data-reliability

Length of output: 3451

🏁 Script executed:

# Check the full context of get_anomaly_query.sql which also uses TRUE/FALSE fd get_anomaly_query.sql -x cat -n {} | sed -n '1,50p'

Repository: elementary-data/dbt-data-reliability

Length of output: 2251

🏁 Script executed:

# Look for database-specific boolean handling patterns in macros rg -t sql 'target\.type|adapter\.|dbt\.config' macros/ | head -15

Repository: elementary-data/dbt-data-reliability

Length of output: 2191

🏁 Script executed:

# Check if there's a data_types directory with boolean type definitions fd -t f 'data_type' macros/ | xargs cat -n | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 1995

🏁 Script executed:

# Check if there are database-specific implementations for boolean literals rg -t sql 'postgres.*TRUE|snowflake.*TRUE|bigquery.*TRUE' . | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 62

🏁 Script executed:

# Look for more context in how is_anomalous is used (it uses TRUE/FALSE too) fd get_anomaly_query.sql -x cat -n {} | grep -A20 "is_anomalous = true"

Repository: elementary-data/dbt-data-reliability

Length of output: 1486

🏁 Script executed:

# Check test files to see what databases are tested fd docker-compose -x cat | grep -i 'postgres\|bigquery\|snowflake\|redshift' | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 507

🏁 Script executed:

# Check if packages.yml gives us hints about supported databases cat packages.yml

Repository: elementary-data/dbt-data-reliability

Length of output: 156

🏁 Script executed:

# Look for adapter-specific boolean literal handling rg -t sql 'true|false|TRUE|FALSE' macros/edr/data_monitoring/schema_changes/get_columns_snapshot_query.sql -B2 -A2

Repository: elementary-data/dbt-data-reliability

Length of output: 345

🏁 Script executed:

# Check if there's a pattern of uppercase vs lowercase boolean literals in boolean-returning cases rg -t sql -B3 'end as is_|end as has_' macros/ | grep -E '(TRUE|FALSE|true|false)' | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 671

🏁 Script executed:

# Let's examine the broader context of the file to see if there's a style preference cat -n macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql | sed -n '200,250p'

Repository: elementary-data/dbt-data-reliability

Length of output: 3955

🏁 Script executed:

# Check integration tests to see what happens with these boolean values rg -t sql 'is_zscore_fallback' . | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 602

Use lowercase boolean literals for consistency with codebase standards.

The code uses TRUE and FALSE (uppercase), but other boolean-returning CASE statements in the codebase use lowercase true and false (e.g., get_columns_snapshot_query.sql). Change lines 235-236 to use lowercase true and false to match the established pattern and align with standard SQL conventions.

🤖 Prompt for AI Agents

In macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql around lines 234 to 237, the CASE returns uppercase boolean literals TRUE and FALSE; change them to lowercase true and false to match the codebase convention and SQL style (i.e., replace TRUE with true and FALSE with false in that CASE expression).

coderabbitai bot reviewed Dec 27, 2025

View reviewed changes

arbiv closed this Dec 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add z-score fallback metric tracking for anomaly detection tests #907

Add z-score fallback metric tracking for anomaly detection tests #907

Uh oh!

devin-ai-integration bot commented Dec 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

devin-ai-integration bot commented Dec 27, 2025

Uh oh!

coderabbitai bot commented Dec 27, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions bot commented Dec 27, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add z-score fallback metric tracking for anomaly detection tests #907

Add z-score fallback metric tracking for anomaly detection tests #907

Uh oh!

Conversation

devin-ai-integration bot commented Dec 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Recommended Test Plan

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration bot commented Dec 27, 2025

🤖 Devin AI Engineer

Uh oh!

coderabbitai bot commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Dec 27, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devin-ai-integration bot commented Dec 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 27, 2025 •

edited

Loading