Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Dec 27, 2025

Summary

Adds tracking for z-score fallback usage in anomaly detection tests. The z-score fallback occurs when training_stddev = 0 (all training values are identical), causing the z-score to return 0 instead of dividing by zero.

Changes:

  • Added is_zscore_fallback boolean field to anomaly scores query (true when training_stddev = 0 with valid training data)
  • Propagated field through get_read_anomaly_scores_query
  • Added zscore_fallback_passed_count and zscore_fallback_failed_count counters to test results
  • Added new columns to elementary_test_results schema

Review & Testing Checklist for Human

  • Verify the fallback condition logic: The condition training_stddev is not null and training_set_size > 1 and training_stddev = 0 should match exactly when anomaly_score = 0 due to the fallback case (line 234-237 in get_anomaly_scores_query.sql)
  • Test boolean handling across database adapters: The is_zscore_fallback boolean field needs to work correctly across Snowflake, BigQuery, Redshift, Postgres, etc. - verify insensitive_get_dict_value handles this properly
  • Schema migration for existing deployments: Adding new columns to elementary_test_results - verify existing deployments handle the schema change gracefully with on_schema_change = 'append_new_columns'
  • Integration test: Run an anomaly detection test with stationary data (all identical values) to verify the fallback is correctly tracked

Recommended Test Plan

  1. Create a test table with identical values in a monitored column
  2. Run a column_anomalies test against it
  3. Query elementary_test_results to verify zscore_fallback_passed_count or zscore_fallback_failed_count is populated correctly

Notes

  • No integration tests were added in this PR - consider adding test coverage
  • Linear ticket: CORE-151

Link to Devin run: https://app.devin.ai/sessions/7dad763995464ea583c77f3781353c39
Requested by: Yosef Arbiv (@arbiv)

Summary by CodeRabbit

  • New Features
    • Enhanced anomaly detection monitoring with z-score fallback detection and reporting. The system now tracks and reports metrics on detection tests passed and failed when z-score calculations fall back to alternative statistical methods. This improvement provides greater visibility into detection accuracy and system reliability across different data scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

- Add is_zscore_fallback field to anomaly scores query (when training_stddev = 0)
- Propagate is_zscore_fallback through get_read_anomaly_scores_query
- Track zscore_fallback_passed_count and zscore_fallback_failed_count in test results
- Add new fields to elementary_test_results schema

This enables tracking the number of tests that use z-score fallback
(when all training values are identical), with distinction between
passed and failed tests.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@coderabbitai
Copy link

coderabbitai bot commented Dec 27, 2025

📝 Walkthrough

Walkthrough

The changes extend anomaly detection result tracking by introducing a is_zscore_fallback flag that identifies when z-score fallback is used (training_stddev is 0 with valid training data), then count passed and failed test rows using this flag across four interconnected SQL macros to expose fallback metrics in test results.

Changes

Cohort / File(s) Summary
Z-Score Fallback Detection
macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql
Adds computed boolean column is_zscore_fallback to anomaly_scores CTE; TRUE when training_stddev is 0 with valid training data, otherwise FALSE.
Fallback Metrics Tracking
macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
Introduces per-row counters zscore_fallback_passed and zscore_fallback_failed; increments zscore_fallback_failed when row is anomalous with fallback flag, zscore_fallback_passed when row is not anomalous with fallback flag. Extends test_result_dict with zscore_fallback_passed_count and zscore_fallback_failed_count fields.
Test Results Schema Extension
macros/edr/system/system_utils/empty_table.sql
Adds two new columns to elementary test results table: zscore_fallback_passed_count (bigint) and zscore_fallback_failed_count (bigint).
Query Integration
macros/edr/tests/test_utils/get_anomaly_query.sql
Includes is_zscore_fallback field in anomaly_scores CTE selection within get_read_anomaly_scores_query macro.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A metric born to track the fall,
When z-score stumbles, we count it all—
Passed and failed, both tallied true,
The fallback flag sees the way through! 📊✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add z-score fallback metric tracking for anomaly detection tests' clearly and specifically describes the main change—adding z-score fallback metrics to anomaly detection tests.
Linked Issues check ✅ Passed The PR successfully implements CORE-151's requirements: adds is_zscore_fallback field to identify fallback usage, tracks passed/failed distinction via zscore_fallback_passed_count and zscore_fallback_failed_count counters.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing z-score fallback metrics: anomaly_scores query, test results storage, schema updates, and test utilities remain directly aligned with CORE-151 objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1766870827-zscore-fallback-metric

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7b5fec6 and f5bea33.

📒 Files selected for processing (4)
  • macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql
  • macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
  • macros/edr/system/system_utils/empty_table.sql
  • macros/edr/tests/test_utils/get_anomaly_query.sql
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-09-04T09:14:40.621Z
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/edr/tests/test_dimension_anomalies.sql:73-75
Timestamp: 2025-09-04T09:14:40.621Z
Learning: In the dbt-data-reliability codebase, `flatten_model` is a valid macro defined in `macros/edr/dbt_artifacts/upload_dbt_models.sql` and can be used as a parameter to `elementary.get_anomaly_query()` even though other test files use the `flattened_test` variable instead.

Applied to files:

  • macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (fusion, bigquery) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (fusion, snowflake) / test
  • GitHub Check: test (fusion, redshift) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (fusion, databricks_catalog) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, bigquery) / test
🔇 Additional comments (4)
macros/edr/tests/test_utils/get_anomaly_query.sql (1)

46-46: LGTM! Field propagation looks correct.

The addition of is_zscore_fallback to the anomaly_scores CTE properly propagates the new boolean flag from the source query, making it available for downstream test logic and counting.

macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql (2)

82-83: LGTM! Counter values correctly added to test result dictionary.

The zscore_fallback_passed_count and zscore_fallback_failed_count fields are properly extracted from the namespace counters and added to the test result output, aligning with the new schema columns.


51-65: LGTM! Counter logic correctly tracks passed and failed z-score fallback tests.

The implementation properly:

  • Initializes counters to 0
  • Increments zscore_fallback_failed when a row is anomalous AND uses the fallback
  • Increments zscore_fallback_passed when a row is not anomalous AND uses the fallback
  • Maintains existing failure counting behavior

The insensitive_get_dict_value helper correctly handles the is_zscore_fallback lookup by attempting exact, lowercase, and uppercase key matches, returning the actual database value or none if not found. Both return values work correctly in Jinja if statements (none is falsy, and database boolean/numeric values evaluate as expected).

macros/edr/system/system_utils/empty_table.sql (1)

30-32: No migration concerns. The elementary_test_results model already specifies on_schema_change = 'append_new_columns', which automatically appends new columns to the incremental table with NULL values for existing rows. dbt handles this consistently across all supported database adapters (Postgres, Redshift, Snowflake, BigQuery, Databricks, ClickHouse). The type mapping uses edr_type_bigint(), which adapts to each database's native type system automatically.

Comment on lines +234 to +237
case
when training_stddev is not null and training_set_size > 1 and training_stddev = 0 then TRUE
else FALSE
end as is_zscore_fallback,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the project structure and find dbt configuration
find . -name "dbt_project.yml" -o -name "profiles.yml" | head -5

Repository: elementary-data/dbt-data-reliability

Length of output: 198


🏁 Script executed:

# Check for boolean literal patterns in SQL files
rg -t sql '\b(TRUE|FALSE|true|false)\b' --no-heading -c | sort | uniq -c | sort -rn | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 1310


🏁 Script executed:

# Look at the specific file context around lines 234-237
fd get_anomaly_scores_query.sql -x cat -n {} | sed -n '220,250p'

Repository: elementary-data/dbt-data-reliability

Length of output: 2197


🏁 Script executed:

# Search for how other SQL files handle boolean values
rg -t sql 'CASE.*WHEN.*THEN\s+(TRUE|FALSE|true|false|1|0)' --no-heading | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

# Check dbt_project.yml for database configuration
cat -n dbt_project.yml | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 807


🏁 Script executed:

# Search for boolean patterns more broadly in macros
rg -t sql --no-heading 'end as.*boolean|end as.*is_' macros/ | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 396


🏁 Script executed:

# Look for type casting patterns in the codebase
rg -t sql 'cast.*boolean|boolean.*cast' -i | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 337


🏁 Script executed:

# Check what database adapters or packages are used
rg -t yaml 'packages:|version:|databases:' . | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 1134


🏁 Script executed:

# Look for similar boolean-returning CASE statements in macros
rg -t sql -A2 'then (TRUE|FALSE|true|false|1|0)\s*$' macros/ | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 2496


🏁 Script executed:

# Find and examine the edr_type_bool() macro
fd edr_type_bool -o find . -name "*edr_type*" -o -name "*type_bool*" | xargs grep -l "edr_type_bool" 2>/dev/null | head -5

Repository: elementary-data/dbt-data-reliability

Length of output: 207


🏁 Script executed:

# Search for the macro definition
rg -t sql 'macro.*edr_type_bool|edr_type_bool\(' macros/ -B2 -A5

Repository: elementary-data/dbt-data-reliability

Length of output: 3451


🏁 Script executed:

# Check the full context of get_anomaly_query.sql which also uses TRUE/FALSE
fd get_anomaly_query.sql -x cat -n {} | sed -n '1,50p'

Repository: elementary-data/dbt-data-reliability

Length of output: 2251


🏁 Script executed:

# Look for database-specific boolean handling patterns in macros
rg -t sql 'target\.type|adapter\.|dbt\.config' macros/ | head -15

Repository: elementary-data/dbt-data-reliability

Length of output: 2191


🏁 Script executed:

# Check if there's a data_types directory with boolean type definitions
fd -t f 'data_type' macros/ | xargs cat -n | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 1995


🏁 Script executed:

# Check if there are database-specific implementations for boolean literals
rg -t sql 'postgres.*TRUE|snowflake.*TRUE|bigquery.*TRUE' . | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

# Look for more context in how is_anomalous is used (it uses TRUE/FALSE too)
fd get_anomaly_query.sql -x cat -n {} | grep -A20 "is_anomalous = true"

Repository: elementary-data/dbt-data-reliability

Length of output: 1486


🏁 Script executed:

# Check test files to see what databases are tested
fd docker-compose -x cat | grep -i 'postgres\|bigquery\|snowflake\|redshift' | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 507


🏁 Script executed:

# Check if packages.yml gives us hints about supported databases
cat packages.yml

Repository: elementary-data/dbt-data-reliability

Length of output: 156


🏁 Script executed:

# Look for adapter-specific boolean literal handling
rg -t sql 'true|false|TRUE|FALSE' macros/edr/data_monitoring/schema_changes/get_columns_snapshot_query.sql -B2 -A2

Repository: elementary-data/dbt-data-reliability

Length of output: 345


🏁 Script executed:

# Check if there's a pattern of uppercase vs lowercase boolean literals in boolean-returning cases
rg -t sql -B3 'end as is_|end as has_' macros/ | grep -E '(TRUE|FALSE|true|false)' | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 671


🏁 Script executed:

# Let's examine the broader context of the file to see if there's a style preference
cat -n macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql | sed -n '200,250p'

Repository: elementary-data/dbt-data-reliability

Length of output: 3955


🏁 Script executed:

# Check integration tests to see what happens with these boolean values
rg -t sql 'is_zscore_fallback' . | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 602


Use lowercase boolean literals for consistency with codebase standards.

The code uses TRUE and FALSE (uppercase), but other boolean-returning CASE statements in the codebase use lowercase true and false (e.g., get_columns_snapshot_query.sql). Change lines 235-236 to use lowercase true and false to match the established pattern and align with standard SQL conventions.

🤖 Prompt for AI Agents
In macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql
around lines 234 to 237, the CASE returns uppercase boolean literals TRUE and
FALSE; change them to lowercase true and false to match the codebase convention
and SQL style (i.e., replace TRUE with true and FALSE with false in that CASE
expression).

@arbiv arbiv closed this Dec 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants