-
Notifications
You must be signed in to change notification settings - Fork 121
Add z-score fallback metric tracking for anomaly detection tests #907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add z-score fallback metric tracking for anomaly detection tests #907
Conversation
- Add is_zscore_fallback field to anomaly scores query (when training_stddev = 0) - Propagate is_zscore_fallback through get_read_anomaly_scores_query - Track zscore_fallback_passed_count and zscore_fallback_failed_count in test results - Add new fields to elementary_test_results schema This enables tracking the number of tests that use z-score fallback (when all training values are identical), with distinction between passed and failed tests. Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
📝 WalkthroughWalkthroughThe changes extend anomaly detection result tracking by introducing a Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
👋 @devin-ai-integration[bot] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sqlmacros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sqlmacros/edr/system/system_utils/empty_table.sqlmacros/edr/tests/test_utils/get_anomaly_query.sql
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 877
File: macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql:34-38
Timestamp: 2025-10-23T11:01:31.528Z
Learning: In the dbt-data-reliability project, `backfill_days` is a mandatory parameter for anomaly detection tests (volume_anomalies, freshness_anomalies, etc.). It is validated by the `validate_mandatory_configuration` macro in `get_anomalies_test_configuration.sql` and will cause a compiler error if not provided. Therefore, it's always guaranteed to be set when the anomaly detection logic runs.
📚 Learning: 2025-09-04T09:14:40.621Z
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/edr/tests/test_dimension_anomalies.sql:73-75
Timestamp: 2025-09-04T09:14:40.621Z
Learning: In the dbt-data-reliability codebase, `flatten_model` is a valid macro defined in `macros/edr/dbt_artifacts/upload_dbt_models.sql` and can be used as a parameter to `elementary.get_anomaly_query()` even though other test files use the `flattened_test` variable instead.
Applied to files:
macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (fusion, bigquery) / test
- GitHub Check: test (latest_pre, postgres) / test
- GitHub Check: test (fusion, snowflake) / test
- GitHub Check: test (fusion, redshift) / test
- GitHub Check: test (latest_official, clickhouse) / test
- GitHub Check: test (latest_official, postgres) / test
- GitHub Check: test (latest_official, redshift) / test
- GitHub Check: test (fusion, databricks_catalog) / test
- GitHub Check: test (latest_official, snowflake) / test
- GitHub Check: test (latest_official, trino) / test
- GitHub Check: test (latest_official, databricks_catalog) / test
- GitHub Check: test (latest_official, dremio) / test
- GitHub Check: test (latest_official, athena) / test
- GitHub Check: test (latest_official, bigquery) / test
🔇 Additional comments (4)
macros/edr/tests/test_utils/get_anomaly_query.sql (1)
46-46: LGTM! Field propagation looks correct.The addition of
is_zscore_fallbackto theanomaly_scoresCTE properly propagates the new boolean flag from the source query, making it available for downstream test logic and counting.macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql (2)
82-83: LGTM! Counter values correctly added to test result dictionary.The
zscore_fallback_passed_countandzscore_fallback_failed_countfields are properly extracted from the namespace counters and added to the test result output, aligning with the new schema columns.
51-65: LGTM! Counter logic correctly tracks passed and failed z-score fallback tests.The implementation properly:
- Initializes counters to 0
- Increments
zscore_fallback_failedwhen a row is anomalous AND uses the fallback- Increments
zscore_fallback_passedwhen a row is not anomalous AND uses the fallback- Maintains existing failure counting behavior
The
insensitive_get_dict_valuehelper correctly handles theis_zscore_fallbacklookup by attempting exact, lowercase, and uppercase key matches, returning the actual database value ornoneif not found. Both return values work correctly in Jinjaifstatements (none is falsy, and database boolean/numeric values evaluate as expected).macros/edr/system/system_utils/empty_table.sql (1)
30-32: No migration concerns. Theelementary_test_resultsmodel already specifieson_schema_change = 'append_new_columns', which automatically appends new columns to the incremental table with NULL values for existing rows. dbt handles this consistently across all supported database adapters (Postgres, Redshift, Snowflake, BigQuery, Databricks, ClickHouse). The type mapping usesedr_type_bigint(), which adapts to each database's native type system automatically.
| case | ||
| when training_stddev is not null and training_set_size > 1 and training_stddev = 0 then TRUE | ||
| else FALSE | ||
| end as is_zscore_fallback, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the project structure and find dbt configuration
find . -name "dbt_project.yml" -o -name "profiles.yml" | head -5Repository: elementary-data/dbt-data-reliability
Length of output: 198
🏁 Script executed:
# Check for boolean literal patterns in SQL files
rg -t sql '\b(TRUE|FALSE|true|false)\b' --no-heading -c | sort | uniq -c | sort -rn | head -20Repository: elementary-data/dbt-data-reliability
Length of output: 1310
🏁 Script executed:
# Look at the specific file context around lines 234-237
fd get_anomaly_scores_query.sql -x cat -n {} | sed -n '220,250p'Repository: elementary-data/dbt-data-reliability
Length of output: 2197
🏁 Script executed:
# Search for how other SQL files handle boolean values
rg -t sql 'CASE.*WHEN.*THEN\s+(TRUE|FALSE|true|false|1|0)' --no-heading | head -10Repository: elementary-data/dbt-data-reliability
Length of output: 62
🏁 Script executed:
# Check dbt_project.yml for database configuration
cat -n dbt_project.yml | head -50Repository: elementary-data/dbt-data-reliability
Length of output: 807
🏁 Script executed:
# Search for boolean patterns more broadly in macros
rg -t sql --no-heading 'end as.*boolean|end as.*is_' macros/ | head -20Repository: elementary-data/dbt-data-reliability
Length of output: 396
🏁 Script executed:
# Look for type casting patterns in the codebase
rg -t sql 'cast.*boolean|boolean.*cast' -i | head -10Repository: elementary-data/dbt-data-reliability
Length of output: 337
🏁 Script executed:
# Check what database adapters or packages are used
rg -t yaml 'packages:|version:|databases:' . | head -20Repository: elementary-data/dbt-data-reliability
Length of output: 1134
🏁 Script executed:
# Look for similar boolean-returning CASE statements in macros
rg -t sql -A2 'then (TRUE|FALSE|true|false|1|0)\s*$' macros/ | head -30Repository: elementary-data/dbt-data-reliability
Length of output: 2496
🏁 Script executed:
# Find and examine the edr_type_bool() macro
fd edr_type_bool -o find . -name "*edr_type*" -o -name "*type_bool*" | xargs grep -l "edr_type_bool" 2>/dev/null | head -5Repository: elementary-data/dbt-data-reliability
Length of output: 207
🏁 Script executed:
# Search for the macro definition
rg -t sql 'macro.*edr_type_bool|edr_type_bool\(' macros/ -B2 -A5Repository: elementary-data/dbt-data-reliability
Length of output: 3451
🏁 Script executed:
# Check the full context of get_anomaly_query.sql which also uses TRUE/FALSE
fd get_anomaly_query.sql -x cat -n {} | sed -n '1,50p'Repository: elementary-data/dbt-data-reliability
Length of output: 2251
🏁 Script executed:
# Look for database-specific boolean handling patterns in macros
rg -t sql 'target\.type|adapter\.|dbt\.config' macros/ | head -15Repository: elementary-data/dbt-data-reliability
Length of output: 2191
🏁 Script executed:
# Check if there's a data_types directory with boolean type definitions
fd -t f 'data_type' macros/ | xargs cat -n | head -50Repository: elementary-data/dbt-data-reliability
Length of output: 1995
🏁 Script executed:
# Check if there are database-specific implementations for boolean literals
rg -t sql 'postgres.*TRUE|snowflake.*TRUE|bigquery.*TRUE' . | head -10Repository: elementary-data/dbt-data-reliability
Length of output: 62
🏁 Script executed:
# Look for more context in how is_anomalous is used (it uses TRUE/FALSE too)
fd get_anomaly_query.sql -x cat -n {} | grep -A20 "is_anomalous = true"Repository: elementary-data/dbt-data-reliability
Length of output: 1486
🏁 Script executed:
# Check test files to see what databases are tested
fd docker-compose -x cat | grep -i 'postgres\|bigquery\|snowflake\|redshift' | head -20Repository: elementary-data/dbt-data-reliability
Length of output: 507
🏁 Script executed:
# Check if packages.yml gives us hints about supported databases
cat packages.ymlRepository: elementary-data/dbt-data-reliability
Length of output: 156
🏁 Script executed:
# Look for adapter-specific boolean literal handling
rg -t sql 'true|false|TRUE|FALSE' macros/edr/data_monitoring/schema_changes/get_columns_snapshot_query.sql -B2 -A2Repository: elementary-data/dbt-data-reliability
Length of output: 345
🏁 Script executed:
# Check if there's a pattern of uppercase vs lowercase boolean literals in boolean-returning cases
rg -t sql -B3 'end as is_|end as has_' macros/ | grep -E '(TRUE|FALSE|true|false)' | head -30Repository: elementary-data/dbt-data-reliability
Length of output: 671
🏁 Script executed:
# Let's examine the broader context of the file to see if there's a style preference
cat -n macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql | sed -n '200,250p'Repository: elementary-data/dbt-data-reliability
Length of output: 3955
🏁 Script executed:
# Check integration tests to see what happens with these boolean values
rg -t sql 'is_zscore_fallback' . | head -10Repository: elementary-data/dbt-data-reliability
Length of output: 602
Use lowercase boolean literals for consistency with codebase standards.
The code uses TRUE and FALSE (uppercase), but other boolean-returning CASE statements in the codebase use lowercase true and false (e.g., get_columns_snapshot_query.sql). Change lines 235-236 to use lowercase true and false to match the established pattern and align with standard SQL conventions.
🤖 Prompt for AI Agents
In macros/edr/data_monitoring/anomaly_detection/get_anomaly_scores_query.sql
around lines 234 to 237, the CASE returns uppercase boolean literals TRUE and
FALSE; change them to lowercase true and false to match the codebase convention
and SQL style (i.e., replace TRUE with true and FALSE with false in that CASE
expression).
Summary
Adds tracking for z-score fallback usage in anomaly detection tests. The z-score fallback occurs when
training_stddev = 0(all training values are identical), causing the z-score to return 0 instead of dividing by zero.Changes:
is_zscore_fallbackboolean field to anomaly scores query (true whentraining_stddev = 0with valid training data)get_read_anomaly_scores_queryzscore_fallback_passed_countandzscore_fallback_failed_countcounters to test resultselementary_test_resultsschemaReview & Testing Checklist for Human
training_stddev is not null and training_set_size > 1 and training_stddev = 0should match exactly whenanomaly_score = 0due to the fallback case (line 234-237 in get_anomaly_scores_query.sql)is_zscore_fallbackboolean field needs to work correctly across Snowflake, BigQuery, Redshift, Postgres, etc. - verifyinsensitive_get_dict_valuehandles this properlyelementary_test_results- verify existing deployments handle the schema change gracefully withon_schema_change = 'append_new_columns'Recommended Test Plan
elementary_test_resultsto verifyzscore_fallback_passed_countorzscore_fallback_failed_countis populated correctlyNotes
Link to Devin run: https://app.devin.ai/sessions/7dad763995464ea583c77f3781353c39
Requested by: Yosef Arbiv (@arbiv)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.