-
Notifications
You must be signed in to change notification settings - Fork 121
Enable test_sample_row_count config on tests #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable test_sample_row_count config on tests #903
Conversation
WalkthroughAdds per-test sample row count override support in the test materialization macro and introduces integration tests that validate unlimited and limited sampling when test metadata overrides the global sample limit. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 @Ryosuke839 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration_tests/tests/test_override_samples_config.py (1)
1-80: LGTM! Comprehensive test coverage for per-test sample count overrides.Both test functions correctly validate the new feature:
test_sample_count_unlimitedconfirms thatNoneoverrides the global limit of 5 to allow unlimited sampling (20 samples)test_sample_count_smallconfirms that an integer value (3) overrides the global limit of 5The test setup, assertions, and ClickHouse exclusion are all appropriate.
Minor style suggestion for SAMPLES_QUERY (line 19): The subquery uses
select *from a CTE that has only one column. Consider being explicit:select result_row from {{ ref("test_result_rows") }} - where elementary_test_results_id in (select * from latest_elementary_test_result) + where elementary_test_results_id in (select id from latest_elementary_test_result)This is purely a style preference and doesn't affect functionality.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
integration_tests/tests/test_override_samples_config.py(1 hunks)macros/edr/materializations/test/test.sql(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: integration_tests/tests/dbt_project.py:191-201
Timestamp: 2025-08-10T11:29:19.004Z
Learning: In the Elementary dbt package integration tests, BigQuery works correctly with the default `("database", "schema")` property mapping in the `get_database_and_schema_properties` function. When using `target.database` and `target.schema` in source definitions, BigQuery's dbt adapter handles these references appropriately without requiring special mapping to `project` and `dataset`.
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 861
File: macros/edr/tests/test_utils/clean_elementary_test_tables.sql:45-50
Timestamp: 2025-09-15T06:06:05.601Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers simpler, direct approaches over creating helper macros for centralized quoting logic, even when it might provide better maintainability across multiple use cases. They value code simplicity and readability over abstraction.
📚 Learning: 2025-09-04T09:14:40.621Z
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/edr/tests/test_dimension_anomalies.sql:73-75
Timestamp: 2025-09-04T09:14:40.621Z
Learning: In the dbt-data-reliability codebase, `flatten_model` is a valid macro defined in `macros/edr/dbt_artifacts/upload_dbt_models.sql` and can be used as a parameter to `elementary.get_anomaly_query()` even though other test files use the `flattened_test` variable instead.
Applied to files:
macros/edr/materializations/test/test.sql
📚 Learning: 2025-07-22T15:41:57.338Z
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/utils/table_operations/create_table_as.sql:0-0
Timestamp: 2025-07-22T15:41:57.338Z
Learning: The dbt.create_table_as macro breaks in dbt-fusion when not run from a materialization context, requiring custom implementations for table creation functionality in dbt-fusion compatible packages.
Applied to files:
macros/edr/materializations/test/test.sql
🧬 Code graph analysis (1)
integration_tests/tests/test_override_samples_config.py (2)
integration_tests/tests/conftest.py (2)
dbt_project(144-147)test_id(180-183)integration_tests/tests/dbt_project.py (2)
DbtProject(46-334)run_query(62-69)
🔇 Additional comments (1)
macros/edr/materializations/test/test.sql (1)
54-56: LGTM! Test-level sample count override implemented correctly.The precedence is correct: test-specific
meta.test_sample_row_countoverrides the global config (line 53 defaults to 5) but is still subject todisable_test_samplesand PII protection checks on lines 61-67. When set toNone, the resulting unlimited sampling works correctly because line 136 usesis not noneto conditionally add the LIMIT clause.No other parts of the codebase reference
test_sample_row_countin ways that need updating. Note: the feature lacks user-facing documentation.
|
Hey @Ryosuke839 ! This is a great feature, thank you for the contribution. We'll take a deeper look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
integration_tests/tests/test_override_samples_config.py (3)
23-49: Add docstrings to clarify test objectives.Consider adding a docstring to explain that this test validates unlimited sampling when
test_sample_row_countis set toNonein test metadata, overriding the global limit of 5.Apply this diff:
@pytest.mark.skip_targets(["clickhouse"]) def test_sample_count_unlimited(test_id: str, dbt_project: DbtProject): + """Test that setting test_sample_row_count to None in meta allows unlimited samples. + + Sets global test_sample_row_count to 5, but overrides with None in test meta. + Expects all 20 failing rows to be sampled. + """ null_count = 20
52-78: Add docstring and consider additional test case.
- Add a docstring to clarify this test validates that test-level overrides work correctly when set to a value smaller than the global limit.
- Consider adding a test case where the test-level override is larger than the global setting (e.g., meta sets 10 when global is 5) to ensure the override truly takes precedence in both directions.
Apply this diff:
@pytest.mark.skip_targets(["clickhouse"]) def test_sample_count_small(test_id: str, dbt_project: DbtProject): + """Test that test_sample_row_count in meta overrides the global setting. + + Sets global test_sample_row_count to 5, but overrides with 3 in test meta. + Expects exactly 3 samples despite 20 failing rows. + """ null_count = 20
23-78: Consider extracting a helper function to reduce duplication.Both test functions share similar setup and verification logic. You could extract a helper function that accepts the test configuration and expected sample count as parameters to reduce duplication.
For example:
def _run_sample_count_test( test_id: str, dbt_project: DbtProject, test_sample_row_count_override, expected_sample_count: int, ): """Helper to run sample count test with specified override.""" null_count = 20 data = [{COLUMN_NAME: None} for _ in range(null_count)] test_result = dbt_project.test( test_id, "not_null", dict(column_name=COLUMN_NAME), data=data, as_model=True, test_vars={ "enable_elementary_test_materialization": True, "test_sample_row_count": 5, }, test_config={"meta": {"test_sample_row_count": test_sample_row_count_override}}, ) assert test_result["status"] == "fail" samples = [ json.loads(row["result_row"]) for row in dbt_project.run_query(SAMPLES_QUERY.format(test_id=test_id)) ] assert len(samples) == expected_sample_count for sample in samples: assert COLUMN_NAME in sample assert sample[COLUMN_NAME] is None
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/tests/test_override_samples_config.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integration_tests/tests/test_override_samples_config.py (2)
integration_tests/tests/conftest.py (1)
dbt_project(144-147)integration_tests/tests/dbt_project.py (2)
DbtProject(46-334)run_query(62-69)
🔇 Additional comments (2)
integration_tests/tests/test_override_samples_config.py (2)
1-4: LGTM!The imports are appropriate for the test file.
6-20: LGTM!The query template is correctly structured with proper brace escaping for the
.format()call.
Currently
test_sample_row_countcan be overridden by var which works globally but not for each model or test.This PR enables
test_sample_row_countconfig on tests which precedes global setting.Config would look like:
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.