Enable test_sample_row_count config on tests #903

Ryosuke839 · 2025-12-12T07:21:22Z

Currently test_sample_row_count can be overridden by var which works globally but not for each model or test.

This PR enables test_sample_row_count config on tests which precedes global setting.

Config would look like:

models:
  - name: some_model
    columns:
      - name: some_column
        data_tests:
          - not_null:
              meta:
                test_sample_row_count: null

Summary by CodeRabbit

New Features
- Per-test sample row count override: individual tests can now override the global sample-row-count setting to control how many test-result samples are captured.
Tests
- Added integration tests validating override behavior for both unlimited and limited sample retention, ensuring samples are returned as configured and sampling limits are respected.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-12T07:21:29Z

Walkthrough

Adds per-test sample row count override support in the test materialization macro and introduces integration tests that validate unlimited and limited sampling when test metadata overrides the global sample limit.

Changes

Cohort / File(s)	Summary
New test module for sample overrides `integration_tests/tests/test_override_samples_config.py`	Adds an integration test module with two tests: `test_sample_count_unlimited` (metadata sets sample count to None → expects 20 samples returned) and `test_sample_count_small` (metadata sets sample count to 3 → expects 3 samples). Both use materialized elementary tests and verify sample JSON content.
Macro override logic `macros/edr/materializations/test/test.sql`	In `handle_dbt_test`, `sample_limit` now checks `flattened_test.meta.test_sample_row_count` and will override the configured `test_sample_row_count` when present; this override is applied before existing sampling-disable and PII-based conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Inspect handle_dbt_test precedence to ensure per-test meta overrides don't bypass intended global disables (PII or explicit sampling off).
Verify SQL generation and binding of sample_limit remains safe across adapters.
Review new integration tests for robustness and correct teardown/fixture usage.

Poem

🐰 I nibbled through configs late at night,
Per-test crumbs now gleam in the light,
Twenty or three — whichever you please,
Samples served up with elegant ease,
A hop, a test, and all is right. 🌙

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: enabling test_sample_row_count configuration at the test level, which is the core objective of this pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-12T07:21:33Z

👋 @Ryosuke839
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

integration_tests/tests/test_override_samples_config.py (1)
1-80: LGTM! Comprehensive test coverage for per-test sample count overrides.

Both test functions correctly validate the new feature:

test_sample_count_unlimited confirms that None overrides the global limit of 5 to allow unlimited sampling (20 samples)

test_sample_count_small confirms that an integer value (3) overrides the global limit of 5

The test setup, assertions, and ClickHouse exclusion are all appropriate.

Minor style suggestion for SAMPLES_QUERY (line 19): The subquery uses select * from a CTE that has only one column. Consider being explicit:
     select result_row
     from {{ ref("test_result_rows") }}
-    where elementary_test_results_id in (select * from latest_elementary_test_result)
+    where elementary_test_results_id in (select id from latest_elementary_test_result)
This is purely a style preference and doesn't affect functionality.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8d91cf6 and 06784c2.

📒 Files selected for processing (2)

integration_tests/tests/test_override_samples_config.py (1 hunks)
macros/edr/materializations/test/test.sql (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: integration_tests/tests/dbt_project.py:191-201
Timestamp: 2025-08-10T11:29:19.004Z
Learning: In the Elementary dbt package integration tests, BigQuery works correctly with the default `("database", "schema")` property mapping in the `get_database_and_schema_properties` function. When using `target.database` and `target.schema` in source definitions, BigQuery's dbt adapter handles these references appropriately without requiring special mapping to `project` and `dataset`.

Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 861
File: macros/edr/tests/test_utils/clean_elementary_test_tables.sql:45-50
Timestamp: 2025-09-15T06:06:05.601Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers simpler, direct approaches over creating helper macros for centralized quoting logic, even when it might provide better maintainability across multiple use cases. They value code simplicity and readability over abstraction.

📚 Learning: 2025-09-04T09:14:40.621Z

Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/edr/tests/test_dimension_anomalies.sql:73-75
Timestamp: 2025-09-04T09:14:40.621Z
Learning: In the dbt-data-reliability codebase, `flatten_model` is a valid macro defined in `macros/edr/dbt_artifacts/upload_dbt_models.sql` and can be used as a parameter to `elementary.get_anomaly_query()` even though other test files use the `flattened_test` variable instead.

Applied to files:

macros/edr/materializations/test/test.sql

📚 Learning: 2025-07-22T15:41:57.338Z

Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/utils/table_operations/create_table_as.sql:0-0
Timestamp: 2025-07-22T15:41:57.338Z
Learning: The dbt.create_table_as macro breaks in dbt-fusion when not run from a materialization context, requiring custom implementations for table creation functionality in dbt-fusion compatible packages.

Applied to files:

macros/edr/materializations/test/test.sql

🧬 Code graph analysis (1)

integration_tests/tests/test_override_samples_config.py (2)

integration_tests/tests/conftest.py (2)

dbt_project (144-147)

test_id (180-183)

integration_tests/tests/dbt_project.py (2)

DbtProject (46-334)

run_query (62-69)

🔇 Additional comments (1)

macros/edr/materializations/test/test.sql (1)

54-56: LGTM! Test-level sample count override implemented correctly.

The precedence is correct: test-specific meta.test_sample_row_count overrides the global config (line 53 defaults to 5) but is still subject to disable_test_samples and PII protection checks on lines 61-67. When set to None, the resulting unlimited sampling works correctly because line 136 uses is not none to conditionally add the LIMIT clause.

No other parts of the codebase reference test_sample_row_count in ways that need updating. Note: the feature lacks user-facing documentation.

NoyaOffer · 2025-12-17T12:27:24Z

Hey @Ryosuke839 ! This is a great feature, thank you for the contribution. We'll take a deeper look.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

integration_tests/tests/test_override_samples_config.py (3)

23-49: Add docstrings to clarify test objectives.

Consider adding a docstring to explain that this test validates unlimited sampling when test_sample_row_count is set to None in test metadata, overriding the global limit of 5.

Apply this diff:

 @pytest.mark.skip_targets(["clickhouse"])
 def test_sample_count_unlimited(test_id: str, dbt_project: DbtProject):
+    """Test that setting test_sample_row_count to None in meta allows unlimited samples.
+    
+    Sets global test_sample_row_count to 5, but overrides with None in test meta.
+    Expects all 20 failing rows to be sampled.
+    """
     null_count = 20

52-78: Add docstring and consider additional test case.

Add a docstring to clarify this test validates that test-level overrides work correctly when set to a value smaller than the global limit.
Consider adding a test case where the test-level override is larger than the global setting (e.g., meta sets 10 when global is 5) to ensure the override truly takes precedence in both directions.

Apply this diff:

 @pytest.mark.skip_targets(["clickhouse"])
 def test_sample_count_small(test_id: str, dbt_project: DbtProject):
+    """Test that test_sample_row_count in meta overrides the global setting.
+    
+    Sets global test_sample_row_count to 5, but overrides with 3 in test meta.
+    Expects exactly 3 samples despite 20 failing rows.
+    """
     null_count = 20

23-78: Consider extracting a helper function to reduce duplication.

Both test functions share similar setup and verification logic. You could extract a helper function that accepts the test configuration and expected sample count as parameters to reduce duplication.

For example:

def _run_sample_count_test(
    test_id: str,
    dbt_project: DbtProject,
    test_sample_row_count_override,
    expected_sample_count: int,
):
    """Helper to run sample count test with specified override."""
    null_count = 20
    data = [{COLUMN_NAME: None} for _ in range(null_count)]

    test_result = dbt_project.test(
        test_id,
        "not_null",
        dict(column_name=COLUMN_NAME),
        data=data,
        as_model=True,
        test_vars={
            "enable_elementary_test_materialization": True,
            "test_sample_row_count": 5,
        },
        test_config={"meta": {"test_sample_row_count": test_sample_row_count_override}},
    )
    assert test_result["status"] == "fail"

    samples = [
        json.loads(row["result_row"])
        for row in dbt_project.run_query(SAMPLES_QUERY.format(test_id=test_id))
    ]
    assert len(samples) == expected_sample_count
    for sample in samples:
        assert COLUMN_NAME in sample
        assert sample[COLUMN_NAME] is None

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 06784c2 and 7228b2c.

📒 Files selected for processing (1)

integration_tests/tests/test_override_samples_config.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

integration_tests/tests/test_override_samples_config.py (2)

integration_tests/tests/conftest.py (1)

dbt_project (144-147)

integration_tests/tests/dbt_project.py (2)

DbtProject (46-334)

run_query (62-69)

🔇 Additional comments (2)

integration_tests/tests/test_override_samples_config.py (2)

1-4: LGTM!

The imports are appropriate for the test file.

6-20: LGTM!

The query template is correctly structured with proper brace escaping for the .format() call.

Ryosuke839 added 2 commits December 12, 2025 16:12

Override test_sample_row_count by test config

369bffc

Add tests

06784c2

Ryosuke839 had a problem deploying to elementary_test_env December 12, 2025 07:21 — with GitHub Actions Error

Ryosuke839 marked this pull request as ready for review December 12, 2025 07:40

Ryosuke839 changed the title ~~Enable test_sample_row_count on tests~~ Enable test_sample_row_count config on tests Dec 12, 2025

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

lint

7228b2c

Ryosuke839 had a problem deploying to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Failure

Ryosuke839 temporarily deployed to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Inactive

Ryosuke839 had a problem deploying to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Failure

Ryosuke839 temporarily deployed to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Inactive

Ryosuke839 had a problem deploying to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Failure

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

Ryosuke839 deployed to elementary_test_env December 28, 2025 15:07 — with GitHub Actions Active

arbiv approved these changes Dec 28, 2025

View reviewed changes

arbiv merged commit a45c384 into elementary-data:master Dec 30, 2025
52 of 61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable test_sample_row_count config on tests #903

Enable test_sample_row_count config on tests #903

Ryosuke839 commented Dec 12, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

NoyaOffer commented Dec 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable test_sample_row_count config on tests #903

Enable test_sample_row_count config on tests #903

Conversation

Ryosuke839 commented Dec 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

NoyaOffer commented Dec 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ryosuke839 commented Dec 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 12, 2025 •

edited

Loading