Skip to content

Conversation

@Ryosuke839
Copy link
Contributor

@Ryosuke839 Ryosuke839 commented Dec 12, 2025

Currently test_sample_row_count can be overridden by var which works globally but not for each model or test.

This PR enables test_sample_row_count config on tests which precedes global setting.

Config would look like:

models:
  - name: some_model
    columns:
      - name: some_column
        data_tests:
          - not_null:
              meta:
                test_sample_row_count: null

Summary by CodeRabbit

  • New Features

    • Per-test sample row count override: individual tests can now override the global sample-row-count setting to control how many test-result samples are captured.
  • Tests

    • Added integration tests validating override behavior for both unlimited and limited sample retention, ensuring samples are returned as configured and sampling limits are respected.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 12, 2025

Walkthrough

Adds per-test sample row count override support in the test materialization macro and introduces integration tests that validate unlimited and limited sampling when test metadata overrides the global sample limit.

Changes

Cohort / File(s) Summary
New test module for sample overrides
integration_tests/tests/test_override_samples_config.py
Adds an integration test module with two tests: test_sample_count_unlimited (metadata sets sample count to None → expects 20 samples returned) and test_sample_count_small (metadata sets sample count to 3 → expects 3 samples). Both use materialized elementary tests and verify sample JSON content.
Macro override logic
macros/edr/materializations/test/test.sql
In handle_dbt_test, sample_limit now checks flattened_test.meta.test_sample_row_count and will override the configured test_sample_row_count when present; this override is applied before existing sampling-disable and PII-based conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Inspect handle_dbt_test precedence to ensure per-test meta overrides don't bypass intended global disables (PII or explicit sampling off).
  • Verify SQL generation and binding of sample_limit remains safe across adapters.
  • Review new integration tests for robustness and correct teardown/fixture usage.

Poem

🐰 I nibbled through configs late at night,
Per-test crumbs now gleam in the light,
Twenty or three — whichever you please,
Samples served up with elegant ease,
A hop, a test, and all is right. 🌙

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: enabling test_sample_row_count configuration at the test level, which is the core objective of this pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

👋 @Ryosuke839
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@Ryosuke839 Ryosuke839 marked this pull request as ready for review December 12, 2025 07:40
@Ryosuke839 Ryosuke839 changed the title Enable test_sample_row_count on tests Enable test_sample_row_count config on tests Dec 12, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integration_tests/tests/test_override_samples_config.py (1)

1-80: LGTM! Comprehensive test coverage for per-test sample count overrides.

Both test functions correctly validate the new feature:

  • test_sample_count_unlimited confirms that None overrides the global limit of 5 to allow unlimited sampling (20 samples)
  • test_sample_count_small confirms that an integer value (3) overrides the global limit of 5

The test setup, assertions, and ClickHouse exclusion are all appropriate.

Minor style suggestion for SAMPLES_QUERY (line 19): The subquery uses select * from a CTE that has only one column. Consider being explicit:

     select result_row
     from {{ ref("test_result_rows") }}
-    where elementary_test_results_id in (select * from latest_elementary_test_result)
+    where elementary_test_results_id in (select id from latest_elementary_test_result)

This is purely a style preference and doesn't affect functionality.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8d91cf6 and 06784c2.

📒 Files selected for processing (2)
  • integration_tests/tests/test_override_samples_config.py (1 hunks)
  • macros/edr/materializations/test/test.sql (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: integration_tests/tests/dbt_project.py:191-201
Timestamp: 2025-08-10T11:29:19.004Z
Learning: In the Elementary dbt package integration tests, BigQuery works correctly with the default `("database", "schema")` property mapping in the `get_database_and_schema_properties` function. When using `target.database` and `target.schema` in source definitions, BigQuery's dbt adapter handles these references appropriately without requiring special mapping to `project` and `dataset`.
Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 861
File: macros/edr/tests/test_utils/clean_elementary_test_tables.sql:45-50
Timestamp: 2025-09-15T06:06:05.601Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers simpler, direct approaches over creating helper macros for centralized quoting logic, even when it might provide better maintainability across multiple use cases. They value code simplicity and readability over abstraction.
📚 Learning: 2025-09-04T09:14:40.621Z
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/edr/tests/test_dimension_anomalies.sql:73-75
Timestamp: 2025-09-04T09:14:40.621Z
Learning: In the dbt-data-reliability codebase, `flatten_model` is a valid macro defined in `macros/edr/dbt_artifacts/upload_dbt_models.sql` and can be used as a parameter to `elementary.get_anomaly_query()` even though other test files use the `flattened_test` variable instead.

Applied to files:

  • macros/edr/materializations/test/test.sql
📚 Learning: 2025-07-22T15:41:57.338Z
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/utils/table_operations/create_table_as.sql:0-0
Timestamp: 2025-07-22T15:41:57.338Z
Learning: The dbt.create_table_as macro breaks in dbt-fusion when not run from a materialization context, requiring custom implementations for table creation functionality in dbt-fusion compatible packages.

Applied to files:

  • macros/edr/materializations/test/test.sql
🧬 Code graph analysis (1)
integration_tests/tests/test_override_samples_config.py (2)
integration_tests/tests/conftest.py (2)
  • dbt_project (144-147)
  • test_id (180-183)
integration_tests/tests/dbt_project.py (2)
  • DbtProject (46-334)
  • run_query (62-69)
🔇 Additional comments (1)
macros/edr/materializations/test/test.sql (1)

54-56: LGTM! Test-level sample count override implemented correctly.

The precedence is correct: test-specific meta.test_sample_row_count overrides the global config (line 53 defaults to 5) but is still subject to disable_test_samples and PII protection checks on lines 61-67. When set to None, the resulting unlimited sampling works correctly because line 136 uses is not none to conditionally add the LIMIT clause.

No other parts of the codebase reference test_sample_row_count in ways that need updating. Note: the feature lacks user-facing documentation.

@NoyaOffer
Copy link
Contributor

Hey @Ryosuke839 ! This is a great feature, thank you for the contribution. We'll take a deeper look.

@Ryosuke839 Ryosuke839 temporarily deployed to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Inactive
@Ryosuke839 Ryosuke839 temporarily deployed to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Inactive
@Ryosuke839 Ryosuke839 temporarily deployed to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Inactive
@Ryosuke839 Ryosuke839 temporarily deployed to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Inactive
@Ryosuke839 Ryosuke839 temporarily deployed to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Inactive
@Ryosuke839 Ryosuke839 temporarily deployed to elementary_test_env December 17, 2025 13:47 — with GitHub Actions Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
integration_tests/tests/test_override_samples_config.py (3)

23-49: Add docstrings to clarify test objectives.

Consider adding a docstring to explain that this test validates unlimited sampling when test_sample_row_count is set to None in test metadata, overriding the global limit of 5.

Apply this diff:

 @pytest.mark.skip_targets(["clickhouse"])
 def test_sample_count_unlimited(test_id: str, dbt_project: DbtProject):
+    """Test that setting test_sample_row_count to None in meta allows unlimited samples.
+    
+    Sets global test_sample_row_count to 5, but overrides with None in test meta.
+    Expects all 20 failing rows to be sampled.
+    """
     null_count = 20

52-78: Add docstring and consider additional test case.

  1. Add a docstring to clarify this test validates that test-level overrides work correctly when set to a value smaller than the global limit.
  2. Consider adding a test case where the test-level override is larger than the global setting (e.g., meta sets 10 when global is 5) to ensure the override truly takes precedence in both directions.

Apply this diff:

 @pytest.mark.skip_targets(["clickhouse"])
 def test_sample_count_small(test_id: str, dbt_project: DbtProject):
+    """Test that test_sample_row_count in meta overrides the global setting.
+    
+    Sets global test_sample_row_count to 5, but overrides with 3 in test meta.
+    Expects exactly 3 samples despite 20 failing rows.
+    """
     null_count = 20

23-78: Consider extracting a helper function to reduce duplication.

Both test functions share similar setup and verification logic. You could extract a helper function that accepts the test configuration and expected sample count as parameters to reduce duplication.

For example:

def _run_sample_count_test(
    test_id: str,
    dbt_project: DbtProject,
    test_sample_row_count_override,
    expected_sample_count: int,
):
    """Helper to run sample count test with specified override."""
    null_count = 20
    data = [{COLUMN_NAME: None} for _ in range(null_count)]

    test_result = dbt_project.test(
        test_id,
        "not_null",
        dict(column_name=COLUMN_NAME),
        data=data,
        as_model=True,
        test_vars={
            "enable_elementary_test_materialization": True,
            "test_sample_row_count": 5,
        },
        test_config={"meta": {"test_sample_row_count": test_sample_row_count_override}},
    )
    assert test_result["status"] == "fail"

    samples = [
        json.loads(row["result_row"])
        for row in dbt_project.run_query(SAMPLES_QUERY.format(test_id=test_id))
    ]
    assert len(samples) == expected_sample_count
    for sample in samples:
        assert COLUMN_NAME in sample
        assert sample[COLUMN_NAME] is None
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 06784c2 and 7228b2c.

📒 Files selected for processing (1)
  • integration_tests/tests/test_override_samples_config.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integration_tests/tests/test_override_samples_config.py (2)
integration_tests/tests/conftest.py (1)
  • dbt_project (144-147)
integration_tests/tests/dbt_project.py (2)
  • DbtProject (46-334)
  • run_query (62-69)
🔇 Additional comments (2)
integration_tests/tests/test_override_samples_config.py (2)

1-4: LGTM!

The imports are appropriate for the test file.


6-20: LGTM!

The query template is correctly structured with proper brace escaping for the .format() call.

@Ryosuke839 Ryosuke839 deployed to elementary_test_env December 28, 2025 15:07 — with GitHub Actions Active
@arbiv arbiv merged commit a45c384 into elementary-data:master Dec 30, 2025
52 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants