Add create_real_model flag to test infra; document fusion schema caching root cause by devin-ai-integration[bot] · Pull Request #926 · elementary-data/dbt-data-reliability

devin-ai-integration · 2026-02-10T12:45:23Z

Add `create_real_model` flag to test infrastructure; document fusion schema caching root cause

Summary

This PR implements the experiment proposed in ELE-5236: adding a create_real_model flag to DbtProject.test() that creates a real SQL model (SELECT * FROM {{ ref('<seed>') }}) instead of pointing a source YAML at the seed table. The goal was to see if this would bypass dbt-fusion's schema caching for test_schema_changes.

Result: The experiment confirmed this approach does NOT fix the fusion issue. CI logs from both fusion/redshift and fusion/snowflake show the root cause is deeper than source-level caching:

adapter.get_columns_in_relation() (called in get_columns_snapshot_query.sql:15) returns cached column metadata across dbt invocations
Evidence: fusion emits Downloaded "..."."..."."test_schema_changes" (schema) on the first invocation but not on the second, even though the warehouse table has different columns
The real model is materialized correctly (the dbt run succeeds with new columns), but the subsequent dbt test still reads stale schema from fusion's cache

The @pytest.mark.skip_for_dbt_fusion marker is retained with an updated comment documenting this root cause.

What changed

dbt_project.py — DbtProject.test():

New create_real_model: bool = False parameter on all overloads
When True: seeds data as {table_name}_seed, creates a real model SQL doing SELECT * FROM {{ ref('{seed_name}') }}, runs dbt run to materialize it, then runs the test against the model
New _seed_and_run_model() helper keeps the seed CSV alive during dbt run (needed for {{ ref() }} resolution), then cleans up before the test phase uses a dummy model for dbt parsing

test_schema_changes.py:

test_schema_changes now uses create_real_model=True
skip_for_dbt_fusion retained with updated comment explaining the adapter.get_columns_in_relation() caching root cause

Review & Testing Checklist for Human

Verify _seed_and_run_model cleanup — the method uses nested context managers (DbtDataSeeder.seed + create_temp_model_for_existing_table). Confirm both the seed CSV and model SQL are properly cleaned up after the dbt run phase completes, and that no temp files leak on failure.
Third test call (no data, create_real_model=True) — uses a dummy model SQL (SELECT 1 AS col) for dbt parsing while the real model table persists in the warehouse from the previous call. Verify this works correctly on all non-fusion targets (the model table must survive between test() calls).
Confirm non-fusion targets still pass — the create_real_model path changes seeding behavior (uses _seed suffix). Wait for CI results on postgres, bigquery, redshift, snowflake (non-fusion) to confirm no regressions.

Recommended test plan: Wait for CI to complete on all targets. The fusion targets are expected to skip test_schema_changes. Non-fusion targets should pass.

Notes

The create_real_model flag defaults to False to avoid performance overhead for tests that don't need it.
test_schema_changes_from_baseline was not modified — it doesn't have a fusion skip and uses a different comparison mechanism (baseline vs. snapshot).
To actually fix test_schema_changes for fusion, the fix would need to happen at the dbt-fusion adapter level (clearing the schema cache between invocations) or by combining model run + test into a single dbt invocation.
Link to Devin run: https://app.devin.ai/sessions/f5312c9c03094bbeb4607ce994892176
Requested by: @haritamar

Summary by CodeRabbit

New Features
- Added a toggle to create real seed-backed models during integration tests.
- Added support for collecting multiple test results in a single call.
Tests
- Updated integration tests to exercise the new real-model pathway and multi-result behavior.
- Updated skip comment to document the root cause of fusion schema caching.

- Add create_real_model parameter to DbtProject.test() that creates a real SQL model (SELECT * FROM seed) instead of a source YAML pointing to the seed table. This avoids dbt-fusion's schema caching issue. - When create_real_model=True, seeds use a '_seed' suffix name to avoid conflicts with the model, and the model is run via dbt run before testing. - Update test_schema_changes to use create_real_model=True and remove the skip_for_dbt_fusion marker. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

linear · 2026-02-10T12:45:26Z

ELE-5236 Fix test_schema_changes in the integration tests for dbt fusion

devin-ai-integration · 2026-02-10T12:45:27Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2026-02-10T12:45:34Z

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2026-02-10T12:45:42Z

📝 Walkthrough

Walkthrough

Added a boolean flag create_real_model to DbtProject.test and overloads. When true, seeds are named "{table_name}_seed", a temporary SQL model selecting from that seed is created and run before testing, and test execution uses that real model path instead of YAML-only model definitions.

Changes

Cohort / File(s)	Summary
Core API & Helpers `integration_tests/tests/dbt_project.py`	Added `create_real_model: bool = False` to `DbtProject.test` overloads and main signature. Introduced `_seed_and_run_model(data, seed_name, model_name)` and logic to derive `seed_name` as `{table_name}_seed` when `create_real_model` is True. Branches test flow to create+run a temporary SQL model (materialization="table") referencing the seed instead of the previous `as_model` YAML path.
Integration Tests `integration_tests/tests/test_schema_changes.py`	Updated three `DbtProject.test()` calls to pass `create_real_model=True` (and `multiple_results=True` for one call). Removed `skip_for_dbt_fusion` decorator and adjusted call formatting to explicit keyword args to accommodate the new parameters.

Sequence Diagram

sequenceDiagram
    participant Test as Test Caller
    participant DbtProject as DbtProject
    participant SeedMgr as Seed Manager
    participant ModelCtx as Temp Model Context
    participant DbtRunner as Dbt Runner

    Test->>DbtProject: test(..., create_real_model=True)
    DbtProject->>SeedMgr: seed data as `table_name_seed`
    SeedMgr-->>DbtProject: seed persisted
    DbtProject->>ModelCtx: create temporary SQL model (select * from ref(seed))
    ModelCtx-->>DbtProject: model file/context ready
    DbtProject->>DbtRunner: run model (dbt run)
    DbtRunner-->>DbtProject: model materialized
    DbtProject->>DbtRunner: execute test (dbt test)
    DbtRunner-->>DbtProject: test results
    DbtProject-->>Test: return results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I seeded the ground with a careful name,
Then stitched a real model to play the game,
No phantom cache can hide what’s true,
The runner hops in and checks it too,
Hooray — tests see the schema anew!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	All coding requirements from ELE-5236 are met: create_real_model flag added with False default, real SQL model creation implemented, model run before test execution implemented, and seed naming behavior with '_seed' suffix correctly applied.
Out of Scope Changes check	✅ Passed	All changes are in scope: modifications to DbtProject.test() and its helper methods, application of the new flag in test_schema_changes, and removal of the skip_for_dbt_fusion decorator are all directly aligned with the stated objectives.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a create_real_model flag to test infrastructure with documentation of the underlying issue (dbt-fusion schema caching).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/ELE-5236-1770727334

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Separate seed+model-run phase (where seed CSV must persist for ref resolution) from test phase (where only a dummy model SQL is needed). Add _seed_and_run_model helper that keeps the seed context open while creating and running the real model. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

CI confirmed that dbt-fusion caches column metadata from adapter.get_columns_in_relation() across invocations. The real model approach fixes the warehouse table schema but the test's schema introspection still returns stale cached results. Re-add the skip marker with an updated comment documenting the root cause. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot and others added 2 commits February 10, 2026 12:50

devin-ai-integration bot changed the title ~~Fix test_schema_changes for dbt-fusion by adding create_real_model flag~~ Add create_real_model flag to test infra; document fusion schema caching root cause Feb 10, 2026

devin-ai-integration bot assigned haritamar Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add create_real_model flag to test infra; document fusion schema caching root cause#926

Add create_real_model flag to test infra; document fusion schema caching root cause#926
devin-ai-integration[bot] wants to merge 3 commits intomasterfrom
devin/ELE-5236-1770727334

devin-ai-integration bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

linear bot commented Feb 10, 2026

Uh oh!

devin-ai-integration bot commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add create_real_model flag to test infrastructure; document fusion schema caching root cause

Summary

What changed

Review & Testing Checklist for Human

Notes

Summary by CodeRabbit

Uh oh!

linear bot commented Feb 10, 2026

Uh oh!

devin-ai-integration bot commented Feb 10, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Feb 10, 2026 •

edited

Loading

Add `create_real_model` flag to test infrastructure; document fusion schema caching root cause

coderabbitai bot commented Feb 10, 2026 •

edited

Loading