Skip to content

Add create_real_model flag to test infra; document fusion schema caching root cause#926

Open
devin-ai-integration[bot] wants to merge 3 commits intomasterfrom
devin/ELE-5236-1770727334
Open

Add create_real_model flag to test infra; document fusion schema caching root cause#926
devin-ai-integration[bot] wants to merge 3 commits intomasterfrom
devin/ELE-5236-1770727334

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Feb 10, 2026

Add create_real_model flag to test infrastructure; document fusion schema caching root cause

Summary

This PR implements the experiment proposed in ELE-5236: adding a create_real_model flag to DbtProject.test() that creates a real SQL model (SELECT * FROM {{ ref('<seed>') }}) instead of pointing a source YAML at the seed table. The goal was to see if this would bypass dbt-fusion's schema caching for test_schema_changes.

Result: The experiment confirmed this approach does NOT fix the fusion issue. CI logs from both fusion/redshift and fusion/snowflake show the root cause is deeper than source-level caching:

  • adapter.get_columns_in_relation() (called in get_columns_snapshot_query.sql:15) returns cached column metadata across dbt invocations
  • Evidence: fusion emits Downloaded "..."."..."."test_schema_changes" (schema) on the first invocation but not on the second, even though the warehouse table has different columns
  • The real model is materialized correctly (the dbt run succeeds with new columns), but the subsequent dbt test still reads stale schema from fusion's cache

The @pytest.mark.skip_for_dbt_fusion marker is retained with an updated comment documenting this root cause.

What changed

dbt_project.pyDbtProject.test():

  • New create_real_model: bool = False parameter on all overloads
  • When True: seeds data as {table_name}_seed, creates a real model SQL doing SELECT * FROM {{ ref('{seed_name}') }}, runs dbt run to materialize it, then runs the test against the model
  • New _seed_and_run_model() helper keeps the seed CSV alive during dbt run (needed for {{ ref() }} resolution), then cleans up before the test phase uses a dummy model for dbt parsing

test_schema_changes.py:

  • test_schema_changes now uses create_real_model=True
  • skip_for_dbt_fusion retained with updated comment explaining the adapter.get_columns_in_relation() caching root cause

Review & Testing Checklist for Human

  • Verify _seed_and_run_model cleanup — the method uses nested context managers (DbtDataSeeder.seed + create_temp_model_for_existing_table). Confirm both the seed CSV and model SQL are properly cleaned up after the dbt run phase completes, and that no temp files leak on failure.
  • Third test call (no data, create_real_model=True) — uses a dummy model SQL (SELECT 1 AS col) for dbt parsing while the real model table persists in the warehouse from the previous call. Verify this works correctly on all non-fusion targets (the model table must survive between test() calls).
  • Confirm non-fusion targets still pass — the create_real_model path changes seeding behavior (uses _seed suffix). Wait for CI results on postgres, bigquery, redshift, snowflake (non-fusion) to confirm no regressions.

Recommended test plan: Wait for CI to complete on all targets. The fusion targets are expected to skip test_schema_changes. Non-fusion targets should pass.

Notes

  • The create_real_model flag defaults to False to avoid performance overhead for tests that don't need it.
  • test_schema_changes_from_baseline was not modified — it doesn't have a fusion skip and uses a different comparison mechanism (baseline vs. snapshot).
  • To actually fix test_schema_changes for fusion, the fix would need to happen at the dbt-fusion adapter level (clearing the schema cache between invocations) or by combining model run + test into a single dbt invocation.
  • Link to Devin run: https://app.devin.ai/sessions/f5312c9c03094bbeb4607ce994892176
  • Requested by: @haritamar

Summary by CodeRabbit

  • New Features

    • Added a toggle to create real seed-backed models during integration tests.
    • Added support for collecting multiple test results in a single call.
  • Tests

    • Updated integration tests to exercise the new real-model pathway and multi-result behavior.
    • Updated skip comment to document the root cause of fusion schema caching.

- Add create_real_model parameter to DbtProject.test() that creates a real
  SQL model (SELECT * FROM seed) instead of a source YAML pointing to the
  seed table. This avoids dbt-fusion's schema caching issue.
- When create_real_model=True, seeds use a '_seed' suffix name to avoid
  conflicts with the model, and the model is run via dbt run before testing.
- Update test_schema_changes to use create_real_model=True and remove the
  skip_for_dbt_fusion marker.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@linear
Copy link

linear bot commented Feb 10, 2026

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

Added a boolean flag create_real_model to DbtProject.test and overloads. When true, seeds are named "{table_name}_seed", a temporary SQL model selecting from that seed is created and run before testing, and test execution uses that real model path instead of YAML-only model definitions.

Changes

Cohort / File(s) Summary
Core API & Helpers
integration_tests/tests/dbt_project.py
Added create_real_model: bool = False to DbtProject.test overloads and main signature. Introduced _seed_and_run_model(data, seed_name, model_name) and logic to derive seed_name as {table_name}_seed when create_real_model is True. Branches test flow to create+run a temporary SQL model (materialization="table") referencing the seed instead of the previous as_model YAML path.
Integration Tests
integration_tests/tests/test_schema_changes.py
Updated three DbtProject.test() calls to pass create_real_model=True (and multiple_results=True for one call). Removed skip_for_dbt_fusion decorator and adjusted call formatting to explicit keyword args to accommodate the new parameters.

Sequence Diagram

sequenceDiagram
    participant Test as Test Caller
    participant DbtProject as DbtProject
    participant SeedMgr as Seed Manager
    participant ModelCtx as Temp Model Context
    participant DbtRunner as Dbt Runner

    Test->>DbtProject: test(..., create_real_model=True)
    DbtProject->>SeedMgr: seed data as `table_name_seed`
    SeedMgr-->>DbtProject: seed persisted
    DbtProject->>ModelCtx: create temporary SQL model (select * from ref(seed))
    ModelCtx-->>DbtProject: model file/context ready
    DbtProject->>DbtRunner: run model (dbt run)
    DbtRunner-->>DbtProject: model materialized
    DbtProject->>DbtRunner: execute test (dbt test)
    DbtRunner-->>DbtProject: test results
    DbtProject-->>Test: return results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I seeded the ground with a careful name,
Then stitched a real model to play the game,
No phantom cache can hide what’s true,
The runner hops in and checks it too,
Hooray — tests see the schema anew!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed All coding requirements from ELE-5236 are met: create_real_model flag added with False default, real SQL model creation implemented, model run before test execution implemented, and seed naming behavior with '_seed' suffix correctly applied.
Out of Scope Changes check ✅ Passed All changes are in scope: modifications to DbtProject.test() and its helper methods, application of the new flag in test_schema_changes, and removal of the skip_for_dbt_fusion decorator are all directly aligned with the stated objectives.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a create_real_model flag to test infrastructure with documentation of the underlying issue (dbt-fusion schema caching).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/ELE-5236-1770727334

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration bot and others added 2 commits February 10, 2026 12:50
Separate seed+model-run phase (where seed CSV must persist for ref
resolution) from test phase (where only a dummy model SQL is needed).
Add _seed_and_run_model helper that keeps the seed context open while
creating and running the real model.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
CI confirmed that dbt-fusion caches column metadata from
adapter.get_columns_in_relation() across invocations. The real model
approach fixes the warehouse table schema but the test's schema
introspection still returns stale cached results. Re-add the skip
marker with an updated comment documenting the root cause.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration devin-ai-integration bot changed the title Fix test_schema_changes for dbt-fusion by adding create_real_model flag Add create_real_model flag to test infra; document fusion schema caching root cause Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant