Add create_real_model flag to test infra; document fusion schema caching root cause#926
Add create_real_model flag to test infra; document fusion schema caching root cause#926devin-ai-integration[bot] wants to merge 3 commits intomasterfrom
Conversation
- Add create_real_model parameter to DbtProject.test() that creates a real SQL model (SELECT * FROM seed) instead of a source YAML pointing to the seed table. This avoids dbt-fusion's schema caching issue. - When create_real_model=True, seeds use a '_seed' suffix name to avoid conflicts with the model, and the model is run via dbt run before testing. - Update test_schema_changes to use create_real_model=True and remove the skip_for_dbt_fusion marker. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
📝 WalkthroughWalkthroughAdded a boolean flag Changes
Sequence DiagramsequenceDiagram
participant Test as Test Caller
participant DbtProject as DbtProject
participant SeedMgr as Seed Manager
participant ModelCtx as Temp Model Context
participant DbtRunner as Dbt Runner
Test->>DbtProject: test(..., create_real_model=True)
DbtProject->>SeedMgr: seed data as `table_name_seed`
SeedMgr-->>DbtProject: seed persisted
DbtProject->>ModelCtx: create temporary SQL model (select * from ref(seed))
ModelCtx-->>DbtProject: model file/context ready
DbtProject->>DbtRunner: run model (dbt run)
DbtRunner-->>DbtProject: model materialized
DbtProject->>DbtRunner: execute test (dbt test)
DbtRunner-->>DbtProject: test results
DbtProject-->>Test: return results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Separate seed+model-run phase (where seed CSV must persist for ref resolution) from test phase (where only a dummy model SQL is needed). Add _seed_and_run_model helper that keeps the seed context open while creating and running the real model. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
CI confirmed that dbt-fusion caches column metadata from adapter.get_columns_in_relation() across invocations. The real model approach fixes the warehouse table schema but the test's schema introspection still returns stale cached results. Re-add the skip marker with an updated comment documenting the root cause. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Add
create_real_modelflag to test infrastructure; document fusion schema caching root causeSummary
This PR implements the experiment proposed in ELE-5236: adding a
create_real_modelflag toDbtProject.test()that creates a real SQL model (SELECT * FROM {{ ref('<seed>') }}) instead of pointing a source YAML at the seed table. The goal was to see if this would bypass dbt-fusion's schema caching fortest_schema_changes.Result: The experiment confirmed this approach does NOT fix the fusion issue. CI logs from both fusion/redshift and fusion/snowflake show the root cause is deeper than source-level caching:
adapter.get_columns_in_relation()(called inget_columns_snapshot_query.sql:15) returns cached column metadata across dbt invocationsDownloaded "..."."..."."test_schema_changes" (schema)on the first invocation but not on the second, even though the warehouse table has different columnsdbt runsucceeds with new columns), but the subsequentdbt teststill reads stale schema from fusion's cacheThe
@pytest.mark.skip_for_dbt_fusionmarker is retained with an updated comment documenting this root cause.What changed
dbt_project.py—DbtProject.test():create_real_model: bool = Falseparameter on all overloadsTrue: seeds data as{table_name}_seed, creates a real model SQL doingSELECT * FROM {{ ref('{seed_name}') }}, runsdbt runto materialize it, then runs the test against the model_seed_and_run_model()helper keeps the seed CSV alive duringdbt run(needed for{{ ref() }}resolution), then cleans up before the test phase uses a dummy model for dbt parsingtest_schema_changes.py:test_schema_changesnow usescreate_real_model=Trueskip_for_dbt_fusionretained with updated comment explaining theadapter.get_columns_in_relation()caching root causeReview & Testing Checklist for Human
_seed_and_run_modelcleanup — the method uses nested context managers (DbtDataSeeder.seed+create_temp_model_for_existing_table). Confirm both the seed CSV and model SQL are properly cleaned up after thedbt runphase completes, and that no temp files leak on failure.create_real_model=True) — uses a dummy model SQL (SELECT 1 AS col) for dbt parsing while the real model table persists in the warehouse from the previous call. Verify this works correctly on all non-fusion targets (the model table must survive betweentest()calls).create_real_modelpath changes seeding behavior (uses_seedsuffix). Wait for CI results on postgres, bigquery, redshift, snowflake (non-fusion) to confirm no regressions.Recommended test plan: Wait for CI to complete on all targets. The fusion targets are expected to skip
test_schema_changes. Non-fusion targets should pass.Notes
create_real_modelflag defaults toFalseto avoid performance overhead for tests that don't need it.test_schema_changes_from_baselinewas not modified — it doesn't have a fusion skip and uses a different comparison mechanism (baseline vs. snapshot).test_schema_changesfor fusion, the fix would need to happen at the dbt-fusion adapter level (clearing the schema cache between invocations) or by combining model run + test into a single dbt invocation.Summary by CodeRabbit
New Features
Tests