feat(experiments): add metadata storage and dbt-core decoupling experiments#1035
Closed
feat(experiments): add metadata storage and dbt-core decoupling experiments#1035
Conversation
…iments Add experiment framework to validate two goals: 1. Store essential dbt metadata in warehouses alongside dbt runs 2. Decouple Recce from dbt-core using sqlglot + SQLAlchemy Experiments included: - metadata_extractor.py: Version-agnostic extraction from manifest/catalog (v9-v12+) - sql_adapter.py: Unified SQL adapter using sqlglot + SQLAlchemy - warehouse_storage.py: Store/retrieve metadata from warehouse tables - run_experiments.py: End-to-end test runner - EXPERIMENT_PLAN.md: Detailed plan with phases and success criteria Key findings: - Essential fields (nodes, sources, parent_map, checksum) are identical across v9-v12 - Can extract metadata without any dbt-core dependency - Catalog schema is stable at v1 across all dbt versions - Only ~13% of manifest is actually used by Recce (macros are 86%) Signed-off-by: Claude <claude@anthropic.com> Signed-off-by: Claude <noreply@anthropic.com>
Add a unified SQL adapter for Recce that:
- Reuses dbt profiles.yml for connection configuration
- Supports DuckDB and Snowflake (extensible architecture)
- Uses sqlglot for dialect-aware quoting when available
- Falls back to manual quoting when sqlglot not installed
- Includes TemplateRenderer for ref() and source() resolution
Key features:
- RecceAdapter.from_dbt_profile() to connect using existing dbt config
- RecceAdapter.connect_duckdb() and connect_snowflake() for direct connections
- Proper identifier quoting for each dialect
- Column introspection via get_columns()
- Template rendering for SQL with {{ ref('model') }} syntax
Signed-off-by: Claude <claude@anthropic.com>
Signed-off-by: Claude <noreply@anthropic.com>
Summarize findings from research ticket: "Research: Analyze Recce's dbt artifact consumption and dbt package feasibility" Key conclusions: - Recce uses only 13% of manifest.json (stable across v9-v12) - dbt package approach is feasible via on-run-end hooks - Warehouse storage schema designed for nodes, lineage, checksums - Full backward compatibility maintained Recommended approach: 1. Build dbt_recce package to store metadata in warehouse 2. Add --from-warehouse flag to Recce CLI 3. Reconstruct manifest format from warehouse tables All PoC code available in experiments/ directory. Signed-off-by: Claude <claude@anthropic.com> Signed-off-by: Claude <noreply@anthropic.com>
Contributor
Author
|
It's a no go due to macro issue. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add experiment framework to validate two goals:
Experiments included:
Key findings:
Signed-off-by: Claude claude@anthropic.com
Signed-off-by: Claude noreply@anthropic.com