Skip to content

Comments

feat(experiments): add metadata storage and dbt-core decoupling experiments#1035

Closed
even-wei wants to merge 3 commits intomainfrom
claude/explain-manifest-catalog-ELizl
Closed

feat(experiments): add metadata storage and dbt-core decoupling experiments#1035
even-wei wants to merge 3 commits intomainfrom
claude/explain-manifest-catalog-ELizl

Conversation

@even-wei
Copy link
Contributor

Add experiment framework to validate two goals:

  1. Store essential dbt metadata in warehouses alongside dbt runs
  2. Decouple Recce from dbt-core using sqlglot + SQLAlchemy

Experiments included:

  • metadata_extractor.py: Version-agnostic extraction from manifest/catalog (v9-v12+)
  • sql_adapter.py: Unified SQL adapter using sqlglot + SQLAlchemy
  • warehouse_storage.py: Store/retrieve metadata from warehouse tables
  • run_experiments.py: End-to-end test runner
  • EXPERIMENT_PLAN.md: Detailed plan with phases and success criteria

Key findings:

  • Essential fields (nodes, sources, parent_map, checksum) are identical across v9-v12
  • Can extract metadata without any dbt-core dependency
  • Catalog schema is stable at v1 across all dbt versions
  • Only ~13% of manifest is actually used by Recce (macros are 86%)

Signed-off-by: Claude claude@anthropic.com
Signed-off-by: Claude noreply@anthropic.com

…iments

Add experiment framework to validate two goals:
1. Store essential dbt metadata in warehouses alongside dbt runs
2. Decouple Recce from dbt-core using sqlglot + SQLAlchemy

Experiments included:
- metadata_extractor.py: Version-agnostic extraction from manifest/catalog (v9-v12+)
- sql_adapter.py: Unified SQL adapter using sqlglot + SQLAlchemy
- warehouse_storage.py: Store/retrieve metadata from warehouse tables
- run_experiments.py: End-to-end test runner
- EXPERIMENT_PLAN.md: Detailed plan with phases and success criteria

Key findings:
- Essential fields (nodes, sources, parent_map, checksum) are identical across v9-v12
- Can extract metadata without any dbt-core dependency
- Catalog schema is stable at v1 across all dbt versions
- Only ~13% of manifest is actually used by Recce (macros are 86%)

Signed-off-by: Claude <claude@anthropic.com>
Signed-off-by: Claude <noreply@anthropic.com>
Add a unified SQL adapter for Recce that:
- Reuses dbt profiles.yml for connection configuration
- Supports DuckDB and Snowflake (extensible architecture)
- Uses sqlglot for dialect-aware quoting when available
- Falls back to manual quoting when sqlglot not installed
- Includes TemplateRenderer for ref() and source() resolution

Key features:
- RecceAdapter.from_dbt_profile() to connect using existing dbt config
- RecceAdapter.connect_duckdb() and connect_snowflake() for direct connections
- Proper identifier quoting for each dialect
- Column introspection via get_columns()
- Template rendering for SQL with {{ ref('model') }} syntax

Signed-off-by: Claude <claude@anthropic.com>
Signed-off-by: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings December 25, 2025 04:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Summarize findings from research ticket:
"Research: Analyze Recce's dbt artifact consumption and dbt package feasibility"

Key conclusions:
- Recce uses only 13% of manifest.json (stable across v9-v12)
- dbt package approach is feasible via on-run-end hooks
- Warehouse storage schema designed for nodes, lineage, checksums
- Full backward compatibility maintained

Recommended approach:
1. Build dbt_recce package to store metadata in warehouse
2. Add --from-warehouse flag to Recce CLI
3. Reconstruct manifest format from warehouse tables

All PoC code available in experiments/ directory.

Signed-off-by: Claude <claude@anthropic.com>
Signed-off-by: Claude <noreply@anthropic.com>
@even-wei
Copy link
Contributor Author

even-wei commented Feb 3, 2026

It's a no go due to macro issue.

@even-wei even-wei closed this Feb 3, 2026
@gcko gcko deleted the claude/explain-manifest-catalog-ELizl branch February 9, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants