Skip to content

Conversation

@danielebriggi
Copy link
Member

Add extractor of md frontmatter for metadata

@danielebriggi danielebriggi self-assigned this Sep 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for extracting metadata from Markdown frontmatter in documents. The implementation introduces a file-specific extractor system that can parse YAML frontmatter from Markdown files and store the extracted metadata separately from the document content.

  • Implements a flexible metadata extraction system with base classes and file-specific extractors
  • Adds frontmatter extraction for Markdown files using the python-frontmatter library
  • Integrates metadata extraction into the document processing pipeline for both new additions and rebuilds

Reviewed Changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/sqlite_rag/extractors/base.py Defines abstract base class for metadata extractors
src/sqlite_rag/extractors/frontmatter.py Implements frontmatter extraction for Markdown files
src/sqlite_rag/extractor.py Main extractor coordinator that selects appropriate extractors by file type
src/sqlite_rag/sqliterag.py Integrates metadata extraction into document processing pipeline
src/sqlite_rag/reader.py Adds docstring to parse_file method
src/sqlite_rag/database.py Fixes import path for sqlite-vector binaries
pyproject.toml Adds python-frontmatter dependency and removes pytest-cov version pin
tests/ Comprehensive test coverage for all new extraction functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@codecov
Copy link

codecov bot commented Sep 30, 2025

Codecov Report

❌ Patch coverage is 97.43590% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/sqlite_rag/sqliterag.py 71.42% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Add extractor of md frontmatter for metadata
@danielebriggi danielebriggi merged commit 8f8bf27 into main Sep 30, 2025
6 checks passed
@danielebriggi danielebriggi deleted the extract-frontmatter branch September 30, 2025 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants