Skip to content

Conversation

@danielebriggi
Copy link
Member

No description provided.

Daniele Briggi added 3 commits October 17, 2025 13:38
refact(settings): extensions options are generated by a setting method
chore(settings):
- default chunk_size equals to the model context window
- increase FTS weight
@danielebriggi danielebriggi requested a review from Copilot October 20, 2025 07:33
@danielebriggi danielebriggi self-assigned this Oct 20, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements sentence-level highlighting in search results by splitting chunks into sentences, generating embeddings for them, and using semantic search to identify the most relevant sentences within matching chunks. This provides more precise result snippets and better context for users.

Key changes:

  • Added sentence splitting functionality with offset tracking for chunk text positioning
  • Extended database schema and processing pipeline to store and search sentence embeddings
  • Updated search results to include top-ranked sentences with their offsets for highlighting

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_settings.py Updated test assertions to use renamed other_model_options setting field
tests/test_sentence_splitter.py Added comprehensive tests for new sentence splitting functionality
tests/test_engine.py Updated Engine tests with sentence_splitter dependency and added sentence processing tests
tests/test_chunker.py Fixed variable naming from excinfo to exc_info for consistency
tests/integration/test_engine.py Moved search tests to integration suite and added sentence search test cases
tests/conftest.py Updated engine fixture to include SentenceSplitter dependency
src/sqlite_rag/sqliterag.py Integrated sentence splitting and search into main search workflow
src/sqlite_rag/settings.py Refactored settings with renamed fields, new methods for context/vector options, and sentence configuration
src/sqlite_rag/sentence_splitter.py New module implementing sentence splitting with offset tracking
src/sqlite_rag/repository.py Extended to persist sentence embeddings to database
src/sqlite_rag/models/sentence_result.py New model for sentence search results
src/sqlite_rag/models/sentence.py New model representing a sentence with embedding and offsets
src/sqlite_rag/models/document_result.py Added sentences field to include sentence results
src/sqlite_rag/models/document.py Fixed type hint from string literal to direct Chunk reference
src/sqlite_rag/models/chunk.py Added sentences field and improved comment clarity
src/sqlite_rag/formatters.py Implemented sentence-based preview generation and display formatting
src/sqlite_rag/engine.py Added sentence processing, search_sentences method, and quantization support
src/sqlite_rag/database.py Added sentences table and vector initialization
src/sqlite_rag/cli.py Updated search command defaults and help text

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 98.75195% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/sqlite_rag/formatters.py 88.88% 2 Missing and 3 partials ⚠️
tests/conftest.py 77.77% 2 Missing ⚠️
src/sqlite_rag/engine.py 97.29% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@danielebriggi danielebriggi requested a review from Copilot October 20, 2025 14:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@danielebriggi danielebriggi force-pushed the highlight-sentence-in-results branch 2 times, most recently from aee25fd to 8dbae68 Compare October 20, 2025 16:42
Avoid to fetch the entire chunk to extract the content
@danielebriggi danielebriggi force-pushed the highlight-sentence-in-results branch 3 times, most recently from 8601e8a to 50430d7 Compare October 21, 2025 12:10
@danielebriggi danielebriggi force-pushed the highlight-sentence-in-results branch from 50430d7 to c9ee5dd Compare October 21, 2025 12:46
@danielebriggi danielebriggi merged commit 2cd0927 into main Oct 21, 2025
5 checks passed
@danielebriggi danielebriggi deleted the highlight-sentence-in-results branch October 21, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants