-
Notifications
You must be signed in to change notification settings - Fork 65
LCORE-1198: RAG chunk parsing improvement for streaming query #1018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LCORE-1198: RAG chunk parsing improvement for streaming query #1018
Conversation
WalkthroughThe PR refactors RAG chunk extraction in query processing by introducing a new helper function to extract RAG chunks from file search results and threading this data through tool call processing in both standard and streaming response endpoints. Additionally, it performs minor code cleanup including whitespace normalization and removal of unused test constants. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/utils/endpoints.py (1)
797-799: Bug:summary.rag_chunksis not synchronized with therag_chunksparameter.
create_referenced_documents_with_metadatausessummary.rag_chunks, but instreaming_query_v2.py,summary.rag_chunksis initialized as an empty list and never updated. The actual RAG chunks are collected in a separaterag_chunksvariable and passed via therag_chunksparameter.This means
referenced_documentswill always be empty for streaming responses, even when RAG chunks were retrieved.🐛 Proposed fix: Use the rag_chunks parameter when available
+ # Use provided rag_chunks if available, otherwise fall back to summary.rag_chunks + chunks_for_documents = ( + [RAGChunk(**chunk) for chunk in rag_chunks] + if rag_chunks + else summary.rag_chunks + ) referenced_documents = create_referenced_documents_with_metadata( - summary, metadata_map + TurnSummary( + llm_response=summary.llm_response, + tool_calls=summary.tool_calls, + tool_results=summary.tool_results, + rag_chunks=chunks_for_documents, + ), + metadata_map, )Or alternatively, add the import and create a simpler helper that takes rag_chunks directly.
src/app/endpoints/streaming_query_v2.py (1)
311-330: Missing synchronization:summary.rag_chunksnever updated with collected chunks.The local
rag_chunkslist is populated during streaming, butsummary.rag_chunksremains empty. Sincecleanup_after_streamingusessummary.rag_chunksfor buildingreferenced_documents(viacreate_referenced_documents_with_metadata), the referenced documents will be incorrect.🐛 Proposed fix: Sync rag_chunks to summary before cleanup
+ # Sync collected RAG chunks to summary for cleanup processing + summary.rag_chunks = rag_chunks + # Perform cleanup tasks (database and cache operations)) await cleanup_after_streaming( user_id=context.user_id, conversation_id=conv_id, model_id=context.model_id, provider_id=context.provider_id, llama_stack_model_id=context.llama_stack_model_id, query_request=context.query_request, summary=summary, metadata_map=context.metadata_map, started_at=context.started_at, client=context.client, config=configuration, skip_userid_check=context.skip_userid_check, get_topic_summary_func=get_topic_summary, is_transcripts_enabled_func=is_transcripts_enabled, store_transcript_func=store_transcript, persist_user_conversation_details_func=persist_user_conversation_details, rag_chunks=[rag_chunk.model_dump() for rag_chunk in rag_chunks], )
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)
484-499: Consider returning the new chunks instead of mutating the input list.The function mutates
rag_chunksin place. While the docstring documents this behavior, the coding guidelines recommend avoiding in-place parameter modification. Consider returning the extracted chunks and letting the caller extend the list.♻️ Optional refactor to return new data structure
def extract_rag_chunks_from_file_search_item( item: OpenAIResponseOutputMessageFileSearchToolCall, - rag_chunks: list[RAGChunk], -) -> None: - """Extract RAG chunks from a file search tool call item and append to rag_chunks. +) -> list[RAGChunk]: + """Extract RAG chunks from a file search tool call item. Args: item: The file search tool call item. - rag_chunks: List to append extracted RAG chunks to. + + Returns: + List of extracted RAG chunks. """ + rag_chunks: list[RAGChunk] = [] if item.results is not None: for result in item.results: rag_chunk = RAGChunk( content=result.text, source="file_search", score=result.score ) rag_chunks.append(rag_chunk) + return rag_chunksThen at the call sites:
- extract_rag_chunks_from_file_search_item(item, rag_chunks) + rag_chunks.extend(extract_rag_chunks_from_file_search_item(item))
|
LGTM |
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR adds support for RAG chunk storing in v2/streaming query. RAG chunks are now parsed on the fly when a File Search Tool chunk is emitted.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.