Skip to content

Conversation

@asimurka
Copy link
Contributor

@asimurka asimurka commented Jan 19, 2026

Description

This PR adds support for RAG chunk storing in v2/streaming query. RAG chunks are now parsed on the fly when a File Search Tool chunk is emitted.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Refactor

    • Improved RAG chunk extraction during query processing for more efficient data handling.
  • Chores

    • Code cleanup and formatting refinements across multiple files for better maintainability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

Walkthrough

The PR refactors RAG chunk extraction in query processing by introducing a new helper function to extract RAG chunks from file search results and threading this data through tool call processing in both standard and streaming response endpoints. Additionally, it performs minor code cleanup including whitespace normalization and removal of unused test constants.

Changes

Cohort / File(s) Summary
RAG Chunk Extraction Refactoring
src/app/endpoints/query_v2.py, src/app/endpoints/streaming_query_v2.py
Introduced extract_rag_chunks_from_file_search_item() helper to convert file_search_call results into RAGChunk entries. Updated _build_tool_call_summary() to accept and process rag_chunks parameter during tool call processing instead of in a separate post-processing step. Updated streaming_query_v2.py to thread rag_chunks through the Responses API integration and pass serialized chunks to cleanup_after_streaming().
Utility and Endpoint Updates
src/utils/endpoints.py
Updated cleanup_after_streaming() docstring to clarify rag_chunks parameter behavior.
Logger Removal
src/utils/mcp_headers.py
Removed module-level logger initialization, but code still references logger.error() in the module—potential bug requiring verification.
Code Cleanup and Formatting
dev-tools/mcp-mock-server/server.py, src/client.py, src/quota/quota_limiter.py, tests/e2e/features/steps/llm_query_response.py, tests/unit/app/endpoints/test_a2a.py, tests/unit/app/endpoints/test_streaming_query.py, tests/unit/cache/test_postgres_cache.py, tests/unit/models/rlsapi/test_requests.py, tests/unit/models/rlsapi/test_responses.py, tests/unit/test_configuration.py
Removed blank lines, unused test constants (CONVERSATION_ID_2), and reformatted YAML string literals in tests—no functional behavior changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik
  • are-ces
  • asamal4
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: RAG chunk parsing improvement for streaming query, which aligns with the PR's primary objective of adding RAG chunk support to the streaming query flow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/utils/endpoints.py (1)

797-799: Bug: summary.rag_chunks is not synchronized with the rag_chunks parameter.

create_referenced_documents_with_metadata uses summary.rag_chunks, but in streaming_query_v2.py, summary.rag_chunks is initialized as an empty list and never updated. The actual RAG chunks are collected in a separate rag_chunks variable and passed via the rag_chunks parameter.

This means referenced_documents will always be empty for streaming responses, even when RAG chunks were retrieved.

🐛 Proposed fix: Use the rag_chunks parameter when available
+    # Use provided rag_chunks if available, otherwise fall back to summary.rag_chunks
+    chunks_for_documents = (
+        [RAGChunk(**chunk) for chunk in rag_chunks]
+        if rag_chunks
+        else summary.rag_chunks
+    )
     referenced_documents = create_referenced_documents_with_metadata(
-        summary, metadata_map
+        TurnSummary(
+            llm_response=summary.llm_response,
+            tool_calls=summary.tool_calls,
+            tool_results=summary.tool_results,
+            rag_chunks=chunks_for_documents,
+        ),
+        metadata_map,
     )

Or alternatively, add the import and create a simpler helper that takes rag_chunks directly.

src/app/endpoints/streaming_query_v2.py (1)

311-330: Missing synchronization: summary.rag_chunks never updated with collected chunks.

The local rag_chunks list is populated during streaming, but summary.rag_chunks remains empty. Since cleanup_after_streaming uses summary.rag_chunks for building referenced_documents (via create_referenced_documents_with_metadata), the referenced documents will be incorrect.

🐛 Proposed fix: Sync rag_chunks to summary before cleanup
+        # Sync collected RAG chunks to summary for cleanup processing
+        summary.rag_chunks = rag_chunks
+
         # Perform cleanup tasks (database and cache operations))
         await cleanup_after_streaming(
             user_id=context.user_id,
             conversation_id=conv_id,
             model_id=context.model_id,
             provider_id=context.provider_id,
             llama_stack_model_id=context.llama_stack_model_id,
             query_request=context.query_request,
             summary=summary,
             metadata_map=context.metadata_map,
             started_at=context.started_at,
             client=context.client,
             config=configuration,
             skip_userid_check=context.skip_userid_check,
             get_topic_summary_func=get_topic_summary,
             is_transcripts_enabled_func=is_transcripts_enabled,
             store_transcript_func=store_transcript,
             persist_user_conversation_details_func=persist_user_conversation_details,
             rag_chunks=[rag_chunk.model_dump() for rag_chunk in rag_chunks],
         )
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)

484-499: Consider returning the new chunks instead of mutating the input list.

The function mutates rag_chunks in place. While the docstring documents this behavior, the coding guidelines recommend avoiding in-place parameter modification. Consider returning the extracted chunks and letting the caller extend the list.

♻️ Optional refactor to return new data structure
 def extract_rag_chunks_from_file_search_item(
     item: OpenAIResponseOutputMessageFileSearchToolCall,
-    rag_chunks: list[RAGChunk],
-) -> None:
-    """Extract RAG chunks from a file search tool call item and append to rag_chunks.
+) -> list[RAGChunk]:
+    """Extract RAG chunks from a file search tool call item.
 
     Args:
         item: The file search tool call item.
-        rag_chunks: List to append extracted RAG chunks to.
+
+    Returns:
+        List of extracted RAG chunks.
     """
+    rag_chunks: list[RAGChunk] = []
     if item.results is not None:
         for result in item.results:
             rag_chunk = RAGChunk(
                 content=result.text, source="file_search", score=result.score
             )
             rag_chunks.append(rag_chunk)
+    return rag_chunks

Then at the call sites:

-        extract_rag_chunks_from_file_search_item(item, rag_chunks)
+        rag_chunks.extend(extract_rag_chunks_from_file_search_item(item))

@are-ces
Copy link
Contributor

are-ces commented Jan 19, 2026

LGTM

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 448cb60 into lightspeed-core:main Jan 19, 2026
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants