Multi Agent RAG

A multi-agent retrieval-augmented generation (RAG) system with specialized agents for HR, Finance, and Tech support queries. Includes full observability with Langfuse for debugging and monitoring routing decisions.

Features

🤖 Specialized Agents: Separate RAG agents for HR, Finance, and Tech domains
🎯 Orchestrator: Intelligent routing to the appropriate specialist agent(s)
🔀 Hybrid Ambiguity Handling: Multi-agent queries for cross-domain ambiguous questions; clarification requests for extremely vague queries
🛡️ Hallucination Prevention: Enforced tool usage - orchestrator must query specialists, cannot answer from its own knowledge
📦 Vector Stores: FAISS-based semantic search for each domain
📊 Observability: Full tracing with Langfuse to debug misrouted questions and track agent performance
⭐ Auto-Evaluation: Automatic quality scoring (1-10) for every response using LLM-as-a-judge, tracked in Langfuse. Evaluations run asynchronously in background threads so responses return immediately

Installation

Prerequisites

Python 3.12 or higher
uv package manager
OpenAI API key
Langfuse account

Setup

Clone the repository

git clone <your-repo-url>
cd multi-agent-rag

Install dependencies with uv
```
uv sync
```
Set up environment variables
```
cp .env.example .env
```
Edit .env and add your API keys:

Usage

Interactive Mode

Run the CLI in interactive mode to ask questions:

uv run python src/multi_agent_system.py

Example session:

You: What is the vacation policy?
Assistant: According to our HR policy, employees receive...

You: How do I submit an expense report?
Assistant: To submit an expense report, you need to...

You: exit
👋 Goodbye!

Verbose Mode

Enable detailed logging to see agent routing decisions:

uv run python src/multi_agent_system.py --verbose

Observability with Langfuse

Viewing Traces

Navigate to https://cloud.langfuse.com
Select your project
Click on "Traces" to see all queries
Click on any trace to see:
- Orchestrator routing decision
- Which specialist agent(s) were called
- Document retrieval results
- LLM calls with prompts and responses
- Tool invocations
- Quality scores with reasoning

Analyzing Quality Scores

Every response receives an automatic quality score (1-10) based on:

Accuracy: Correctness of information
Relevance: How well it addresses the question
Completeness: Whether all aspects are answered
Clarity: Readability and structure

Special handling: When the system correctly refuses out-of-scope questions, it receives a high score (8-10) because this is the intended behavior.

To view scores:

Open any trace in Langfuse
Click the "Scores" tab
See response_quality score and reasoning

Debugging Routing Decisions

For single-domain queries:

Find the trace in Langfuse
Examine the orchestrator's routing decision
Check which tool was called
Review the agent selection reasoning

For ambiguous queries:

Look for traces where multiple specialist agents were called
Verify the orchestrator correctly identified the ambiguity
Check if all relevant specialists were consulted
Review how responses were synthesized

For clarification requests:

Identify traces where request_clarification was used
Verify the query was genuinely too vague to route
Check if the clarification question was helpful
Consider if multiple specialists would have been better

Limitations

No Conversation History: The system processes each query independently without maintaining conversation context. Users cannot ask follow-up questions like "What about for managers?" or "Tell me more", reference previous answers, or build on earlier context within a session.
No Real-Time Document Updates: The knowledge base is frozen at startup. If HR updates the vacation policy document or any other source document, the system won't reflect those changes until you manually rebuild the vector store by deleting the existing FAISS index and restarting.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
test_queries.json		test_queries.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi Agent RAG

Features

Installation

Prerequisites

Setup

Usage

Interactive Mode

Verbose Mode

Observability with Langfuse

Viewing Traces

Analyzing Quality Scores

Debugging Routing Decisions

Limitations

About

Uh oh!

Releases

Packages

Languages

thecodingpoet/multi-agent-rag

Folders and files

Latest commit

History

Repository files navigation

Multi Agent RAG

Features

Installation

Prerequisites

Setup

Usage

Interactive Mode

Verbose Mode

Observability with Langfuse

Viewing Traces

Analyzing Quality Scores

Debugging Routing Decisions

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages