Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# How to Design Agent Architectures with Parallel Tool Calls and Sub-Agent Handoffs

## Overview

This guide explains architectural considerations for using parallel tool calls, sub-agents, and agent handoffs in a multi-agent system. It focuses on how to share context between workflows, minimize latency, and decide which agent should generate the final user-facing response.

## Prerequisites

Before applying this guide, you should:

- Understand basic multi-agent system concepts (master/parent agent vs. sub-agents).
- Be familiar with:
- Tool calls (external API or function calls made by agents).
- Context windows and their limitations in large language models.
- Agent handoff patterns (passing control and context from one agent to another).
- Have an existing or planned architecture where:
- A “master” or orchestrator agent coordinates work.
- Specialized sub-agents (for example, “billing” or “device” agents) handle domain-specific tasks.

## Architectural Explanation

### 1. Use Parallel Tool Calls to Reduce Latency

- Design sub-agents to be small and lean so they can:
- Run concurrently.
- Perform their tool calls in parallel.
- When sub-agents are lightweight and parallelized:
- Overall latency impact can be minimized.
- The system can gather required contextual information faster.

### 2. Share Context Between Workflows via Agent Handoff

- When re-triaging or moving between workflows:
- Use agent handoff to pass relevant context from one agent to another.
- Ensure that the receiving agent has:
- The necessary tool outputs.
- Any user instructions or constraints needed to complete the task.
- Context sharing reduces the need for a single master agent to hold all instructions and data in its own context window.

### 3. Let Sub-Agents Respond Directly When Possible

When contextual information comes from tool calls:

- Prefer having the specialized sub-agent (for example, billing or device agent) generate the user-facing answer immediately after obtaining the tool results.
- This avoids two major issues:
1. **Context window pressure on the master agent**
- The master agent would otherwise need:
- All instructions on how to answer.
- All extracted information from tools.
- This can exceed or strain the model’s context window.
2. **Additional latency from relaying**
- If the master agent only serves as a relay between sub-agent and user, it:
- Adds an extra model invocation.
- Increases end-to-end latency without adding real value.

### 4. Decide When a Master Agent Should Aggregate Multiple Sub-Agent Results

There are edge cases where:

- Information from multiple sub-agents or routines is required to form a single answer.
- The final answer must follow a specific, unified format.

In these cases:

- The master agent may need to:
- Aggregate outputs from multiple sub-agents.
- Apply formatting or composition logic.
- However, this should be reserved for situations where:
- A single sub-agent cannot reasonably produce the complete, correctly formatted answer.
- There is a clear, consistent need for cross-domain aggregation.

### 5. Avoid Over-Enforcing Templates in Complex Compositions

- When many different combinations of routines and sub-agents can be involved:
- It is not practical to enforce rigid templates for every possible case.
- Instead:
- Use templates sparingly and only where they provide clear value.
- Allow agents some flexibility in how they compose responses, especially when multiple domains are involved.

## Important Notes and Caveats

- **Context window limitations**:
- Centralizing all instructions and tool outputs in a master agent can quickly exhaust the model’s context window.
- Offloading response generation to specialized sub-agents helps keep context usage localized and manageable.

- **Latency trade-offs**:
- Each additional agent hop (master → sub-agent → master → user) adds latency.
- Design flows so that:
- Sub-agents respond directly to the user when they have all needed information.
- The master agent is used primarily for orchestration and cross-agent coordination, not as a passive relay.

- **Complex formatting requirements**:
- For answers that must combine multiple data sources and follow strict formatting, a master agent may be necessary.
- Do not attempt to define exhaustive templates for every possible multi-agent combination; this does not scale.

- **Missing implementation details**:
- The discussion does not specify:
- Exact APIs or frameworks for agent orchestration.
- How context is serialized and passed between agents.
- Concrete examples of message schemas or protocol definitions.
- These details must be defined based on your specific platform or orchestration layer.

## Troubleshooting Tips

### Issue: High Latency in Multi-Agent Flows

**Symptoms:**
- User responses are slow when multiple sub-agents are involved.

**Potential causes and mitigations:**
- Sub-agents running sequentially instead of in parallel
→ Enable concurrent execution of sub-agents and their tool calls.
- Master agent acting as a relay without adding value
→ Allow sub-agents to respond directly to the user when they have complete context.

### Issue: Context Window Overflows or Truncation

**Symptoms:**
- Model performance degrades or important instructions are lost.
- You see truncation of earlier messages or tool outputs.

**Potential causes and mitigations:**
- Master agent accumulating too much context
→ Move domain-specific instructions and tool outputs into sub-agents.
→ Use agent handoff to pass only relevant, summarized context.
- Overly detailed global instructions
→ Keep global instructions minimal; push domain-specific guidance into specialized agents.

### Issue: Inconsistent or Poorly Formatted Responses

**Symptoms:**
- Responses vary in structure when multiple sub-agents are involved.
- Formatting requirements are not consistently met.

**Potential causes and mitigations:**
- Over-reliance on rigid templates in highly variable scenarios
→ Relax strict templating where many combinations of routines exist.
- No clear owner for final formatting
→ Designate:
- Either a master agent to perform final aggregation and formatting, or
- A specific sub-agent responsible for formatting when it has all necessary data.

---

Additional information that would improve this guide includes: concrete examples of message flows, specific orchestration frameworks or libraries in use, and sample schemas for agent handoff and tool call results.

---
*Source: [Original Slack thread](https://distylai.slack.com/archives/impl-tower-think-tank/p1754523843336589)*