From 130e9e2136a794edfdf9ab4d2073e2c307e9f60d Mon Sep 17 00:00:00 2001 From: yenchiafeng Date: Thu, 11 Dec 2025 09:40:47 -0800 Subject: [PATCH] docs: Add How to Design Agent Architectures with Parallel Tool Calls and Sub-Agent Handoffs --- ...rchitectures-with-parallel-tool-calls-a.md | 148 ++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 docs/access/how-to-design-agent-architectures-with-parallel-tool-calls-a.md diff --git a/docs/access/how-to-design-agent-architectures-with-parallel-tool-calls-a.md b/docs/access/how-to-design-agent-architectures-with-parallel-tool-calls-a.md new file mode 100644 index 0000000..18e89ad --- /dev/null +++ b/docs/access/how-to-design-agent-architectures-with-parallel-tool-calls-a.md @@ -0,0 +1,148 @@ +# How to Design Agent Architectures with Parallel Tool Calls and Sub-Agent Handoffs + +## Overview + +This guide explains architectural considerations for using parallel tool calls, sub-agents, and agent handoffs in a multi-agent system. It focuses on how to share context between workflows, minimize latency, and decide which agent should generate the final user-facing response. + +## Prerequisites + +Before applying this guide, you should: + +- Understand basic multi-agent system concepts (master/parent agent vs. sub-agents). +- Be familiar with: + - Tool calls (external API or function calls made by agents). + - Context windows and their limitations in large language models. + - Agent handoff patterns (passing control and context from one agent to another). +- Have an existing or planned architecture where: + - A “master” or orchestrator agent coordinates work. + - Specialized sub-agents (for example, “billing” or “device” agents) handle domain-specific tasks. + +## Architectural Explanation + +### 1. Use Parallel Tool Calls to Reduce Latency + +- Design sub-agents to be small and lean so they can: + - Run concurrently. + - Perform their tool calls in parallel. +- When sub-agents are lightweight and parallelized: + - Overall latency impact can be minimized. + - The system can gather required contextual information faster. + +### 2. Share Context Between Workflows via Agent Handoff + +- When re-triaging or moving between workflows: + - Use agent handoff to pass relevant context from one agent to another. + - Ensure that the receiving agent has: + - The necessary tool outputs. + - Any user instructions or constraints needed to complete the task. +- Context sharing reduces the need for a single master agent to hold all instructions and data in its own context window. + +### 3. Let Sub-Agents Respond Directly When Possible + +When contextual information comes from tool calls: + +- Prefer having the specialized sub-agent (for example, billing or device agent) generate the user-facing answer immediately after obtaining the tool results. +- This avoids two major issues: + 1. **Context window pressure on the master agent** + - The master agent would otherwise need: + - All instructions on how to answer. + - All extracted information from tools. + - This can exceed or strain the model’s context window. + 2. **Additional latency from relaying** + - If the master agent only serves as a relay between sub-agent and user, it: + - Adds an extra model invocation. + - Increases end-to-end latency without adding real value. + +### 4. Decide When a Master Agent Should Aggregate Multiple Sub-Agent Results + +There are edge cases where: + +- Information from multiple sub-agents or routines is required to form a single answer. +- The final answer must follow a specific, unified format. + +In these cases: + +- The master agent may need to: + - Aggregate outputs from multiple sub-agents. + - Apply formatting or composition logic. +- However, this should be reserved for situations where: + - A single sub-agent cannot reasonably produce the complete, correctly formatted answer. + - There is a clear, consistent need for cross-domain aggregation. + +### 5. Avoid Over-Enforcing Templates in Complex Compositions + +- When many different combinations of routines and sub-agents can be involved: + - It is not practical to enforce rigid templates for every possible case. +- Instead: + - Use templates sparingly and only where they provide clear value. + - Allow agents some flexibility in how they compose responses, especially when multiple domains are involved. + +## Important Notes and Caveats + +- **Context window limitations**: + - Centralizing all instructions and tool outputs in a master agent can quickly exhaust the model’s context window. + - Offloading response generation to specialized sub-agents helps keep context usage localized and manageable. + +- **Latency trade-offs**: + - Each additional agent hop (master → sub-agent → master → user) adds latency. + - Design flows so that: + - Sub-agents respond directly to the user when they have all needed information. + - The master agent is used primarily for orchestration and cross-agent coordination, not as a passive relay. + +- **Complex formatting requirements**: + - For answers that must combine multiple data sources and follow strict formatting, a master agent may be necessary. + - Do not attempt to define exhaustive templates for every possible multi-agent combination; this does not scale. + +- **Missing implementation details**: + - The discussion does not specify: + - Exact APIs or frameworks for agent orchestration. + - How context is serialized and passed between agents. + - Concrete examples of message schemas or protocol definitions. + - These details must be defined based on your specific platform or orchestration layer. + +## Troubleshooting Tips + +### Issue: High Latency in Multi-Agent Flows + +**Symptoms:** +- User responses are slow when multiple sub-agents are involved. + +**Potential causes and mitigations:** +- Sub-agents running sequentially instead of in parallel + → Enable concurrent execution of sub-agents and their tool calls. +- Master agent acting as a relay without adding value + → Allow sub-agents to respond directly to the user when they have complete context. + +### Issue: Context Window Overflows or Truncation + +**Symptoms:** +- Model performance degrades or important instructions are lost. +- You see truncation of earlier messages or tool outputs. + +**Potential causes and mitigations:** +- Master agent accumulating too much context + → Move domain-specific instructions and tool outputs into sub-agents. + → Use agent handoff to pass only relevant, summarized context. +- Overly detailed global instructions + → Keep global instructions minimal; push domain-specific guidance into specialized agents. + +### Issue: Inconsistent or Poorly Formatted Responses + +**Symptoms:** +- Responses vary in structure when multiple sub-agents are involved. +- Formatting requirements are not consistently met. + +**Potential causes and mitigations:** +- Over-reliance on rigid templates in highly variable scenarios + → Relax strict templating where many combinations of routines exist. +- No clear owner for final formatting + → Designate: + - Either a master agent to perform final aggregation and formatting, or + - A specific sub-agent responsible for formatting when it has all necessary data. + +--- + +Additional information that would improve this guide includes: concrete examples of message flows, specific orchestration frameworks or libraries in use, and sample schemas for agent handoff and tool call results. + +--- +*Source: [Original Slack thread](https://distylai.slack.com/archives/impl-tower-think-tank/p1754523843336589)*