From 130e9e2136a794edfdf9ab4d2073e2c307e9f60d Mon Sep 17 00:00:00 2001
From: yenchiafeng <yenchia@distyl.ai>
Date: Thu, 11 Dec 2025 09:40:47 -0800
Subject: [PATCH] docs: Add How to Design Agent Architectures with Parallel
 Tool Calls and Sub-Agent Handoffs

---
 ...rchitectures-with-parallel-tool-calls-a.md | 148 ++++++++++++++++++
 1 file changed, 148 insertions(+)
 create mode 100644 docs/access/how-to-design-agent-architectures-with-parallel-tool-calls-a.md

diff --git a/docs/access/how-to-design-agent-architectures-with-parallel-tool-calls-a.md b/docs/access/how-to-design-agent-architectures-with-parallel-tool-calls-a.md
new file mode 100644
index 0000000..18e89ad
--- /dev/null
+++ b/docs/access/how-to-design-agent-architectures-with-parallel-tool-calls-a.md
@@ -0,0 +1,148 @@
+# How to Design Agent Architectures with Parallel Tool Calls and Sub-Agent Handoffs
+
+## Overview
+
+This guide explains architectural considerations for using parallel tool calls, sub-agents, and agent handoffs in a multi-agent system. It focuses on how to share context between workflows, minimize latency, and decide which agent should generate the final user-facing response.
+
+## Prerequisites
+
+Before applying this guide, you should:
+
+- Understand basic multi-agent system concepts (master/parent agent vs. sub-agents).
+- Be familiar with:
+  - Tool calls (external API or function calls made by agents).
+  - Context windows and their limitations in large language models.
+  - Agent handoff patterns (passing control and context from one agent to another).
+- Have an existing or planned architecture where:
+  - A “master” or orchestrator agent coordinates work.
+  - Specialized sub-agents (for example, “billing” or “device” agents) handle domain-specific tasks.
+
+## Architectural Explanation
+
+### 1. Use Parallel Tool Calls to Reduce Latency
+
+- Design sub-agents to be small and lean so they can:
+  - Run concurrently.
+  - Perform their tool calls in parallel.
+- When sub-agents are lightweight and parallelized:
+  - Overall latency impact can be minimized.
+  - The system can gather required contextual information faster.
+
+### 2. Share Context Between Workflows via Agent Handoff
+
+- When re-triaging or moving between workflows:
+  - Use agent handoff to pass relevant context from one agent to another.
+  - Ensure that the receiving agent has:
+    - The necessary tool outputs.
+    - Any user instructions or constraints needed to complete the task.
+- Context sharing reduces the need for a single master agent to hold all instructions and data in its own context window.
+
+### 3. Let Sub-Agents Respond Directly When Possible
+
+When contextual information comes from tool calls:
+
+- Prefer having the specialized sub-agent (for example, billing or device agent) generate the user-facing answer immediately after obtaining the tool results.
+- This avoids two major issues:
+  1. **Context window pressure on the master agent**  
+     - The master agent would otherwise need:
+       - All instructions on how to answer.
+       - All extracted information from tools.
+     - This can exceed or strain the model’s context window.
+  2. **Additional latency from relaying**  
+     - If the master agent only serves as a relay between sub-agent and user, it:
+       - Adds an extra model invocation.
+       - Increases end-to-end latency without adding real value.
+
+### 4. Decide When a Master Agent Should Aggregate Multiple Sub-Agent Results
+
+There are edge cases where:
+
+- Information from multiple sub-agents or routines is required to form a single answer.
+- The final answer must follow a specific, unified format.
+
+In these cases:
+
+- The master agent may need to:
+  - Aggregate outputs from multiple sub-agents.
+  - Apply formatting or composition logic.
+- However, this should be reserved for situations where:
+  - A single sub-agent cannot reasonably produce the complete, correctly formatted answer.
+  - There is a clear, consistent need for cross-domain aggregation.
+
+### 5. Avoid Over-Enforcing Templates in Complex Compositions
+
+- When many different combinations of routines and sub-agents can be involved:
+  - It is not practical to enforce rigid templates for every possible case.
+- Instead:
+  - Use templates sparingly and only where they provide clear value.
+  - Allow agents some flexibility in how they compose responses, especially when multiple domains are involved.
+
+## Important Notes and Caveats
+
+- **Context window limitations**:  
+  - Centralizing all instructions and tool outputs in a master agent can quickly exhaust the model’s context window.
+  - Offloading response generation to specialized sub-agents helps keep context usage localized and manageable.
+
+- **Latency trade-offs**:  
+  - Each additional agent hop (master → sub-agent → master → user) adds latency.
+  - Design flows so that:
+    - Sub-agents respond directly to the user when they have all needed information.
+    - The master agent is used primarily for orchestration and cross-agent coordination, not as a passive relay.
+
+- **Complex formatting requirements**:  
+  - For answers that must combine multiple data sources and follow strict formatting, a master agent may be necessary.
+  - Do not attempt to define exhaustive templates for every possible multi-agent combination; this does not scale.
+
+- **Missing implementation details**:  
+  - The discussion does not specify:
+    - Exact APIs or frameworks for agent orchestration.
+    - How context is serialized and passed between agents.
+    - Concrete examples of message schemas or protocol definitions.
+  - These details must be defined based on your specific platform or orchestration layer.
+
+## Troubleshooting Tips
+
+### Issue: High Latency in Multi-Agent Flows
+
+**Symptoms:**
+- User responses are slow when multiple sub-agents are involved.
+
+**Potential causes and mitigations:**
+- Sub-agents running sequentially instead of in parallel  
+  → Enable concurrent execution of sub-agents and their tool calls.
+- Master agent acting as a relay without adding value  
+  → Allow sub-agents to respond directly to the user when they have complete context.
+
+### Issue: Context Window Overflows or Truncation
+
+**Symptoms:**
+- Model performance degrades or important instructions are lost.
+- You see truncation of earlier messages or tool outputs.
+
+**Potential causes and mitigations:**
+- Master agent accumulating too much context  
+  → Move domain-specific instructions and tool outputs into sub-agents.
+  → Use agent handoff to pass only relevant, summarized context.
+- Overly detailed global instructions  
+  → Keep global instructions minimal; push domain-specific guidance into specialized agents.
+
+### Issue: Inconsistent or Poorly Formatted Responses
+
+**Symptoms:**
+- Responses vary in structure when multiple sub-agents are involved.
+- Formatting requirements are not consistently met.
+
+**Potential causes and mitigations:**
+- Over-reliance on rigid templates in highly variable scenarios  
+  → Relax strict templating where many combinations of routines exist.
+- No clear owner for final formatting  
+  → Designate:
+    - Either a master agent to perform final aggregation and formatting, or
+    - A specific sub-agent responsible for formatting when it has all necessary data.
+
+---
+
+Additional information that would improve this guide includes: concrete examples of message flows, specific orchestration frameworks or libraries in use, and sample schemas for agent handoff and tool call results.
+
+---
+*Source: [Original Slack thread](https://distylai.slack.com/archives/impl-tower-think-tank/p1754523843336589)*