diff --git a/docs/tools/how-triage-preemptive-tool-calls-resolve-arguments-for-devic.md b/docs/tools/how-triage-preemptive-tool-calls-resolve-arguments-for-devic.md new file mode 100644 index 0000000..0e43ab6 --- /dev/null +++ b/docs/tools/how-triage-preemptive-tool-calls-resolve-arguments-for-devic.md @@ -0,0 +1,159 @@ +# How Triage Preemptive Tool Calls Resolve Arguments for `device_lookup` + +## Overview + +This guide explains how the triage preemptive tool call determines the correct arguments for the `device_lookup` function, how it uses function signatures and docstrings, and what known issues and caveats exist (particularly around category values such as `smartphones` vs `phones`). + +## Prerequisites + +Before using or modifying the triage preemptive tool call behavior, you should: + +- Be familiar with: + - Python function signatures and docstrings (`__doc__`). + - How tools are registered and exposed to the triage/infobot system (for example, via `TOOLS_BY_ROUTINE` or similar registries). + - The initialization flow of: + - `KnowledgeBase.init` + - The FastAPI application startup + - Swarm (or equivalent orchestration layer) +- Have access to and be able to read the relevant codebase, including: + - `libraries/nectar/nectar/util.py` (specifically around line 224). + - The `infobot_tools` module. + - `engine.py` where `KnowledgeBase.init` is invoked. + +## How Argument Resolution Works + +### 1. Argument Extraction from Function Signature + +The triage preemptive tool call primarily determines the arguments for `device_lookup` from the function’s Python signature, not from the docstring. + +- The tool registration logic inspects the function object for: + - Parameter names + - Parameter types (if annotated) + - Default values +- These are then converted into the tool schema (for example, via a helper such as `function_to_json` or equivalent logic in `nectar/util.py`). + +Implication: +Even if the docstring is not yet set or is empty, the argument list itself is still derived correctly from the function signature. + +### 2. Use of Docstrings for Descriptions and Semantic Hints + +The docstring of `device_lookup` is used to provide: + +- Descriptions of the function and its parameters. +- Semantic hints such as: + - Valid categories (for example, `phones` vs `smartphones`). + - Brand or category filtering behavior. + +The current behavior is: + +- **Signature** → Source of argument names and structure. +- **Docstring** → Source of argument descriptions and domain-specific hints (such as which category labels to use). + +This explains why the system can still call `device_lookup` with the correct argument names, but may choose the wrong category value (for example, `smartphones` instead of `phones`) if the docstring is not available or not correctly initialized. + +### 3. Initialization Order and Docstring Assignment + +There is a critical dependency on when the docstring for `device_lookup` is set relative to when the knowledge base and tools are initialized: + +- In one implementation: + - `__doc__` for `device_lookup` is set at the **top level** of the module. + - When the module is imported, the docstring is immediately available. + - When the function is added to `TOOLS_BY_ROUTINE` (or similar), the docstring is already populated. + +- In the `infobot_tools` implementation: + - The docstring (`__doc__`) is set **inside an `init` function**, not at the top level. + - `KnowledgeBase.init` is called from `engine.py` when FastAPI starts, before Swarm is initialized. + - If `KnowledgeBase.init` (or any tool registration logic) runs **before** `init` in `infobot_tools` is called, then: + - The `device_lookup` function is registered with an **empty or default docstring**. + - Argument descriptions and category hints from the docstring are not available at registration time. + +This ordering issue is the likely cause of incorrect category suggestions (for example, `smartphones` instead of `phones`), because the tool schema is built before the docstring-based metadata is applied. + +## Important Notes and Caveats + +1. **Docstring Timing Matters** + The docstring must be set **before** the function is registered as a tool. If `__doc__` is assigned inside an initialization function that runs later, the tool registry will not see the updated docstring. + +2. **Function Imports Do Not Re-Apply Docstrings** + Importing the function from `infobot_tools` executes top-level code only. If the docstring is set inside `init` and `init` is not called before tool registration, the docstring will remain empty at registration time. + +3. **Differences Between Implementations** + There are known differences between: + - The “core” or reference implementation (for example, in `nectar/util.py`). + - The `infobot_tools` implementation. + + These differences include: + - Where and when `__doc__` is set. + - How and when the knowledge base is initialized. + - Potentially, whether category/brand filtering is implemented in the same way. + +4. **Recent Hotfixes May Not Be Final** + A recent change was added to address this issue (ensuring `KnowledgeBase.init` runs at FastAPI startup and/or adjusting when docstrings are set). The author expressed low confidence that this is a robust or long-term fix. A better abstraction for tools is desired. + +5. **Enum-Based Categories as a Future Improvement** + An “enum trick” was mentioned as a potential solution. Using enumerations for categories would: + - Make valid category values explicit. + - Reduce reliance on docstring parsing for category semantics. + - Improve robustness of category selection (for example, enforcing `phones` instead of `smartphones`). + +## Troubleshooting + +### Symptom: Category is `smartphones` Instead of `phones` + +**Observed issue** +The triage tool call is producing a category of `smartphones` instead of the expected `phones`. The correct category information is known to be present in the `device_lookup` docstring. + +**Likely cause** +The docstring for `device_lookup` is not set at the time the tool is registered, due to initialization order: + +- `KnowledgeBase.init` (or equivalent tool registration) runs. +- `device_lookup` is added to `TOOLS_BY_ROUTINE` while its `__doc__` is empty. +- Later, `init` in `infobot_tools` sets the docstring, but the tool schema has already been built. + +**Steps to diagnose** + +1. **Check where `__doc__` is set for `device_lookup`** + - Confirm whether `__doc__` is assigned at the top level of the module or inside an `init` function. + - If it is inside `init`, note that it may be too late for tool registration. + +2. **Verify initialization order** + - Confirm when `KnowledgeBase.init` is called in `engine.py` relative to: + - FastAPI application startup. + - Any `init` function in `infobot_tools` that sets docstrings. + - Ensure that the function that sets `__doc__` is executed **before** `KnowledgeBase.init`. + +3. **Inspect the tool registry at runtime** + - After FastAPI starts and before any requests, inspect `TOOLS_BY_ROUTINE` (or equivalent) to see: + - Whether `device_lookup` is present. + - What description and argument metadata it has. + - Confirm whether the category hints from the docstring are present in the tool metadata. + +4. **Compare with the reference implementation** + - Review the logic around line 224 in `libraries/nectar/nectar/util.py` to see: + - How function signatures and docstrings are converted into tool schemas. + - Whether your `infobot_tools` implementation diverges in a way that affects docstring usage. + +**Potential fixes** + +- **Move docstring assignment to top level** + Set `device_lookup.__doc__` at module import time, not inside `init`, so it is always available when the function is imported and registered. + +- **Ensure `init` runs before tool registration** + If you must keep docstring assignment inside `init`, call `init` before `KnowledgeBase.init` or any tool registration logic. + +- **Introduce explicit enums for categories** + Replace or supplement docstring-based category hints with explicit enumerations in the function signature or tool schema. This reduces reliance on docstring timing and parsing. + +## Additional Information Needed + +To fully document and validate the behavior, the following would be helpful: + +- The exact implementation of: + - The function that converts Python functions into tool schemas (for example, `function_to_json`). + - The `KnowledgeBase.init` method and how it registers tools. + - The `infobot_tools` `init` function and where `device_lookup.__doc__` is set. +- A definitive list of valid categories and how they are intended to be enforced (for example, via enums, constants, or docstring conventions). +- Confirmation of the final, agreed-upon abstraction for tools (for example, a standardized way to specify argument descriptions and allowed values outside of docstrings). + +--- +*Source: [Original Slack thread](https://distylai.slack.com/archives/impl-tower-infobot/p1737151724894869)*