Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# How to Understand Token Usage and Planned Token Breakdown by Tool Call

## Overview

This guide explains the current state of token usage reporting (for evaluation and production) and outlines the planned enhancements to break down tokens by logical Large Language Model (LLM) call types (for example, triage, device lookup, general information, guardrails). It also clarifies current limitations and prioritization considerations.

## Prerequisites

- Access to the existing token usage metrics or dashboards for your environment.
- Familiarity with:
- LLM model names currently in use (for example, `gpt-4o`, `gpt-4o-mini`).
- The concept of tool calls (for example, triage, device lookup, general information, guardrails).
- Permissions to view conversation-level or model-level usage data (as configured in your organization).

> Note: No specific commands, URLs, or file paths were provided in the original discussion. Consult your internal documentation or platform owner for the exact location of your token usage dashboards or logs.

## Current Behavior and How to Interpret It

### 1. Token Breakdown by Model

At present, token usage is broken down by model name per conversation. This means you can see, for each conversation:

- The total number of tokens consumed by each model (for example, `gpt-4o`, `gpt-4o-mini`).
- A rough separation between:
- **Guardrails/system prompts** (often associated with one model), and
- **Other application logic or tool calls** (often associated with another model).

Because different models are used for different roles, you can infer some high-level distribution of token usage:

- **`gpt-4o`**: Typically used for primary reasoning and tool calls.
- **`gpt-4o-mini`**: Often used for lighter-weight tasks such as guardrails or system prompts (though usage may vary over time; for example, it may not have been used on stage in the past month).

### 2. No Direct Token Breakdown by Tool Call (Yet)

The system does **not yet** provide a direct token breakdown by specific tool call types such as:

- Triage
- Device lookup
- General information
- Guardrails

Instead, you currently rely on:

- **Model-level aggregation** (tokens per `model_name` per conversation).
- Internal knowledge of which model is used for which function to approximate where tokens are being spent.

### 3. Direction and Planned Enhancements

The long-term goal is to provide a **token breakdown by tool call type**, enabling you to:

- See token usage per logical step (for example, triage vs device lookup).
- Better understand where to focus optimization efforts for token reduction.
- More accurately attribute costs to specific parts of your workflow.

The team is taking **incremental steps** toward this goal, starting with:

- Total tokens per `model_name` per conversation (already available).
- Future work to:
- Attribute tokens to specific tool calls.
- Provide more granular reporting for evaluation and production use cases.

## Important Notes and Caveats

- **Approximation only**: Current model-based breakdown is an approximation of functional usage (for example, guardrails vs system prompts vs tools). It is not a precise per-tool-call breakdown.
- **Model usage can change over time**: For example, `gpt-4o-mini` may not always be used in the same way (or at all) in certain environments or time periods. Do not assume a fixed mapping between model and function without confirming current configuration.
- **Prioritization trade-offs**: Implementing token breakdown by tool call is a desired feature, but it must be prioritized against other work, such as:
- Fixing bugs (for example, in the “intel-bot” or similar systems).
- Other reliability or performance improvements.

## Troubleshooting and Open Questions

Because the original discussion did not include implementation details, some information is missing for a complete “how-to”:

1. **Where to view token metrics**
- Missing information:
- Exact dashboard URL or analytics tool.
- Any required query, filter, or report name.
- Action:
- Contact your platform owner or analytics team to locate the current token usage dashboard.

2. **How models map to specific tool calls**
- Missing information:
- Configuration or code that defines which model is used for triage, device lookup, general information, and guardrails.
- Action:
- Review your application’s orchestration layer or configuration.
- Document which model is used for each tool call type to better interpret current metrics.

3. **Timeline for per-tool-call token breakdown**
- Missing information:
- Roadmap, milestones, or expected delivery date for this feature.
- Action:
- Align with the owning team on:
- Priority relative to other work (for example, bug fixing on intel-bot).
- Expected scope of the first version (for example, per-tool-call totals vs detailed per-step logs).

4. **If token breakdown by tool call appears incorrect or missing**
- Confirm whether the feature is actually available in your environment (it may still be in planning or early development).
- Verify that:
- Tool calls are being logged with sufficient metadata (tool name, model used, timestamps).
- Any analytics pipeline is correctly aggregating tokens by tool call.
- Escalate with:
- Example conversation IDs.
- Expected vs observed token counts.
- The models and tools involved.

---

For a fully actionable guide, additional details are needed on: the exact metrics interface, how models are configured per tool call, and the implementation plan for per-tool-call token breakdown. Once those are available, this document can be extended with concrete steps and screenshots.

---
*Source: [Original Slack thread](https://distylai.slack.com/archives/impl-tower-infobot/p1740094728583309)*