Skip to content

Commit c6bcdb5

Browse files
committed
refactor: use token-based instrumentation API for OpenTelemetry support
This change addresses feedback on the instrumentation interface design. The updated API now uses a token-based approach where on_request_start() returns a token that is passed to on_request_end() and on_error(). This enables instrumenters to maintain state (like OpenTelemetry spans) without external storage or side-channels. Changes: - Updated Instrumenter protocol to return token from on_request_start() - Modified on_request_end() and on_error() to accept token as first parameter - Updated server.py to capture and pass instrumentation tokens - Updated all tests to match new API - Added complete OpenTelemetry example implementation - Updated documentation with token-based examples Fixes #421
1 parent 8d79aa0 commit c6bcdb5

File tree

5 files changed

+547
-95
lines changed

5 files changed

+547
-95
lines changed

docs/instrumentation.md

Lines changed: 203 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,17 @@
22

33
The MCP Python SDK provides a pluggable instrumentation interface for monitoring request/response lifecycle. This enables integration with OpenTelemetry, custom metrics, logging frameworks, and other observability tools.
44

5+
**Related issue**: [#421 - Adding OpenTelemetry to MCP SDK](https://github.com/modelcontextprotocol/python-sdk/issues/421)
6+
57
## Overview
68

79
The `Instrumenter` protocol defines three hooks:
810

9-
- `on_request_start`: Called when a request starts processing
10-
- `on_request_end`: Called when a request completes (successfully or not)
11-
- `on_error`: Called when an error occurs during request processing
11+
- `on_request_start`: Called when a request starts processing, **returns a token**
12+
- `on_request_end`: Called when a request completes, **receives the token**
13+
- `on_error`: Called when an error occurs, **receives the token**
14+
15+
The token-based design allows instrumenters to maintain state (like OpenTelemetry spans) between `on_request_start` and `on_request_end` without needing external storage or side-channels.
1216

1317
All methods are optional (no-op implementations are valid). Exceptions raised by instrumentation hooks are logged but do not affect request processing.
1418

@@ -17,6 +21,7 @@ All methods are optional (no-op implementations are valid). Exceptions raised by
1721
### Server-Side Instrumentation
1822

1923
```python
24+
from typing import Any
2025
from mcp.server.lowlevel import Server
2126
from mcp.shared.instrumentation import Instrumenter
2227
from mcp.types import RequestId
@@ -30,27 +35,35 @@ class MyInstrumenter:
3035
request_type: str,
3136
method: str | None = None,
3237
**metadata,
33-
) -> None:
38+
) -> Any:
39+
"""Return a token (any value) to track this request."""
3440
print(f"Request {request_id} started: {request_type}")
41+
# Return a token - can be anything (dict, object, etc.)
42+
return {"request_id": request_id, "start_time": time.time()}
3543

3644
def on_request_end(
3745
self,
46+
token: Any, # Receives the token from on_request_start
3847
request_id: RequestId,
3948
request_type: str,
4049
success: bool,
4150
duration_seconds: float | None = None,
4251
**metadata,
4352
) -> None:
53+
"""Process the completed request using the token."""
4454
status = "succeeded" if success else "failed"
4555
print(f"Request {request_id} {status} in {duration_seconds:.3f}s")
56+
print(f"Token data: {token}")
4657

4758
def on_error(
4859
self,
60+
token: Any, # Receives the token from on_request_start
4961
request_id: RequestId | None,
5062
error: Exception,
5163
error_type: str,
5264
**metadata,
5365
) -> None:
66+
"""Handle errors using the token."""
5467
print(f"Error in request {request_id}: {error_type} - {error}")
5568

5669
# Create server with custom instrumenter
@@ -83,6 +96,43 @@ async with ClientSession(
8396
# Use session...
8497
```
8598

99+
### Why Tokens?
100+
101+
The token-based design solves a key problem: **how do you maintain state between `on_request_start` and `on_request_end`?**
102+
103+
Without tokens, instrumenters would need to use external storage (like a dictionary keyed by `request_id`) to track state:
104+
105+
```python
106+
# ❌ Old approach - requires external storage
107+
class OldInstrumenter:
108+
def __init__(self):
109+
self.spans = {} # Need to manage this dict
110+
111+
def on_request_start(self, request_id, ...):
112+
span = create_span(...)
113+
self.spans[request_id] = span # Store externally
114+
115+
def on_request_end(self, request_id, ...):
116+
span = self.spans.pop(request_id) # Retrieve from storage
117+
span.end()
118+
```
119+
120+
With tokens, state passes directly through the SDK:
121+
122+
```python
123+
# ✅ New approach - token is returned and passed back
124+
class NewInstrumenter:
125+
def on_request_start(self, request_id, ...):
126+
span = create_span(...)
127+
return span # Return directly
128+
129+
def on_request_end(self, token, request_id, ...):
130+
span = token # Receive directly
131+
span.end()
132+
```
133+
134+
This is especially important for OpenTelemetry, where spans need to be kept alive.
135+
86136
## Metadata
87137

88138
Instrumentation hooks receive metadata via `**metadata` keyword arguments:
@@ -105,42 +155,130 @@ The `request_id` parameter is consistent across all hooks for a given request, a
105155

106156
## OpenTelemetry Integration
107157

108-
A full OpenTelemetry instrumenter will be provided in a future release or as a separate package. Here's a basic example to get started:
158+
The token-based instrumentation interface is designed specifically to work well with OpenTelemetry. Here's a complete example:
109159

110160
```python
161+
from typing import Any
111162
from opentelemetry import trace
112163
from opentelemetry.trace import Status, StatusCode
113-
114-
tracer = trace.get_tracer(__name__)
164+
from mcp.types import RequestId
115165

116166
class OpenTelemetryInstrumenter:
117-
def __init__(self):
118-
self.spans = {}
167+
"""OpenTelemetry implementation of the MCP Instrumenter protocol."""
119168

120-
def on_request_start(self, request_id, request_type, **metadata):
121-
span = tracer.start_span(
122-
f"mcp.request.{request_type}",
123-
attributes={
124-
"mcp.request_id": str(request_id),
125-
"mcp.request_type": request_type,
126-
**metadata,
127-
}
128-
)
129-
self.spans[request_id] = span
169+
def __init__(self, tracer_provider=None):
170+
if tracer_provider is None:
171+
tracer_provider = trace.get_tracer_provider()
172+
self.tracer = tracer_provider.get_tracer("mcp.sdk", version="1.0.0")
173+
174+
def on_request_start(
175+
self,
176+
request_id: RequestId,
177+
request_type: str,
178+
method: str | None = None,
179+
**metadata: Any,
180+
) -> Any:
181+
"""Start a new span and return it as the token."""
182+
span_name = f"mcp.{request_type}"
183+
if method:
184+
span_name = f"{span_name}.{method}"
185+
186+
# Start the span
187+
span = self.tracer.start_span(span_name)
188+
189+
# Set attributes
190+
span.set_attribute("mcp.request_id", str(request_id))
191+
span.set_attribute("mcp.request_type", request_type)
192+
if method:
193+
span.set_attribute("mcp.method", method)
194+
195+
# Add metadata
196+
session_type = metadata.get("session_type")
197+
if session_type:
198+
span.set_attribute("mcp.session_type", session_type)
199+
200+
# Return span as token
201+
return span
130202

131-
def on_request_end(self, request_id, request_type, success, duration_seconds=None, **metadata):
132-
if span := self.spans.pop(request_id, None):
133-
if duration_seconds:
134-
span.set_attribute("mcp.duration_seconds", duration_seconds)
135-
span.set_status(Status(StatusCode.OK if success else StatusCode.ERROR))
136-
span.end()
203+
def on_request_end(
204+
self,
205+
token: Any, # This is the span from on_request_start
206+
request_id: RequestId,
207+
request_type: str,
208+
success: bool,
209+
duration_seconds: float | None = None,
210+
**metadata: Any,
211+
) -> None:
212+
"""End the span."""
213+
if token is None:
214+
return
215+
216+
span = token
217+
218+
# Set success attributes
219+
span.set_attribute("mcp.success", success)
220+
if duration_seconds is not None:
221+
span.set_attribute("mcp.duration_seconds", duration_seconds)
222+
223+
# Set status
224+
if success:
225+
span.set_status(Status(StatusCode.OK))
226+
else:
227+
span.set_status(Status(StatusCode.ERROR))
228+
error_msg = metadata.get("error")
229+
if error_msg:
230+
span.set_attribute("mcp.error", str(error_msg))
231+
232+
# End the span
233+
span.end()
137234

138-
def on_error(self, request_id, error, error_type, **metadata):
139-
if span := self.spans.get(request_id):
140-
span.record_exception(error)
141-
span.set_status(Status(StatusCode.ERROR, str(error)))
235+
def on_error(
236+
self,
237+
token: Any, # This is the span from on_request_start
238+
request_id: RequestId | None,
239+
error: Exception,
240+
error_type: str,
241+
**metadata: Any,
242+
) -> None:
243+
"""Record error in the span."""
244+
if token is None:
245+
return
246+
247+
span = token
248+
249+
# Record exception
250+
span.record_exception(error)
251+
span.set_attribute("mcp.error_type", error_type)
252+
span.set_attribute("mcp.error_message", str(error))
253+
254+
# Set error status
255+
span.set_status(Status(StatusCode.ERROR, str(error)))
256+
```
257+
258+
### Full Working Example
259+
260+
A complete working example with OpenTelemetry setup is available in `examples/opentelemetry_instrumentation.py`.
261+
262+
To use it:
263+
264+
```bash
265+
# Install OpenTelemetry
266+
pip install opentelemetry-api opentelemetry-sdk
267+
268+
# Run the example
269+
python examples/opentelemetry_instrumentation.py
142270
```
143271

272+
### Key Benefits
273+
274+
The token-based design provides several advantages for OpenTelemetry:
275+
276+
1. **No external storage**: No need to maintain a `spans` dictionary
277+
2. **Automatic cleanup**: Spans are garbage collected when done
278+
3. **Thread-safe**: Each request gets its own token
279+
4. **Context propagation**: Easy to integrate with OpenTelemetry context
280+
5. **Distributed tracing**: Can be extended to propagate trace context in `_meta`
281+
144282
## Default Behavior
145283

146284
If no instrumenter is provided, a no-op implementation is used automatically. This has minimal overhead and doesn't affect request processing.
@@ -166,7 +304,8 @@ instrumenter = get_default_instrumenter()
166304

167305
```python
168306
from collections import defaultdict
169-
from typing import Dict
307+
from typing import Any, Dict
308+
from mcp.types import RequestId
170309

171310
class MetricsInstrumenter:
172311
"""Track request counts and durations."""
@@ -176,14 +315,39 @@ class MetricsInstrumenter:
176315
self.request_durations: Dict[str, list[float]] = defaultdict(list)
177316
self.error_counts: Dict[str, int] = defaultdict(int)
178317

179-
def on_request_start(self, request_id, request_type, **metadata):
318+
def on_request_start(
319+
self,
320+
request_id: RequestId,
321+
request_type: str,
322+
method: str | None = None,
323+
**metadata: Any,
324+
) -> Any:
325+
"""Track request start, return request_type as token."""
180326
self.request_counts[request_type] += 1
327+
return request_type # Simple token - just the request type
181328

182-
def on_request_end(self, request_id, request_type, success, duration_seconds=None, **metadata):
329+
def on_request_end(
330+
self,
331+
token: Any,
332+
request_id: RequestId,
333+
request_type: str,
334+
success: bool,
335+
duration_seconds: float | None = None,
336+
**metadata: Any,
337+
) -> None:
338+
"""Track request completion."""
183339
if duration_seconds is not None:
184340
self.request_durations[request_type].append(duration_seconds)
185341

186-
def on_error(self, request_id, error, error_type, **metadata):
342+
def on_error(
343+
self,
344+
token: Any,
345+
request_id: RequestId | None,
346+
error: Exception,
347+
error_type: str,
348+
**metadata: Any,
349+
) -> None:
350+
"""Track errors."""
187351
self.error_counts[error_type] += 1
188352

189353
def get_stats(self):
@@ -199,10 +363,14 @@ class MetricsInstrumenter:
199363
return stats
200364
```
201365

366+
Note: For this simple metrics case, the token isn't strictly necessary, so we just return the `request_type`. For more complex instrumenters (like OpenTelemetry), the token is essential for maintaining state.
367+
202368
## Future Work
203369

204-
- Full OpenTelemetry integration as a separate module
205-
- Additional built-in instrumenters (Prometheus, StatsD, etc.)
370+
- Package OpenTelemetry instrumenter as a separate installable extra (`pip install mcp[opentelemetry]`)
371+
- Additional built-in instrumenters (Prometheus, StatsD, Datadog, etc.)
372+
- Support for distributed tracing via `params._meta.traceparent` propagation (see [modelcontextprotocol/spec#414](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/414))
373+
- Semantic conventions for MCP traces and metrics (see [open-telemetry/semantic-conventions#2083](https://github.com/open-telemetry/semantic-conventions/pull/2083))
206374
- Client-side request instrumentation
207375
- Async hook support for long-running instrumentation operations
208376

0 commit comments

Comments
 (0)