Skip to content

Conversation

@galkleinman
Copy link
Contributor

@galkleinman galkleinman commented Jan 15, 2026

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

  • Breaking Changes

    • Minimum Python requirement raised to >=3.10 across packages (>=3.12 for LlamaIndex instrumentation).
  • New Features

    • Spans now include tool-invocation details.
    • More robust LLM metadata extraction (model and token usage) across formats.
  • Updates

    • Dependency ecosystem bumped (LlamaIndex, LangChain and related).
    • Expanded/modernized test cassettes; many tests converted to async and updated model defaults.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

Raised per-package Python minimums (mostly to 3.10; one llamaindex toolchain to 3.12), updated many pyproject test/dependency ranges and added [tool.uv] constraint blocks, migrated LangChain imports, converted llamaindex tests to async/updated APIs, refreshed many VCR cassettes, and enhanced span handling and LLM response parsing.

Changes

Cohort / File(s) Summary
Python & tooling bumps
packages/*/.python-version, packages/*/pyproject.toml, packages/opentelemetry-semantic-conventions-ai/.python-version
Bump .python-version pins (mostly 3.9.5→3.10; one llamaindex → 3.12); raise requires-python in many pyproject.toml; add [tool.uv] constraint-dependencies; bump test tooling (notably vcrpy → >=8.0.0).
LlamaIndex instrumentation core
packages/opentelemetry-instrumentation-llamaindex/.../dispatcher_wrapper.py, .../span_utils.py
Added workflow tool-call handling to span creation (extract tool.name and tool.arguments for WorkflowToolCall) and reworked set_llm_chat_response_model_attributes to robustly extract model and token usage across multiple response formats and fallback shapes.
LlamaIndex tests & fixtures
packages/opentelemetry-instrumentation-llamaindex/tests/*.py, packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/...
Converted tests to async (@pytest.mark.asyncio), updated ReActAgent/LLM/Cohere signatures and models, relaxed span assertions; massively expanded/refreshed VCR cassettes (tool-call permutations, upload/poll/result flows, HTTPX UA/version changes).
LangChain tests & cassettes
packages/opentelemetry-instrumentation-langchain/tests/*.py, packages/opentelemetry-instrumentation-langchain/tests/cassettes/...
Repointed imports to langchain_core/langchain_classic/langchain_community; updated model strings; adjusted assertions to be log/field-based; updated many cassettes to Cohere v2 SSE/event-stream formats and changed request/response formats.
Sample app updates
packages/sample-app/pyproject.toml, packages/sample-app/sample_app/...
Upgraded llama-index/langchain ecosystem deps, adjusted dependencies list, and fixed import path for HuggingFaceInferenceAPIllama_index.llms.huggingface_api.
Widespread cassette refreshes
packages/*/tests/cassettes/...
Large number of fixture updates: header/timing/trace metadata, gzip payload substitutions, SSE/event-stream transitions, and client UA/version changes — primarily recorded test data modifications.
Misc imports & tests adjustments
packages/*/tests/*.py, packages/sample-app/sample_app/*.py
Multiple import-path replacements (langchain → langchain_core/langchain_classic/langchain_community), minor test renames/refocus (e.g., llamaparse parse/aparse presence), and assertion shape relaxations.
Individual package pyproject additions
packages/opentelemetry-instrumentation-*/pyproject.toml (many)
Added small per-package pyproject.toml entries (mostly [tool.uv]), and per-package minor dependency/test-version bumps reflecting the global toolchain change.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through pins of Python bright,
Tests learned to await by candlelight,
Cassettes now stream with eventful song,
Spans hum tools and tokens all day long,
I leave a carrot for CI tonight. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: updating dependencies to remediate CVEs. It is concise, specific, and accurately reflects the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@galkleinman galkleinman marked this pull request as ready for review January 15, 2026 16:46
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped PR review on f4ce84c because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_structured_llm/test_structured_llm_achat_model_attributes.yaml (1)

17-115: Filter sensitive headers and project identifiers in cassette recordings.

The cassette exposes openai-project (proj_tzz1TbPPOXaf6j9tEkVUBIAa) and openai-organization (traceloop), which are account/organization identifiers that should not be persisted in test fixtures. Additionally, volatile headers like Date, CF-RAY, Set-Cookie, and traceparent will cause unnecessary cassette diffs on re-recording.

Extend the filter_headers in packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py to include these headers. Consider also filtering other CloudFlare and tracing headers that don't affect test assertions.

🤖 Fix all issues with AI agents
In `@packages/opentelemetry-instrumentation-llamaindex/pyproject.toml`:
- Around line 38-44: Bump validation: run the llama-parse/llama-index
compatibility tests and adjust wrapper code if runtime failures occur;
specifically run the test suite referenced in
packages/opentelemetry-instrumentation-llamaindex/tests/test_llamaparse.py with
uv run pytest to reproduce issues, check any wrapper functions around
llama-index classes/methods (e.g., your instrumentation wrappers that call Parse
API or Workflow/checkpointer-related shims) for changes in llama-parse 0.6+
(JSON config payload) and llama-index 0.14+ (removed
checkpointer/sub-workflows/deprecated Workflow methods), and update the wrapper
logic to use the new Parse JSON payload shape or remove references to removed
Workflow/checkpointer APIs so tests pass under the new dependency versions.

In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml`:
- Around line 677-900: The VCR cassette includes stale requests for models
"command-r" and "command-r-plus" that returned 404s; re-record the cassette
(packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml)
using the current test harness so it only captures the actual calls made by the
test (which instantiate Cohere(model="command-a-03-2025") in test_agents.py) —
run the test suite or the specific test that uses the cassette, delete or
replace the old cassette and commit the newly recorded YAML so the cassette
matches the current codepath and no longer contains the obsolete fallback
requests.

In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml`:
- Around line 358-359: The cassette contains a recorded failed interaction for
the deprecated model "gpt-3.5-turbo-0613" with "stream": true that returns a
404; locate the request/response block that mentions model: "gpt-3.5-turbo-0613"
and response code: 404 / message: "Not Found" and either remove that entire
failed interaction from the YAML or re-record the cassette so the interaction is
replaced by a successful call (preferably using a supported model and without
streaming if not intended); ensure the cleaned cassette no longer contains the
404 response or the deprecated model entry.

In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml`:
- Around line 797-1545: The VCR cassette contains sensitive x-session-id and
x-correlation-id response headers that must be scrubbed; update the vcr_config()
function (in conftest.py) so its returned "filter_headers" list includes
"x-session-id" and "x-correlation-id" in addition to existing entries like
"authorization" and "api-key", preserving the other keys (e.g., "ignore_hosts")
in the returned config.

In `@packages/sample-app/pyproject.toml`:
- Line 17: Remove the unused dependency "llama-index-llms-huggingface" from
pyproject.toml (it is not imported or used anywhere); keep the necessary
"llama-index-llms-huggingface-api" (used for HuggingFaceInferenceAPI) and
"llama-index-embeddings-huggingface" (used for HuggingFaceEmbedding), so delete
the "llama-index-llms-huggingface" entry to reduce dependency bloat and attack
surface.
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_aload_data_instrumentation.yaml (1)

797-1545: Confirm the async test’s polling/request payload matches the new cassette.

This cassette now reflects the updated request body and headers. If the async test still sends the old payload or polls with different headers, VCR replays can miss.

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml (1)

473-478: Consider scrubbing OpenAI project identifiers from cassette.

The cassette contains OpenAI project identifiers (e.g., proj_tzz1TbPPOXaf6j9tEkVUBIAa) in the response headers. While not a secret, these identifiers could be considered organizational PII. Based on learnings, VCR cassettes should avoid containing PII. Consider using VCR's filter_headers or before_record_response to scrub these values.

packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (1)

196-202: Consider strengthening tool span assertions.

The conditional check if "tool.name" in tool_span.attributes makes the assertion optional. If call_tool.task spans should always have a tool.name attribute, consider making this assertion unconditional. Otherwise, if this is intentionally flexible for different code paths, the current approach is acceptable.

Comment on lines +677 to +900
- request:
body: '{"message":"Which city has the highest population and how many years will
it take to reach 20 million inhabitants if it''s population increases by 1 million
a year?","model":"command-r","chat_history":[{"role":"System","message":"You
are designed to help with a variety of tasks, from answering questions to providing
summaries to other types of analyses.\n\n## Tools\n\nYou have access to a wide
variety of tools. You are responsible for using the tools in any sequence you
deem appropriate to complete the task at hand.\nThis may require breaking the
task into subtasks and using different tools to complete each subtask.\n\nYou
have access to the following tools:\n> Tool Name: calc_tool\nTool Description:
Useful for calculating the number of years until a city reaches a target population.\nTool
Args: {\"properties\": {\"target_population\": {\"title\": \"Target Population\",
\"type\": \"integer\"}, \"current_population\": {\"title\": \"Current Population\",
\"type\": \"integer\"}, \"yearly_increase\": {\"title\": \"Yearly Increase\",
\"type\": \"integer\"}}, \"required\": [\"target_population\", \"current_population\",
\"yearly_increase\"], \"type\": \"object\"}\n\n> Tool Name: sql_tool\nTool Description:
Useful for translating a natural language query into a SQL query over a table
which contains the names of cities, together with their population and country\nTool
Args: {\"properties\": {\"input\": {\"title\": \"Input\", \"type\": \"string\"}},
\"required\": [\"input\"], \"type\": \"object\"}\n\n\n\n## Output Format\n\nPlease
answer in the same language as the question and use the following format:\n\n```\nThought:
The current language of the user is: (user''s language). I need to use a tool
to help me answer the question.\nAction: tool name (one of calc_tool, sql_tool)
if using a tool.\nAction Input: the input to the tool, in a JSON format representing
the kwargs (e.g. {\"input\": \"hello world\", \"num_beams\": 5})\n```\n\nPlease
ALWAYS start with a Thought.\n\nNEVER surround your response with markdown code
markers. You may use code markers within your response if you need to.\n\nPlease
use a valid JSON format for the Action Input. Do NOT do this {''input'': ''hello
world'', ''num_beams'': 5}. If you include the \"Action:\" line, then you MUST
include the \"Action Input:\" line too, even if the tool does not need kwargs,
in that case you MUST use \"Action Input: {}\".\n\nIf this format is used, the
tool will respond in the following format:\n\n```\nObservation: tool response\n```\n\nYou
should keep repeating the above format till you have enough information to answer
the question without using any more tools. At that point, you MUST respond in
one of the following two formats:\n\n```\nThought: I can answer without using
any more tools. I''ll use the user''s language to answer\nAnswer: [your answer
here (In the same language as the user''s question)]\n```\n\n```\nThought: I
cannot answer the question with the provided tools.\nAnswer: [your answer here
(In the same language as the user''s question)]\n```\n\n## Current Conversation\n\nBelow
is the current conversation consisting of interleaving human and assistant messages.\n"}],"stream":false}'
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
connection:
- keep-alive
content-length:
- '3186'
content-type:
- application/json
host:
- api.cohere.com
user-agent:
- cohere/5.20.1
x-client-name:
- llama_index
x-fern-language:
- Python
x-fern-sdk-name:
- cohere
x-fern-sdk-version:
- 5.20.1
method: POST
uri: https://api.cohere.com/v1/chat
response:
body:
string: !!binary |
H4sIAAAAAAAA/yzMTWrDMBBH8av8mS6yifwhW5j6GjmBohk3hkgTNEpLKb17wc3y8eD3QzvTSsvk
R55ScMP74t28DMHFmWc3XceQ2PPmvdCZspjFD6GVsrLccUqacyzs6glf0VAl66cwtOAijyb5KhVj
OMMPPnS4iODW2sPWvmdN1iW9SZUuaT66P1R7e6nYtCLivluDbvif+NYnUix4mmAv1iRyR79/AQAA
//9hA1FqyAAAAA==
headers:
Alt-Svc:
- h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Transfer-Encoding:
- chunked
Via:
- 1.1 google
access-control-expose-headers:
- X-Debug-Trace-ID
cache-control:
- no-cache, no-store, no-transform, must-revalidate, private, max-age=0
content-encoding:
- gzip
content-type:
- application/json
date:
- Sat, 17 Jan 2026 11:21:18 GMT
expires:
- Thu, 01 Jan 1970 00:00:00 GMT
pragma:
- no-cache
server:
- envoy
vary:
- Origin,Accept-Encoding
x-accel-expires:
- '0'
x-debug-trace-id:
- 6ce2557b14e869b1ed6d37ed9fd80b38
x-endpoint-monthly-call-limit:
- '1000'
x-envoy-upstream-service-time:
- '15'
x-trial-endpoint-call-limit:
- '20'
x-trial-endpoint-call-remaining:
- '19'
status:
code: 404
message: Not Found
- request:
body: '{"message":"Which city has the highest population and how many years will
it take to reach 20 million inhabitants if it''s population increases by 1 million
a year?","model":"command-r-plus","chat_history":[{"role":"System","message":"You
are designed to help with a variety of tasks, from answering questions to providing
summaries to other types of analyses.\n\n## Tools\n\nYou have access to a wide
variety of tools. You are responsible for using the tools in any sequence you
deem appropriate to complete the task at hand.\nThis may require breaking the
task into subtasks and using different tools to complete each subtask.\n\nYou
have access to the following tools:\n> Tool Name: calc_tool\nTool Description:
Useful for calculating the number of years until a city reaches a target population.\nTool
Args: {\"properties\": {\"target_population\": {\"title\": \"Target Population\",
\"type\": \"integer\"}, \"current_population\": {\"title\": \"Current Population\",
\"type\": \"integer\"}, \"yearly_increase\": {\"title\": \"Yearly Increase\",
\"type\": \"integer\"}}, \"required\": [\"target_population\", \"current_population\",
\"yearly_increase\"], \"type\": \"object\"}\n\n> Tool Name: sql_tool\nTool Description:
Useful for translating a natural language query into a SQL query over a table
which contains the names of cities, together with their population and country\nTool
Args: {\"properties\": {\"input\": {\"title\": \"Input\", \"type\": \"string\"}},
\"required\": [\"input\"], \"type\": \"object\"}\n\n\n\n## Output Format\n\nPlease
answer in the same language as the question and use the following format:\n\n```\nThought:
The current language of the user is: (user''s language). I need to use a tool
to help me answer the question.\nAction: tool name (one of calc_tool, sql_tool)
if using a tool.\nAction Input: the input to the tool, in a JSON format representing
the kwargs (e.g. {\"input\": \"hello world\", \"num_beams\": 5})\n```\n\nPlease
ALWAYS start with a Thought.\n\nNEVER surround your response with markdown code
markers. You may use code markers within your response if you need to.\n\nPlease
use a valid JSON format for the Action Input. Do NOT do this {''input'': ''hello
world'', ''num_beams'': 5}. If you include the \"Action:\" line, then you MUST
include the \"Action Input:\" line too, even if the tool does not need kwargs,
in that case you MUST use \"Action Input: {}\".\n\nIf this format is used, the
tool will respond in the following format:\n\n```\nObservation: tool response\n```\n\nYou
should keep repeating the above format till you have enough information to answer
the question without using any more tools. At that point, you MUST respond in
one of the following two formats:\n\n```\nThought: I can answer without using
any more tools. I''ll use the user''s language to answer\nAnswer: [your answer
here (In the same language as the user''s question)]\n```\n\n```\nThought: I
cannot answer the question with the provided tools.\nAnswer: [your answer here
(In the same language as the user''s question)]\n```\n\n## Current Conversation\n\nBelow
is the current conversation consisting of interleaving human and assistant messages.\n"}],"stream":false}'
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
connection:
- keep-alive
content-length:
- '3191'
content-type:
- application/json
host:
- api.cohere.com
user-agent:
- cohere/5.20.1
x-client-name:
- llama_index
x-fern-language:
- Python
x-fern-sdk-name:
- cohere
x-fern-sdk-version:
- 5.20.1
method: POST
uri: https://api.cohere.com/v1/chat
response:
body:
string: !!binary |
H4sIAAAAAAAA/yzMQW6DMBCF4as8uYtsYig0VIFr5ASD/dwgYQZ5TKuq6t0rpVn++qXvxy3RTS6J
XLs4iu/T++gvabz4q5Be3uaefZjj0AV3dplm8kE3uayRK05Bc5Yt+uL39bATvsRQmPWTEbrhxr0y
zyzohjP6135ocCNxr3W3qW2jBmuC3lnYBM2Pbh+0vTxpJC0QrItVaML/xLceCLLhMGLZrFJi437/
AgAA///c5RiNzQAAAA==
headers:
Alt-Svc:
- h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Transfer-Encoding:
- chunked
Via:
- 1.1 google
access-control-expose-headers:
- X-Debug-Trace-ID
cache-control:
- no-cache, no-store, no-transform, must-revalidate, private, max-age=0
content-encoding:
- gzip
content-type:
- application/json
date:
- Sat, 17 Jan 2026 11:22:27 GMT
expires:
- Thu, 01 Jan 1970 00:00:00 GMT
pragma:
- no-cache
server:
- envoy
vary:
- Origin,Accept-Encoding
x-accel-expires:
- '0'
x-debug-trace-id:
- e75eb1b282207c5d4eda1fd4e27b567c
x-endpoint-monthly-call-limit:
- '1000'
x-envoy-upstream-service-time:
- '12'
x-trial-endpoint-call-limit:
- '20'
x-trial-endpoint-call-remaining:
- '19'
status:
code: 404
message: Not Found
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the test file that uses this cassette and examine its structure
find . -name "test_agent_with_multiple_tools.py" -o -name "*test_agents*" | head -20

Repository: traceloop/openllmetry

Length of output: 1083


🏁 Script executed:

#!/bin/bash
# Search for fallback logic, error handling, and model selection in the llamaindex instrumentation
rg -n "command-r-plus|command-r|command-a-03-2025" packages/opentelemetry-instrumentation-llamaindex/tests --type py -B2 -A2

Repository: traceloop/openllmetry

Length of output: 597


🏁 Script executed:

#!/bin/bash
# Check for VCR configuration and cassette recording modes
rg -n "vcr|cassette|record_mode|before_record" packages/opentelemetry-instrumentation-llamaindex --type py -B1 -A1

Repository: traceloop/openllmetry

Length of output: 5360


🏁 Script executed:

#!/bin/bash
# Read the test_agents.py file to find test_agent_with_multiple_tools and understand the test structure
cat -n packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py | head -220

Repository: traceloop/openllmetry

Length of output: 9822


🏁 Script executed:

#!/bin/bash
# Get the full vcr_config from conftest.py
sed -n '122,150p' packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py

Repository: traceloop/openllmetry

Length of output: 704


🏁 Script executed:

#!/bin/bash
# Search for error handling or fallback logic related to 404s or model retries
rg -n "except|try|404|retry|fallback" packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py -B2 -A2

Repository: traceloop/openllmetry

Length of output: 1194


🏁 Script executed:

#!/bin/bash
# Check if there's any model fallback logic in the cohere integration or conftest
rg -n "command-r|fallback|default.*model" packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py packages/opentelemetry-instrumentation-llamaindex/src/ --type py

Repository: traceloop/openllmetry

Length of output: 159


🏁 Script executed:

#!/bin/bash
# Check the cassette file itself - look at the actual requests to confirm the models being called
head -100 packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml | rg -n "command-r|command-a-03|uri:|model"

Repository: traceloop/openllmetry

Length of output: 194


🏁 Script executed:

#!/bin/bash
# Check if there's a conftest fixture that modifies the LLM or its behavior
cat -n packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py | head -130

Repository: traceloop/openllmetry

Length of output: 5051


🏁 Script executed:

#!/bin/bash
# Search in the instrumentation directory structure
find packages/opentelemetry-instrumentation-llamaindex -type f -name "*.py" | head -20

Repository: traceloop/openllmetry

Length of output: 2214


🏁 Script executed:

#!/bin/bash
# Look for any cohere-related instrumentation that might have fallback logic
rg -n "command-r|model.*fallback|default.*model" packages/opentelemetry-instrumentation-llamaindex --type py -B2 -A2

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check the cassette timestamps to understand when it was recorded
head -20 packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml

Repository: traceloop/openllmetry

Length of output: 1662


🏁 Script executed:

#!/bin/bash
# Check the full cassette structure to see how many models are attempted
rg -n '"model":' packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml | head -20

Repository: traceloop/openllmetry

Length of output: 1243


🏁 Script executed:

#!/bin/bash
# Look for any pytest-vcr or vcr-related configuration that might auto-record or handle retries
rg -n "vcr|cassette|record_mode|match_on" packages/opentelemetry-instrumentation-llamaindex/tests --type py

Repository: traceloop/openllmetry

Length of output: 1477


🏁 Script executed:

#!/bin/bash
# Check if there's any documentation or comments in the cassette about why multiple models are present
head -50 packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml | cat -v

Repository: traceloop/openllmetry

Length of output: 3580


Re-record the VCR cassette to match current test code.

The cassette contains requests for command-r and command-r-plus models that respond with 404 errors, followed by successful requests using command-a-03-2025. However, the current test code directly specifies only Cohere(model="command-a-03-2025") at line 161 of test_agents.py. The 404 responses appear to be artifacts from an older version of the Cohere or llama-index library that attempted model fallback. Re-record the cassette using the current code to ensure it reflects actual API behavior and prevent masking potential regressions in the success path.

🤖 Prompt for AI Agents
In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml`
around lines 677 - 900, The VCR cassette includes stale requests for models
"command-r" and "command-r-plus" that returned 404s; re-record the cassette
(packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml)
using the current test harness so it only captures the actual calls made by the
test (which instantiate Cohere(model="command-a-03-2025") in test_agents.py) —
run the test suite or the specific test that uses the cassette, delete or
replace the old cassette and commit the newly recorded YAML so the cassette
matches the current codepath and no longer contains the obsolete fallback
requests.

Comment on lines +358 to +359
code: 404
message: Not Found
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cassette contains a 404 failure response.

The interaction at lines 244-359 records a request to the deprecated model gpt-3.5-turbo-0613 with "stream":true that returns a 404 Not Found. This failed interaction is recorded in the cassette and may cause unexpected test behavior or confusion. Consider removing this failed request block or re-recording the cassette to only include successful interactions.

🤖 Prompt for AI Agents
In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml`
around lines 358 - 359, The cassette contains a recorded failed interaction for
the deprecated model "gpt-3.5-turbo-0613" with "stream": true that returns a
404; locate the request/response block that mentions model: "gpt-3.5-turbo-0613"
and response code: 404 / message: "Not Found" and either remove that entire
failed interaction from the YAML or re-record the cassette so the interaction is
replaced by a successful call (preferably using a supported model and without
streaming if not intended); ensure the cleaned cassette no longer contains the
404 response or the deprecated model entry.

Comment on lines +797 to +1545
- request:
body: from_python_package=true&input_url=https%3A%2F%2Farxiv.org%2Fpdf%2F1706.03762.pdf&language=en
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
connection:
- keep-alive
content-length:
- '93'
content-type:
- application/x-www-form-urlencoded
host:
- api.cloud.llamaindex.ai
user-agent:
- python-httpx/0.28.1
method: POST
uri: https://api.cloud.llamaindex.ai/api/parsing/upload
response:
body:
string: '{"id":"f63d3e7d-5e4e-4edc-8568-ef276937ad1d","status":"PENDING","error_code":null,"error_message":null}'
headers:
Connection:
- keep-alive
Content-Length:
- '103'
Content-Type:
- application/json
Date:
- Sat, 17 Jan 2026 11:05:23 GMT
Strict-Transport-Security:
- max-age=31536000; includeSubDomains
x-correlation-id:
- e686c240-db78-402c-84f1-6fcf437f80c2
x-session-id:
- 3aa231d1-3199-439c-83cb-c129f3a03490
status:
code: 200
message: OK
- request:
body: ''
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
connection:
- keep-alive
host:
- api.cloud.llamaindex.ai
user-agent:
- python-httpx/0.28.1
method: GET
uri: https://api.cloud.llamaindex.ai/api/parsing/job/f63d3e7d-5e4e-4edc-8568-ef276937ad1d
response:
body:
string: '{"id":"f63d3e7d-5e4e-4edc-8568-ef276937ad1d","status":"PENDING","error_code":null,"error_message":null}'
headers:
Connection:
- keep-alive
Content-Length:
- '103'
Content-Type:
- application/json
Date:
- Sat, 17 Jan 2026 11:05:24 GMT
Strict-Transport-Security:
- max-age=31536000; includeSubDomains
x-correlation-id:
- 8e70feb5-82ce-42b2-9fc4-21441ddd6870
x-session-id:
- c07122ec-6421-4b53-9174-529321dfe7e1
status:
code: 200
message: OK
- request:
body: ''
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
connection:
- keep-alive
host:
- api.cloud.llamaindex.ai
user-agent:
- python-httpx/0.28.1
method: GET
uri: https://api.cloud.llamaindex.ai/api/parsing/job/f63d3e7d-5e4e-4edc-8568-ef276937ad1d
response:
body:
string: '{"id":"f63d3e7d-5e4e-4edc-8568-ef276937ad1d","status":"SUCCESS","error_code":null,"error_message":null}'
headers:
Connection:
- keep-alive
Content-Length:
- '103'
Content-Type:
- application/json
Date:
- Sat, 17 Jan 2026 11:05:27 GMT
Strict-Transport-Security:
- max-age=31536000; includeSubDomains
x-correlation-id:
- 514677a3-d31f-4f00-9f66-edba7cb15e52
x-session-id:
- 4ad4092c-cb2b-460c-bd8c-03c7551e0fb0
status:
code: 200
message: OK
- request:
body: ''
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
connection:
- keep-alive
host:
- api.cloud.llamaindex.ai
user-agent:
- python-httpx/0.28.1
method: GET
uri: https://api.cloud.llamaindex.ai/api/parsing/job/f63d3e7d-5e4e-4edc-8568-ef276937ad1d/result/text
response:
body:
string: "{\"text\":\" arXiv:1706.03762v7 [cs.CL] 2 Aug 2023\\n\\n Provided
proper attribution is provided, Google hereby grants permission to\\nreproduce
the tables and figures in this paper solely for use in journalistic or\\n
\ scholarly works.\\n\\n Attention Is All You Need\\n\\n Ashish Vaswani\u2217
\ Noam Shazeer\u2217 Niki Parmar\u2217 Jakob Uszkoreit\u2217\\n Google
Brain Google Brain Google Research Google Research\\navaswani@google.com
\ noam@google.com nikip@google.com usz@google.com\\n\\n Llion Jones\u2217
\ Aidan N. Gomez\u2217 \u2020 \u0141ukasz Kaiser\u2217\\nGoogle
Research University of Toronto Google Brain\\nllion@google.com
\ aidan@cs.toronto.edu lukaszkaiser@google.com\\n\\n Illia
Polosukhin\u2217 \u2021\\n illia.polosukhin@gmail.com\\n\\n
\ Abstract\\n\\n The
dominant sequence transduction models are based on complex recurrent or\\n
\ convolutional neural networks that include an encoder
and a decoder. The best\\n performing models also connect
the encoder and decoder through an attention\\n mechanism. We propose
a new simple network architecture, the Transformer,\\n based
solely on attention mechanisms, dispensing with recurrence and convolutions\\n
\ entirely. Experiments on two machine translation tasks show these models
to\\n be superior in quality while being more parallelizable
and requiring significantly\\n less time to train. Our model achieves
28.4 BLEU on the WMT 2014 English-\\n to-German translation
task, improving over the existing best results, including\\n ensembles,
by over 2 BLEU. On the WMT 2014 English-to-French translation task,\\n our
model establishes a new single-model state-of-the-art BLEU score of 41.8 after\\n
\ training for 3.5 days on eight GPUs, a small fraction
of the training costs of the\\n best models from the literature.
We show that the Transformer generalizes well to\\n other
tasks by applying it successfully to English constituency parsing both with\\n
\ large and limited training data.\\n\\n \u2217Equal contribution. Listing
order is random. Jakob proposed replacing RNNs with self-attention and started\\n
\ the effort to evaluate this idea. Ashish, with Illia, designed and implemented
the first Transformer models and\\n has been crucially involved in every
aspect of this work. Noam proposed scaled dot-product attention, multi-head\\n
\ attention and the parameter-free position representation and became
the other person involved in nearly every\\n detail. Niki designed,
implemented, tuned and evaluated countless model variants in our original
codebase and\\n tensor2tensor. Llion also experimented with novel model
variants, was responsible for our initial codebase, and\\n efficient inference
and visualizations. Lukasz and Aidan spent countless long days designing various
parts of and\\n implementing tensor2tensor, replacing our earlier codebase,
greatly improving results and massively accelerating\\n our research.\\n
\ \u2020Work performed while at Google Brain.\\n \u2021Work performed
while at Google Research.\\n\\n 31st Conference on Neural Information Processing
Systems (NIPS 2017), Long Beach, CA, USA.\\n\\n---\\n\\n1 Introduction\\n\\nRecurrent
neural networks, long short-term memory [13] and gated recurrent [7] neural
networks\\nin particular, have been firmly established as state of the art
approaches in sequence modeling and\\ntransduction problems such as language
modeling and machine translation [35, 2, 5]. Numerous\\nefforts have since
continued to push the boundaries of recurrent language models and encoder-decoder\\narchitectures
[38, 24, 15].\\nRecurrent models typically factor computation along the symbol
positions of the input and output\\nsequences. Aligning the positions to steps
in computation time, they generate a sequence of hidden\\nstates ht, as a
function of the previous hidden state ht\u22121 and the input for position
t. This inherently\\nsequential nature precludes parallelization within training
examples, which becomes critical at longer\\nsequence lengths, as memory constraints
limit batching across examples. Recent work has achieved\\nsignificant improvements
in computational efficiency through factorization tricks [21] and conditional\\ncomputation
[32], while also improving model performance in case of the latter. The fundamental\\nconstraint
of sequential computation, however, remains.\\nAttention mechanisms have become
an integral part of compelling sequence modeling and transduc-\\ntion models
in various tasks, allowing modeling of dependencies without regard to their
distance in\\nthe input or output sequences [2, 19]. In all but a few cases
[27], however, such attention mechanisms\\nare used in conjunction with a
recurrent network.\\nIn this work we propose the Transformer, a model architecture
eschewing recurrence and instead\\nrelying entirely on an attention mechanism
to draw global dependencies between input and output.\\nThe Transformer allows
for significantly more parallelization and can reach a new state of the art
in\\ntranslation quality after being trained for as little as twelve hours
on eight P100 GPUs.\\n\\n2 Background\\n\\nThe goal of reducing sequential
computation also forms the foundation of the Extended Neural GPU\\n[16], ByteNet
[18] and ConvS2S [9], all of which use convolutional neural networks as basic
building\\nblock, computing hidden representations in parallel for all input
and output positions. In these models,\\nthe number of operations required
to relate signals from two arbitrary input or output positions grows\\nin
the distance between positions, linearly for ConvS2S and logarithmically for
ByteNet. This makes\\nit more difficult to learn dependencies between distant
positions [12]. In the Transformer this is\\nreduced to a constant number
of operations, albeit at the cost of reduced effective resolution due\\nto
averaging attention-weighted positions, an effect we counteract with Multi-Head
Attention as\\ndescribed in section 3.2.\\nSelf-attention, sometimes called
intra-attention is an attention mechanism relating different positions\\nof
a single sequence in order to compute a representation of the sequence. Self-attention
has been\\nused successfully in a variety of tasks including reading comprehension,
abstractive summarization,\\ntextual entailment and learning task-independent
sentence representations [4, 27, 28, 22].\\nEnd-to-end memory networks are
based on a recurrent attention mechanism instead of sequence-\\naligned recurrence
and have been shown to perform well on simple-language question answering
and\\nlanguage modeling tasks [34].\\nTo the best of our knowledge, however,
the Transformer is the first transduction model relying\\nentirely on self-attention
to compute representations of its input and output without using sequence-\\naligned
RNNs or convolution. In the following sections, we will describe the Transformer,
motivate\\nself-attention and discuss its advantages over models such as [17,
18] and [9].\\n\\n3 Model Architecture\\n\\nMost competitive neural sequence
transduction models have an encoder-decoder structure [5, 2, 35].\\nHere,
the encoder maps an input sequence of symbol representations (x1, ..., xn)
to a sequence\\nof continuous representations z = (z1, ..., zn). Given z,
the decoder then generates an output\\nsequence (y1, ..., ym) of symbols one
element at a time. At each step the model is auto-regressive\\n[10], consuming
the previously generated symbols as additional input when generating the next.\\n\\n
\ 2\\n\\n---\\n\\n2020\\n20
\ A A 2 2 20 20\\n0 \u2212 2 N 0\\n\\nA D 0 20 0\\n
\ Bae TA\\n\\n 2020\\n\\nFigure 1: The Transformer - model architecture.\\n\\nThe
Transformer follows this overall architecture using stacked self-attention
and point-wise, fully\\nconnected layers for both the encoder and decoder,
shown in the left and right halves of Figure 1,\\nrespectively.\\n\\n3.1 Encoder
and Decoder Stacks\\nEncoder: The encoder is composed of a stack of N
= 6 identical layers. Each layer has two\\nsub-layers. The first is a multi-head
self-attention mechanism, and the second is a simple, position-\\nwise fully
connected feed-forward network. We employ a residual connection [11] around
each of\\nthe two sub-layers, followed by layer normalization [1]. That is,
the output of each sub-layer is\\nLayerNorm(x + Sublayer(x)), where Sublayer(x)
is the function implemented by the sub-layer\\nitself. To facilitate these
residual connections, all sub-layers in the model, as well as the embedding\\nlayers,
produce outputs of dimension dmodel = 512.\\n\\nDecoder: The decoder is also
composed of a stack of N = 6 identical layers. In addition to the two\\nsub-layers
in each encoder layer, the decoder inserts a third sub-layer, which performs
multi-head\\nattention over the output of the encoder stack. Similar to the
encoder, we employ residual connections\\naround each of the sub-layers, followed
by layer normalization. We also modify the self-attention\\nsub-layer in the
decoder stack to prevent positions from attending to subsequent positions.
This\\nmasking, combined with fact that the output embeddings are offset by
one position, ensures that the\\npredictions for position i can depend only
on the known outputs at positions less than i.\\n\\n3.2 Attention\\nAn attention
function can be described as mapping a query and a set of key-value pairs
to an output,\\nwhere the query, keys, values, and output are all vectors.
The output is computed as a weighted sum\\n\\n 3\\n\\n---\\n\\n
\ Scaled Dot-Product Attention Multi-Head Attention\\n\\n Linear\\n
\ MatMul\\n\\n SoftMax Concat\\n\\nMask
(opt.) Scaled Dot-Product\\n\\n Scale Attention\\n\\n
\ MatMul inear Linear Linear\\n \u2191 \u2191\\n
\ Q K\\n\\n Figure 2: (left) Scaled Dot-Product Attention. (right) Multi-Head
Attention consists of several\\n attention layers running in parallel.\\n\\n
\ of the values, where the weight assigned to each value is computed by
a compatibility function of the\\n query with the corresponding key.\\n\\n
\ 3.2.1 Scaled Dot-Product Attention\\n We call our particular attention
\\\"Scaled Dot-Product Attention\\\" (Figure 2). The input consists of\\n
\ queries and keys of dimension d , and values of dimension d .
We compute the dot products of the\\n query with all keys, divide k
\ \u221A v\\n values. each by dk, and
apply a softmax function to obtain the weights on the\\n In practice, we
compute the attention function on a set of queries simultaneously, packed
together\\n into a matrix Q. The keys and values are also packed together
into matrices K and V . We compute\\n the matrix of outputs as:\\n\\n QK
T\\n Attention(Q, K, V ) = softmax( \u221Adk
)V (1)\\n\\n The two most commonly used attention functions
are additive attention [2], and dot-product (multi-\\n plicative) attention.
Dot-product attention is identical to our algorithm, except for the scaling
factor\\n of 1 . Additive attention computes the compatibility
function using a feed-forward network with\\n \u221Adk\\n a single
hidden layer. While the two are similar in theoretical complexity, dot-product
attention is\\n much faster and more space-efficient in practice, since
it can be implemented using highly optimized\\n matrix multiplication code.\\n
\ While for small values of dk the two mechanisms perform similarly, additive
attention outperforms\\n dot product attention without scaling for larger
values of dk [3]. We suspect that for large values of\\n dk, the dot products
grow large in magnitude, pushing the softmax function into regions where it
has\\n extremely small gradients 4. To counteract this effect, we scale
the dot products by 1 .\\n \u221Adk\\n\\n
\ 3.2.2 Multi-Head Attention\\n Instead of performing a single attention
function with dmodel-dimensional keys, values and queries,\\n we found
it beneficial to linearly project the queries, keys and values h times with
different, learned\\n linear projections to dk, dk and dv dimensions, respectively.
On each of these projected versions of\\n queries, keys and values we then
perform the attention function in parallel, yielding dv-dimensional\\n 4To
illustrate why the dot products get large, assume that the components of q
and k are independent random\\n variables with mean 0 and variance 1. Then
their dot product, q \xB7 k = P\u1D48\u1D4F qiki, has mean 0 and variance
d\u2096.\\n i=1\\n\\n
\ 4\\n\\n---\\n\\noutput values.
These are concatenated and once again projected, resulting in the final values,
as\\ndepicted in Figure 2.\\nMulti-head attention allows the model to jointly
attend to information from different representation\\nsubspaces at different
positions. With a single attention head, averaging inhibits this.\\n\\nMultiHead(Q,
K, V ) = Concat(head1, ..., headh)W O\\nwhere headi = Attention(QW Q, KWK,
V W V )\\ni i i\\n\\nWhere the
projections are parameter matrices W Q \u2208 Rdmodel\xD7d\u2096, W K \u2208
Rdmodel\xD7d\u2096, W V \u2208 Rdmodel\xD7dv\\nand W O \u2208 Rhdv\xD7dmodel
. i i i\\nIn this work
we employ h = 8 parallel attention layers, or heads. For each of these
we use\\ndk = dv = dmodel/h = 64. Due to the reduced dimension of each head,
the total computational cost\\nis similar to that of single-head attention
with full dimensionality.\\n\\n3.2.3 Applications of Attention in our Model\\nThe
Transformer uses multi-head attention in three different ways:\\n\\n \u2022
\ In \\\"encoder-decoder attention\\\" layers, the queries come from the previous
decoder layer,\\n and the memory keys and values come from the output
of the encoder. This allows every\\n position in the decoder to attend
over all positions in the input sequence. This mimics the\\n typical
encoder-decoder attention mechanisms in sequence-to-sequence models such as\\n
\ [38, 2, 9].\\n \u2022 The encoder contains self-attention
layers. In a self-attention layer all of the keys, values\\n and
queries come from the same place, in this case, the output of the previous
layer in the\\n encoder. Each position in the encoder can attend
to all positions in the previous layer of the\\n encoder.\\n \u2022
\ Similarly, self-attention layers in the decoder allow each position in the
decoder to attend to\\n all positions in the decoder up to and including
that position. We need to prevent leftward\\n information flow in
the decoder to preserve the auto-regressive property. We implement this\\n
\ inside of scaled dot-product attention by masking out (setting to
\u2212\u221E) all values in the input\\n of the softmax which correspond
to illegal connections. See Figure 2.\\n\\n3.3 Position-wise Feed-Forward
Networks\\n\\nIn addition to attention sub-layers, each of the layers in our
encoder and decoder contains a fully\\nconnected feed-forward network, which
is applied to each position separately and identically. This\\nconsists of
two linear transformations with a ReLU activation in between.\\n\\n FFN(x)
= max(0, xW1 + b1)W2 + b2 (2)\\n\\n While the linear
transformations are the same across different positions, they use different
parameters\\nfrom layer to layer. Another way of describing this is as two
convolutions with kernel size 1.\\nThe dimensionality of input and output
is dmodel = 512, and the inner-layer has dimensionality\\ndf f = 2048.\\n\\n3.4
\ Embeddings and Softmax\\n\\nSimilarly to other sequence transduction models,
we use learned embeddings to convert the input\\ntokens and output tokens
to vectors of dimension dmodel. We also use the usual learned linear transfor-\\nmation
and softmax function to convert the decoder output to predicted next-token
probabilities. In\\nour model, we share the same weight matrix between the
two embedding layers and the pre-softmax\\nlinear transformation, similar
to [30]. In the embedding layers, we multiply those weights by \u221Admodel.\\n\\n
\ 5\\n\\n---\\n\\nTable 1: Maximum
path lengths, per-layer complexity and minimum number of sequential operations\\nfor
different layer types. n is the sequence length, d is the representation dimension,
k is the kernel\\nsize of convolutions and r the size of the neighborhood
in restricted self-attention.\\n\\n Layer Type Complexity
per Layer Sequential Maximum Path Length\\n Operations\\n
Self-Attention O(n2 \xB7 d) O(1) O(1)\\n
Recurrent O(n \xB7 d2) O(n) O(n)\\n
Convolutional O(k \xB7 n \xB7 d2) O(1) O(logk(n))\\n
Self-Attention (restricted) O(r \xB7 n \xB7 d) O(1) O(n/r)\\n\\n3.5
\ Positional Encoding\\nSince our model contains no recurrence and no convolution,
in order for the model to make use of the\\norder of the sequence, we must
inject some information about the relative or absolute position of the\\ntokens
in the sequence. To this end, we add \\\"positional encodings\\\" to the input
embeddings at the\\nbottoms of the encoder and decoder stacks. The positional
encodings have the same dimension dmodel\\nas the embeddings, so that the
two can be summed. There are many choices of positional encodings,\\nlearned
and fixed [9].\\nIn this work, we use sine and cosine functions of different
frequencies:\\n\\n P E(pos,2i) = sin(pos/100002i/d\u1D50\u1D52\u1D48\u1D49\u02E1
)\\n P E(pos,2i+1) = cos(pos/100002i/d\u1D50\u1D52\u1D48\u1D49\u02E1
)\\n\\nwhere pos is the position and i is the dimension. That is, each dimension
of the positional encoding\\ncorresponds to a sinusoid. The wavelengths form
a geometric progression from 2\u03C0 to 10000 \xB7 2\u03C0. We\\nchose this
function because we hypothesized it would allow the model to easily learn
to attend by\\nrelative positions, since for any fixed offset k, P Epos+k
can be represented as a linear function of\\nP Epos.\\nWe also experimented
with using learned positional embeddings [9] instead, and found that the two\\nversions
produced nearly identical results (see Table 3 row (E)). We chose the sinusoidal
version\\nbecause it may allow the model to extrapolate to sequence lengths
longer than the ones encountered\\nduring training.\\n\\n4 Why Self-Attention\\n\\nIn
this section we compare various aspects of self-attention layers to the recurrent
and convolu-\\ntional layers commonly used for mapping one variable-length
sequence of symbol representations\\n(x1, ..., xn) to another sequence of
equal length (z1, ..., zn), with xi, zi \u2208 Rd, such as a hidden\\nlayer
in a typical sequence transduction encoder or decoder. Motivating our use
of self-attention we\\nconsider three desiderata.\\nOne is the total computational
complexity per layer. Another is the amount of computation that can\\nbe parallelized,
as measured by the minimum number of sequential operations required.\\nThe
third is the path length between long-range dependencies in the network. Learning
long-range\\ndependencies is a key challenge in many sequence transduction
tasks. One key factor affecting the\\nability to learn such dependencies is
the length of the paths forward and backward signals have to\\ntraverse in
the network. The shorter these paths between any combination of positions
in the input\\nand output sequences, the easier it is to learn long-range
dependencies [12]. Hence we also compare\\nthe maximum path length between
any two input and output positions in networks composed of the\\ndifferent
layer types.\\nAs noted in Table 1, a self-attention layer connects all positions
with a constant number of sequentially\\nexecuted operations, whereas a recurrent
layer requires O(n) sequential operations. In terms of\\ncomputational complexity,
self-attention layers are faster than recurrent layers when the sequence\\n\\n
\ 6\\n\\n---\\n\\nlength n is
smaller than the representation dimensionality d, which is most often the
case with\\nsentence representations used by state-of-the-art models in machine
translations, such as word-piece\\n[38] and byte-pair [31] representations.
To improve computational performance for tasks involving\\nvery long sequences,
self-attention could be restricted to considering only a neighborhood of size
r in\\nthe input sequence centered around the respective output position.
This would increase the maximum\\npath length to O(n/r). We plan to investigate
this approach further in future work.\\nA single convolutional layer with
kernel width k < n does not connect all pairs of input and output\\npositions.
Doing so requires a stack of O(n/k) convolutional layers in the case of contiguous
kernels,\\nor O(logk(n)) in the case of dilated convolutions [18], increasing
the length of the longest paths\\nbetween any two positions in the network.
Convolutional layers are generally more expensive than\\nrecurrent layers,
by a factor of k. Separable convolutions [6], however, decrease the complexity\\nconsiderably,
to O(k \xB7 n \xB7 d + n \xB7 d2). Even with k = n, however, the complexity
of a separable\\nconvolution is equal to the combination of a self-attention
layer and a point-wise feed-forward layer,\\nthe approach we take in our model.\\nAs
side benefit, self-attention could yield more interpretable models. We inspect
attention distributions\\nfrom our models and present and discuss examples
in the appendix. Not only do individual attention\\nheads clearly learn to
perform different tasks, many appear to exhibit behavior related to the syntactic\\nand
semantic structure of the sentences.\\n\\n5 Training\\n\\nThis section
describes the training regime for our models.\\n\\n5.1 Training Data and
Batching\\n\\nWe trained on the standard WMT 2014 English-German dataset consisting
of about 4.5 million\\nsentence pairs. Sentences were encoded using byte-pair
encoding [3], which has a shared source-\\ntarget vocabulary of about 37000
tokens. For English-French, we used the significantly larger WMT\\n2014 English-French
dataset consisting of 36M sentences and split tokens into a 32000 word-piece\\nvocabulary
[38]. Sentence pairs were batched together by approximate sequence length.
Each training\\nbatch contained a set of sentence pairs containing approximately
25000 source tokens and 25000\\ntarget tokens.\\n\\n5.2 Hardware and Schedule\\n\\nWe
trained our models on one machine with 8 NVIDIA P100 GPUs. For our base models
using\\nthe hyperparameters described throughout the paper, each training
step took about 0.4 seconds. We\\ntrained the base models for a total of 100,000
steps or 12 hours. For our big models,(described on the\\nbottom line of table
3), step time was 1.0 seconds. The big models were trained for 300,000 steps\\n(3.5
days).\\n\\n5.3 Optimizer\\n\\nWe used the Adam optimizer [20] with \u03B21
= 0.9, \u03B22 = 0.98 and \u03F5 = 10\u22129. We varied the learning\\nrate
over the course of training, according to the formula:\\n\\n lrate
= d\u22120.5 \xB7 min(step_num\u22120.5, step_num \xB7 warmup_steps\u22121.5)
\ (3)\\n model\\n\\nThis corresponds to increasing the learning
rate linearly for the first warmup_steps training steps,\\nand decreasing
it thereafter proportionally to the inverse square root of the step number.
We used\\nwarmup_steps = 4000.\\n\\n5.4 Regularization\\n\\nWe employ three
types of regularization during training:\\n\\n 7\\n\\n---\\n\\nTable
2: The Transformer achieves better BLEU scores than previous state-of-the-art
models on the\\nEnglish-to-German and English-to-French newstest2014 tests
at a fraction of the training cost.\\n\\nModel BLEU
\ Training Cost (FLOPs)\\n EN-DE
EN-FR EN-DE EN-FR\\nByteNet [18] 23.75\\nDeep-Att
+ PosUnk [39] 39.2 1.0 \xB7 1020\\nGNMT
+ RL [38] 24.6 39.92 2.3 \xB7 1019 1.4 \xB7 1020\\nConvS2S
[9] 25.16 40.46 9.6 \xB7 1018 1.5 \xB7 1020\\nMoE
[32] 26.03 40.56 2.0 \xB7 1019 1.2 \xB7
1020\\nDeep-Att + PosUnk Ensemble [39] 40.4 8.0
\xB7 1020\\nGNMT + RL Ensemble [38] 26.30 41.16 1.8 \xB7 1020
\ 1.1 \xB7 1021\\nConvS2S Ensemble [9] 26.36 41.29 7.7
\xB7 1019 1.2 \xB7 1021\\nTransformer (base model) 27.3 38.1
\ 3.3 \xB7 1018\\nTransformer (big) 28.4 41.8 2.3
\xB7 1019\\n\\nResidual Dropout We apply dropout [33] to the output of
each sub-layer, before it is added to the\\nsub-layer input and normalized.
In addition, we apply dropout to the sums of the embeddings and the\\npositional
encodings in both the encoder and decoder stacks. For the base model, we use
a rate of\\nPdrop = 0.1.\\n\\nLabel Smoothing During training,
we employed label smoothing of value \u03F5ls = 0.1 [36]. This\\nhurts perplexity,
as the model learns to be more unsure, but improves accuracy and BLEU score.\\n\\n6
\ Results\\n\\n6.1 Machine Translation\\n\\nOn the WMT 2014 English-to-German
translation task, the big transformer model (Transformer (big)\\nin Table
2) outperforms the best previously reported models (including ensembles) by
more than 2.0\\nBLEU, establishing a new state-of-the-art BLEU score of 28.4.
The configuration of this model is\\nlisted in the bottom line of Table 3.
Training took 3.5 days on 8 P100 GPUs. Even our base model\\nsurpasses all
previously published models and ensembles, at a fraction of the training cost
of any of\\nthe competitive models.\\nOn the WMT 2014 English-to-French translation
task, our big model achieves a BLEU score of 41.0,\\noutperforming all of
the previously published single models, at less than 1/4 the training cost
of the\\nprevious state-of-the-art model. The Transformer (big) model trained
for English-to-French used\\ndropout rate Pdrop = 0.1, instead of 0.3.\\nFor
the base models, we used a single model obtained by averaging the last 5 checkpoints,
which\\nwere written at 10-minute intervals. For the big models, we averaged
the last 20 checkpoints. We\\nused beam search with a beam size of 4 and length
penalty \u03B1 = 0.6 [38]. These hyperparameters\\nwere chosen after experimentation
on the development set. We set the maximum output length during\\ninference
to input length + 50, but terminate early when possible [38].\\nTable 2 summarizes
our results and compares our translation quality and training costs to other
model\\narchitectures from the literature. We estimate the number of floating
point operations used to train a\\nmodel by multiplying the training time,
the number of GPUs used, and an estimate of the sustained\\nsingle-precision
floating-point capacity of each GPU 5.\\n\\n6.2 Model Variations\\n\\nTo
evaluate the importance of different components of the Transformer, we varied
our base model\\nin different ways, measuring the change in performance on
English-to-German translation on the\\n\\n 5We used values of 2.8, 3.7,
6.0 and 9.5 TFLOPS for K80, K40, M40 and P100, respectively.\\n\\n 8\\n\\n---\\n\\nTable
3: Variations on the Transformer architecture. Unlisted values are identical
to those of the base\\nmodel. All metrics are on the English-to-German translation
development set, newstest2013. Listed\\nperplexities are per-wordpiece, according
to our byte-pair encoding, and should not be compared to\\nper-word perplexities.\\n\\n
\ N dmodel dff h dk dv Pdrop \u03F5ls train
\ PPL BLEU params\\n steps
\ (dev) (dev) \xD7106\\nbase 6 512 2048 8 64
\ 64 0.1 0.1 100K 4.92 25.8 65\\n 1
\ 512 512 5.29 24.9\\n(A) 4
\ 128 128 5.00 25.5\\n 16
\ 32 32 4.91 25.8\\n 32
\ 16 16 5.01 25.4\\n(B) 16
\ 5.16 25.1 58\\n 32
\ 5.01 25.4 60\\n 2 6.11
\ 23.7 36\\n 4 5.19
\ 25.3 50\\n 8 4.88
\ 25.5 80\\n(C) 256 32 32 5.75
\ 24.5 28\\n 1024 128 128 4.66
\ 26.0 168\\n 1024 5.12
\ 25.4 53\\n 4096 4.75
\ 26.2 90\\n 0.0
\ 5.77 24.6\\n(D) 0.2
\ 0.0 4.95 25.5\\n 4.67
\ 25.3\\n 0.2
\ 5.47 25.7\\n(E) positional embedding instead
of sinusoids 4.92 25.7\\nbig 6 1024 4096
\ 16 0.3 300K 4.33 26.4 213\\n\\ndevelopment
set, newstest2013. We used beam search as described in the previous section,
but no\\ncheckpoint averaging. We present these results in Table 3.\\nIn Table
3 rows (A), we vary the number of attention heads and the attention key and
value dimensions,\\nkeeping the amount of computation constant, as described
in Section 3.2.2. While single-head\\nattention is 0.9 BLEU worse than the
best setting, quality also drops off with too many heads.\\nIn Table 3 rows
(B), we observe that reducing the attention key size dk hurts model quality.
This\\nsuggests that determining compatibility is not easy and that a more
sophisticated compatibility\\nfunction than dot product may be beneficial.
We further observe in rows (C) and (D) that, as expected,\\nbigger models
are better, and dropout is very helpful in avoiding over-fitting. In row (E)
we replace our\\nsinusoidal positional encoding with learned positional embeddings
[9], and observe nearly identical\\nresults to the base model.\\n\\n6.3 English
Constituency Parsing\\n\\nTo evaluate if the Transformer can generalize to
other tasks we performed experiments on English\\nconstituency parsing. This
task presents specific challenges: the output is subject to strong structural\\nconstraints
and is significantly longer than the input. Furthermore, RNN sequence-to-sequence\\nmodels
have not been able to attain state-of-the-art results in small-data regimes
[37].\\nWe trained a 4-layer transformer with dmodel = 1024 on the Wall Street
Journal (WSJ) portion of the\\nPenn Treebank [25], about 40K training sentences.
We also trained it in a semi-supervised setting,\\nusing the larger high-confidence
and BerkleyParser corpora from with approximately 17M sentences\\n[37]. We
used a vocabulary of 16K tokens for the WSJ only setting and a vocabulary
of 32K tokens\\nfor the semi-supervised setting.\\nWe performed only a small
number of experiments to select the dropout, both attention and residual\\n(section
5.4), learning rates and beam size on the Section 22 development set, all
other parameters\\nremained unchanged from the English-to-German base translation
model. During inference, we\\n\\n 9\\n\\n---\\n\\n
\ Table 4: The Transformer generalizes well to English constituency parsing
(Results are on Section 23\\n of WSJ)\\n Parser Training
\ WSJ 23 F1\\nVinyals & Kaiser el al. (2014) [37] WSJ only, discriminative
\ 88.3\\n Petrov et al. (2006) [29] WSJ only, discriminative
\ 90.4\\n Zhu et al. (2013) [40] WSJ only, discriminative
\ 90.4\\n Dyer et al. (2016) [8] WSJ only, discriminative
\ 91.7\\n Transformer (4 layers) WSJ only, discriminative
\ 91.3\\n Zhu et al. (2013) [40] semi-supervised 91.3\\n
\ Huang & Harper (2009) [14] semi-supervised 91.3\\n
\ McClosky et al. (2006) [26] semi-supervised 92.1\\nVinyals
& Kaiser el al. (2014) [37] semi-supervised 92.1\\n Transformer
(4 layers) semi-supervised 92.7\\n Luong et al.
(2015) [23] multi-task 93.0\\n Dyer et al.
(2016) [8] generative 93.3\\n\\n increased the
maximum output length to input length + 300. We used a beam size of 21 and
\u03B1 = 0.3\\n for both WSJ only and the semi-supervised setting.\\n Our
results in Table 4 show that despite the lack of task-specific tuning our
model performs sur-\\n prisingly well, yielding better results than all
previously reported models with the exception of the\\n Recurrent Neural
Network Grammar [8].\\n In contrast to RNN sequence-to-sequence models
[37], the Transformer outperforms the Berkeley-\\n Parser [29] even when
training only on the WSJ training set of 40K sentences.\\n\\n 7 Conclusion\\n\\n
\ In this work, we presented the Transformer, the first sequence transduction
model based entirely on\\n attention, replacing the recurrent layers most
commonly used in encoder-decoder architectures with\\n multi-headed self-attention.\\n
\ For translation tasks, the Transformer can be trained significantly faster
than architectures based\\n on recurrent or convolutional layers. On
both WMT 2014 English-to-German and WMT 2014\\n English-to-French translation
tasks, we achieve a new state of the art. In the former task our best\\n model
outperforms even all previously reported ensembles.\\n We are excited about
the future of attention-based models and plan to apply them to other tasks.
We\\n plan to extend the Transformer to problems involving input and output
modalities other than text and\\n to investigate local, restricted attention
mechanisms to efficiently handle large inputs and outputs\\n such as images,
audio and video. Making generation less sequential is another research goals
of ours.\\n The code we used to train and evaluate our models is available
at https://github.com/\\n tensorflow/tensor2tensor.\\n\\n Acknowledgements
We are grateful to Nal Kalchbrenner and Stephan Gouws for their fruitful\\n
\ comments, corrections and inspiration.\\n\\n References\\n [1]
\ Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization.
arXiv preprint\\n arXiv:1607.06450, 2016.\\n [2] Dzmitry Bahdanau,
Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly\\n
\ learning to align and translate. CoRR, abs/1409.0473, 2014.\\n [3]
\ Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc V. Le. Massive exploration
of neural\\n machine translation architectures. CoRR, abs/1703.03906,
2017.\\n [4] Jianpeng Cheng, Li Dong, and Mirella Lapata. Long short-term
memory-networks for machine\\n reading. arXiv preprint arXiv:1601.06733,
2016.\\n\\n 10\\n\\n---\\n\\n
[5] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares,
Holger Schwenk,\\n and Yoshua Bengio. Learning phrase representations
using rnn encoder-decoder for statistical\\n machine translation. CoRR,
abs/1406.1078, 2014.\\n [6] Francois Chollet. Xception: Deep learning with
depthwise separable convolutions. arXiv\\n preprint arXiv:1610.02357,
2016.\\n [7] Junyoung Chung, \xC7aglar G\xFCl\xE7ehre, Kyunghyun Cho, and
Yoshua Bengio. Empirical evaluation\\n of gated recurrent neural networks
on sequence modeling. CoRR, abs/1412.3555, 2014.\\n [8] Chris Dyer, Adhiguna
Kuncoro, Miguel Ballesteros, and Noah A. Smith. Recurrent neural\\n network
grammars. In Proc. of NAACL, 2016.\\n [9] Jonas Gehring, Michael Auli, David
Grangier, Denis Yarats, and Yann N. Dauphin. Convolu-\\n tional sequence
to sequence learning. arXiv preprint arXiv:1705.03122v2, 2017.\\n[10] Alex
Graves. Generating sequences with recurrent neural networks. arXiv
preprint\\n arXiv:1308.0850, 2013.\\n[11] Kaiming He, Xiangyu Zhang,
Shaoqing Ren, and Jian Sun. Deep residual learning for im-\\n age recognition.
\ In Proceedings of the IEEE Conference on Computer Vision and Pattern\\n
\ Recognition, pages 770\u2013778, 2016.\\n[12] Sepp Hochreiter, Yoshua
Bengio, Paolo Frasconi, and J\xFCrgen Schmidhuber. Gradient flow in\\n recurrent
nets: the difficulty of learning long-term dependencies, 2001.\\n[13] Sepp
Hochreiter and J\xFCrgen Schmidhuber. Long short-term memory. Neural
computation,\\n 9(8):1735\u20131780, 1997.\\n[14] Zhongqiang Huang and
Mary Harper. Self-training PCFG grammars with latent annotations\\n across
languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural\\n
\ Language Processing, pages 832\u2013841. ACL, August 2009.\\n[15] Rafal
Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring\\n
\ the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.\\n[16]
\ \u0141ukasz Kaiser and Samy Bengio. Can active memory replace attention?
In Advances in Neural\\n Information Processing Systems, (NIPS), 2016.\\n[17]
\ \u0141ukasz Kaiser and Ilya Sutskever. Neural GPUs learn algorithms. In
International Conference\\n on Learning Representations (ICLR), 2016.\\n[18]
\ Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex
Graves, and Ko-\\n ray Kavukcuoglu. Neural machine translation in linear
time. arXiv preprint arXiv:1610.10099v2,\\n 2017.\\n[19] Yoon Kim, Carl
Denton, Luong Hoang, and Alexander M. Rush. Structured attention networks.\\n
\ In International Conference on Learning Representations, 2017.\\n[20]
\ Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
In ICLR, 2015.\\n[21] Oleksii Kuchaiev and Boris Ginsburg. Factorization
tricks for LSTM networks. arXiv preprint\\n arXiv:1703.10722, 2017.\\n[22]
\ Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang,
Bowen\\n Zhou, and Yoshua Bengio. A structured self-attentive sentence
embedding. arXiv preprint\\n arXiv:1703.03130, 2017.\\n[23] Minh-Thang
Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. Multi-task\\n
\ sequence to sequence learning. arXiv preprint arXiv:1511.06114, 2015.\\n[24]
\ Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches
to attention-\\n based neural machine translation. arXiv preprint arXiv:1508.04025,
2015.\\n\\n 11\\n\\n---\\n\\n[25] Mitchell
P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large
annotated\\n corpus of english: The penn treebank. Computational linguistics,
19(2):313\u2013330, 1993.\\n\\n[26] David McClosky, Eugene Charniak, and
Mark Johnson. Effective self-training for parsing. In\\n Proceedings
of the Human Language Technology Conference of the NAACL, Main Conference,\\n
\ pages 152\u2013159. ACL, June 2006.\\n\\n[27] Ankur Parikh, Oscar T\xE4ckstr\xF6m,
Dipanjan Das, and Jakob Uszkoreit. A decomposable attention\\n model.
In Empirical Methods in Natural Language Processing, 2016.\\n\\n[28] Romain
Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive\\n
\ summarization. arXiv preprint arXiv:1705.04304, 2017.\\n\\n[29] Slav
Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact,\\n
\ and interpretable tree annotation. In Proceedings of the 21st International
Conference on\\n Computational Linguistics and 44th Annual Meeting of
the ACL, pages 433\u2013440. ACL, July\\n 2006.\\n\\n[30] Ofir Press
and Lior Wolf. Using the output embedding to improve language models. arXiv\\n
\ preprint arXiv:1608.05859, 2016.\\n\\n[31] Rico Sennrich, Barry Haddow,
and Alexandra Birch. Neural machine translation of rare words\\n with
subword units. arXiv preprint arXiv:1508.07909, 2015.\\n\\n[32] Noam Shazeer,
Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton,\\n
\ and Jeff Dean. Outrageously large neural networks: The sparsely-gated
mixture-of-experts\\n layer. arXiv preprint arXiv:1701.06538, 2017.\\n\\n[33]
\ Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and
Ruslan Salakhutdi-\\n nov. Dropout: a simple way to prevent neural networks
from overfitting. Journal of Machine\\n Learning Research, 15(1):1929\u20131958,
2014.\\n\\n[34] Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob
Fergus. End-to-end memory\\n networks. In C. Cortes, N. D. Lawrence,
D. D. Lee, M. Sugiyama, and R. Garnett, editors,\\n Advances in Neural
Information Processing Systems 28, pages 2440\u20132448. Curran Associates,\\n
\ Inc., 2015.\\n\\n[35] Ilya Sutskever, Oriol Vinyals, and Quoc VV Le.
Sequence to sequence learning with neural\\n networks. In Advances in
Neural Information Processing Systems, pages 3104\u20133112, 2014.\\n\\n[36]
\ Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and
Zbigniew Wojna.\\n Rethinking the inception architecture for computer
vision. CoRR, abs/1512.00567, 2015.\\n\\n[37] Vinyals & Kaiser, Koo, Petrov,
Sutskever, and Hinton. Grammar as a foreign language. In\\n Advances
in Neural Information Processing Systems, 2015.\\n\\n[38] Yonghui Wu, Mike
Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang\\n Macherey,
Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google\u2019s neural
machine\\n translation system: Bridging the gap between human and machine
translation. arXiv preprint\\n arXiv:1609.08144, 2016.\\n\\n[39] Jie
Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. Deep recurrent models
with\\n fast-forward connections for neural machine translation. CoRR,
abs/1606.04199, 2016.\\n\\n[40] Muhua Zhu, Yue Zhang, Wenliang Chen, Min
Zhang, and Jingbo Zhu. Fast and accurate\\n shift-reduce constituent
parsing. In Proceedings of the 51st Annual Meeting of the ACL (Volume\\n 1:
Long Papers), pages 434\u2013443. ACL, August 2013.\\n\\n 12\\n\\n---\\n\\nInput-Input
Layer5\\nAttention Visualizations\\n\\ngovernments registration\\nAmerican
\ process\\nmajority passed making difficult<EOS> <pad>\\n
\ <pad> <pad>\\n <pad>
<pad>\\n since voting more <pad>\\nspirit have
\ laws 2009\\nIt is in this thata of new the or .\\n\\nIt is
\ this thata of the or . <EOS>\\n difficult\\n
\ in spirit more <pad>\\n
\ since process <pad>\\n laws
\ <pad> <pad> <pad>\\n have
new voting <pad>\\n American\\n
\ majority passed 2009\\n making\\n
\ governments registration\\n\\nFigure
3: An example of the attention mechanism following long-distance dependencies
in the\\nencoder self-attention in layer 5 of 6. Many of the attention heads
attend to a distant dependency of\\nthe verb \u2018making\u2019, completing
the phrase \u2018making...more difficult\u2019. Attentions here shown only
for\\nthe word \u2018making\u2019. Different colors represent different heads.
Best viewed in color.\\n\\n13\\n\\n---\\n\\nInput-Input Layer5\\n\\napplication\\n
\ missing <EOS>\\nLaw never perfect should what opinion
\ <pad>\\nThe will be , butits be just- thisis we are , in my
\ .\\n\\nInput-Input Layer5 . <EOS>\\nThe , its - this ,\\n
\ be perfectbut be just is what are in my\\nLaw never should
\ we missing <pad>\\n will application opinion\\n\\napplication\\n
\ missing <EOS>\\nLaw never perfect should what opinion
\ <pad>\\nThe will be , but its be just- thisis we are , in my
\ .\\n\\nThe , its - this , . <EOS>\\n be perfectbut
\ be just is what are in my\\nLaw never should we
\ missing <pad>\\n will application opinion\\n\\nFigure
4: Two attention heads, also in layer 5 of 6, apparently involved in anaphora
resolution. Top:\\nFull attentions for head 5. Bottom: Isolated attentions
from just the word \u2018its\u2019 for attention heads 5\\nand 6. Note that
the attentions are very sharp for this word.\\n\\n14\\n\\n---\\n\\nInput-Input
Layer5\\n\\napplication\\n missing <EOS>\\nLaw never perfect
\ should what opinion <pad>\\nThe will be , but its be just-
\ thisis we are , in my .\\n\\nThe , its - this , . <EOS>\\n
\ be perfectbut be just is what are in my\\nLaw never should
\ we missing <pad>\\n will application opinion\\nInput-Input
Layer5\\n\\napplication\\n missing <EOS>\\nLaw never perfect
\ should what opinion <pad>\\nThe will be , but its be just-
\ thisis we are , in my .\\n\\nThe , its - this , . <EOS>\\n
\ be perfectbut be just is what are in my\\nLaw never should
\ we missing <pad>\\n will application opinion\\n\\nFigure
5: Many of the attention heads exhibit behaviour that seems related to the
structure of the\\nsentence. We give two such examples above, from two different
heads from the encoder self-attention\\nat layer 5 of 6. The heads clearly
learned to perform different tasks.\\n\\n15\",\"job_metadata\":{\"credits_used\":0,\"job_credits_usage\":0,\"job_pages\":0,\"job_auto_mode_triggered_pages\":0,\"job_is_cache_hit\":true}}"
headers:
Connection:
- keep-alive
Content-Length:
- '47254'
Content-Type:
- application/json
Date:
- Sat, 17 Jan 2026 11:05:27 GMT
Strict-Transport-Security:
- max-age=31536000; includeSubDomains
x-correlation-id:
- 0697e9d8-7d1f-4f67-96a7-01a19b6f5d46
x-session-id:
- bfb1df66-56af-47c2-be0f-ce3e346f3211
status:
code: 200
message: OK
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the test file that uses this cassette and VCR configuration
echo "=== Looking for test file using this cassette ==="
find . -name "*test_llamaparse*.py" -type f

echo -e "\n=== Searching for VCR matcher configuration ==="
rg -A 5 -B 5 "test_llamaparse_load_data_instrumentation" --type=py

echo -e "\n=== Checking for VCR/pytest-recording config ==="
rg "record_mode|match_on|vcr|cassette" --type=py packages/opentelemetry-instrumentation-llamaindex/tests | head -30

Repository: traceloop/openllmetry

Length of output: 2875


🏁 Script executed:

#!/bin/bash
# Check the test file structure
echo "=== Checking for test file ==="
test_file=$(find . -name "*test_llamaparse*.py" -type f)
if [ -n "$test_file" ]; then
  echo "Found: $test_file"
  echo "=== First 100 lines of test file ==="
  head -100 "$test_file"
fi

Repository: traceloop/openllmetry

Length of output: 3087


🏁 Script executed:

#!/bin/bash
# Search for API key/secret patterns in cassette file
cassette_file="packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml"

echo "=== Checking cassette for potential secrets ==="
# Look for common secret patterns
grep -i "api.key\|api_key\|authorization\|bearer\|token\|password\|secret" "$cassette_file" || echo "No obvious secrets found in headers"

echo -e "\n=== Checking response body for sensitive data ==="
# Look at the response body more carefully
grep -o "Authorization\|X-API-Key\|x-api-key" "$cassette_file" || echo "No auth headers detected"

echo -e "\n=== Checking what UUIDs/IDs are present ==="
grep -o "id\": \"[^\"]*\"" "$cassette_file" | head -10

Repository: traceloop/openllmetry

Length of output: 1632


🏁 Script executed:

#!/bin/bash
# Check the vcr_config in conftest.py for matcher settings
echo "=== VCR Configuration in conftest.py ==="
grep -A 20 "def vcr_config" packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py

Repository: traceloop/openllmetry

Length of output: 745


🏁 Script executed:

#!/bin/bash
# Verify if session IDs and correlation IDs should be scrubbed
echo "=== Checking if session/correlation IDs are consistent or should be scrubbed ==="
grep -o "x-session-id:[^}]*\|x-correlation-id:[^}]*" packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml | sort | uniq -c

Repository: traceloop/openllmetry

Length of output: 193


🏁 Script executed:

#!/bin/bash
# Check if the default VCR matcher settings and what it matches on
echo "=== Verifying pytest-vcr default matching behavior ==="
rg "match_on|record_mode" packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py

echo -e "\n=== Check if pytest-vcr is being used and its defaults ==="
grep -r "pytest.mark.vcr\|pytest_vcr\|vcr_config" packages/opentelemetry-instrumentation-llamaindex/tests/*.py | head -5

Repository: traceloop/openllmetry

Length of output: 617


🏁 Script executed:

#!/bin/bash
# Verify the actual request body to see if parsing_instruction presence matters
echo "=== Checking if body variations could cause matcher issues ==="
python3 << 'EOF'
# pytest-vcr with default config matches on:
# - method, scheme, host, port, path, query
# - NOT on request body or response headers
# - This means body changes (removing parsing_instruction) should NOT cause mismatches

print("Default VCR matchers (per pytest-vcr):")
print("  - method")
print("  - scheme")
print("  - host")
print("  - port")
print("  - path")
print("  - query")
print("\nDoes NOT match on: request body, response headers")
print("\nConclusion: The cassette body/header changes should NOT break VCR matching")
EOF

Repository: traceloop/openllmetry

Length of output: 352


Scrub session and correlation ID headers from the cassette using VCR filters.

The cassette contains x-session-id and x-correlation-id response headers that should be filtered to prevent potentially sensitive session data from being stored. Update vcr_config() in conftest.py to add these headers to the filter list.

Recommended change:
def vcr_config():
    return {
        "filter_headers": ["authorization", "api-key", "x-session-id", "x-correlation-id"],
        "ignore_hosts": ["raw.githubusercontent.com"],
    }
🤖 Prompt for AI Agents
In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml`
around lines 797 - 1545, The VCR cassette contains sensitive x-session-id and
x-correlation-id response headers that must be scrubbed; update the vcr_config()
function (in conftest.py) so its returned "filter_headers" list includes
"x-session-id" and "x-correlation-id" in addition to existing entries like
"authorization" and "api-key", preserving the other keys (e.g., "ignore_hosts")
in the returned config.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 17, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/dispatcher_wrapper.py`:
- Around line 232-244: The code unconditionally logs tool arguments for
call_tool spans (in the call_tool branch using ev, WorkflowToolCall,
span.set_attribute("tool.arguments", json.dumps(...), JSONEncoder)), which can
expose PII; update this block to consult the package's masking/suppression
config or utility (e.g., the existing masking flag/function used elsewhere)
before serializing ev.tool_kwargs—only call json.dumps and set the
"tool.arguments" attribute when the config permits, otherwise either omit the
attribute or set a redacted placeholder; ensure you reference the same ev,
WorkflowToolCall and span.set_attribute symbols so the gating logic is applied
exactly in this call_tool handling path.

In
`@packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py`:
- Around line 82-147: The token coercion and choices handling can raise
TypeError or misreport counts: when converting billed.input_tokens /
billed.output_tokens (variable billed) to int, guard against None (only int(...)
if value is not None) and for the dict-path use
.get("input_tokens")/.get("output_tokens") without defaulting to 0 so missing
keys remain None; only set span attributes after confirming non-None values.
Also extend the finish-reason handling around the choices variable to accept
dict-style choices (e.g., handle choices[0].get("finish_reason") when choices is
a list of dicts) before setting the span attribute via span.set_attribute with
GenAIAttributes or SpanAttributes.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (1)

65-118: Reduce brittleness of prompt-index assertions.
The test currently assumes prompt indices .0 and .1; llama-index prompt ordering can shift. Consider asserting that any prompt content matches the user question instead of fixed indices.

♻️ Suggested adjustment
-    assert f"{GenAIAttributes.GEN_AI_PROMPT}.0.content" in llm_span_1.attributes
-    assert f"{GenAIAttributes.GEN_AI_PROMPT}.1.content" in llm_span_1.attributes
-    assert llm_span_1.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.1.content"] == (
-        "What is 2 times 3?"
-    )
+    prompt_contents = [
+        v
+        for k, v in llm_span_1.attributes.items()
+        if k.startswith(f"{GenAIAttributes.GEN_AI_PROMPT}.") and k.endswith(".content")
+    ]
+    assert "What is 2 times 3?" in prompt_contents

Comment on lines +232 to +244
# Extract tool information for call_tool spans (workflow-based agents)
if method_name == "call_tool":
try:
# The 'ev' argument is a WorkflowToolCall event
ev = bound_args.arguments.get("ev")
if ev and isinstance(ev, WorkflowToolCall):
span.set_attribute("tool.name", ev.tool_name)
span.set_attribute(
"tool.arguments",
json.dumps(ev.tool_kwargs, cls=JSONEncoder)
)
except Exception:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Gate tool arguments behind a configuration mechanism to protect PII.

ev.tool_kwargs contains tool function arguments, which can include user input and sensitive data. Currently, tool.arguments is always recorded as JSON, regardless of any prompt suppression settings. This poses a privacy risk when the instrumentation is configured to limit sensitive data recording.

Implement conditional gating (similar to the existing masking configuration in the package) to respect suppression semantics when recording tool arguments. This could either reference a configuration flag or a utility function that determines whether to record such sensitive attributes.

Example approach
         if method_name == "call_tool":
             try:
                 # The 'ev' argument is a WorkflowToolCall event
                 ev = bound_args.arguments.get("ev")
                 if ev and isinstance(ev, WorkflowToolCall):
                     span.set_attribute("tool.name", ev.tool_name)
-                    span.set_attribute(
-                        "tool.arguments",
-                        json.dumps(ev.tool_kwargs, cls=JSONEncoder)
-                    )
+                    if should_record_sensitive_data():
+                        span.set_attribute(
+                            "tool.arguments",
+                            json.dumps(ev.tool_kwargs, cls=JSONEncoder),
+                        )
             except Exception:
                 pass
🤖 Prompt for AI Agents
In
`@packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/dispatcher_wrapper.py`
around lines 232 - 244, The code unconditionally logs tool arguments for
call_tool spans (in the call_tool branch using ev, WorkflowToolCall,
span.set_attribute("tool.arguments", json.dumps(...), JSONEncoder)), which can
expose PII; update this block to consult the package's masking/suppression
config or utility (e.g., the existing masking flag/function used elsewhere)
before serializing ev.tool_kwargs—only call json.dumps and set the
"tool.arguments" attribute when the config permits, otherwise either omit the
attribute or set a redacted placeholder; ensure you reference the same ev,
WorkflowToolCall and span.set_attribute symbols so the gating logic is applied
exactly in this call_tool handling path.

Comment on lines +82 to +147
# Get model name - handle both dict and object formats
model = None
if hasattr(raw, "model"):
model = raw.model
elif isinstance(raw, dict) and "model" in raw:
model = raw.get("model")
if model:
span.set_attribute(GenAIAttributes.GEN_AI_RESPONSE_MODEL, model)

# Handle token usage - support multiple formats
input_tokens = None
output_tokens = None
total_tokens = None

# Try OpenAI format first: raw.usage with completion_tokens, prompt_tokens
usage = getattr(raw, "usage", None) or (raw.get("usage") if isinstance(raw, dict) else None)
if usage:
if hasattr(usage, "completion_tokens"):
output_tokens = usage.completion_tokens
input_tokens = usage.prompt_tokens
total_tokens = usage.total_tokens
elif isinstance(usage, dict):
output_tokens = usage.get("completion_tokens")
input_tokens = usage.get("prompt_tokens")
total_tokens = usage.get("total_tokens")

# Try Cohere format: raw.meta.tokens or raw.meta.billed_units
if input_tokens is None or output_tokens is None:
meta = getattr(raw, "meta", None) or (raw.get("meta") if isinstance(raw, dict) else None)
if meta:
# Try meta.tokens first (actual token counts)
tokens = getattr(meta, "tokens", None) or (meta.get("tokens") if isinstance(meta, dict) else None)
if tokens:
if hasattr(tokens, "input_tokens"):
input_tokens = tokens.input_tokens
output_tokens = tokens.output_tokens
elif isinstance(tokens, dict):
input_tokens = tokens.get("input_tokens")
output_tokens = tokens.get("output_tokens")

# Fallback to meta.billed_units if tokens not found
if input_tokens is None or output_tokens is None:
billed = getattr(meta, "billed_units", None) or (
meta.get("billed_units") if isinstance(meta, dict) else None
)
if billed:
if hasattr(billed, "input_tokens"):
input_tokens = int(billed.input_tokens)
output_tokens = int(billed.output_tokens)
elif isinstance(billed, dict):
input_tokens = int(billed.get("input_tokens", 0))
output_tokens = int(billed.get("output_tokens", 0))

# Set token attributes if found
if output_tokens is not None:
span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, int(output_tokens))
if input_tokens is not None:
span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, int(input_tokens))
if total_tokens is not None:
span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, int(total_tokens))
elif input_tokens is not None and output_tokens is not None:
# Calculate total if not provided (e.g., for Cohere)
span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, int(input_tokens) + int(output_tokens))

# Handle finish reason for OpenAI-style responses
choices = getattr(raw, "choices", None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py

Repository: traceloop/openllmetry

Length of output: 173


🏁 Script executed:

sed -n '70,160p' packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py | cat -n

Repository: traceloop/openllmetry

Length of output: 4487


🏁 Script executed:

rg -n "def dont_throw" packages/opentelemetry-instrumentation-llamaindex/

Repository: traceloop/openllmetry

Length of output: 189


🏁 Script executed:

rg -n "@dont_throw" packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py | head -20

Repository: traceloop/openllmetry

Length of output: 204


🏁 Script executed:

sed -n '38,50p' packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py | cat -n

Repository: traceloop/openllmetry

Length of output: 555


🏁 Script executed:

sed -n '38,55p' packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py | cat -n

Repository: traceloop/openllmetry

Length of output: 782


Add None guards for token coercion and support dict-based choices for consistency.

Lines 60–61: int(billed.input_tokens) and int(billed.output_tokens) can raise TypeError if values are None, which the @dont_throw decorator suppresses. Additionally, the dict fallback (lines 63–64) defaults missing keys to 0 instead of treating them as absent, creating asymmetric behavior that can misreport token counts. Lines 78–82 only handle object-style choices[0].finish_reason without supporting dict format, inconsistent with the dict handling established elsewhere in this function.

Suggested fixes
             if input_tokens is None or output_tokens is None:
                 billed = getattr(meta, "billed_units", None) or (
                     meta.get("billed_units") if isinstance(meta, dict) else None
                 )
                 if billed:
                     if hasattr(billed, "input_tokens"):
-                        input_tokens = int(billed.input_tokens)
-                        output_tokens = int(billed.output_tokens)
+                        if billed.input_tokens is not None:
+                            input_tokens = int(billed.input_tokens)
+                        if billed.output_tokens is not None:
+                            output_tokens = int(billed.output_tokens)
                     elif isinstance(billed, dict):
-                        input_tokens = int(billed.get("input_tokens", 0))
-                        output_tokens = int(billed.get("output_tokens", 0))
+                        if "input_tokens" in billed and billed["input_tokens"] is not None:
+                            input_tokens = int(billed["input_tokens"])
+                        if "output_tokens" in billed and billed["output_tokens"] is not None:
+                            output_tokens = int(billed["output_tokens"])

-    choices = getattr(raw, "choices", None)
+    choices = getattr(raw, "choices", None) or (
+        raw.get("choices") if isinstance(raw, dict) else None
+    )
     if choices:
-        span.set_attribute(
-            SpanAttributes.LLM_RESPONSE_FINISH_REASON, choices[0].finish_reason
-        )
+        first = choices[0]
+        finish_reason = (
+            first.get("finish_reason") if isinstance(first, dict) else first.finish_reason
+        )
+        if finish_reason is not None:
+            span.set_attribute(
+                SpanAttributes.LLM_RESPONSE_FINISH_REASON, finish_reason
+            )
🤖 Prompt for AI Agents
In
`@packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py`
around lines 82 - 147, The token coercion and choices handling can raise
TypeError or misreport counts: when converting billed.input_tokens /
billed.output_tokens (variable billed) to int, guard against None (only int(...)
if value is not None) and for the dict-path use
.get("input_tokens")/.get("output_tokens") without defaulting to 0 so missing
keys remain None; only set span attributes after confirming non-None values.
Also extend the finish-reason handling around the choices variable to accept
dict-style choices (e.g., handle choices[0].get("finish_reason") when choices is
a list of dicts) before setting the span attribute via span.set_attribute with
GenAIAttributes or SpanAttributes.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 17, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 17, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@packages/opentelemetry-instrumentation-agno/pyproject.toml`:
- Around line 75-76: The [tool.uv] constraint-dependencies entry with
"urllib3>=2.6.3" is placed in a package pyproject.toml but UV only reads
constraints from the workspace-root; move the constraint-dependencies
declaration (the [tool.uv] table and its constraint-dependencies array
containing "urllib3>=2.6.3") into the repository root pyproject.toml and remove
it from the package-level pyproject.toml so the urllib3 constraint is applied
during dependency resolution.
- Line 44: The project bumped vcrpy to "vcrpy>=8.0.0,<9" but the VCR cassettes
were not regenerated; re-record or verify all cassettes under
packages/opentelemetry-instrumentation-agno/tests/cassettes/ using vcrpy 8.x
(run the test suite or cassette regeneration command with Python ≥3.10 and
urllib3 ≥2.x), update any cassette format or host/httpx-related differences as
needed to ensure tests pass, and commit the regenerated cassette files alongside
the pyproject.toml change.

In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml`:
- Line 12: Revert the Python requirement back to supports-3.9 by changing
requires-python from ">=3.10,<4" to ">=3.9,<4" and add an explicit constraint
for the vulnerable dependency instead of dropping 3.9: add a
constraint-dependencies entry under the [tool.uv] (or the project's constraints
section) that includes "filelock>=3.20.3" (you can also include the suggested
"urllib3>=2.6.3"), so the fixes are applied via dependency pinning rather than
removing support for Python 3.9; update the pyproject.toml accordingly, touching
the requires-python field and adding the constraint-dependencies list.

In `@packages/opentelemetry-instrumentation-cohere/pyproject.toml`:
- Line 12: The pyproject.toml change sets requires-python = ">=3.10,<4" but CI
only tests 3.11; update the CI matrix in the build-packages job of
.github/workflows/ci.yml to include Python 3.10 (add "3.10" to the
python-version/matrix entry used by the build-packages job) so the minimum
supported interpreter is validated, or alternatively run local validation and
note that in CI — reference the requires-python entry in pyproject.toml and the
build-packages job matrix when making the change.

In `@packages/opentelemetry-instrumentation-mistralai/.python-version`:
- Line 1: The PR incorrectly implies CVE-2025-68146 is fixed by moving to Python
3.10; either add an explicit dependency constraint filelock>=3.20.1 to
pyproject.toml (ensuring the lock/update and tests run) to actually remediate
the CVE, or update the commit/PR text to remove the CVE claim and state the real
reason for setting .python-version to "3.10" (clarify compatibility/maintenance
intent); reference the .python-version file, pyproject.toml dependencies, and
the filelock package/CVE-2025-68146 when making the change.

In `@packages/opentelemetry-instrumentation-ollama/pyproject.toml`:
- Around line 74-75: The per-package pyproject.toml contains a [tool.uv]
constraint-dependencies entry which is ignored by uv; remove the [tool.uv]
constraint-dependencies = ["urllib3>=2.6.3"] block from the package
pyproject.toml and add the same setting under [tool.uv] in the repository
workspace root pyproject.toml (or other workspace-level configuration that uv
reads) so the urllib3 constraint is applied; if there is currently no root
pyproject.toml create one and place the [tool.uv] constraint-dependencies entry
there.

In `@packages/opentelemetry-instrumentation-openai-agents/pyproject.toml`:
- Around line 75-76: The [tool.uv] constraint-dependencies entry currently
defined as constraint-dependencies = ["urllib3>=2.6.3"] in package
pyproject.toml must be moved to the workspace root pyproject.toml (or removed
from per-package files) because uv only reads this field at the root; to fix,
add a [tool.uv] section in the root pyproject.toml containing
constraint-dependencies = ["urllib3>=2.6.3"] (or delete the
constraint-dependencies line from each package pyproject.toml) so the urllib3
constraint is actually enforced.

In `@packages/opentelemetry-instrumentation-openai/pyproject.toml`:
- Line 42: Update test infra and VCR cassettes to be compatible with vcrpy 8.x
by running the full test suite and re-recording any failing cassettes;
specifically, after bumping the dependency string "vcrpy>=8.0.0,<9" in
pyproject.toml, run tests locally/CI, recreate VCR cassettes using the project's
test helper (the cassette recording utility used in your repo), and commit the
new cassette files, while also ensuring CI/Python matrix uses Python 3.10+ and
that any httpx/urllib3-related fixtures or adapters in your test helpers are
updated to the vcrpy 8.x API.
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-llamaindex/pyproject.toml (1)

38-44: Significant llama-index ecosystem version bumps.

Multiple test dependencies jumped major versions. The past review comment flagged breaking changes in llama-index 0.14+ (removed checkpointer, sub-workflows, deprecated Workflow methods) and llama-parse 0.6+ (JSON config payload changes), which was marked as addressed in commit 0d1309f.

Ensure the test suite passes with these new versions via uv run pytest packages/opentelemetry-instrumentation-llamaindex/tests/.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-lancedb/pyproject.toml (1)

77-78: Confirm that urllib3>=2.6.3 is validated on PyPI; consider adding an upper bound for defensive versioning.

urllib3>=2.6.3 is available on PyPI (released Jan 7, 2026) and is compatible with transitive dependencies (e.g., vcrpy 8.0.0-8.x). Adding an upper bound <3 is recommended to prevent unexpected major version upgrades when urllib3 3.0 is released.

♻️ Suggested constraint tightening
[tool.uv]
-constraint-dependencies = ["urllib3>=2.6.3"]
+constraint-dependencies = ["urllib3>=2.6.3,<3"]

"pytest-sugar==1.0.0",
"pytest>=8.2.2,<9",
"vcrpy>=6.0.1,<7",
"vcrpy>=8.0.0,<9",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

python3 << 'EOF'
import urllib.request
import json

# Check vcrpy versions on PyPI
url = "https://pypi.org/pypi/vcrpy/json"
try:
    with urllib.request.urlopen(url) as response:
        data = json.loads(response.read().decode())
        
    # Get available versions
    releases = data['releases']
    versions_6x = sorted([v for v in releases.keys() if v.startswith('6.')])
    versions_8x = sorted([v for v in releases.keys() if v.startswith('8.')])
    current_version = data['info']['version']
    
    print(f"Current PyPI version: {current_version}")
    print(f"\nAvailable 6.x versions: {versions_6x[-3:] if versions_6x else 'None'}")
    print(f"Available 8.x versions: {versions_8x[-3:] if versions_8x else 'None'}")
    print(f"All 8.x versions exist: {len(versions_8x) > 0}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

# Find all .yaml or .json cassette files in test directories
find . -path "**/tests/**" -type f \( -name "*.yaml" -o -name "*.json" \) | head -20

Repository: traceloop/openllmetry

Length of output: 2468


🌐 Web query:

vcrpy 8.0 breaking changes migration from 6.x

💡 Result:

Summary checklist to migrate from vcrpy 6.x → 8.x

  1. Upgrade Python to a supported version (v8 drops Python 3.9). Install/use Python >=3.10+. [1]
  2. Ensure urllib3 >= 2 (v8 dropped support for urllib3 < 2). Upgrade dependency before installing vcrpy. [1]
  3. Install vcrpy 8.x: pip install --upgrade "vcrpy>=8.0.0". Test your test-suite. [1]
  4. Test httpx/aiohttp behaviour: v8 rewrote httpx support (now patches httpcore) and fixed several httpx/http proxy issues — run tests that use httpx and any custom httpx transports. [1]
  5. Re-record cassettes if necessary: vcrpy 6.0 warned that some cassettes produced by earlier releases needed re-creating due to binary-format issues. If you see cassette/serialization errors or mismatched responses, delete or re-record affected cassettes. [2]
  6. Optional: take advantage of new v8 features (e.g., drop_unused_requests cassette option) and verify logging/URI/proxy changes do not affect assertions. [1]

References

  • vcrpy v8.0.0 release notes (breaking changes, httpx, urllib3, new options). [1]
  • vcrpy 6.0.0 changelog (note about possibly needing to recreate some cassettes). [2]

🏁 Script executed:

# Check the full pyproject.toml for Python and urllib3 constraints
cat -n ./packages/opentelemetry-instrumentation-agno/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 2331


🏁 Script executed:

# Check git history/diff for cassette changes (last 5 commits)
git log --oneline -10 -- ./packages/opentelemetry-instrumentation-agno/tests/cassettes/

Repository: traceloop/openllmetry

Length of output: 107


🏁 Script executed:

# Inspect a sample cassette file to check for potential secrets/PII
head -50 ./packages/opentelemetry-instrumentation-agno/tests/cassettes/test_agent/test_agent_run_basic.yaml

Repository: traceloop/openllmetry

Length of output: 1677


🏁 Script executed:

# Check if cassettes were modified in this PR by checking git diff stat
git diff --stat -- "./packages/opentelemetry-instrumentation-agno/tests/cassettes/"

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

# Check recent changes to cassettes more broadly
git log --oneline -20 -- "**/tests/cassettes/" | head -10

Repository: traceloop/openllmetry

Length of output: 47


Regenerate all VCR cassettes with vcrpy 8.x before merging.

The major version bump from vcrpy 6.x to 8.x includes breaking changes (httpx rewrite, urllib3 >=2 requirement, potential cassette format changes). While the Python (≥3.10) and urllib3 (≥2.6.3) constraints are properly configured, the cassettes have not been regenerated in this PR. Per best practices, all cassettes in packages/opentelemetry-instrumentation-agno/tests/cassettes/ must be re-recorded or verified to work with vcrpy 8.x.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-agno/pyproject.toml` at line 44, The
project bumped vcrpy to "vcrpy>=8.0.0,<9" but the VCR cassettes were not
regenerated; re-record or verify all cassettes under
packages/opentelemetry-instrumentation-agno/tests/cassettes/ using vcrpy 8.x
(run the test suite or cassette regeneration command with Python ≥3.10 and
urllib3 ≥2.x), update any cassette format or host/httpx-related differences as
needed to ensure tests pass, and commit the regenerated cassette files alongside
the pyproject.toml change.

Comment on lines 75 to 76
[tool.uv]
constraint-dependencies = ["urllib3>=2.6.3"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

urllib3 2.6.3 CVE security advisory

💡 Result:

Summary: urllib3 2.6.3 (released Jan 7, 2026) fixes a high‑severity decompression-bomb/DoS vulnerability tracked as CVE‑2026‑21441. Versions >=1.22 and <2.6.3 are affected; upgrade to 2.6.3 (or later) to mitigate. [1][2][3]

Sources:

  • urllib3 2.6.3 release on PyPI (upload/release metadata). [1]
  • urllib3 GitHub release notes (2.6.3) mentioning the security fix (GHSA-38jv-5279-wg99). [2]
  • CVE summary for CVE-2026-21441 describing affected versions and impact. [3]

🏁 Script executed:

# Check the pyproject.toml file to see full context and dependencies
cat packages/opentelemetry-instrumentation-agno/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 1799


🌐 Web query:

uv package manager constraint-dependencies pyproject.toml documentation

💡 Result:

Summary — what tool.uv.constraint-dependencies does

  • Purpose: global version constraints that restrict what versions uv may choose when resolving your project’s dependencies; adding a package here does NOT cause it to be installed by itself. [1][2]
  • Location: put it under [tool.uv] in the workspace-root pyproject.toml; uv lock / uv sync / uv run only read this field from the workspace root (they ignore member pyproject.toml or uv.toml declarations). [1][2]
  • Syntax: list of requirement strings (e.g. "grpcio<1.65" or "setuptools==60.0.0" for build constraints). [1][2]
  • Interaction with pip/compile: uv will append these pyproject constraints to any --constraint files when compiling/locking. [1]

Minimal example (pyproject.toml)
[tool.uv]
constraint-dependencies = ["grpcio<1.65"]

References

  1. uv reference — Settings (constraint-dependencies / build-constraint-dependencies). [docs.astral.sh]. [1]
  2. uv Workspace/pyproject (ToolUv struct) documentation (fields: constraint_dependencies, build_constraint_dependencies; root-only note). [docs.rs]. [2]

Move [tool.uv] constraint-dependencies to the workspace-root pyproject.toml.

The constraint-dependencies field for urllib3>=2.6.3 is placed in a package-specific pyproject.toml, but per uv documentation, uv only reads constraint-dependencies from the workspace-root pyproject.toml and explicitly ignores declarations in member packages. This means the constraint won't be applied during dependency resolution.

While the CVE remediation goal is valid (urllib3 2.6.3, released Jan 7, 2026, fixes CVE-2026-21441), move the [tool.uv] section to the repository's root pyproject.toml for the constraint to take effect.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-agno/pyproject.toml` around lines 75 -
76, The [tool.uv] constraint-dependencies entry with "urllib3>=2.6.3" is placed
in a package pyproject.toml but UV only reads constraints from the
workspace-root; move the constraint-dependencies declaration (the [tool.uv]
table and its constraint-dependencies array containing "urllib3>=2.6.3") into
the repository root pyproject.toml and remove it from the package-level
pyproject.toml so the urllib3 constraint is applied during dependency
resolution.

license = "Apache-2.0"
readme = "README.md"
requires-python = ">=3.9,<4"
requires-python = ">=3.10,<4"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

filelock CVE Python 3.9 2024 2025

💡 Result:

Short answer: Yes — the filelock Python package had TOCTOU/symlink CVEs in 2025 (CVE-2025-68146) and 2026 (CVE-2026-22701). Both are issues in filelock itself (not in CPython) and can affect projects running on Python 3.9 if they use vulnerable filelock versions. Fixes: upgrade to filelock >= 3.20.1 (CVE-2025-68146) and >= 3.20.3 (CVE-2026-22701). Sources: NVD and advisory pages. [1][2][3]

References


🏁 Script executed:

cat -n packages/opentelemetry-instrumentation-anthropic/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 2384


🏁 Script executed:

rg -i "filelock" packages/opentelemetry-instrumentation-anthropic/

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

# Search for filelock as a transitive dependency by checking opentelemetry-instrumentation
git ls-files | grep -E "(pyproject\.toml|requirements)" | head -20

Repository: traceloop/openllmetry

Length of output: 1319


🏁 Script executed:

# Check if there are other changes in the same PR that might explain Python 3.10
git log --oneline --all | head -20

Repository: traceloop/openllmetry

Length of output: 1169


🏁 Script executed:

# Check the parent package to see if it requires Python 3.10
rg "requires-python" packages/opentelemetry-instrumentation/pyproject.toml 2>/dev/null || echo "Not found"

Repository: traceloop/openllmetry

Length of output: 74


🌐 Web query:

CVE filelock Python 3.9 specific vulnerability

💡 Result:

Short answer: There is no CVE that targets Python 3.9 itself — the known filelock CVEs are vulnerabilities in the filelock library (not the Python interpreter) and affect certain filelock versions regardless of which Python interpreter version you run. See below.

Details:

  • CVE-2025-68146 — TOCTOU race in Unix/Windows lock-file creation; fixed in filelock 3.20.1. Affects filelock users on all platforms; not specific to Python 3.9. [1][2]
  • CVE-2026-22701 — TOCTOU race in SoftFileLock; fixed in filelock 3.20.3. Also a filelock-version issue, not a Python-3.9-specific interpreter bug. [2][3]

Note: some Linux distro package names/versions (e.g., “python-filelock 3.9.0-1”) refer to the filelock package version, not the Python interpreter version; those packaged versions may be listed as “vulnerable” if they contain an affected filelock release. [3]

Recommended action: upgrade filelock to >=3.20.3 (or >=3.20.1 to address CVE-2025-68146) or apply the vendor mitigations in the advisories. [1][2]

Sources:
[1] GitLab Advisory / filelock CVE-2025-68146.
[2] NVD / CVE-2026-22701 and AWS/Explore pages (fixes in 3.20.3 / 3.20.1).
[3] Debian security tracker showing packaged filelock versions and vulnerability status.


Dropping Python 3.9 is an unnecessary breaking change.

The filelock CVEs (CVE-2025-68146, CVE-2026-22701) are vulnerabilities in the filelock package itself, not Python-version specific. They affect all Python versions equally and can be fixed by pinning filelock >= 3.20.3. Dropping Python 3.9 support is not required to remediate these CVEs.

Instead, add an explicit filelock constraint to the project dependencies or [tool.uv] constraint-dependencies section:

constraint-dependencies = ["urllib3>=2.6.3", "filelock>=3.20.3"]

Then revert requires-python to ">=3.9,<4" to preserve backward compatibility.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml` at line 12,
Revert the Python requirement back to supports-3.9 by changing requires-python
from ">=3.10,<4" to ">=3.9,<4" and add an explicit constraint for the vulnerable
dependency instead of dropping 3.9: add a constraint-dependencies entry under
the [tool.uv] (or the project's constraints section) that includes
"filelock>=3.20.3" (you can also include the suggested "urllib3>=2.6.3"), so the
fixes are applied via dependency pinning rather than removing support for Python
3.9; update the pyproject.toml accordingly, touching the requires-python field
and adding the constraint-dependencies list.

license = "Apache-2.0"
readme = "README.md"
requires-python = ">=3.9,<4"
requires-python = ">=3.10,<4"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 1. Find and display the pyproject.toml file content
echo "=== Cohere instrumentation pyproject.toml ==="
cat packages/opentelemetry-instrumentation-cohere/pyproject.toml

echo -e "\n=== Searching for Python version pins across repo ==="
rg -n -C2 '(python-version|PYTHON_VERSION|3\.9|3\.10|requires-python)' --type-list | head -50

echo -e "\n=== CI workflow files ==="
fd -e yml -e yaml . .github 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 2507


🏁 Script executed:

#!/bin/bash
# Check root pyproject.toml and parent package dependencies
echo "=== Root pyproject.toml (if exists) ==="
if [ -f "pyproject.toml" ]; then
  cat pyproject.toml | grep -A5 -B5 "requires-python\|python"
fi

echo -e "\n=== Checking sibling packages Python requirements ==="
fd -e pyproject.toml "packages/.*pyproject.toml" | while read f; do
  echo "--- $f ---"
  grep "requires-python" "$f"
done

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

#!/bin/bash
# Check CI workflow for Python version matrix
echo "=== GitHub Actions CI workflow ==="
cat .github/workflows/ci.yml | head -100

echo -e "\n=== Search for 3.9 references ==="
rg "3\.9" --no-heading -n 2>/dev/null || echo "No 3.9 references found"

echo -e "\n=== Search for python-version references ==="
rg "python-version|python.version" --no-heading -n 2>/dev/null || echo "No version pins found"

Repository: traceloop/openllmetry

Length of output: 50379


🏁 Script executed:

#!/bin/bash
# Check Python requirements in other packages
echo "=== All package requires-python statements ==="
fd -e pyproject.toml "packages/" | xargs grep "requires-python" 2>/dev/null

echo -e "\n=== Check for any tooling that pins older Python ==="
rg "3\.9" --type py --type toml --max-count=10 2>/dev/null || echo "No 3.9 found in code"

Repository: traceloop/openllmetry

Length of output: 1299


Add Python 3.10 to CI test matrix to validate the minimum version floor.

The CI workflow currently only tests with Python 3.11 (see build-packages job matrix in .github/workflows/ci.yml). Setting requires-python = ">=3.10,<4" without testing against 3.10 creates a gap where the minimum version constraint is untested. Either add 3.10 to the matrix or verify that tooling and dependencies have been validated against 3.10 locally before merging.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-cohere/pyproject.toml` at line 12, The
pyproject.toml change sets requires-python = ">=3.10,<4" but CI only tests 3.11;
update the CI matrix in the build-packages job of .github/workflows/ci.yml to
include Python 3.10 (add "3.10" to the python-version/matrix entry used by the
build-packages job) so the minimum supported interpreter is validated, or
alternatively run local validation and note that in CI — reference the
requires-python entry in pyproject.toml and the build-packages job matrix when
making the change.

@@ -1 +1 @@
3.9.5
3.10
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check pyproject.toml in mistralai instrumentation package
echo "=== pyproject.toml in mistralai instrumentation ==="
find packages/opentelemetry-instrumentation-mistralai -name "pyproject.toml" -type f | head -1 | xargs cat 2>/dev/null || echo "File not found"

echo ""
echo "=== All .python-version files in packages directory ==="
find packages -name ".python-version" -type f | sort | xargs -I {} sh -c 'echo "=== {} ==="; cat {}'

echo ""
echo "=== Check if setup.py or setup.cfg exist in mistralai package ==="
find packages/opentelemetry-instrumentation-mistralai -type f \( -name "setup.py" -o -name "setup.cfg" \) | xargs ls -la

Repository: traceloop/openllmetry

Length of output: 5669


🌐 Web query:

filelock CVE Python 3.10 security vulnerability

💡 Result:

Short answer: A TOCTOU (time-of-check/time-of-use) race condition in the filelock Python library (fixed in filelock 3.20.1) allows a local attacker who can create symlinks to cause lock-file creation to follow a symlink and truncate or corrupt arbitrary files. Affects filelock versions prior to 3.20.1 on Unix/Linux/macOS and Windows (requires ability to create symlinks). Upgrade to filelock >= 3.20.1; if you cannot upgrade immediately, restrict lock-file directories (chmod 0700), avoid predictable lock paths, or use SoftFileLock as a partial mitigation. [1][2][3]

Sources:

  • GitLab advisory (CVE-2025-68146) — filelock TOCTOU/symlink attack. [1]
  • NVD entry for CVE-2025-68146 (description, affected versions, fix 3.20.1). [2]
  • AWS/ALAS and distro trackers summarizing impact and mitigations. [3]

🏁 Script executed:

#!/bin/bash
# Search for filelock constraints in pyproject.toml and related files
echo "=== Searching for filelock in mistralai package files ==="
find packages/opentelemetry-instrumentation-mistralai -type f \( -name "*.toml" -o -name "*.txt" -o -name "*.cfg" \) | xargs grep -l filelock 2>/dev/null || echo "No direct filelock references found"

echo ""
echo "=== Check for uv.lock or requirements files with filelock ==="
find packages -maxdepth 1 -name "uv.lock" -o -name "requirements*.txt" 2>/dev/null | xargs grep filelock 2>/dev/null || echo "No filelock version found in lock files"

echo ""
echo "=== Check root level uv.lock or similar ==="
find . -maxdepth 2 -name "uv.lock" -type f | head -1 | xargs grep filelock 2>/dev/null | head -5 || echo "Checking failed or no filelock entry"

echo ""
echo "=== Check if there's a constraint on urllib3 or other packages ==="
grep -r "urllib3" packages/opentelemetry-instrumentation-mistralai/ 2>/dev/null || echo "No urllib3 constraints found in mistralai package"

Repository: traceloop/openllmetry

Length of output: 600


Clarify CVE remediation or add filelock version constraint.

The configuration is consistent (.python-version and pyproject.toml both require Python ≥3.10) and mistralai is compatible with Python 3.10. However, the commit message's claim that Python 3.10 resolves the filelock CVE is unsupported: CVE-2025-68146 is fixed only by upgrading filelock to ≥3.20.1, not by the Python version itself. No filelock version constraint was added in this PR.

Either add an explicit filelock>=3.20.1 constraint to the dependencies if CVE remediation is the goal, or clarify the actual reason for the Python 3.10 upgrade if the CVE claim was inaccurate.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-mistralai/.python-version` at line 1,
The PR incorrectly implies CVE-2025-68146 is fixed by moving to Python 3.10;
either add an explicit dependency constraint filelock>=3.20.1 to pyproject.toml
(ensuring the lock/update and tests run) to actually remediate the CVE, or
update the commit/PR text to remove the CVE claim and state the real reason for
setting .python-version to "3.10" (clarify compatibility/maintenance intent);
reference the .python-version file, pyproject.toml dependencies, and the
filelock package/CVE-2025-68146 when making the change.

Comment on lines 74 to 75
[tool.uv]
constraint-dependencies = ["urllib3>=2.6.3"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

uv constraint-dependencies tool.uv pyproject.toml

💡 Result:

Use the tool.uv setting constraint-dependencies in pyproject.toml to add version constraints that uv will apply when resolving your project's runtime dependencies. Constraints restrict which versions can be selected but do not by themselves cause a package to be installed (the package must be requested by a direct or transitive dependency).

Key points

  • Setting: [tool.uv].constraint-dependencies is a list of requirement strings (e.g. "grpcio<1.65"). [1][2]
  • It is only read from the workspace root pyproject.toml for uv lock / uv sync / uv run (uv ignores per-package declarations or uv.toml). [1][2]
  • Use it when you need to force or limit versions of transitive dependencies without adding them as direct dependencies.

Example pyproject.toml snippet
[tool.uv]
constraint-dependencies = ["grpcio<1.65"]

References
[1] uv settings — constraint-dependencies (official docs).
[2] uv Workspace pyproject ToolUv fields (docs.rs).


🏁 Script executed:

# Check uv version requirement and urllib3 dependencies in the project
cd packages/opentelemetry-instrumentation-ollama
cat pyproject.toml | head -100

Repository: traceloop/openllmetry

Length of output: 1760


🏁 Script executed:

# Search for urllib3 references across the project to identify potential conflicts
rg "urllib3" --type py --type toml -A 2 -B 2

Repository: traceloop/openllmetry

Length of output: 13976


🏁 Script executed:

# Check if there's a workspace root pyproject.toml
ls -la pyproject.toml 2>/dev/null && echo "=== Root pyproject.toml found ===" || echo "No root pyproject.toml"

# Check if this is configured as a workspace
cat pyproject.toml | grep -A 10 "\[tool.uv\]" 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

# Verify workspace member structure
git ls-files | grep "pyproject.toml" | head -5

Repository: traceloop/openllmetry

Length of output: 377


🏁 Script executed:

# Check for uv workspace configuration (uv.lock or other indicators)
git ls-files | grep -E "(uv\.lock|pyproject\.toml$)" | head -1
ls -la uv.lock 2>/dev/null || echo "No uv.lock found"

Repository: traceloop/openllmetry

Length of output: 140


Remove or relocate [tool.uv] constraint-dependencies to workspace root.

The constraint-dependencies setting in lines 74–75 has no effect in per-package pyproject.toml files. According to uv documentation, [tool.uv] is only read from the workspace root pyproject.toml; per-package declarations are ignored. Since this repository has no root pyproject.toml, move the constraint to a workspace-level configuration file for it to take effect.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-ollama/pyproject.toml` around lines 74
- 75, The per-package pyproject.toml contains a [tool.uv]
constraint-dependencies entry which is ignored by uv; remove the [tool.uv]
constraint-dependencies = ["urllib3>=2.6.3"] block from the package
pyproject.toml and add the same setting under [tool.uv] in the repository
workspace root pyproject.toml (or other workspace-level configuration that uv
reads) so the urllib3 constraint is applied; if there is currently no root
pyproject.toml create one and place the [tool.uv] constraint-dependencies entry
there.

Comment on lines 75 to 76
[tool.uv]
constraint-dependencies = ["urllib3>=2.6.3"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

uv constraint-dependencies pyproject.toml documentation

💡 Result:

Do you mean the "uv" tool/package (which project?), or are you referring to "uvicorn", "poetry", or another build/packaging tool? I need which project/URL to search so I can find the exact pyproject.toml "constraint-dependencies" documentation.


🏁 Script executed:

# Find CI configuration files
find . -type f \( -name "*.yml" -o -name "*.yaml" \) | grep -E "(workflow|ci|action)" | head -20

Repository: traceloop/openllmetry

Length of output: 2464


🏁 Script executed:

# Check the pyproject.toml file for context
cat -n packages/opentelemetry-instrumentation-openai-agents/pyproject.toml | head -100

Repository: traceloop/openllmetry

Length of output: 2372


🏁 Script executed:

# Check CI configuration
cat -n .github/workflows/ci.yml | head -100

Repository: traceloop/openllmetry

Length of output: 3937


🏁 Script executed:

# Search for other uses of [tool.uv] in the repo
rg '\[tool\.uv\]' -A 2

Repository: traceloop/openllmetry

Length of output: 6017


🌐 Web query:

astral-sh uv constraint-dependencies pyproject.toml

💡 Result:

Use uv's tool.uv settings in your workspace root pyproject.toml.

  • constraint-dependencies — constraints applied when resolving runtime/regular dependencies. Provide a list of version specifiers (strings). uv only reads this from the workspace root. Example:
    [tool.uv]
    constraint-dependencies = ["grpcio<1.65", "httpx==0.27.2"]. [1]

  • build-constraint-dependencies — constraints applied when resolving build (PEP 517) dependencies. Also only read from the workspace root. Example:
    [tool.uv]
    build-constraint-dependencies = ["setuptools==60.0.0"]. [1]

Notes:

  • Both fields are list[str] and do not cause packages to be installed by themselves — they only restrict versions selected during resolution. [1][2]

Sources:
[1] uv Settings (tool.uv) — docs.astral.sh.
[2] Managing dependencies — docs.astral.sh.


🏁 Script executed:

# Check for workspace root pyproject.toml
ls -la pyproject.toml 2>/dev/null || echo "No pyproject.toml in root"

# Check uv.toml for workspace config
ls -la uv.toml 2>/dev/null || echo "No uv.toml in root"

# Look for workspace configuration
rg 'workspace' pyproject.toml 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 109


🏁 Script executed:

# Check the structure to find where workspace root might be
find . -maxdepth 2 -name "pyproject.toml" -type f | head -10

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

# Let me check if there's a workspace configuration elsewhere
cat -n pyproject.toml 2>/dev/null || echo "Checking repository structure..."

# Check what nx setup is (since the CI uses npx nx)
cat -n nx.json 2>/dev/null | head -50

Repository: traceloop/openllmetry

Length of output: 259


🏁 Script executed:

# Verify the CI actually runs uv commands by looking at more of ci.yml
cat -n .github/workflows/ci.yml | tail -100

Repository: traceloop/openllmetry

Length of output: 3945


Move constraint-dependencies to workspace root pyproject.toml.

uv supports the constraint-dependencies field, and CI does install uv. However, according to uv documentation, constraint-dependencies is only read and enforced from the workspace root pyproject.toml, not from individual package files. The current configuration in 32+ packages is not being enforced by uv. Create a root pyproject.toml with this constraint, or remove it from individual packages if a workspace is not intended.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-openai-agents/pyproject.toml` around
lines 75 - 76, The [tool.uv] constraint-dependencies entry currently defined as
constraint-dependencies = ["urllib3>=2.6.3"] in package pyproject.toml must be
moved to the workspace root pyproject.toml (or removed from per-package files)
because uv only reads this field at the root; to fix, add a [tool.uv] section in
the root pyproject.toml containing constraint-dependencies = ["urllib3>=2.6.3"]
(or delete the constraint-dependencies line from each package pyproject.toml) so
the urllib3 constraint is actually enforced.

"pytest>=8.2.2,<9",
"requests>=2.31.0,<3",
"vcrpy>=6.0.1,<7",
"vcrpy>=8.0.0,<9",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the current vcrpy dependency in the pyproject.toml file
cat packages/opentelemetry-instrumentation-openai/pyproject.toml | grep -A 2 -B 2 vcrpy

Repository: traceloop/openllmetry

Length of output: 135


🏁 Script executed:

# List test files in the package
fd -t f "test_" packages/opentelemetry-instrumentation-openai/tests --max-depth 3

Repository: traceloop/openllmetry

Length of output: 1433


🏁 Script executed:

# Look for VCR cassettes (typically YAML files) in the test directory
fd -t f "\.(yaml|yml)$" packages/opentelemetry-instrumentation-openai/tests

Repository: traceloop/openllmetry

Length of output: 22763


🏁 Script executed:

# Look at a sample cassette to understand the format
head -30 packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat.yaml

Repository: traceloop/openllmetry

Length of output: 743


🌐 Web query:

vcrpy breaking changes version 6 to version 8 major version upgrade

💡 Result:

Summary of major breaking changes when upgrading from vcrpy 6.x → 8.x (notes pulled from the project changelogs/releases):

  • Cassette/serialization changes: v6 warned that some cassettes may need to be recreated because previous releases saved a binary format incorrectly (you may need to re-record cassettes or run migration tools). Also custom serializer/deserializer APIs changed in earlier 1.x-era migration notes — verify any custom persisters/serializers. [1]

  • Dependency / supported-Python changes:

    • v8 drops support for Python 3.9 (so minimum supported Python is >=3.10/3.11 depending on the exact v8 patch). [2]
    • v8 drops support for urllib3 < 2 (you must be using urllib3 v2+). [2]
  • HTTP client integrations and behavior:

    • v6 included fixes for httpx support; v8 rewrote httpx support to patch httpcore instead of httpx and fixed several httpx-related exceptions and custom-transport support — if you rely on httpx behavior or custom transports, test those flows after upgrade. [1][2]
    • HTTPS proxy handling was changed/fixed in v8 (proxy addresses no longer appear in cassette URIs). [2]
  • Removed/changed optional integrations:

    • v6 removed support for boto (the old boto package) while keeping boto3; if you used boto (not boto3) that will break. v6 also dropped simplejson. Check for other removed/changed optional dependencies in the v6 notes. [1]

Migration advice (short):

  • Re-record cassettes (or back them up and run any migration scripts) and run your test-suite to detect cassette-format or serializer issues. [1][3]
  • Ensure runtime environment uses supported Python and dependency versions (upgrade urllib3 and Python if needed). [2]
  • Run integration tests for httpx/requests/urllib3-based code paths and for any custom persisters/serializers. [1][2]

Sources:

  • vcrpy v6 changelog (notable breaking items: cassette format/binary, drop boto/simplejson). [1] https://vcrpy.readthedocs.io/en/v6.0.0/changelog.html
  • vcrpy v8.0.0 release notes / changelog (drop Python 3.9, drop urllib3 <2, httpx rewrite, proxy handling fix). [2] (GitHub release / changelog)

Verify tests pass and re-record VCR cassettes for vcrpy 8.x compatibility.

Major version bump to vcrpy 8.x includes breaking changes that may require cassette re-recording. Key compatibility concerns:

  • Cassette format may need recreation
  • Python 3.9 no longer supported (minimum 3.10+)
  • urllib3 <2 no longer supported
  • httpx support was rewritten

Run the test suite and re-record cassettes if tests fail due to cassette format incompatibility.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-openai/pyproject.toml` at line 42,
Update test infra and VCR cassettes to be compatible with vcrpy 8.x by running
the full test suite and re-recording any failing cassettes; specifically, after
bumping the dependency string "vcrpy>=8.0.0,<9" in pyproject.toml, run tests
locally/CI, recreate VCR cassettes using the project's test helper (the cassette
recording utility used in your repo), and commit the new cassette files, while
also ensuring CI/Python matrix uses Python 3.10+ and that any
httpx/urllib3-related fixtures or adapters in your test helpers are updated to
the vcrpy 8.x API.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 17, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/sample-app/pyproject.toml`:
- Line 17: Remove the two unused dependencies "llama-index-embeddings-openai"
and "llama-index-llms-huggingface" from the dependency list in pyproject.toml;
locate the entries by their package names and delete those lines, then ensure
the surrounding lines (commas and list formatting) remain valid so the TOML
dependency array parses correctly and the remaining used packages (llama-index,
llama-index-embeddings-huggingface, llama-index-llms-huggingface-api,
llama-index-vector-stores-chroma, llama-index-llms-openai) are unaffected.

"requests>=2.32.3,<3",
"pinecone-client>=3.2.2,<4",
"llama-index>=0.12.52,<0.13.0",
"llama-index>=0.14.12,<0.15.0",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Read the pyproject.toml to see the actual dependency versions
cat -n packages/sample-app/pyproject.toml | sed -n '10,50p'

Repository: traceloop/openllmetry

Length of output: 1685


🏁 Script executed:

# Check for test files in packages/sample-app
fd -type f -name "*.py" packages/sample-app | grep -i test

Repository: traceloop/openllmetry

Length of output: 235


🏁 Script executed:

# Get the full context of the dependencies section
cat -n packages/sample-app/pyproject.toml | sed -n '14,45p'

Repository: traceloop/openllmetry

Length of output: 1393


🏁 Script executed:

# Check for test files in packages/sample-app directory
find packages/sample-app -name "*.py" | grep -i test

Repository: traceloop/openllmetry

Length of output: 186


🏁 Script executed:

# List all Python files in packages/sample-app
ls -la packages/sample-app/ | head -20

Repository: traceloop/openllmetry

Length of output: 812


🏁 Script executed:

# Check if there's a tests directory
find packages/sample-app -type d -name "test*"

Repository: traceloop/openllmetry

Length of output: 90


🏁 Script executed:

# Check the test files
cat packages/sample-app/tests/test_placeholder.py

Repository: traceloop/openllmetry

Length of output: 97


🏁 Script executed:

# Check conftest.py
cat packages/sample-app/tests/conftest.py

Repository: traceloop/openllmetry

Length of output: 124


🏁 Script executed:

# Now let's check the sample_app directory to understand what uses llama-index
ls -la packages/sample-app/sample_app/

Repository: traceloop/openllmetry

Length of output: 5992


🏁 Script executed:

# Search for llama-index usage in the sample app
rg "llama.index|from llama_index" packages/sample-app/ --type py

Repository: traceloop/openllmetry

Length of output: 1498


🏁 Script executed:

# Search for imports of llama_index.llms.huggingface (non-api version)
rg "from llama_index.llms.huggingface[^_]|import.*huggingface[^_]" packages/sample-app/ --type py

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

# Check if any package in the repo imports from llama_index.llms.huggingface
rg "from llama_index.llms.huggingface[^_]|import.*HuggingFaceEmbedding|import.*HuggingFace" packages/ --type py

Repository: traceloop/openllmetry

Length of output: 464


🏁 Script executed:

# Get the full pyproject.toml to see the actual current versions before the change
git log -1 --format="%H" packages/sample-app/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 105


🏁 Script executed:

# Check git diff to see what was actually changed
git diff HEAD~1 HEAD packages/sample-app/pyproject.toml | head -80

Repository: traceloop/openllmetry

Length of output: 251


🏁 Script executed:

# Show what changed in the current commit
git show --stat packages/sample-app/pyproject.toml | head -50

Repository: traceloop/openllmetry

Length of output: 333


🏁 Script executed:

# Check the actual diff for the pyproject.toml file
git show packages/sample-app/pyproject.toml | head -100

Repository: traceloop/openllmetry

Length of output: 4023


🏁 Script executed:

# Look at the entire dependencies section to verify what's in the current state
cat -n packages/sample-app/pyproject.toml | sed -n '13,50p'

Repository: traceloop/openllmetry

Length of output: 1587


🏁 Script executed:

# Check for usage of llama-index-embeddings-openai
rg "llama_index.embeddings.openai|OpenAIEmbedding" packages/sample-app/ --type py

Repository: traceloop/openllmetry

Length of output: 307


Remove unused llama-index dependencies on lines 33 and 34.

The version bumps for active dependencies are appropriate. However, llama-index-embeddings-openai (line 33) and llama-index-llms-huggingface (line 34) are unused in the sample app:

  • Line 33: The chroma_app.py uses ChromaDB's OpenAIEmbeddingFunction directly, not llama-index's embeddings
  • Line 34: Only llama-index-llms-huggingface-api (line 35) is imported; the base huggingface LLM package is not referenced

Verified dependencies in use:

  • llama-index (core)
  • llama-index-embeddings-huggingface
  • llama-index-llms-huggingface-api
  • llama-index-vector-stores-chroma
  • llama-index-llms-openai

Note: Test validation cannot be performed as the only test file is a placeholder (test_placeholder.py contains only pass).

🤖 Prompt for AI Agents
In `@packages/sample-app/pyproject.toml` at line 17, Remove the two unused
dependencies "llama-index-embeddings-openai" and "llama-index-llms-huggingface"
from the dependency list in pyproject.toml; locate the entries by their package
names and delete those lines, then ensure the surrounding lines (commas and list
formatting) remain valid so the TOML dependency array parses correctly and the
remaining used packages (llama-index, llama-index-embeddings-huggingface,
llama-index-llms-huggingface-api, llama-index-vector-stores-chroma,
llama-index-llms-openai) are unaffected.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 17, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/opentelemetry-instrumentation-langchain/pyproject.toml`:
- Around line 37-47: The dependency version ranges for langchain and langgraph
are too permissive and allow known vulnerable releases; update the constraints
so langchain is at least 1.2.5 (e.g., change "langchain>=1.0.0,<2.0.0" to
"langchain>=1.2.5,<2.0.0") and langgraph is at least 1.0.1 (e.g., change
"langgraph>=1.0.0,<2.0.0" to "langgraph>=1.0.1,<2.0.0"); also review the
anthropic constraint ("anthropic>=0.75.0,<0.76.0") and raise the minimum if a
patched release exists or lock it to a vetted safe version—update the pyproject
dependency entries for langchain, langgraph (and anthropic if needed)
accordingly.

Comment on lines 37 to 47
"anthropic>=0.75.0,<0.76.0",
"boto3>=1.35.49,<2",
"langchain-anthropic>=0.3.13,<0.4.0",
"langchain-aws>=0.2.11,<0.3.0",
"langchain-cohere==0.3.1",
"langchain-community>=0.3.3,<0.4.0",
"langchain-huggingface>=0.1.2,<0.2.0",
"langchain-openai>=0.3.1,<0.4.0",
"langchain>=0.3.15,<0.4.0",
"langchain-anthropic>=1.0.0,<2.0.0",
"langchain-aws>=1.0.0,<2.0.0",
"langchain-cohere>=0.5.0,<0.6.0",
"langchain-community>=0.4.0,<0.5.0",
"langchain-huggingface>=1.0.0,<2.0.0",
"langchain-openai>=1.0.0,<2.0.0",
"langchain>=1.0.0,<2.0.0",
"langchainhub>=0.1.21,<0.2.0",
"langgraph>=0.4,<0.5.0",
"langgraph>=1.0.0,<2.0.0",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

langchain 1.0.0 CVE security fixes

💡 Result:

Short answer — yes. Two high-profile vulnerabilities affecting LangChain 1.0.0 are:

  • CVE-2025-65106 — a prompt/template-injection flaw affecting langchain 1.0.0 through 1.0.6 (patched in 1.0.7). [1]
  • CVE-2025-68664 — a serialization-injection/secret-extraction flaw affecting langchain >=1.0.0 and <1.2.5 (patched in 1.2.5). (A related JS/TS CVE impacts @langchain/core / langchain.js; those fixes are in the JS releases noted in advisories.) [2][3]

Action: upgrade any Python services using vulnerable versions to at least 1.0.7 (for the template fix) and to 1.2.5 (for the serialization fix) as appropriate, and also update affected LangChain.js packages per the advisories. [1][2][3]

Sources: [1] NVD entry for CVE-2025-65106; [2] LangChain security advisory summaries (serialization fix / CVE-2025-68664); [3] reporting on the JS-related CVE and version ranges.


🌐 Web query:

langgraph 1.0.0 CVE security vulnerabilities fixed

💡 Result:

Short answer: langgraph 1.0.0 is vulnerable to a deserialization RCE (CVE-2025-64439); that issue was fixed by upgrading to langgraph >= 1.0.1. The separate SQLite SQL‑injection advisories (CVE-2025-64104 / CVE-2025-8709) affect the SQLite checkpoint package and were fixed in langgraph-checkpoint-sqlite 2.0.11 (not by 1.0.0). Sources: Snyk (deserialization fix to 1.0.1) [1], vulnerability writeup (CVE-2025-64439) [2], GitHub advisory / CVE for the SQLite SQL injection (fixed in 2.0.11) [3].

References

  • [1] Snyk — Deserialization of Untrusted Data / langgraph (fix: upgrade to ≥1.0.1).
  • [2] Wiz / vulnerability database — CVE-2025-64439 (JsonPlusSerializer RCE).
  • [3] GitHub Advisory / NVD entries — LangGraph SQLite SQL injection (fixed in 2.0.11).

🌐 Web query:

anthropic 0.75.0 CVE security fixes

💡 Result:

Short answer: The anthropic Python SDK v0.75.0 (release 24 Nov 2025) does not list any CVE/security advisories in its release notes — the changelog shows features/bugfixes but no explicit CVE fixes. However, multiple CVEs have been published for Anthropic’s claude-code npm packages (e.g., CVE-2025-54795, CVE-2025-58764, CVE-2025-59536, CVE-2025-65099) which are separate projects and have their own patches. [1][2][3]

Sources:

If you’d like, I can:

  • list the specific claude-code CVEs and their fixed versions, or
  • check other Anthropic repos (claude-code, sandbox-runtime) for security advisories and recommended upgrades.

Version constraints allow vulnerable versions of langchain and langgraph.

The specified version ranges (langchain>=1.0.0,<2.0.0 and langgraph>=1.0.0,<2.0.0) permit vulnerable versions:

  • langchain 1.0.0–1.0.6 contain CVE-2025-65106 (template injection; fixed in 1.0.7)
  • langchain 1.0.0–1.2.4 contain CVE-2025-68664 (serialization injection; fixed in 1.2.5)
  • langgraph 1.0.0 contains CVE-2025-64439 (deserialization RCE; fixed in 1.0.1)

Additionally, anthropic v0.75.0 release notes list no CVE fixes. Tighten constraints to exclude known vulnerable versions, or specify minimum versions that include security patches (e.g., langchain>=1.2.5 and langgraph>=1.0.1).

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-langchain/pyproject.toml` around lines
37 - 47, The dependency version ranges for langchain and langgraph are too
permissive and allow known vulnerable releases; update the constraints so
langchain is at least 1.2.5 (e.g., change "langchain>=1.0.0,<2.0.0" to
"langchain>=1.2.5,<2.0.0") and langgraph is at least 1.0.1 (e.g., change
"langgraph>=1.0.0,<2.0.0" to "langgraph>=1.0.1,<2.0.0"); also review the
anthropic constraint ("anthropic>=0.75.0,<0.76.0") and raise the minimum if a
patched release exists or lock it to a vetted safe version—update the pyproject
dependency entries for langchain, langgraph (and anthropic if needed)
accordingly.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 17, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 17, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/opentelemetry-instrumentation-writer/pyproject.toml`:
- Around line 75-76: The pyproject.toml already uses the [tool.uv]
constraint-dependencies entry but ensure the constraint-dependencies array
contains the exact secure minimums by keeping "urllib3>=2.6.3" and
"pyarrow>=18.1.0" in the constraint-dependencies list (the
constraint-dependencies key) so transitive installs via uv are pinned; if not
present, add those two entries to the constraint-dependencies value to enforce
the secure versions referenced (look for the constraint-dependencies line and
the strings urllib3 and pyarrow to locate and update).

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 18, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@packages/opentelemetry-instrumentation-chromadb/pyproject.toml`:
- Line 12: Update the Python requirement in the three package pyproject.toml
files so they match chromadb: change requires-python in
packages/opentelemetry-semantic-conventions-ai/pyproject.toml,
packages/opentelemetry-instrumentation-qdrant/pyproject.toml, and
packages/opentelemetry-instrumentation-milvus/pyproject.toml to ">=3.10,<4"
(ensure the literal string matches the chromadb line and save each file).

In `@packages/opentelemetry-instrumentation-cohere/pyproject.toml`:
- Line 43: The vcrpy bump to "vcrpy>=8.0.0,<9" requires Python >=3.10 and
urllib3 >=2.0.0; update pyproject.toml to ensure python_requires is >=3.10 (or
adjust CI matrix) and add or align a test dependency constraint for urllib3
(e.g., "urllib3>=2.0.0") so environments meet vcrpy's requirements, then run the
full test suite and re-record any VCR cassettes containing binary HTTP bodies if
tests fail.

In `@packages/opentelemetry-instrumentation-lancedb/pyproject.toml`:
- Line 12: The pyproject.toml entry requires-python = ">=3.10,<4" is a breaking
min-version bump and must be documented in the release notes/CHANGELOG; add a
concise changelog entry under the upcoming release (or Unreleased) stating that
this package (packages/opentelemetry-instrumentation-lancedb) now requires
Python >=3.10 (dropping 3.9), mark it as a breaking change, and include a short
rationale and migration note (e.g., users on 3.9 must upgrade or pin prior
versions); update the CHANGELOG.md or RELEASE_NOTES with this entry and ensure
the entry mirrors the repository-wide note about 29/32 packages moving to
>=3.10.

In `@packages/opentelemetry-instrumentation-mistralai/pyproject.toml`:
- Line 42: The project updated the vcrpy dependency in pyproject.toml
("vcrpy>=8.0.0,<9"), so re-record all VCR cassettes used by the test suite to
ensure they match vcrpy 8.x behavior; run the relevant integration tests that
use VCR, delete or regenerate the existing cassette files, verify recorded HTTP
interactions and sensitive-data filtering are correct, and commit the new
cassette files so tests pass under the updated vcrpy version.
♻️ Duplicate comments (5)
packages/opentelemetry-instrumentation-agno/pyproject.toml (1)

44-44: Re-record VCR cassettes with vcrpy 8 and scrub secrets.

The major vcrpy bump can change cassette formats and matching; please re-record/verify cassettes and ensure PII/secret filtering remains in place. Based on learnings, ensure cassettes are regenerated and scrubbed.

packages/opentelemetry-instrumentation-cohere/pyproject.toml (1)

12-12: CI test matrix gap for Python 3.10 minimum version.

This concern was already raised in a previous review: the CI workflow tests only Python 3.11, so the new >=3.10 floor remains untested. Please address the prior feedback before merging.

packages/opentelemetry-instrumentation-langchain/pyproject.toml (1)

37-37: Tighten LangChain/LangGraph minimums to patched releases.
These ranges still allow known vulnerable versions (and this PR is a CVE remediation). Please raise minimums to patched versions and confirm whether the new anthropic minimum is security‑motivated or just compatibility.

🔧 Proposed constraint hardening (adjust to latest patched versions)
-  "langchain>=1.0.0,<2.0.0",
+  "langchain>=1.2.5,<2.0.0",
...
-  "langgraph>=1.0.0,<2.0.0",
+  "langgraph>=1.0.1,<2.0.0",
What are the latest patched minimum versions for langchain (CVE-2025-65106, CVE-2025-68664) and langgraph (CVE-2025-64439)?

Also applies to: 39-49

packages/sample-app/pyproject.toml (1)

34-37: Re-verify usage of llama-index-embeddings-openai and llama-index-llms-huggingface.
Line 35-36 were previously identified as unused in the sample app; please confirm they’re now required, otherwise remove to cut dependency surface.

If you want to verify quickly, this read-only scan should help:

#!/bin/bash
# Search for llama-index OpenAI embeddings / HuggingFace LLM usage in sample-app
rg -nP --type=py -C2 'llama_index\.embeddings\.openai|llama_index\.llms\.huggingface' packages/sample-app
rg -nP --type=py -C2 'OpenAIEmbedding|HuggingFace' packages/sample-app
packages/opentelemetry-instrumentation-openai/pyproject.toml (1)

42-42: Re-validate vcrpy 8.x cassette compatibility.
Run the relevant tests and re-record any failing cassettes due to vcrpy’s major changes.

🧹 Nitpick comments (2)
packages/sample-app/pyproject.toml (1)

29-31: Verify LangChain 1.x migration and necessity of langchain-classic.
Line 29-31 move to LangChain 1.x; please confirm sample-app imports are aligned with the new module paths and that langchain-classic is actually needed (drop if unused).

packages/opentelemetry-instrumentation-lancedb/pyproject.toml (1)

37-37: Consider adding an upper bound for lancedb test dependency.

The upper bound was removed (previously <0.10.0). While this is a test-only dependency, adding an upper bound like <0.27 or <1 would improve reproducibility and prevent unexpected CI failures from future breaking releases.

Suggested change
-  "lancedb>=0.26.0",
+  "lancedb>=0.26.0,<1",

license = "Apache-2.0"
readme = "README.md"
requires-python = ">=3.9,<4"
requires-python = ">=3.10,<4"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for requires-python settings across all pyproject.toml files
echo "=== requires-python settings in all pyproject.toml files ==="
rg -n --glob 'pyproject.toml' 'requires-python'

echo -e "\n=== Python 3.9 references in .github directory (CI configs) ==="
rg -n '3\.9' .github 2>/dev/null || echo "No 3.9 references found in .github"

echo -e "\n=== Python 3.9 references in entire repo (limited to relevant files) ==="
rg -n '3\.9' --glob '*.toml' --glob '*.yml' --glob '*.yaml' --glob '*.py' . | head -50

Repository: traceloop/openllmetry

Length of output: 50379


Address Python version inconsistency across packages.

The chromadb change to requires-python = ">=3.10,<4" is correct, but three other packages still allow Python 3.9:

  • packages/opentelemetry-semantic-conventions-ai/pyproject.toml
  • packages/opentelemetry-instrumentation-qdrant/pyproject.toml
  • packages/opentelemetry-instrumentation-milvus/pyproject.toml

Align these to >=3.10,<4" for consistency across the repository. CI configuration has been properly updated.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-chromadb/pyproject.toml` at line 12,
Update the Python requirement in the three package pyproject.toml files so they
match chromadb: change requires-python in
packages/opentelemetry-semantic-conventions-ai/pyproject.toml,
packages/opentelemetry-instrumentation-qdrant/pyproject.toml, and
packages/opentelemetry-instrumentation-milvus/pyproject.toml to ">=3.10,<4"
(ensure the literal string matches the chromadb line and save each file).

"pytest-sugar==1.0.0",
"pytest>=8.2.2,<9",
"vcrpy>=6.0.1,<7",
"vcrpy>=8.0.0,<9",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Verify vcrpy 8.0.0 exists on PyPI and check latest version
echo "=== Checking vcrpy versions on PyPI ==="
curl -s https://pypi.org/pypi/vcrpy/json | jq -r '.releases | keys | .[]' | sort -V | tail -15

echo -e "\n=== Latest vcrpy version ==="
curl -s https://pypi.org/pypi/vcrpy/json | jq -r '.info.version'

echo -e "\n=== vcrpy 8.0.0 release info ==="
curl -s https://pypi.org/pypi/vcrpy/8.0.0/json | jq -r '.info | {version, release_date: .release_url}'

Repository: traceloop/openllmetry

Length of output: 348


🌐 Web query:

vcrpy 8.0.0 breaking changes cassette format

💡 Result:

Short answer: vcrpy 8.0.0 did not introduce a new cassette format or a breaking cassette-format change. The cassette format has been versioned as "1" since the 1.x series; older 0.x cassettes remain incompatible and must be re-recorded or migrated, but no further cassette-format change was made in 8.0.0. [1][2]

Sources:

  • serialize implementation showing cassette format version = 1 and the old 0.x → 1.x migration warning. [1]
  • vcrpy 8.0.0 release notes (breaking changes list — drops Python 3.9 and older urllib3 support; no cassette-format change). [2]

🌐 Web query:

vcrpy 6.x to 8.x migration guide compatibility

💡 Result:

Brief migration checklist (vcrpy 6.x → 8.x)

  • Python: ensure your environment meets v8’s Python support (v8 drops older Pythons — verify target version per v8 release). [1][2]
  • urllib3: v8 removes support for urllib3 < 2 — require urllib3 >= 2.0.0. Update your pinned dependency. [2][3]
  • HTTP libraries:
    • httpx/httpcore: v8 rewrote httpx support to patch httpcore; exercise any tests that use custom httpx transports and watch for Response-reading issues. Run your httpx tests after upgrade. [2][3]
    • Confirm other adapters you use (requests, aiohttp, urllib3, tornado, boto3, etc.) remain supported in v8. [2]
  • Cassettes: some earlier releases (notably around v6) changed how binary bodies were saved — you will likely need to re-record or migrate old cassette files (backup then re-record). Use the cassette-migration tooling or delete & re-record where feasible. [1][2]
  • Removed/changed integrations:
    • v6 already dropped legacy boto (keeps boto3); review any boto (non-boto3) usage. [1]
    • If you maintain custom serializers/deserializers or matcher extensions, verify they still conform to the current serializer API. [2]
  • New features / options: v8 added options like drop_unused_requests for cassette cleanup — consider using them to tidy recordings. [3]
  • Test plan: run full test suite with v8, paying special attention to HTTP-heavy tests, proxy handling, redirects and any custom transports. Back up cassettes before testing.

Key sources:

  • vcrpy 6.0.0 changelog (notes about binary cassette changes, boto drop). [1]
  • vcrpy docs (installation / compatibility for v8, supported libraries). [2]
  • v8.0.0 release notes (breaking changes: urllib3 support, Python support, httpx/httpcore rewrite, new options). [3]

If you want, I can:

  • produce a concrete pip/requirements diff for your project,
  • search your repo (or a sample) for things likely to break (custom serializers, boto usage, httpx transports),
    or
  • draft a short automated test script to validate cassettes after upgrade. Which would you like?

vcrpy major version bump to 8.0.0 is valid, but verify Python and urllib3 compatibility.

vcrpy 8.0.0 exists on PyPI and doesn't introduce cassette format breaking changes. However, v8 requires Python 3.10+ and urllib3 ≥ 2.0.0, which may affect your test environment. While cassette format stability is assured, run the test suite after this upgrade to ensure all VCR recordings remain compatible—cassettes recorded under v6 may need re-recording if they contain binary HTTP bodies due to earlier v6 handling changes.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-cohere/pyproject.toml` at line 43, The
vcrpy bump to "vcrpy>=8.0.0,<9" requires Python >=3.10 and urllib3 >=2.0.0;
update pyproject.toml to ensure python_requires is >=3.10 (or adjust CI matrix)
and add or align a test dependency constraint for urllib3 (e.g.,
"urllib3>=2.0.0") so environments meet vcrpy's requirements, then run the full
test suite and re-record any VCR cassettes containing binary HTTP bodies if
tests fail.

license = "Apache-2.0"
readme = "README.md"
requires-python = ">=3.9,<4"
requires-python = ">=3.10,<4"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if Python version requirement is consistent across all packages in this repo

# Find all pyproject.toml files and check their requires-python values
echo "=== Python version requirements across packages ==="
fd -t f 'pyproject.toml' packages/ --exec grep -H 'requires-python' {} \; | sort -u

# Check for any changelog or release notes mentioning Python version changes
echo ""
echo "=== Checking for changelog/release notes files ==="
fd -t f \( -name 'CHANGELOG*' -o -name 'RELEASE*' -o -name 'HISTORY*' -o -name 'NEWS*' \) | head -20

# Also check if there are git-related docs
echo ""
echo "=== Checking root directory for documentation ==="
ls -la | grep -E '\.(md|rst|txt)$'

Repository: traceloop/openllmetry

Length of output: 3995


🏁 Script executed:

#!/bin/bash
# Check CHANGELOG for Python version bump documentation
echo "=== Recent CHANGELOG entries mentioning Python ==="
head -100 CHANGELOG.md | grep -A 5 -B 5 -i "python"

echo ""
echo "=== Full lancedb pyproject.toml ==="
cat -n packages/opentelemetry-instrumentation-lancedb/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

#!/bin/bash
# Read the lancedb pyproject.toml file
echo "=== lancedb pyproject.toml content ==="
cat packages/opentelemetry-instrumentation-lancedb/pyproject.toml

echo ""
echo "=== Searching CHANGELOG for 'Python 3.10' mention ==="
grep -n "Python 3.10\|python.*3\.10" CHANGELOG.md | head -20

Repository: traceloop/openllmetry

Length of output: 1905


Ensure the Python 3.10 requirement change is documented in release notes.

The Python 3.10 minimum version bump is intentional and consistent with the ecosystem—29 of 32 packages in this repository now require Python >=3.10. However, this breaking change should be explicitly documented in the CHANGELOG or release notes to inform users dropping support for Python 3.9.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-lancedb/pyproject.toml` at line 12,
The pyproject.toml entry requires-python = ">=3.10,<4" is a breaking min-version
bump and must be documented in the release notes/CHANGELOG; add a concise
changelog entry under the upcoming release (or Unreleased) stating that this
package (packages/opentelemetry-instrumentation-lancedb) now requires Python
>=3.10 (dropping 3.9), mark it as a breaking change, and include a short
rationale and migration note (e.g., users on 3.9 must upgrade or pin prior
versions); update the CHANGELOG.md or RELEASE_NOTES with this entry and ensure
the entry mirrors the repository-wide note about 29/32 packages moving to
>=3.10.

"pytest-sugar==1.0.0",
"pytest>=8.2.2,<9",
"vcrpy>=6.0.1,<7",
"vcrpy>=8.0.0,<9",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify vcrpy 8.0.0 exists on PyPI and check available versions
curl -s https://pypi.org/pypi/vcrpy/json | jq -r '.releases | keys[]' | sort -V | tail -20

Repository: traceloop/openllmetry

Length of output: 184


🌐 Web query:

vcrpy 8.0.0 release notes breaking changes changelog

💡 Result:

Breaking changes in vcrpy 8.0.0

  • Drop support for Python 3.9. [1]
  • Drop support for urllib3 < 2 (addresses CVE warnings from urllib3 1.x). [1][2]

See the full release notes/changelog on the GitHub release page for v8.0.0. [1]

Sources:
[1] vcrpy v8.0.0 release notes — GitHub.
[2] vcrpy v8.0.0 summary — NewReleases.


🏁 Script executed:

cat packages/opentelemetry-instrumentation-mistralai/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 1728


VCR cassettes should be re-recorded after this vcrpy upgrade.

vcrpy 8.0.0 exists and is compatible with this package (Python >=3.10). However, upgrading from vcrpy 6.x to 8.x may require re-recording test cassettes to ensure API interactions are properly captured and prevent test failures.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-mistralai/pyproject.toml` at line 42,
The project updated the vcrpy dependency in pyproject.toml ("vcrpy>=8.0.0,<9"),
so re-record all VCR cassettes used by the test suite to ensure they match vcrpy
8.x behavior; run the relevant integration tests that use VCR, delete or
regenerate the existing cassette files, verify recorded HTTP interactions and
sensitive-data filtering are correct, and commit the new cassette files so tests
pass under the updated vcrpy version.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 18, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py`:
- Around line 39-50: The test test_agents_and_tools sets llm =
OpenAI(model="gpt-4o-mini") but existing VCR cassettes still contain responses
for gpt-3.5-turbo-0613; re-record the VCR cassette(s) for this test so they
reflect the new model: run the test suite (or the specific pytest invocation
with VCR enabled) to regenerate test_agents_and_tools.yaml, verify the new
cassette contains gpt-4o-mini entries, and commit the updated cassette; ensure
conftest.py VCR filters remain intact so no secrets are captured.
- Line 135: The VCR cassette for test_agent_with_multiple_tools was recorded
against the old model and must be re-recorded to match the updated Cohere model;
update the test by re-running the test that defines llm =
Cohere(model="command-a-03-2025") (and any helper that initializes the agent) so
VCR captures interactions for "command-a-03-2025" and commit the new cassette;
ensure VCR filtering remains configured to strip authorization/api-key headers
and verify the cassette filename referenced by the test is replaced with the
newly recorded file.
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (2)

86-100: Hardcoded token counts are brittle and will break if cassettes are re-recorded.

The exact token count assertions (e.g., output_tokens == 44, input_tokens == 538) are tightly coupled to the current VCR cassette recordings. If the cassettes need to be re-recorded due to API changes or model updates, these assertions will fail.

Consider using > 0 checks (similar to the Cohere test below) or range-based assertions to make the tests more resilient while still validating that token usage is captured.

♻️ Suggested approach
-    assert llm_span_1.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 44
-    assert llm_span_1.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 538
-    assert llm_span_1.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 582
+    assert llm_span_1.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] > 0
+    assert llm_span_1.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] > 0
+    assert llm_span_1.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] > 0

     # Verify second LLM span
     assert len(llm_spans) >= 2, "Expected at least 2 LLM spans"
     llm_span_2 = llm_spans[1]
     assert llm_span_2.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat"
     assert llm_span_2.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] == "gpt-4o-mini"
     assert GenAIAttributes.GEN_AI_RESPONSE_MODEL in llm_span_2.attributes
     assert f"{GenAIAttributes.GEN_AI_PROMPT}.0.content" in llm_span_2.attributes
     assert f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" in llm_span_2.attributes
-    assert llm_span_2.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 30
-    assert llm_span_2.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 594
-    assert llm_span_2.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 624
+    assert llm_span_2.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] > 0
+    assert llm_span_2.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] > 0
+    assert llm_span_2.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] > 0

189-206: Consider consolidating attribute key checks.

The dual-key fallback pattern (GenAIAttributes.GEN_AI_REQUEST_MODEL in ... or "gen_ai.request.model" in ...) is repeated multiple times. If both formats are equivalent (they should be, since GenAIAttributes.GEN_AI_REQUEST_MODEL should equal "gen_ai.request.model"), this redundancy can be simplified.

If they differ due to version compatibility, consider defining a helper function to reduce duplication:

♻️ Optional helper approach
def has_attr(span, *keys):
    """Check if span has any of the provided attribute keys."""
    return any(k in span.attributes for k in keys)

def get_attr(span, *keys):
    """Get first matching attribute value from span."""
    for k in keys:
        if k in span.attributes:
            return span.attributes[k]
    return None

Comment on lines 39 to +50
@pytest.mark.vcr
def test_agents_and_tools(instrument_legacy, span_exporter, log_exporter):
@pytest.mark.asyncio
async def test_agents_and_tools(instrument_legacy, span_exporter, log_exporter):
def multiply(a: int, b: int) -> int:
"""Multiply two integers and returns the result integer"""
return a * b

multiply_tool = FunctionTool.from_defaults(fn=multiply)
llm = OpenAI(model="gpt-3.5-turbo-0613")
agent = ReActAgent.from_tools([multiply_tool], llm=llm, verbose=True)
llm = OpenAI(model="gpt-4o-mini")
agent = ReActAgent(tools=[multiply_tool], llm=llm, verbose=True, streaming=False)

agent.chat("What is 2 times 3?")
await agent.run("What is 2 times 3?")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the cassettes directory for this test
find . -type d -name "cassettes" | head -5

Repository: traceloop/openllmetry

Length of output: 406


🏁 Script executed:

#!/bin/bash
# List cassette files related to test_agents
find . -path "*/cassettes/*test_agents*" -type f 2>/dev/null | head -10

Repository: traceloop/openllmetry

Length of output: 1346


🏁 Script executed:

#!/bin/bash
# Get the actual test file to verify it matches the snippet
cat -n packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py | sed -n '39,50p'

Repository: traceloop/openllmetry

Length of output: 624


🏁 Script executed:

#!/bin/bash
# Check directory structure around the test file
ls -la packages/opentelemetry-instrumentation-llamaindex/tests/ | head -20

Repository: traceloop/openllmetry

Length of output: 721


🏁 Script executed:

#!/bin/bash
# Check the model name in the cassettes
grep -i "gpt-4o-mini\|gpt-3.5-turbo" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools_with_events_with_content.yaml | head -5

Repository: traceloop/openllmetry

Length of output: 251


🏁 Script executed:

#!/bin/bash
# Check for API keys, tokens, or secrets in cassettes
grep -i -E "(api.?key|secret|password|bearer|authorization|sk-[a-zA-Z0-9])" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools_with_events_with_content.yaml | head -10

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check the file size and structure of the cassettes
wc -l ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools_*.yaml

Repository: traceloop/openllmetry

Length of output: 355


🏁 Script executed:

#!/antml:max_thinking
Let me also check if there's a basic test_agents_and_tools.yaml file without the _with_events suffix
</thinking>
bash
# Check if there's a plain test_agents_and_tools.yaml cassette
ls -la ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/ | grep "test_agents_and_tools"

Repository: traceloop/openllmetry

Length of output: 407


🏁 Script executed:

#!/bin/bash
# Check the main test_agents_and_tools.yaml cassette content
head -50 ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml

Repository: traceloop/openllmetry

Length of output: 2923


🏁 Script executed:

#!/bin/bash
# Count occurrences of each model in the main cassette
echo "=== Model references in main cassette ==="
grep -c "gpt-3.5-turbo-0613" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml || echo "gpt-3.5-turbo-0613: 0 occurrences"
grep -c "gpt-4o-mini" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml || echo "gpt-4o-mini: 0 occurrences"

Repository: traceloop/openllmetry

Length of output: 110


🏁 Script executed:

#!/bin/bash
# Check if cassettes scrub sensitive headers properly
head -100 ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml | grep -i "authorization\|api-key\|x-api-key" || echo "No obvious unscubbed auth headers found in first 100 lines"

Repository: traceloop/openllmetry

Length of output: 123


🏁 Script executed:

#!/bin/bash
# Check conftest.py for VCR configuration and filters
cat -n packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py | grep -A 20 -B 5 "vcr\|filter"

Repository: traceloop/openllmetry

Length of output: 1013


🏁 Script executed:

#!/bin/bash
# Show the exact lines with the old model in cassettes
grep -n "gpt-3.5-turbo-0613" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml

Repository: traceloop/openllmetry

Length of output: 348


🏁 Script executed:

#!/bin/bash
# Verify there are zero gpt-4o-mini references in cassettes
grep -n "gpt-4o-mini" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml | wc -l

Repository: traceloop/openllmetry

Length of output: 66


VCR cassettes must be re-recorded before merging. The test code uses model="gpt-4o-mini" (line 47), but the cassettes still contain gpt-3.5-turbo-0613 at lines 30, 154, and 275 of test_agents_and_tools.yaml. Per coding guidelines, cassettes must be re-recorded when API interactions change to ensure test accuracy. The VCR filter configuration in conftest.py properly scrubs authorization headers, so no secrets are exposed in cassettes.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py`
around lines 39 - 50, The test test_agents_and_tools sets llm =
OpenAI(model="gpt-4o-mini") but existing VCR cassettes still contain responses
for gpt-3.5-turbo-0613; re-record the VCR cassette(s) for this test so they
reflect the new model: run the test suite (or the specific pytest invocation
with VCR enabled) to regenerate test_agents_and_tools.yaml, verify the new
cassette contains gpt-4o-mini entries, and commit the updated cassette; ensure
conftest.py VCR filters remain intact so no secrets are captured.


sql_database = make_sql_table()
llm = Cohere()
llm = Cohere(model="command-a-03-2025")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Cohere command model API 2025 2026 available models list

💡 Result:

Latest available "Command" models (names & notable variants) — sourced from Cohere docs:

  • command-a-03-2025. [1]
  • command-a-vision-07-2025 (multimodal / vision). [1][3]
  • command-a-reasoning-08-2025 (reasoning / agentic). [2]
  • command-a-translate-08-2025 (translation). [4]
  • command-r7b-12-2024. [1]
  • command-r-08-2024 and command-r-plus-08-2024 (listed as recommended replacements for older command models). [5][1]

Note: Cohere has deprecated older command models (e.g., command-r-03-2024, command-r-plus-04-2024, command-light, command). See the deprecation notice for details. [5]

Sources: Cohere model overview and changelog/release notes. [1][2][3][4][5]


🏁 Script executed:

# Check the test_agents.py file for VCR cassette configuration and the full context around line 135
cd packages/opentelemetry-instrumentation-llamaindex || true
head -200 tests/test_agents.py | tail -100

Repository: traceloop/openllmetry

Length of output: 4686


🏁 Script executed:

# Search for VCR cassettes or fixtures related to this test
find . -name "*cassette*" -o -name "*vcr*" 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 1423


🏁 Script executed:

# Check imports in the test file to see how VCR is configured
grep -n "vcr\|cassette\|pytest" tests/test_agents.py | head -30

Repository: traceloop/openllmetry

Length of output: 118


🏁 Script executed:

# Find the test_agents.py file
find . -name "test_agents.py" -type f

Repository: traceloop/openllmetry

Length of output: 209


🏁 Script executed:

# List the cassettes directory for llamaindex
ls -la ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/ 2>/dev/null | head -50

Repository: traceloop/openllmetry

Length of output: 506


🏁 Script executed:

# Search for test_agent_with_multiple_tools in the llamaindex tests
find ./packages/opentelemetry-instrumentation-llamaindex/tests -name "*.py" -exec grep -l "test_agent_with_multiple_tools\|command-a-03-2025" {} \;

Repository: traceloop/openllmetry

Length of output: 137


🏁 Script executed:

# Check the cassettes for test_agents
ls -la ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/ | head -30

Repository: traceloop/openllmetry

Length of output: 1073


🏁 Script executed:

# Read the test_agents.py file around line 135
sed -n '120,150p' ./packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py

Repository: traceloop/openllmetry

Length of output: 1071


🏁 Script executed:

# Check the beginning of test_agents.py for VCR and other configurations
head -50 ./packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py

Repository: traceloop/openllmetry

Length of output: 1961


🏁 Script executed:

# Check if there's a cassette for test_agent_with_multiple_tools
find ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents -name "*multiple*"

Repository: traceloop/openllmetry

Length of output: 465


🏁 Script executed:

# Check the cassettes for any potential secrets or API keys
head -100 ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml

Repository: traceloop/openllmetry

Length of output: 6959


🏁 Script executed:

# Search for API key patterns in the cassette
grep -i "api.key\|authorization\|bearer\|secret" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml | head -20

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

# Check the test file to see how VCR is configured for filtering
grep -A 20 -B 5 "pytest.ini\|conftest\|vcr_config\|filter" ./packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py | head -40

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

# Check if there's a conftest.py that configures VCR filtering
find ./packages/opentelemetry-instrumentation-llamaindex/tests -name "conftest.py" -exec cat {} \;

Repository: traceloop/openllmetry

Length of output: 4570


Re-record the VCR cassette for test_agent_with_multiple_tools.

The model command-a-03-2025 is valid, but the cassette contains recorded interactions using the older command-r model. The cassette must be re-recorded to match the updated model specification. VCR filtering is properly configured to exclude sensitive headers (authorization, api-key), so secrets are protected.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py` at
line 135, The VCR cassette for test_agent_with_multiple_tools was recorded
against the old model and must be re-recorded to match the updated Cohere model;
update the test by re-running the test that defines llm =
Cohere(model="command-a-03-2025") (and any helper that initializes the agent) so
VCR captures interactions for "command-a-03-2025" and commit the new cassette;
ensure VCR filtering remains configured to strip authorization/api-key headers
and verify the cassette filename referenced by the test is replaced with the
newly recorded file.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 18, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 18, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml`:
- Around line 75-76: Add a filelock constraint to the
[tool.uv].constraint-dependencies entry by including "filelock>=3.20.3" and
remove "pip>=25.3" from that list (or move the pip constraint out of
constraint-dependencies into a dev/build tooling configuration if it is truly
needed for build tooling); update the constraint-dependencies array that
currently contains ["urllib3>=2.6.3", "pip>=25.3"] to instead include
"filelock>=3.20.3" and only keep runtime transitive deps like "urllib3>=2.6.3".

In `@packages/opentelemetry-instrumentation-bedrock/pyproject.toml`:
- Line 41: The dependency bump to "vcrpy>=8.0.0,<9" in pyproject.toml requires
regenerating test cassettes because vcrpy v8 rejects legacy v6/v7 cassettes; run
the migration tool (python -m vcr.migration) or delete and re-record cassettes
in the package opentelemetry-instrumentation-bedrock tests directory, then
re-run the test suite to regenerate them; after regeneration ensure you apply
the project's VCR filters/scrubbers to remove any secrets/PII from the new
cassettes before committing.

In `@packages/opentelemetry-instrumentation-langchain/pyproject.toml`:
- Around line 91-92: Update the constraint-dependencies entry: keep
urllib3>=2.6.3 and pip>=25.3 as-is, correct the CVE attribution for
langgraph-checkpoint (it addresses CVE-2025-64439) and add
langgraph-checkpoint-sqlite>=2.0.11 to address the SQLite injection CVEs
(CVE-2025-64104 and CVE-2025-8709); modify the constraint-dependencies list (the
symbol to change is constraint-dependencies) to include
"langgraph-checkpoint-sqlite>=2.0.11" alongside the existing
"langgraph-checkpoint>=4.0.0", and ensure any documentation or commands
referencing package management use the "uv run <command>" pattern per
guidelines.

In `@packages/opentelemetry-instrumentation-openai-agents/pyproject.toml`:
- Line 11: Update the minimum Python version in this package by setting
requires-python = ">=3.10,<4" in pyproject.toml (the requires-python entry) and
ensure consistency across the package: update any Python classifiers in
pyproject.toml or setup metadata, adjust CI job matrix entries that reference
Python 3.9 to start at 3.10, and verify that any runtime checks or packaging
metadata (e.g., in tool.poetry or project.urls) do not still enumerate 3.9
support so the package metadata and CI reflect the new >=3.10 requirement.

In `@packages/opentelemetry-instrumentation-pinecone/pyproject.toml`:
- Line 12: The release workflow currently tests only Python 3.11 while
pyproject.toml declares requires-python = ">=3.10,<4"; update
.github/workflows/release.yml to include Python 3.10 in the python-version
matrix for all release jobs (the Python matrix entries around the job
definitions at lines referenced in the review) so the declared minimum is
validated, and add a classifiers section to the project's pyproject.toml (add
the classifiers key under the project metadata) including at minimum
"Programming Language :: Python :: 3.10" and "Programming Language :: Python ::
3.11" to make supported versions explicit in package metadata.
- Line 44: Update the pytest-recording constraint in pyproject.toml to a version
that declares vcrpy 8 support (e.g., change pytest-recording to ">=0.13.2" or
preferably ">=0.13.4"), then re-record the VCR cassettes used by the Pinecone
tests (notably test_pinecone_retrieval.yaml) using vcrpy 8 to accommodate
httpx/httpcore and proxy/redirect behavior changes; after re-recording, verify
cassettes have sensitive data scrubbed (ensure filtered Authorization header and
other secrets are properly filtered) before committing.

In `@packages/opentelemetry-instrumentation-together/pyproject.toml`:
- Line 43: Upgrade to vcrpy 8.0.0 requires re-recording and scrubbing test
cassettes; re-run the tests that use pytest-recording to re-record the cassettes
under tests/cassettes/test_chat/ and tests/cassettes/test_completion/ (ensure
record_mode is set to 'all' or appropriate in your pytest-recording/conftest.py
fixture), then sanitize those new cassettes using pytest-recording filters (or
the VCR.py before_record/after_record hooks configured in conftest.py) to remove
keys, tokens, PII and any headers (e.g., Authorization, Cookie, API keys) before
committing; verify playback succeeds with urllib3>=2 and Python 3.10+.

In `@packages/opentelemetry-instrumentation-writer/pyproject.toml`:
- Around line 75-76: The constraint-dependencies entry in this package's
[tool.uv] section is ineffective because uv only reads constraints from the
workspace root; move the line constraint-dependencies = ["urllib3>=2.6.3",
"pyarrow>=18.1.0", "pip>=25.3"] out of
packages/opentelemetry-instrumentation-writer/pyproject.toml and add it to the
root pyproject.toml under its [tool.uv] section so the uv resolver will honor
these constraints.
♻️ Duplicate comments (6)
packages/sample-app/pyproject.toml (1)

35-36: Remove unused llama-index packages to reduce attack surface.

These two dependencies still look unused in the sample app. If no imports exist, remove them to avoid unnecessary installs and CVE exposure.

🧹 Proposed cleanup
-  "llama-index-embeddings-openai>=0.5.0,<0.6.0",
-  "llama-index-llms-huggingface>=0.6.0,<0.7.0",

Run this to confirm there are no usages:

#!/bin/bash
rg -n "llama_index\.embeddings\.openai|OpenAIEmbedding" 
rg -nP "llama_index\.llms\.huggingface(?!_api)"
packages/opentelemetry-instrumentation-openai-agents/pyproject.toml (1)

74-76: constraint-dependencies not enforced in per-package files.

As noted in a previous review, uv's constraint-dependencies is only read from the workspace root pyproject.toml, not from individual package files. This configuration will have no effect here.

packages/opentelemetry-instrumentation-mistralai/pyproject.toml (1)

42-42: vcrpy upgrade to 8.0.0 for CVE remediation.

The vcrpy version bump addresses CVE warnings from urllib3 1.x by requiring urllib3 >= 2.

The need to re-record VCR cassettes after this upgrade was already flagged in a previous review.

packages/opentelemetry-instrumentation-openai/pyproject.toml (1)

12-12: Python 3.10+ and vcrpy 8.x changes are consistent.

The Python version bump to 3.10+ aligns with vcrpy 8.x requirements (which dropped Python 3.9 support). Ensure test cassettes have been re-recorded if any tests fail due to cassette format changes.

Also applies to: 42-42

packages/opentelemetry-instrumentation-langchain/pyproject.toml (1)

37-49: Vulnerable version ranges already flagged in previous review.

The CVE concerns with langchain>=1.0.0 and langgraph>=1.0.0 allowing vulnerable versions have been identified in a prior review comment. Please address that feedback by tightening these constraints to exclude known vulnerable releases.

packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1)

12-12: Dropping Python 3.9 is a breaking change — see previous review comment.

The past review correctly identified that this change is unnecessary for CVE remediation. The filelock CVEs affect the filelock package version, not the Python interpreter version. The suggested fix was to add filelock>=3.20.3 to constraint-dependencies instead.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-lancedb/pyproject.toml (1)

37-37: Consider adding an upper bound to the lancedb dependency.

The lancedb test dependency was changed from >=0.9.0,<0.10.0 to >=0.26.0 without an upper bound. This significant version jump (0.9.x → 0.26.x) may include breaking API changes, and removing the upper bound could lead to unexpected test failures when new major versions are released.

♻️ Suggested fix
-  "lancedb>=0.26.0",
+  "lancedb>=0.26.0,<0.27.0",

Comment on lines +75 to +76
[tool.uv]
constraint-dependencies = ["urllib3>=2.6.3", "pip>=25.3"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing filelock constraint; pip constraint seems unnecessary.

If the Python 3.10 bump was motivated by filelock CVEs (as mentioned in the PR title for CVE remediation), the filelock>=3.20.3 constraint should be added here — this was the suggested fix from the past review to avoid dropping Python 3.9.

Additionally, pip>=25.3 is an unusual constraint since pip is not typically a transitive runtime dependency. If this is for development/build tooling, it may not need to be in constraint-dependencies.

Suggested fix
 [tool.uv]
-constraint-dependencies = ["urllib3>=2.6.3", "pip>=25.3"]
+constraint-dependencies = ["urllib3>=2.6.3", "filelock>=3.20.3"]
#!/bin/bash
# Check if pip is actually a transitive dependency in this package
rg -l "pip" packages/opentelemetry-instrumentation-anthropic/ --type py || echo "No pip references found in source"

# Check if filelock is used as a transitive dependency
rg "filelock" packages/opentelemetry-instrumentation-anthropic/ || echo "No direct filelock references"
🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml` around lines
75 - 76, Add a filelock constraint to the [tool.uv].constraint-dependencies
entry by including "filelock>=3.20.3" and remove "pip>=25.3" from that list (or
move the pip constraint out of constraint-dependencies into a dev/build tooling
configuration if it is truly needed for build tooling); update the
constraint-dependencies array that currently contains ["urllib3>=2.6.3",
"pip>=25.3"] to instead include "filelock>=3.20.3" and only keep runtime
transitive deps like "urllib3>=2.6.3".

"pytest-sugar==1.0.0",
"pytest>=8.2.2,<9",
"vcrpy>=6.0.1,<7",
"vcrpy>=8.0.0,<9",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find test cassettes in the bedrock instrumentation package
find packages/opentelemetry-instrumentation-bedrock -type f \( -name "*.yaml" -o -name "*.json" -o -name "*cassette*" \) 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 2802


🏁 Script executed:

# Also check the overall test structure for bedrock
find packages/opentelemetry-instrumentation-bedrock -type d -name "tests" -o -name "test" 2>/dev/null

Repository: traceloop/openllmetry

Length of output: 117


🏁 Script executed:

# List contents of test directory if it exists
ls -la packages/opentelemetry-instrumentation-bedrock/tests/ 2>/dev/null || echo "No tests directory"

Repository: traceloop/openllmetry

Length of output: 411


🌐 Web query:

vcrpy 8.0.0 breaking changes migration from v6 v7 cassette format

💡 Result:

Short answer

  • VCR.py 8 uses the modern "versioned" cassette format (cassette format version = 1) and will reject older un‑versioned/legacy cassettes. If you have cassettes produced by much older VCR.py releases they’re not compatible and must be migrated or re‑recorded. [1][2]

What to do when migrating from v6/v7

  • Easiest: delete existing cassette files and re-run tests so VCR.py re‑records them. This is the recommended approach if re‑recording is feasible. [1][2]
  • Migration helper: VCR.py includes a migration script that attempts to convert old cassette files to the new format:
    python -m vcr.migration PATH
    (PATH = a cassette file or directory). Back up cassettes before running it. [1][2]

Notes and gotchas

  • Custom serializers: if you implemented a custom serializer/deserializer, update it to the newer API (serializers take dict -> string; deserializers take string -> dict). [1]
  • v6.x warning: some v6 releases noted binary format issues and recommended re‑creating affected cassettes — if you saw that in v6 you may still need to re‑record even after attempting migration. [3]
  • If migration fails the safest route is to delete + re‑record. The serialize implementation will raise an error when it detects truly old cassette structures. [2]

References

  1. VCR.py docs — Upgrade / New Cassette Format and serializer notes. [vcrpy docs]. [1]
  2. vcr.serialize source (cassette format version, detection + error message and serialize/deserialize behavior). [2]
  3. vcrpy 6.0.0 changelog (note about possible need to recreate cassettes due to binary format). [3]

Would you like me to run the migration for a specific cassette directory (I can show the exact command and checks) or inspect a sample cassette to see if it needs migration?


🏁 Script executed:

# Inspect a sample cassette to check if it has version header (versioned format)
head -20 "packages/opentelemetry-instrumentation-bedrock/tests/metrics/cassettes/test_bedrock_metrics/test_invoke_model_metrics.yaml"

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

# Check another cassette from traces
head -20 "packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_anthropic/test_anthropic_2_completion.yaml"

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

# Search for "version:" field in cassettes to see if they're versioned
rg "^version:" packages/opentelemetry-instrumentation-bedrock/tests/ --type yaml | head -5

Repository: traceloop/openllmetry

Length of output: 151


vcrpy 8.x upgrade requires cassette regeneration.

This major version bump (6 → 8, skipping v7) changes the cassette format. vcrpy 8 uses the versioned cassette format and rejects legacy cassettes from v6/v7. The existing test cassettes must be regenerated by deleting them and re-running the tests, or by using the migration script (python -m vcr.migration packages/opentelemetry-instrumentation-bedrock/tests/). Ensure regenerated cassettes are scrubbed of any secrets or PII using VCR filters.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-bedrock/pyproject.toml` at line 41,
The dependency bump to "vcrpy>=8.0.0,<9" in pyproject.toml requires regenerating
test cassettes because vcrpy v8 rejects legacy v6/v7 cassettes; run the
migration tool (python -m vcr.migration) or delete and re-record cassettes in
the package opentelemetry-instrumentation-bedrock tests directory, then re-run
the test suite to regenerate them; after regeneration ensure you apply the
project's VCR filters/scrubbers to remove any secrets/PII from the new cassettes
before committing.

Comment on lines +91 to +92
[tool.uv]
constraint-dependencies = ["urllib3>=2.6.3", "langgraph-checkpoint>=4.0.0", "pip>=25.3"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

urllib3 2.6.3 CVE vulnerabilities security fixes

💡 Result:

Short answer: urllib3 2.6.3 patches a high‑severity decompression‑bomb bypass (CVE‑2026‑21441) that affected the streaming API when following HTTP redirects. The release note also includes a behavioral change for Retry‑After handling and other bug fixes — upgrade to 2.6.3. [1][2][3]

Sources

  • urllib3 2.6.3 release notes (GitHub). [1]
  • NVD / CVE‑2026‑21441 details. [2]
  • Fedora / advisory summary noting the same fix. [3]

🌐 Web query:

langgraph-checkpoint 4.0.0 CVE-2025-64104 CVE-2025-8709 SQLite injection

💡 Result:

Short answer: The two CVEs (CVE-2025-64104 and CVE-2025-8709) describe SQL‑injection vulnerabilities in LangGraph’s SQLite checkpoint/store implementation (improper string concatenation of filter operators). They affect the langgraph-checkpoint-sqlite package (known affected release: 2.0.10) and were fixed in 2.0.11 — so upgrade to >= 2.0.11 or apply the upstream patch. [1][2][3]

References:

  • NVD entry / summary for CVE-2025-8709. [1]
  • GitHub Advisory for the SQLite store SQL‑injection (patch and fixed release). [2]
  • OpenCVE / advisory summary for CVE-2025-64104 (notes fix in 2.0.11). [3]

🌐 Web query:

pip 25.3 CVE security vulnerabilities

💡 Result:

Short answer: pip 25.3 fixes a known tar-extraction symlink vulnerability tracked as CVE-2025-8869 (affects pip versions before 25.3). The flaw could allow a malicious sdist to overwrite files outside the extraction directory; upgrading to pip 25.3 (or using a Python that implements PEP 706) is the recommended remediation. [1][2][3]

Sources:

  • Advisory summary of the issue and fix. [1]
  • Ubuntu security notice describing CVE-2025-8869 and mitigations (upgrade pip or use Python with PEP 706). [2]
  • Debian security tracker showing the fix in pip 25.3+dfsg-1. [3]

🌐 Web query:

langgraph-checkpoint changelog 4.0.0 security

💡 Result:

Summary — langgraph-checkpoint v4.0.0 (security items)

  • v4.0.0 released on Jan 12, 2026; release notes show bug fixes (no new RCE patch text in the tag itself). [1]
  • A prior critical RCE in the JsonPlusSerializer (CVE‑2025‑64439) affected langgraph‑checkpoint < 3.0; upgrading to 3.0+ (and thus 4.0.0) removes unsafe deserialization of custom objects in the legacy "json" mode. [2][3]
  • Security scanners / databases report no known vulnerabilities in 4.0.0. [4][5]

Action: upgrade to langgraph‑checkpoint >= 3.0 (4.0.0 recommended) and, if you must deserialize old “json”-mode payloads, audit/whitelist allowed modules via the serde config. [2][5]

Sources:
[1] GitHub release: langchain-ai/langgraph — langgraph-checkpoint==4.0.0 (Jan 12, 2026).
[2] GitLab advisory / CVE‑2025‑64439 — JsonPlusSerializer RCE; fixed in 3.0+.
[3] LangGraph docs changelog noting requirement of langgraph-checkpoint >3.0 to avoid deserialization vuln.
[4] Safety DB / package page — latest version 4.0.0 with no known vulnerabilities.
[5] Snyk package page — no known security issues in 4.0.0.


Correct the langgraph-checkpoint CVE attribution and add missing SQLite checkpoint constraint.

The constraint-dependencies has an inaccurate CVE attribution:

  • urllib3>=2.6.3 correctly addresses CVE-2026-21441 (decompression-bomb bypass)
  • pip>=25.3 correctly addresses CVE-2025-8869 (tar-extraction symlink vulnerability)
  • langgraph-checkpoint>=4.0.0 does NOT fix CVE-2025-64104 or CVE-2025-8709 — those SQLite injection CVEs affect langgraph-checkpoint-sqlite 2.0.10 and are fixed in langgraph-checkpoint-sqlite 2.0.11. The langgraph-checkpoint 4.0.0 constraint instead fixes CVE-2025-64439 (JsonPlusSerializer RCE in versions < 3.0).

To address the SQLite injection CVEs, add langgraph-checkpoint-sqlite>=2.0.11 to the constraint-dependencies. Note that these constraints only apply when using the uv package manager; pip users will not benefit from them. Per coding guidelines, ensure package management commands use uv run <command>.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-langchain/pyproject.toml` around lines
91 - 92, Update the constraint-dependencies entry: keep urllib3>=2.6.3 and
pip>=25.3 as-is, correct the CVE attribution for langgraph-checkpoint (it
addresses CVE-2025-64439) and add langgraph-checkpoint-sqlite>=2.0.11 to address
the SQLite injection CVEs (CVE-2025-64104 and CVE-2025-8709); modify the
constraint-dependencies list (the symbol to change is constraint-dependencies)
to include "langgraph-checkpoint-sqlite>=2.0.11" alongside the existing
"langgraph-checkpoint>=4.0.0", and ensure any documentation or commands
referencing package management use the "uv run <command>" pattern per
guidelines.

license = "Apache-2.0"
readme = "README.md"
requires-python = ">=3.9,<4"
requires-python = ">=3.10,<4"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Python version requirements across all pyproject.toml files in the repo
echo "=== Python version requirements across packages ==="
fd -t f 'pyproject.toml' | xargs grep -l 'requires-python' | while read f; do
  echo "--- $f ---"
  grep 'requires-python' "$f"
done

Repository: traceloop/openllmetry

Length of output: 3391


🌐 Web query:

Python 3.9 end-of-life date official announcement

💡 Result:

Official end-of-life date for Python 3.9: October 31, 2025. [1][2]

Sources:

  • PEP 596 (Python 3.9 release schedule / lifespan). [1]
  • Python.org release notice for Python 3.9.25 (marks 3.9 reached EOL on 2025-10-31). [2]

Python 3.10 minimum version bump is appropriate.

Dropping Python 3.9 support is well-justified: Python 3.9 reached end-of-life on October 31, 2025, and this change aligns with the broader ecosystem update. However, note that the transition is still in progress—while most packages now require Python 3.10+, three packages still support 3.9:

  • opentelemetry-instrumentation-milvus
  • opentelemetry-instrumentation-qdrant
  • opentelemetry-semantic-conventions-ai

The change in openai-agents is consistent with the majority of the codebase.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-openai-agents/pyproject.toml` at line
11, Update the minimum Python version in this package by setting requires-python
= ">=3.10,<4" in pyproject.toml (the requires-python entry) and ensure
consistency across the package: update any Python classifiers in pyproject.toml
or setup metadata, adjust CI job matrix entries that reference Python 3.9 to
start at 3.10, and verify that any runtime checks or packaging metadata (e.g.,
in tool.poetry or project.urls) do not still enumerate 3.9 support so the
package metadata and CI reflect the new >=3.10 requirement.

license = "Apache-2.0"
readme = "README.md"
requires-python = ">=3.9,<4"
requires-python = ">=3.10,<4"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check packaging metadata and CI for Python version declarations
echo "=== Checking requires-python declarations ==="
rg -n "requires-python" --type toml --type cfg
echo -e "\n=== Checking python_requires declarations ==="
rg -n "python_requires" --type py --type cfg
echo -e "\n=== Checking Programming Language classifiers ==="
rg -n "Programming Language :: Python" --type toml --type cfg
echo -e "\n=== Checking python-version in GitHub workflows ==="
rg -n "python-version" .github/workflows/

Repository: traceloop/openllmetry

Length of output: 1158


🏁 Script executed:

#!/bin/bash
# Display the specific pinecone pyproject.toml file
echo "=== Pinecone package pyproject.toml ==="
cat -n packages/opentelemetry-instrumentation-pinecone/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 2479


🏁 Script executed:

#!/bin/bash
# Check if there are any runtime Python version checks in the pinecone instrumentation
echo "=== Searching for sys.version checks in pinecone package ==="
fd . packages/opentelemetry-instrumentation-pinecone -type f \( -name "*.py" \)
rg "sys.version" packages/opentelemetry-instrumentation-pinecone

Repository: traceloop/openllmetry

Length of output: 313


🏁 Script executed:

#!/bin/bash
# Get context around the CI workflow python-version configurations
echo "=== ci.yml context around python-version declarations ==="
sed -n '50,85p' .github/workflows/ci.yml
echo ""
sed -n '135,170p' .github/workflows/ci.yml

Repository: traceloop/openllmetry

Length of output: 2591


🏁 Script executed:

#!/bin/bash
# Check if classifiers exist in the pinecone pyproject.toml
echo "=== Check for classifiers in pinecone pyproject.toml ==="
grep -n "classifiers" packages/opentelemetry-instrumentation-pinecone/pyproject.toml
# If not found, check a few other packages
echo -e "\n=== Check other packages for requires-python and classifiers ==="
for pkg in packages/opentelemetry-instrumentation-*/pyproject.toml; do
  echo "--- $pkg ---"
  grep -n "requires-python\|classifiers" "$pkg" | head -5
done | head -50

Repository: traceloop/openllmetry

Length of output: 2780


🏁 Script executed:

#!/bin/bash
# Check release.yml for which package versions it releases
echo "=== release.yml workflow content (relevant sections) ==="
sed -n '1,50p' .github/workflows/release.yml
echo ""
echo "=== release.yml job configurations ==="
grep -n "python-version\|needs\|jobs:" .github/workflows/release.yml

Repository: traceloop/openllmetry

Length of output: 1641


Align release workflow with Python 3.10 minimum requirement.

Line 12 declares requires-python = ">=3.10,<4", and CI does test on 3.10 in the test matrix. However, .github/workflows/release.yml (lines 22, 63, 104, 137) uses only Python 3.11 across all release jobs. Consider adding Python 3.10 to the release workflow matrix to explicitly validate the declared minimum before publishing.

Additionally, no classifiers are present in the project's pyproject.toml files. Consider adding classifiers metadata (e.g., Programming Language :: Python :: 3.10, Programming Language :: Python :: 3.11) to clarify version support in package metadata.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-pinecone/pyproject.toml` at line 12,
The release workflow currently tests only Python 3.11 while pyproject.toml
declares requires-python = ">=3.10,<4"; update .github/workflows/release.yml to
include Python 3.10 in the python-version matrix for all release jobs (the
Python matrix entries around the job definitions at lines referenced in the
review) so the declared minimum is validated, and add a classifiers section to
the project's pyproject.toml (add the classifiers key under the project
metadata) including at minimum "Programming Language :: Python :: 3.10" and
"Programming Language :: Python :: 3.11" to make supported versions explicit in
package metadata.

"pytest-sugar==1.0.0",
"pytest>=8.2.2,<9",
"vcrpy>=6.0.1,<7",
"vcrpy>=8.0.0,<9",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

pytest-recording vcrpy 8.x compatibility breaking changes

💡 Result:

Short summary — what changed in VCR.py 8 that can break pytest-recording users

  • VCR.py 8 dropped support for Python 3.9 and also requires urllib3 >= 2 (i.e., dropped urllib3 < 2). If your environment uses Python 3.9 or older urllib3, upgrading to v8 will be incompatible. [1]

  • VCR.py 8 contains internal/behaviour changes (rewritten httpx support to patch httpcore, fixes to proxy/redirect handling, new options such as drop_unused_requests) that may require re-recording cassettes or updating code that relied on prior vcrpy internals. [1]

  • pytest-recording itself has recent releases and packaging updates tied to newer Python versions; confirm you have a pytest-recording release that declares compatibility with vcrpy 8 (or that your test environment meets vcrpy8 requirements). If you depend on an older pytest-recording pinned to older vcrpy versions, upgrading vcrpy to 8 may break things until pytest-recording explicitly supports v8. Check/upgrade pytest-recording to a release that supports vcrpy 8 or keep vcrpy <8. [2][3]

What to do

  • If you plan to upgrade to vcrpy 8: ensure your Python version >= the one v8 supports and urllib3 >= 2, update pytest-recording to a release that declares compatibility, and re-run/ re-record cassettes where behavior changed (httpx, proxy/redirects). [1][2]

Sources

  • vcrpy v8.0.0 release notes (breaking changes: drop Python 3.9, drop urllib3 <2; httpx/httpcore rework; new options). [1]
  • pytest-recording project / PyPI (project and releases; check package metadata for Python/version constraints). [2][3]

References
[1] vcrpy v8 release notes / changelog.
[2] pytest-recording GitHub / project README.
[3] pytest-recording PyPI release metadata.


🏁 Script executed:

# First, let's examine the pyproject.toml file and surrounding context
cat packages/opentelemetry-instrumentation-pinecone/pyproject.toml | head -60

Repository: traceloop/openllmetry

Length of output: 186


🏁 Script executed:

# Check if there are test files and cassettes in the pinecone instrumentation
find packages/opentelemetry-instrumentation-pinecone -type f -name "*.py" -path "*/tests/*" | head -20

Repository: traceloop/openllmetry

Length of output: 264


🏁 Script executed:

# Look for VCR cassette files
find packages/opentelemetry-instrumentation-pinecone -type f \( -name "*.yaml" -o -name "*.json" \) | grep -i cassette

Repository: traceloop/openllmetry

Length of output: 168


🏁 Script executed:

# Check the dependencies more carefully
grep -A 10 "dependencies\|pytest-recording\|vcrpy" packages/opentelemetry-instrumentation-pinecone/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 1040


🌐 Web query:

pytest-recording 0.13.1 vcrpy 8 compatibility support

💡 Result:

Short answer: pytest-recording 0.13.1 (released Dec 6, 2023) predates VCR.py v8 and does not explicitly guarantee compatibility with vcrpy 8 — upgrade to a later pytest-recording (0.13.2+ / recommended 0.13.4) for better chance of compatibility. [1][2][3]

Sources:

  • pytest-recording release listing (0.13.1 date). [1]
  • pytest-recording project page / README (integration with VCR.py; no explicit vcrpy‑8 guarantee on older releases). [2]
  • VCR.py (vcrpy) documentation (v8 series). [3]

References:
[1] piwheels release list for pytest-recording (0.13.1 — 2023-12-06).
[2] pytest-recording project / PyPI pages.
[3] vcrpy documentation (v8).


🏁 Script executed:

# Check conftest.py to see VCR configuration
cat packages/opentelemetry-instrumentation-pinecone/tests/conftest.py

Repository: traceloop/openllmetry

Length of output: 1893


🏁 Script executed:

# Examine the cassette file - check its format and timestamp if available
head -50 packages/opentelemetry-instrumentation-pinecone/tests/cassettes/test_query/test_pinecone_retrieval.yaml

Repository: traceloop/openllmetry

Length of output: 1605


🏁 Script executed:

# Check git history of cassette to see if it was recently updated
cd packages/opentelemetry-instrumentation-pinecone && git log --oneline -n 10 tests/cassettes/test_query/test_pinecone_retrieval.yaml 2>/dev/null || echo "Git history not available"

Repository: traceloop/openllmetry

Length of output: 93


🏁 Script executed:

# Examine test_query.py to see how VCR is used
cat packages/opentelemetry-instrumentation-pinecone/tests/test_query.py

Repository: traceloop/openllmetry

Length of output: 7137


Upgrade pytest-recording to support vcrpy 8 and re-record cassettes.

The pinned version pytest-recording>=0.13.1,<0.14.0 predates vcrpy 8 (released Dec 2023) and lacks explicit compatibility. Upgrade to pytest-recording>=0.13.2 (recommended 0.13.4+) which declares vcrpy 8 support. Additionally, vcrpy 8 introduced breaking changes (httpx/httpcore rewrite, proxy/redirect handling fixes) that require cassettes to be re-recorded. Verify that test_pinecone_retrieval.yaml has been re-recorded with vcrpy 8 to ensure proper behavior. Ensure cassettes contain no secrets (verify the filtered authorization header and other sensitive data are properly scrubbed).

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-pinecone/pyproject.toml` at line 44,
Update the pytest-recording constraint in pyproject.toml to a version that
declares vcrpy 8 support (e.g., change pytest-recording to ">=0.13.2" or
preferably ">=0.13.4"), then re-record the VCR cassettes used by the Pinecone
tests (notably test_pinecone_retrieval.yaml) using vcrpy 8 to accommodate
httpx/httpcore and proxy/redirect behavior changes; after re-recording, verify
cassettes have sensitive data scrubbed (ensure filtered Authorization header and
other secrets are properly filtered) before committing.

"pytest>=8.2.2,<9",
"together>=1.2.0,<2",
"vcrpy>=6.0.1,<7",
"vcrpy>=8.0.0,<9",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, examine the pyproject.toml file to understand the change and context
head -n 50 packages/opentelemetry-instrumentation-together/pyproject.toml | tail -n 15

Repository: traceloop/openllmetry

Length of output: 384


🏁 Script executed:

# Search for any test configuration or cassette-related setup
fd cassettes packages/opentelemetry-instrumentation-together/tests

Repository: traceloop/openllmetry

Length of output: 129


🏁 Script executed:

# Look for vcrpy usage in test files
rg "vcr\." packages/opentelemetry-instrumentation-together/tests -l

Repository: traceloop/openllmetry

Length of output: 47


🌐 Web query:

vcrpy 8.0.0 breaking changes between 6.x and 8.x release notes

💡 Result:

Summary of breaking changes you need to know (from v6.x → v8.0.0):

  • v6.0.0 (notable BREAKING items):

    • Fix to httpx support that may require re-recording cassettes because a binary format changed. [1]
    • Drop support for boto (vcrpy still supports boto3). [1]
    • Drop simplejson dependency. [1]
  • v8.0.0 (notable BREAKING items):

    • Drop support for Python 3.9. [2]
    • Drop support for urllib3 < 2 (moves to urllib3 v2+ to address CVE/warning issues). [2]

Sources:


🌐 Web query:

vcrpy library pypi version history changelog

💡 Result:

You can find vcrpy's PyPI release history and files on its PyPI page, and its detailed changelog on the project docs/GitHub:

  • PyPI release history (all released versions, upload dates, and downloadable files). [1]
  • Official changelog (per-release notes) in the vcrpy documentation on Read the Docs. [2]
  • Alternate aggregated changelog/version list (LibHunt). [3]

Links:
[1] PyPI project page — vcrpy (release history & files).
[2] vcrpy Changelog — Read the Docs (per-release notes).
[3] VCR.py changelog overview — LibHunt.


🏁 Script executed:

# Check Python version requirement in pyproject.toml
rg "requires-python" packages/opentelemetry-instrumentation-together/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

# Check urllib3 version requirement
rg "urllib3" packages/opentelemetry-instrumentation-together/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

# Check if there's pytest-recording configuration or cassette setup
head -n 100 packages/opentelemetry-instrumentation-together/tests/conftest.py 2>/dev/null || echo "No conftest.py found"

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

# Look for any cassette files to see if they might be stale
find packages/opentelemetry-instrumentation-together/tests/cassettes -name "*.yaml" -o -name "*.json" | head -10

Repository: traceloop/openllmetry

Length of output: 151


🏁 Script executed:

grep "requires-python" packages/opentelemetry-instrumentation-together/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 94


🏁 Script executed:

cat packages/opentelemetry-instrumentation-together/pyproject.toml | grep -A 20 "dependencies"

Repository: traceloop/openllmetry

Length of output: 973


🏁 Script executed:

ls -la packages/opentelemetry-instrumentation-together/tests/cassettes/ | head -15

Repository: traceloop/openllmetry

Length of output: 295


Verify vcrpy 8.0.0 cassettes are re-recorded and clean.

The upgrade dependencies are compatible: Python 3.10+ is required (satisfies vcrpy 8.0.0's Python 3.9 drop), and urllib3 >=2.6.3 is already specified (meets vcrpy's urllib3 >= 2 requirement). Since tests use pytest-recording wrapper, direct API changes are minimal.

Re-record VCR cassettes in tests/cassettes/test_chat/ and tests/cassettes/test_completion/ to ensure compatibility, and ensure all cassettes are scrubbed of secrets or PII using pytest-recording's filtering mechanisms.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-together/pyproject.toml` at line 43,
Upgrade to vcrpy 8.0.0 requires re-recording and scrubbing test cassettes;
re-run the tests that use pytest-recording to re-record the cassettes under
tests/cassettes/test_chat/ and tests/cassettes/test_completion/ (ensure
record_mode is set to 'all' or appropriate in your pytest-recording/conftest.py
fixture), then sanitize those new cassettes using pytest-recording filters (or
the VCR.py before_record/after_record hooks configured in conftest.py) to remove
keys, tokens, PII and any headers (e.g., Authorization, Cookie, API keys) before
committing; verify playback succeeds with urllib3>=2 and Python 3.10+.

Comment on lines +75 to +76
[tool.uv]
constraint-dependencies = ["urllib3>=2.6.3", "pyarrow>=18.1.0", "pip>=25.3"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check workspace structure and all [tool.uv] sections
echo "=== Files matching pyproject.toml in repo ==="
fd 'pyproject.toml' --type f

echo -e "\n=== All [tool.uv] sections ==="
rg -B2 '\[tool\.uv\]' --glob 'pyproject.toml' -A3

echo -e "\n=== All constraint-dependencies occurrences ==="
rg 'constraint-dependencies' --glob 'pyproject.toml' -B1 -A1

Repository: traceloop/openllmetry

Length of output: 186


🌐 Web query:

uv constraint-dependencies workspace root scope behavior

💡 Result:

Short answer: uv only reads constraint-related settings from the workspace root pyproject.toml. Specifically, constraint-dependencies (and build-constraint-dependencies, override-dependencies, and exclude-dependencies) set at the workspace root are applied by uv commands like uv lock / uv sync / uv run; uv ignores those same keys when declared in non-root workspace members or in uv.toml files. See the docs for the exact settings and the same note in the locking docs. [1][2]

Sources

  • uv reference — settings (constraint-dependencies / build-constraint-dependencies / override-dependencies notes). [1]
  • uv locking docs (how constraint files are used and the note that uv appends constraint-dependencies from the workspace root). [2]

🌐 Web query:

uv package manager pyproject.toml constraint-dependencies non-root package

💡 Result:

Short answer: uv only reads constraint-dependencies from the workspace root. To apply constraints to a non-root (workspace member) package, put the constraint-dependencies in the root pyproject.toml (or use a constraints file passed to uv pip compile). uv will ignore constraint-dependencies declared inside other workspace members or uv.toml files. [1][2]

Example (root pyproject.toml):
[tool.uv]
constraint-dependencies = ["somepkg<2.0"] # applies to resolution for workspace members

If you need build-time constraints, use build-constraint-dependencies at the workspace root or a build constraints file. [1][3]

Sources:

  • uv settings (constraint-dependencies / build-constraint-dependencies). [1]
  • uv virtual/workspace dependency behavior and using sources/constraints. [2]
  • troubleshooting / build-constraint-dependencies example. [3]

🌐 Web query:

uv monorepo constraint-dependencies inheritance or propagation

💡 Result:

Short answer: uv's constraint-dependencies do not inherit/propagate from workspace members — uv (uv lock / uv sync / uv run) reads constraint-dependencies (and build-constraint-dependencies) only from the pyproject.toml at the workspace root and ignores declarations in other workspace members or uv.toml files. Use the root pyproject.toml to declare constraints for the whole monorepo. [1][2]

References:

  • uv settings docs (constraint-dependencies / build-constraint-dependencies note). [1]
  • uv resolution docs (how constraints are applied during resolution). [2]

[1] https://docs.astral.sh/uv/reference/settings/
[2] https://docs.astral.sh/uv/concepts/resolution/


Move constraint-dependencies to the workspace root pyproject.toml.

uv only reads constraint-dependencies from the workspace root; constraints declared in non-root workspace members (like this package) are silently ignored. Move constraint-dependencies = ["urllib3>=2.6.3", "pyarrow>=18.1.0", "pip>=25.3"] to the [tool.uv] section in the root pyproject.toml so these constraints actually take effect during dependency resolution.

🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-writer/pyproject.toml` around lines 75
- 76, The constraint-dependencies entry in this package's [tool.uv] section is
ineffective because uv only reads constraints from the workspace root; move the
line constraint-dependencies = ["urllib3>=2.6.3", "pyarrow>=18.1.0",
"pip>=25.3"] out of packages/opentelemetry-instrumentation-writer/pyproject.toml
and add it to the root pyproject.toml under its [tool.uv] section so the uv
resolver will honor these constraints.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 18, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jan 18, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@galkleinman galkleinman merged commit adba2bf into main Jan 18, 2026
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants