chore(deps): various cves remediation #3565

galkleinman · 2026-01-15T15:56:02Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

Breaking Changes
- Minimum Python requirement raised to >=3.10 across packages (>=3.12 for LlamaIndex instrumentation).
New Features
- Spans now include tool-invocation details.
- More robust LLM metadata extraction (model and token usage) across formats.
Updates
- Dependency ecosystem bumped (LlamaIndex, LangChain and related).
- Expanded/modernized test cassettes; many tests converted to async and updated model defaults.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T15:56:10Z

📝 Walkthrough

Walkthrough

Raised per-package Python minimums (mostly to 3.10; one llamaindex toolchain to 3.12), updated many pyproject test/dependency ranges and added [tool.uv] constraint blocks, migrated LangChain imports, converted llamaindex tests to async/updated APIs, refreshed many VCR cassettes, and enhanced span handling and LLM response parsing.

Changes

Cohort / File(s)	Summary
Python & tooling bumps `packages//.python-version`, `packages//pyproject.toml`, `packages/opentelemetry-semantic-conventions-ai/.python-version`	Bump `.python-version` pins (mostly 3.9.5→3.10; one llamaindex → 3.12); raise `requires-python` in many `pyproject.toml`; add `[tool.uv]` constraint-dependencies; bump test tooling (notably `vcrpy` → >=8.0.0).
LlamaIndex instrumentation core `packages/opentelemetry-instrumentation-llamaindex/.../dispatcher_wrapper.py`, `.../span_utils.py`	Added workflow tool-call handling to span creation (extract `tool.name` and `tool.arguments` for WorkflowToolCall) and reworked `set_llm_chat_response_model_attributes` to robustly extract model and token usage across multiple response formats and fallback shapes.
LlamaIndex tests & fixtures `packages/opentelemetry-instrumentation-llamaindex/tests/*.py`, `packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/...`	Converted tests to async (`@pytest.mark.asyncio`), updated ReActAgent/LLM/Cohere signatures and models, relaxed span assertions; massively expanded/refreshed VCR cassettes (tool-call permutations, upload/poll/result flows, HTTPX UA/version changes).
LangChain tests & cassettes `packages/opentelemetry-instrumentation-langchain/tests/*.py`, `packages/opentelemetry-instrumentation-langchain/tests/cassettes/...`	Repointed imports to `langchain_core`/`langchain_classic`/`langchain_community`; updated model strings; adjusted assertions to be log/field-based; updated many cassettes to Cohere v2 SSE/event-stream formats and changed request/response formats.
Sample app updates `packages/sample-app/pyproject.toml`, `packages/sample-app/sample_app/...`	Upgraded llama-index/langchain ecosystem deps, adjusted dependencies list, and fixed import path for `HuggingFaceInferenceAPI` → `llama_index.llms.huggingface_api`.
Widespread cassette refreshes `packages/*/tests/cassettes/...`	Large number of fixture updates: header/timing/trace metadata, gzip payload substitutions, SSE/event-stream transitions, and client UA/version changes — primarily recorded test data modifications.
Misc imports & tests adjustments `packages//tests/.py`, `packages/sample-app/sample_app/*.py`	Multiple import-path replacements (langchain → langchain_core/langchain_classic/langchain_community), minor test renames/refocus (e.g., llamaparse parse/aparse presence), and assertion shape relaxations.
Individual package pyproject additions `packages/opentelemetry-instrumentation-*/pyproject.toml` (many)	Added small per-package `pyproject.toml` entries (mostly `[tool.uv]`), and per-package minor dependency/test-version bumps reflecting the global toolchain change.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through pins of Python bright,
Tests learned to await by candlelight,
Cassettes now stream with eventful song,
Spans hum tools and tokens all day long,
I leave a carrot for CI tonight. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: updating dependencies to remediate CVEs. It is concise, specific, and accurately reflects the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Skipped PR review on f4ce84c because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

…nllmetry into gk/cves-remediation-150126

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_structured_llm/test_structured_llm_achat_model_attributes.yaml (1)

17-115: Filter sensitive headers and project identifiers in cassette recordings.

The cassette exposes openai-project (proj_tzz1TbPPOXaf6j9tEkVUBIAa) and openai-organization (traceloop), which are account/organization identifiers that should not be persisted in test fixtures. Additionally, volatile headers like Date, CF-RAY, Set-Cookie, and traceparent will cause unnecessary cassette diffs on re-recording.

Extend the filter_headers in packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py to include these headers. Consider also filtering other CloudFlare and tracing headers that don't affect test assertions.

🤖 Fix all issues with AI agents

In `@packages/opentelemetry-instrumentation-llamaindex/pyproject.toml`:
- Around line 38-44: Bump validation: run the llama-parse/llama-index
compatibility tests and adjust wrapper code if runtime failures occur;
specifically run the test suite referenced in
packages/opentelemetry-instrumentation-llamaindex/tests/test_llamaparse.py with
uv run pytest to reproduce issues, check any wrapper functions around
llama-index classes/methods (e.g., your instrumentation wrappers that call Parse
API or Workflow/checkpointer-related shims) for changes in llama-parse 0.6+
(JSON config payload) and llama-index 0.14+ (removed
checkpointer/sub-workflows/deprecated Workflow methods), and update the wrapper
logic to use the new Parse JSON payload shape or remove references to removed
Workflow/checkpointer APIs so tests pass under the new dependency versions.

In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml`:
- Around line 677-900: The VCR cassette includes stale requests for models
"command-r" and "command-r-plus" that returned 404s; re-record the cassette
(packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml)
using the current test harness so it only captures the actual calls made by the
test (which instantiate Cohere(model="command-a-03-2025") in test_agents.py) —
run the test suite or the specific test that uses the cassette, delete or
replace the old cassette and commit the newly recorded YAML so the cassette
matches the current codepath and no longer contains the obsolete fallback
requests.

In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml`:
- Around line 358-359: The cassette contains a recorded failed interaction for
the deprecated model "gpt-3.5-turbo-0613" with "stream": true that returns a
404; locate the request/response block that mentions model: "gpt-3.5-turbo-0613"
and response code: 404 / message: "Not Found" and either remove that entire
failed interaction from the YAML or re-record the cassette so the interaction is
replaced by a successful call (preferably using a supported model and without
streaming if not intended); ensure the cleaned cassette no longer contains the
404 response or the deprecated model entry.

In
`@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml`:
- Around line 797-1545: The VCR cassette contains sensitive x-session-id and
x-correlation-id response headers that must be scrubbed; update the vcr_config()
function (in conftest.py) so its returned "filter_headers" list includes
"x-session-id" and "x-correlation-id" in addition to existing entries like
"authorization" and "api-key", preserving the other keys (e.g., "ignore_hosts")
in the returned config.

In `@packages/sample-app/pyproject.toml`:
- Line 17: Remove the unused dependency "llama-index-llms-huggingface" from
pyproject.toml (it is not imported or used anywhere); keep the necessary
"llama-index-llms-huggingface-api" (used for HuggingFaceInferenceAPI) and
"llama-index-embeddings-huggingface" (used for HuggingFaceEmbedding), so delete
the "llama-index-llms-huggingface" entry to reduce dependency bloat and attack
surface.

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_aload_data_instrumentation.yaml (1)

797-1545: Confirm the async test’s polling/request payload matches the new cassette.

This cassette now reflects the updated request body and headers. If the async test still sends the old payload or polls with different headers, VCR replays can miss.

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml (1)

473-478: Consider scrubbing OpenAI project identifiers from cassette.

The cassette contains OpenAI project identifiers (e.g., proj_tzz1TbPPOXaf6j9tEkVUBIAa) in the response headers. While not a secret, these identifiers could be considered organizational PII. Based on learnings, VCR cassettes should avoid containing PII. Consider using VCR's filter_headers or before_record_response to scrub these values.

packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (1)

196-202: Consider strengthening tool span assertions.

The conditional check if "tool.name" in tool_span.attributes makes the assertion optional. If call_tool.task spans should always have a tool.name attribute, consider making this assertion unconditional. Otherwise, if this is intentionally flexible for different code paths, the current approach is acceptable.

packages/opentelemetry-instrumentation-llamaindex/pyproject.toml

coderabbitai · 2026-01-17T11:56:51Z

...y-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml

+- request:
+    body: '{"message":"Which city has the highest population and how many years will
+      it take to reach 20 million inhabitants if it''s population increases by 1 million
+      a year?","model":"command-r","chat_history":[{"role":"System","message":"You
+      are designed to help with a variety of tasks, from answering questions to providing
+      summaries to other types of analyses.\n\n## Tools\n\nYou have access to a wide
+      variety of tools. You are responsible for using the tools in any sequence you
+      deem appropriate to complete the task at hand.\nThis may require breaking the
+      task into subtasks and using different tools to complete each subtask.\n\nYou
+      have access to the following tools:\n> Tool Name: calc_tool\nTool Description:
+      Useful for calculating the number of years until a city reaches a target population.\nTool
+      Args: {\"properties\": {\"target_population\": {\"title\": \"Target Population\",
+      \"type\": \"integer\"}, \"current_population\": {\"title\": \"Current Population\",
+      \"type\": \"integer\"}, \"yearly_increase\": {\"title\": \"Yearly Increase\",
+      \"type\": \"integer\"}}, \"required\": [\"target_population\", \"current_population\",
+      \"yearly_increase\"], \"type\": \"object\"}\n\n> Tool Name: sql_tool\nTool Description:
+      Useful for translating a natural language query into a SQL query over a table
+      which contains the names of cities, together with their population and country\nTool
+      Args: {\"properties\": {\"input\": {\"title\": \"Input\", \"type\": \"string\"}},
+      \"required\": [\"input\"], \"type\": \"object\"}\n\n\n\n## Output Format\n\nPlease
+      answer in the same language as the question and use the following format:\n\n```\nThought:
+      The current language of the user is: (user''s language). I need to use a tool
+      to help me answer the question.\nAction: tool name (one of calc_tool, sql_tool)
+      if using a tool.\nAction Input: the input to the tool, in a JSON format representing
+      the kwargs (e.g. {\"input\": \"hello world\", \"num_beams\": 5})\n```\n\nPlease
+      ALWAYS start with a Thought.\n\nNEVER surround your response with markdown code
+      markers. You may use code markers within your response if you need to.\n\nPlease
+      use a valid JSON format for the Action Input. Do NOT do this {''input'': ''hello
+      world'', ''num_beams'': 5}. If you include the \"Action:\" line, then you MUST
+      include the \"Action Input:\" line too, even if the tool does not need kwargs,
+      in that case you MUST use \"Action Input: {}\".\n\nIf this format is used, the
+      tool will respond in the following format:\n\n```\nObservation: tool response\n```\n\nYou
+      should keep repeating the above format till you have enough information to answer
+      the question without using any more tools. At that point, you MUST respond in
+      one of the following two formats:\n\n```\nThought: I can answer without using
+      any more tools. I''ll use the user''s language to answer\nAnswer: [your answer
+      here (In the same language as the user''s question)]\n```\n\n```\nThought: I
+      cannot answer the question with the provided tools.\nAnswer: [your answer here
+      (In the same language as the user''s question)]\n```\n\n## Current Conversation\n\nBelow
+      is the current conversation consisting of interleaving human and assistant messages.\n"}],"stream":false}'
+    headers:
+      accept:
+      - '*/*'
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      content-length:
+      - '3186'
+      content-type:
+      - application/json
+      host:
+      - api.cohere.com
+      user-agent:
+      - cohere/5.20.1
+      x-client-name:
+      - llama_index
+      x-fern-language:
+      - Python
+      x-fern-sdk-name:
+      - cohere
+      x-fern-sdk-version:
+      - 5.20.1
+    method: POST
+    uri: https://api.cohere.com/v1/chat
+  response:
+    body:
+      string: !!binary |
+        H4sIAAAAAAAA/yzMTWrDMBBH8av8mS6yifwhW5j6GjmBohk3hkgTNEpLKb17wc3y8eD3QzvTSsvk
+        R55ScMP74t28DMHFmWc3XceQ2PPmvdCZspjFD6GVsrLccUqacyzs6glf0VAl66cwtOAijyb5KhVj
+        OMMPPnS4iODW2sPWvmdN1iW9SZUuaT66P1R7e6nYtCLivluDbvif+NYnUix4mmAv1iRyR79/AQAA
+        //9hA1FqyAAAAA==
+    headers:
+      Alt-Svc:
+      - h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
+      Transfer-Encoding:
+      - chunked
+      Via:
+      - 1.1 google
+      access-control-expose-headers:
+      - X-Debug-Trace-ID
+      cache-control:
+      - no-cache, no-store, no-transform, must-revalidate, private, max-age=0
+      content-encoding:
+      - gzip
+      content-type:
+      - application/json
+      date:
+      - Sat, 17 Jan 2026 11:21:18 GMT
+      expires:
+      - Thu, 01 Jan 1970 00:00:00 GMT
+      pragma:
+      - no-cache
+      server:
+      - envoy
+      vary:
+      - Origin,Accept-Encoding
+      x-accel-expires:
+      - '0'
+      x-debug-trace-id:
+      - 6ce2557b14e869b1ed6d37ed9fd80b38
+      x-endpoint-monthly-call-limit:
+      - '1000'
+      x-envoy-upstream-service-time:
+      - '15'
+      x-trial-endpoint-call-limit:
+      - '20'
+      x-trial-endpoint-call-remaining:
+      - '19'
+    status:
+      code: 404
+      message: Not Found
+- request:
+    body: '{"message":"Which city has the highest population and how many years will
+      it take to reach 20 million inhabitants if it''s population increases by 1 million
+      a year?","model":"command-r-plus","chat_history":[{"role":"System","message":"You
+      are designed to help with a variety of tasks, from answering questions to providing
+      summaries to other types of analyses.\n\n## Tools\n\nYou have access to a wide
+      variety of tools. You are responsible for using the tools in any sequence you
+      deem appropriate to complete the task at hand.\nThis may require breaking the
+      task into subtasks and using different tools to complete each subtask.\n\nYou
+      have access to the following tools:\n> Tool Name: calc_tool\nTool Description:
+      Useful for calculating the number of years until a city reaches a target population.\nTool
+      Args: {\"properties\": {\"target_population\": {\"title\": \"Target Population\",
+      \"type\": \"integer\"}, \"current_population\": {\"title\": \"Current Population\",
+      \"type\": \"integer\"}, \"yearly_increase\": {\"title\": \"Yearly Increase\",
+      \"type\": \"integer\"}}, \"required\": [\"target_population\", \"current_population\",
+      \"yearly_increase\"], \"type\": \"object\"}\n\n> Tool Name: sql_tool\nTool Description:
+      Useful for translating a natural language query into a SQL query over a table
+      which contains the names of cities, together with their population and country\nTool
+      Args: {\"properties\": {\"input\": {\"title\": \"Input\", \"type\": \"string\"}},
+      \"required\": [\"input\"], \"type\": \"object\"}\n\n\n\n## Output Format\n\nPlease
+      answer in the same language as the question and use the following format:\n\n```\nThought:
+      The current language of the user is: (user''s language). I need to use a tool
+      to help me answer the question.\nAction: tool name (one of calc_tool, sql_tool)
+      if using a tool.\nAction Input: the input to the tool, in a JSON format representing
+      the kwargs (e.g. {\"input\": \"hello world\", \"num_beams\": 5})\n```\n\nPlease
+      ALWAYS start with a Thought.\n\nNEVER surround your response with markdown code
+      markers. You may use code markers within your response if you need to.\n\nPlease
+      use a valid JSON format for the Action Input. Do NOT do this {''input'': ''hello
+      world'', ''num_beams'': 5}. If you include the \"Action:\" line, then you MUST
+      include the \"Action Input:\" line too, even if the tool does not need kwargs,
+      in that case you MUST use \"Action Input: {}\".\n\nIf this format is used, the
+      tool will respond in the following format:\n\n```\nObservation: tool response\n```\n\nYou
+      should keep repeating the above format till you have enough information to answer
+      the question without using any more tools. At that point, you MUST respond in
+      one of the following two formats:\n\n```\nThought: I can answer without using
+      any more tools. I''ll use the user''s language to answer\nAnswer: [your answer
+      here (In the same language as the user''s question)]\n```\n\n```\nThought: I
+      cannot answer the question with the provided tools.\nAnswer: [your answer here
+      (In the same language as the user''s question)]\n```\n\n## Current Conversation\n\nBelow
+      is the current conversation consisting of interleaving human and assistant messages.\n"}],"stream":false}'
+    headers:
+      accept:
+      - '*/*'
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      content-length:
+      - '3191'
+      content-type:
+      - application/json
+      host:
+      - api.cohere.com
+      user-agent:
+      - cohere/5.20.1
+      x-client-name:
+      - llama_index
+      x-fern-language:
+      - Python
+      x-fern-sdk-name:
+      - cohere
+      x-fern-sdk-version:
+      - 5.20.1
+    method: POST
+    uri: https://api.cohere.com/v1/chat
+  response:
+    body:
+      string: !!binary |
+        H4sIAAAAAAAA/yzMQW6DMBCF4as8uYtsYig0VIFr5ASD/dwgYQZ5TKuq6t0rpVn++qXvxy3RTS6J
+        XLs4iu/T++gvabz4q5Be3uaefZjj0AV3dplm8kE3uayRK05Bc5Yt+uL39bATvsRQmPWTEbrhxr0y
+        zyzohjP6135ocCNxr3W3qW2jBmuC3lnYBM2Pbh+0vTxpJC0QrItVaML/xLceCLLhMGLZrFJi437/
+        AgAA///c5RiNzQAAAA==
+    headers:
+      Alt-Svc:
+      - h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
+      Transfer-Encoding:
+      - chunked
+      Via:
+      - 1.1 google
+      access-control-expose-headers:
+      - X-Debug-Trace-ID
+      cache-control:
+      - no-cache, no-store, no-transform, must-revalidate, private, max-age=0
+      content-encoding:
+      - gzip
+      content-type:
+      - application/json
+      date:
+      - Sat, 17 Jan 2026 11:22:27 GMT
+      expires:
+      - Thu, 01 Jan 1970 00:00:00 GMT
+      pragma:
+      - no-cache
+      server:
+      - envoy
+      vary:
+      - Origin,Accept-Encoding
+      x-accel-expires:
+      - '0'
+      x-debug-trace-id:
+      - e75eb1b282207c5d4eda1fd4e27b567c
+      x-endpoint-monthly-call-limit:
+      - '1000'
+      x-envoy-upstream-service-time:
+      - '12'
+      x-trial-endpoint-call-limit:
+      - '20'
+      x-trial-endpoint-call-remaining:
+      - '19'
+    status:
+      code: 404
+      message: Not Found


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the test file that uses this cassette and examine its structure find . -name "test_agent_with_multiple_tools.py" -o -name "*test_agents*" | head -20

Repository: traceloop/openllmetry

Length of output: 1083

🏁 Script executed:

#!/bin/bash # Search for fallback logic, error handling, and model selection in the llamaindex instrumentation rg -n "command-r-plus|command-r|command-a-03-2025" packages/opentelemetry-instrumentation-llamaindex/tests --type py -B2 -A2

Repository: traceloop/openllmetry

Length of output: 597

🏁 Script executed:

#!/bin/bash # Check for VCR configuration and cassette recording modes rg -n "vcr|cassette|record_mode|before_record" packages/opentelemetry-instrumentation-llamaindex --type py -B1 -A1

Repository: traceloop/openllmetry

Length of output: 5360

🏁 Script executed:

#!/bin/bash # Read the test_agents.py file to find test_agent_with_multiple_tools and understand the test structure cat -n packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py | head -220

Repository: traceloop/openllmetry

Length of output: 9822

🏁 Script executed:

#!/bin/bash # Get the full vcr_config from conftest.py sed -n '122,150p' packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py

Repository: traceloop/openllmetry

Length of output: 704

🏁 Script executed:

#!/bin/bash # Search for error handling or fallback logic related to 404s or model retries rg -n "except|try|404|retry|fallback" packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py -B2 -A2

Repository: traceloop/openllmetry

Length of output: 1194

🏁 Script executed:

#!/bin/bash # Check if there's any model fallback logic in the cohere integration or conftest rg -n "command-r|fallback|default.*model" packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py packages/opentelemetry-instrumentation-llamaindex/src/ --type py

Repository: traceloop/openllmetry

Length of output: 159

🏁 Script executed:

#!/bin/bash # Check the cassette file itself - look at the actual requests to confirm the models being called head -100 packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml | rg -n "command-r|command-a-03|uri:|model"

Repository: traceloop/openllmetry

Length of output: 194

🏁 Script executed:

#!/bin/bash # Check if there's a conftest fixture that modifies the LLM or its behavior cat -n packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py | head -130

Repository: traceloop/openllmetry

Length of output: 5051

🏁 Script executed:

#!/bin/bash # Search in the instrumentation directory structure find packages/opentelemetry-instrumentation-llamaindex -type f -name "*.py" | head -20

Repository: traceloop/openllmetry

Length of output: 2214

🏁 Script executed:

#!/bin/bash # Look for any cohere-related instrumentation that might have fallback logic rg -n "command-r|model.*fallback|default.*model" packages/opentelemetry-instrumentation-llamaindex --type py -B2 -A2

Repository: traceloop/openllmetry

Length of output: 47

🏁 Script executed:

#!/bin/bash # Check the cassette timestamps to understand when it was recorded head -20 packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml

Repository: traceloop/openllmetry

Length of output: 1662

🏁 Script executed:

#!/bin/bash # Check the full cassette structure to see how many models are attempted rg -n '"model":' packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml | head -20

Repository: traceloop/openllmetry

Length of output: 1243

🏁 Script executed:

#!/bin/bash # Look for any pytest-vcr or vcr-related configuration that might auto-record or handle retries rg -n "vcr|cassette|record_mode|match_on" packages/opentelemetry-instrumentation-llamaindex/tests --type py

Repository: traceloop/openllmetry

Length of output: 1477

🏁 Script executed:

#!/bin/bash # Check if there's any documentation or comments in the cassette about why multiple models are present head -50 packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml | cat -v

Repository: traceloop/openllmetry

Length of output: 3580

Re-record the VCR cassette to match current test code.

The cassette contains requests for command-r and command-r-plus models that respond with 404 errors, followed by successful requests using command-a-03-2025. However, the current test code directly specifies only Cohere(model="command-a-03-2025") at line 161 of test_agents.py. The 404 responses appear to be artifacts from an older version of the Cohere or llama-index library that attempted model fallback. Re-record the cassette using the current code to ensure it reflects actual API behavior and prevent masking potential regressions in the success path.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml` around lines 677 - 900, The VCR cassette includes stale requests for models "command-r" and "command-r-plus" that returned 404s; re-record the cassette (packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml) using the current test harness so it only captures the actual calls made by the test (which instantiate Cohere(model="command-a-03-2025") in test_agents.py) — run the test suite or the specific test that uses the cassette, delete or replace the old cassette and commit the newly recorded YAML so the cassette matches the current codepath and no longer contains the obsolete fallback requests.

coderabbitai · 2026-01-17T11:56:51Z

...ntelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml

+      code: 404
+      message: Not Found


⚠️ Potential issue | 🟡 Minor

Cassette contains a 404 failure response.

The interaction at lines 244-359 records a request to the deprecated model gpt-3.5-turbo-0613 with "stream":true that returns a 404 Not Found. This failed interaction is recorded in the cassette and may cause unexpected test behavior or confusion. Consider removing this failed request block or re-recording the cassette to only include successful interactions.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml` around lines 358 - 359, The cassette contains a recorded failed interaction for the deprecated model "gpt-3.5-turbo-0613" with "stream": true that returns a 404; locate the request/response block that mentions model: "gpt-3.5-turbo-0613" and response code: 404 / message: "Not Found" and either remove that entire failed interaction from the YAML or re-record the cassette so the interaction is replaced by a successful call (preferably using a supported model and without streaming if not intended); ensure the cleaned cassette no longer contains the 404 response or the deprecated model entry.

coderabbitai · 2026-01-17T11:56:51Z

...on-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml

+- request:
+    body: from_python_package=true&input_url=https%3A%2F%2Farxiv.org%2Fpdf%2F1706.03762.pdf&language=en
+    headers:
+      accept:
+      - '*/*'
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      content-length:
+      - '93'
+      content-type:
+      - application/x-www-form-urlencoded
+      host:
+      - api.cloud.llamaindex.ai
+      user-agent:
+      - python-httpx/0.28.1
+    method: POST
+    uri: https://api.cloud.llamaindex.ai/api/parsing/upload
+  response:
+    body:
+      string: '{"id":"f63d3e7d-5e4e-4edc-8568-ef276937ad1d","status":"PENDING","error_code":null,"error_message":null}'
+    headers:
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '103'
+      Content-Type:
+      - application/json
+      Date:
+      - Sat, 17 Jan 2026 11:05:23 GMT
+      Strict-Transport-Security:
+      - max-age=31536000; includeSubDomains
+      x-correlation-id:
+      - e686c240-db78-402c-84f1-6fcf437f80c2
+      x-session-id:
+      - 3aa231d1-3199-439c-83cb-c129f3a03490
+    status:
+      code: 200
+      message: OK
+- request:
+    body: ''
+    headers:
+      accept:
+      - '*/*'
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      host:
+      - api.cloud.llamaindex.ai
+      user-agent:
+      - python-httpx/0.28.1
+    method: GET
+    uri: https://api.cloud.llamaindex.ai/api/parsing/job/f63d3e7d-5e4e-4edc-8568-ef276937ad1d
+  response:
+    body:
+      string: '{"id":"f63d3e7d-5e4e-4edc-8568-ef276937ad1d","status":"PENDING","error_code":null,"error_message":null}'
+    headers:
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '103'
+      Content-Type:
+      - application/json
+      Date:
+      - Sat, 17 Jan 2026 11:05:24 GMT
+      Strict-Transport-Security:
+      - max-age=31536000; includeSubDomains
+      x-correlation-id:
+      - 8e70feb5-82ce-42b2-9fc4-21441ddd6870
+      x-session-id:
+      - c07122ec-6421-4b53-9174-529321dfe7e1
+    status:
+      code: 200
+      message: OK
+- request:
+    body: ''
+    headers:
+      accept:
+      - '*/*'
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      host:
+      - api.cloud.llamaindex.ai
+      user-agent:
+      - python-httpx/0.28.1
+    method: GET
+    uri: https://api.cloud.llamaindex.ai/api/parsing/job/f63d3e7d-5e4e-4edc-8568-ef276937ad1d
+  response:
+    body:
+      string: '{"id":"f63d3e7d-5e4e-4edc-8568-ef276937ad1d","status":"SUCCESS","error_code":null,"error_message":null}'
+    headers:
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '103'
+      Content-Type:
+      - application/json
+      Date:
+      - Sat, 17 Jan 2026 11:05:27 GMT
+      Strict-Transport-Security:
+      - max-age=31536000; includeSubDomains
+      x-correlation-id:
+      - 514677a3-d31f-4f00-9f66-edba7cb15e52
+      x-session-id:
+      - 4ad4092c-cb2b-460c-bd8c-03c7551e0fb0
+    status:
+      code: 200
+      message: OK
+- request:
+    body: ''
+    headers:
+      accept:
+      - '*/*'
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      host:
+      - api.cloud.llamaindex.ai
+      user-agent:
+      - python-httpx/0.28.1
+    method: GET
+    uri: https://api.cloud.llamaindex.ai/api/parsing/job/f63d3e7d-5e4e-4edc-8568-ef276937ad1d/result/text
+  response:
+    body:
+      string: "{\"text\":\"    arXiv:1706.03762v7 [cs.CL] 2 Aug 2023\\n\\n  Provided
+        proper attribution is provided, Google hereby grants permission to\\nreproduce
+        the tables and figures in this paper solely for use in journalistic or\\n
+        \   scholarly works.\\n\\n    Attention Is All You Need\\n\\n    Ashish Vaswani\u2217
+        \ Noam Shazeer\u2217        Niki Parmar\u2217  Jakob Uszkoreit\u2217\\n    Google
+        Brain         Google Brain     Google Research    Google Research\\navaswani@google.com
+        \   noam@google.com    nikip@google.com    usz@google.com\\n\\n    Llion Jones\u2217
+        \    Aidan N. Gomez\u2217 \u2020         \u0141ukasz Kaiser\u2217\\nGoogle
+        Research    University of Toronto          Google Brain\\nllion@google.com
+        \   aidan@cs.toronto.edu    lukaszkaiser@google.com\\n\\n                     Illia
+        Polosukhin\u2217 \u2021\\n                     illia.polosukhin@gmail.com\\n\\n
+        \                                                         Abstract\\n\\n                        The
+        dominant sequence transduction models are based on complex recurrent or\\n
+        \                      convolutional neural networks that include an encoder
+        and a decoder. The best\\n                        performing models also connect
+        the encoder and decoder through an attention\\n     mechanism. We propose
+        a new simple network architecture, the Transformer,\\n                     based
+        solely on attention mechanisms, dispensing with recurrence and convolutions\\n
+        \    entirely.   Experiments on two machine translation tasks show these models
+        to\\n                     be superior in quality while being more parallelizable
+        and requiring significantly\\n     less time to train. Our model achieves
+        28.4 BLEU on the WMT 2014 English-\\n                      to-German translation
+        task, improving over the existing best results, including\\n     ensembles,
+        by over 2 BLEU. On the WMT 2014 English-to-French translation task,\\n                     our
+        model establishes a new single-model state-of-the-art BLEU score of 41.8 after\\n
+        \                    training for 3.5 days on eight GPUs, a small fraction
+        of the training costs of the\\n                     best models from the literature.
+        We show that the Transformer generalizes well to\\n                     other
+        tasks by applying it successfully to English constituency parsing both with\\n
+        \    large and limited training data.\\n\\n    \u2217Equal contribution. Listing
+        order is random. Jakob proposed replacing RNNs with self-attention and started\\n
+        \     the effort to evaluate this idea. Ashish, with Illia, designed and implemented
+        the first Transformer models and\\n      has been crucially involved in every
+        aspect of this work. Noam proposed scaled dot-product attention, multi-head\\n
+        \      attention and the parameter-free position representation and became
+        the other person involved in nearly every\\n       detail. Niki designed,
+        implemented, tuned and evaluated countless model variants in our original
+        codebase and\\n      tensor2tensor. Llion also experimented with novel model
+        variants, was responsible for our initial codebase, and\\n     efficient inference
+        and visualizations. Lukasz and Aidan spent countless long days designing various
+        parts of and\\n      implementing tensor2tensor, replacing our earlier codebase,
+        greatly improving results and massively accelerating\\n    our research.\\n
+        \   \u2020Work performed while at Google Brain.\\n    \u2021Work performed
+        while at Google Research.\\n\\n    31st Conference on Neural Information Processing
+        Systems (NIPS 2017), Long Beach, CA, USA.\\n\\n---\\n\\n1 Introduction\\n\\nRecurrent
+        neural networks, long short-term memory [13] and gated recurrent [7] neural
+        networks\\nin particular, have been firmly established as state of the art
+        approaches in sequence modeling and\\ntransduction problems such as language
+        modeling and machine translation [35, 2, 5]. Numerous\\nefforts have since
+        continued to push the boundaries of recurrent language models and encoder-decoder\\narchitectures
+        [38, 24, 15].\\nRecurrent models typically factor computation along the symbol
+        positions of the input and output\\nsequences. Aligning the positions to steps
+        in computation time, they generate a sequence of hidden\\nstates ht, as a
+        function of the previous hidden state ht\u22121 and the input for position
+        t. This inherently\\nsequential nature precludes parallelization within training
+        examples, which becomes critical at longer\\nsequence lengths, as memory constraints
+        limit batching across examples. Recent work has achieved\\nsignificant improvements
+        in computational efficiency through factorization tricks [21] and conditional\\ncomputation
+        [32], while also improving model performance in case of the latter. The fundamental\\nconstraint
+        of sequential computation, however, remains.\\nAttention mechanisms have become
+        an integral part of compelling sequence modeling and transduc-\\ntion models
+        in various tasks, allowing modeling of dependencies without regard to their
+        distance in\\nthe input or output sequences [2, 19]. In all but a few cases
+        [27], however, such attention mechanisms\\nare used in conjunction with a
+        recurrent network.\\nIn this work we propose the Transformer, a model architecture
+        eschewing recurrence and instead\\nrelying entirely on an attention mechanism
+        to draw global dependencies between input and output.\\nThe Transformer allows
+        for significantly more parallelization and can reach a new state of the art
+        in\\ntranslation quality after being trained for as little as twelve hours
+        on eight P100 GPUs.\\n\\n2 Background\\n\\nThe goal of reducing sequential
+        computation also forms the foundation of the Extended Neural GPU\\n[16], ByteNet
+        [18] and ConvS2S [9], all of which use convolutional neural networks as basic
+        building\\nblock, computing hidden representations in parallel for all input
+        and output positions. In these models,\\nthe number of operations required
+        to relate signals from two arbitrary input or output positions grows\\nin
+        the distance between positions, linearly for ConvS2S and logarithmically for
+        ByteNet. This makes\\nit more difficult to learn dependencies between distant
+        positions [12]. In the Transformer this is\\nreduced to a constant number
+        of operations, albeit at the cost of reduced effective resolution due\\nto
+        averaging attention-weighted positions, an effect we counteract with Multi-Head
+        Attention as\\ndescribed in section 3.2.\\nSelf-attention, sometimes called
+        intra-attention is an attention mechanism relating different positions\\nof
+        a single sequence in order to compute a representation of the sequence. Self-attention
+        has been\\nused successfully in a variety of tasks including reading comprehension,
+        abstractive summarization,\\ntextual entailment and learning task-independent
+        sentence representations [4, 27, 28, 22].\\nEnd-to-end memory networks are
+        based on a recurrent attention mechanism instead of sequence-\\naligned recurrence
+        and have been shown to perform well on simple-language question answering
+        and\\nlanguage modeling tasks [34].\\nTo the best of our knowledge, however,
+        the Transformer is the first transduction model relying\\nentirely on self-attention
+        to compute representations of its input and output without using sequence-\\naligned
+        RNNs or convolution. In the following sections, we will describe the Transformer,
+        motivate\\nself-attention and discuss its advantages over models such as [17,
+        18] and [9].\\n\\n3 Model Architecture\\n\\nMost competitive neural sequence
+        transduction models have an encoder-decoder structure [5, 2, 35].\\nHere,
+        the encoder maps an input sequence of symbol representations (x1, ..., xn)
+        to a sequence\\nof continuous representations z = (z1, ..., zn). Given z,
+        the decoder then generates an output\\nsequence (y1, ..., ym) of symbols one
+        element at a time. At each step the model is auto-regressive\\n[10], consuming
+        the previously generated symbols as additional input when generating the next.\\n\\n
+        \                                                2\\n\\n---\\n\\n2020\\n20
+        \   A  A  2 2  20  20\\n0     \u2212 2  N         0\\n\\nA D  0  20  0\\n
+        \ Bae    TA\\n\\n         2020\\n\\nFigure 1: The Transformer - model architecture.\\n\\nThe
+        Transformer follows this overall architecture using stacked self-attention
+        and point-wise, fully\\nconnected layers for both the encoder and decoder,
+        shown in the left and right halves of Figure 1,\\nrespectively.\\n\\n3.1  Encoder
+        and Decoder Stacks\\nEncoder:     The encoder is composed of a stack of N
+        = 6 identical layers. Each layer has two\\nsub-layers. The first is a multi-head
+        self-attention mechanism, and the second is a simple, position-\\nwise fully
+        connected feed-forward network. We employ a residual connection [11] around
+        each of\\nthe two sub-layers, followed by layer normalization [1]. That is,
+        the output of each sub-layer is\\nLayerNorm(x + Sublayer(x)), where Sublayer(x)
+        is the function implemented by the sub-layer\\nitself. To facilitate these
+        residual connections, all sub-layers in the model, as well as the embedding\\nlayers,
+        produce outputs of dimension dmodel = 512.\\n\\nDecoder:  The decoder is also
+        composed of a stack of N = 6 identical layers. In addition to the two\\nsub-layers
+        in each encoder layer, the decoder inserts a third sub-layer, which performs
+        multi-head\\nattention over the output of the encoder stack. Similar to the
+        encoder, we employ residual connections\\naround each of the sub-layers, followed
+        by layer normalization. We also modify the self-attention\\nsub-layer in the
+        decoder stack to prevent positions from attending to subsequent positions.
+        This\\nmasking, combined with fact that the output embeddings are offset by
+        one position, ensures that the\\npredictions for position i can depend only
+        on the known outputs at positions less than i.\\n\\n3.2  Attention\\nAn attention
+        function can be described as mapping a query and a set of key-value pairs
+        to an output,\\nwhere the query, keys, values, and output are all vectors.
+        The output is computed as a weighted sum\\n\\n                       3\\n\\n---\\n\\n
+        \   Scaled Dot-Product Attention    Multi-Head Attention\\n\\n                                           Linear\\n
+        \   MatMul\\n\\n    SoftMax                                Concat\\n\\nMask
+        (opt.)                          Scaled Dot-Product\\n\\n   Scale                                   Attention\\n\\n
+        \   MatMul                          inear  Linear   Linear\\n   \u2191  \u2191\\n
+        \  Q  K\\n\\n    Figure 2: (left) Scaled Dot-Product Attention. (right) Multi-Head
+        Attention consists of several\\n    attention layers running in parallel.\\n\\n
+        \   of the values, where the weight assigned to each value is computed by
+        a compatibility function of the\\n    query with the corresponding key.\\n\\n
+        \   3.2.1      Scaled Dot-Product Attention\\n    We call our particular attention
+        \\\"Scaled Dot-Product Attention\\\" (Figure 2). The input consists of\\n
+        \   queries and keys of dimension d , and values of dimension d            .
+        We compute the dot products of the\\n    query with all keys, divide     k
+        \   \u221A    v\\n    values.                       each by         dk, and
+        apply a softmax function to obtain the weights on the\\n    In practice, we
+        compute the attention function on a set of queries simultaneously, packed
+        together\\n    into a matrix Q. The keys and values are also packed together
+        into matrices K and V . We compute\\n    the matrix of outputs as:\\n\\n                                                                QK
+        T\\n                                  Attention(Q, K, V ) = softmax( \u221Adk
+        )V                   (1)\\n\\n    The two most commonly used attention functions
+        are additive attention [2], and dot-product (multi-\\n    plicative) attention.
+        Dot-product attention is identical to our algorithm, except for the scaling
+        factor\\n    of  1            . Additive attention computes the compatibility
+        function using a feed-forward network with\\n        \u221Adk\\n    a single
+        hidden layer. While the two are similar in theoretical complexity, dot-product
+        attention is\\n    much faster and more space-efficient in practice, since
+        it can be implemented using highly optimized\\n    matrix multiplication code.\\n
+        \   While for small values of dk the two mechanisms perform similarly, additive
+        attention outperforms\\n    dot product attention without scaling for larger
+        values of dk [3]. We suspect that for large values of\\n    dk, the dot products
+        grow large in magnitude, pushing the softmax function into regions where it
+        has\\n    extremely small gradients 4. To counteract this effect, we scale
+        the dot products by  1 .\\n                                                                                          \u221Adk\\n\\n
+        \   3.2.2      Multi-Head Attention\\n    Instead of performing a single attention
+        function with dmodel-dimensional keys, values and queries,\\n    we found
+        it beneficial to linearly project the queries, keys and values h times with
+        different, learned\\n    linear projections to dk, dk and dv dimensions, respectively.
+        On each of these projected versions of\\n    queries, keys and values we then
+        perform the attention function in parallel, yielding dv-dimensional\\n    4To
+        illustrate why the dot products get large, assume that the components of q
+        and k are independent random\\n    variables with mean 0 and variance 1. Then
+        their dot product, q \xB7 k = P\u1D48\u1D4F qiki, has mean 0 and variance
+        d\u2096.\\n                                                                i=1\\n\\n
+        \                                             4\\n\\n---\\n\\noutput values.
+        These are concatenated and once again projected, resulting in the final values,
+        as\\ndepicted in Figure 2.\\nMulti-head attention allows the model to jointly
+        attend to information from different representation\\nsubspaces at different
+        positions. With a single attention head, averaging inhibits this.\\n\\nMultiHead(Q,
+        K, V ) = Concat(head1, ..., headh)W O\\nwhere headi = Attention(QW Q, KWK,
+        V W V )\\ni    i                                            i\\n\\nWhere the
+        projections are parameter matrices W Q \u2208 Rdmodel\xD7d\u2096, W K \u2208
+        Rdmodel\xD7d\u2096, W V \u2208 Rdmodel\xD7dv\\nand W O \u2208 Rhdv\xD7dmodel
+        .                   i      i                             i\\nIn this work
+        we employ h   = 8 parallel attention layers, or heads.  For each of these
+        we use\\ndk = dv = dmodel/h = 64. Due to the reduced dimension of each head,
+        the total computational cost\\nis similar to that of single-head attention
+        with full dimensionality.\\n\\n3.2.3  Applications of Attention in our Model\\nThe
+        Transformer uses multi-head attention in three different ways:\\n\\n       \u2022
+        \ In \\\"encoder-decoder attention\\\" layers, the queries come from the previous
+        decoder layer,\\n          and the memory keys and values come from the output
+        of the encoder. This allows every\\n          position in the decoder to attend
+        over all positions in the input sequence. This mimics the\\n          typical
+        encoder-decoder attention mechanisms in sequence-to-sequence models such as\\n
+        \         [38, 2, 9].\\n       \u2022   The encoder contains self-attention
+        layers. In a self-attention layer all of the keys, values\\n          and
+        queries come from the same place, in this case, the output of the previous
+        layer in the\\n          encoder. Each position in the encoder can attend
+        to all positions in the previous layer of the\\n          encoder.\\n       \u2022
+        \ Similarly, self-attention layers in the decoder allow each position in the
+        decoder to attend to\\n          all positions in the decoder up to and including
+        that position. We need to prevent leftward\\n          information flow in
+        the decoder to preserve the auto-regressive property. We implement this\\n
+        \         inside of scaled dot-product attention by masking out (setting to
+        \u2212\u221E) all values in the input\\n          of the softmax which correspond
+        to illegal connections. See Figure 2.\\n\\n3.3    Position-wise Feed-Forward
+        Networks\\n\\nIn addition to attention sub-layers, each of the layers in our
+        encoder and decoder contains a fully\\nconnected feed-forward network, which
+        is applied to each position separately and identically. This\\nconsists of
+        two linear transformations with a ReLU activation in between.\\n\\n                           FFN(x)
+        = max(0, xW1 + b1)W2 + b2                          (2)\\n\\n While the linear
+        transformations are the same across different positions, they use different
+        parameters\\nfrom layer to layer. Another way of describing this is as two
+        convolutions with kernel size 1.\\nThe dimensionality of input and output
+        is dmodel = 512, and the inner-layer has dimensionality\\ndf f = 2048.\\n\\n3.4
+        \   Embeddings and Softmax\\n\\nSimilarly to other sequence transduction models,
+        we use learned embeddings to convert the input\\ntokens and output tokens
+        to vectors of dimension dmodel. We also use the usual learned linear transfor-\\nmation
+        and softmax function to convert the decoder output to predicted next-token
+        probabilities. In\\nour model, we share the same weight matrix between the
+        two embedding layers and the pre-softmax\\nlinear transformation, similar
+        to [30]. In the embedding layers, we multiply those weights by \u221Admodel.\\n\\n
+        \                                              5\\n\\n---\\n\\nTable 1: Maximum
+        path lengths, per-layer complexity and minimum number of sequential operations\\nfor
+        different layer types. n is the sequence length, d is the representation dimension,
+        k is the kernel\\nsize of convolutions and r the size of the neighborhood
+        in restricted self-attention.\\n\\n Layer Type                 Complexity
+        per Layer    Sequential    Maximum Path Length\\n                                                    Operations\\n
+        Self-Attention                  O(n2 \xB7 d)             O(1)               O(1)\\n
+        Recurrent                       O(n \xB7 d2)             O(n)               O(n)\\n
+        Convolutional                 O(k \xB7 n \xB7 d2)           O(1)            O(logk(n))\\n
+        Self-Attention (restricted)    O(r \xB7 n \xB7 d)           O(1)              O(n/r)\\n\\n3.5
+        \ Positional Encoding\\nSince our model contains no recurrence and no convolution,
+        in order for the model to make use of the\\norder of the sequence, we must
+        inject some information about the relative or absolute position of the\\ntokens
+        in the sequence. To this end, we add \\\"positional encodings\\\" to the input
+        embeddings at the\\nbottoms of the encoder and decoder stacks. The positional
+        encodings have the same dimension dmodel\\nas the embeddings, so that the
+        two can be summed. There are many choices of positional encodings,\\nlearned
+        and fixed [9].\\nIn this work, we use sine and cosine functions of different
+        frequencies:\\n\\n                          P E(pos,2i) = sin(pos/100002i/d\u1D50\u1D52\u1D48\u1D49\u02E1
+        )\\n                             P E(pos,2i+1) = cos(pos/100002i/d\u1D50\u1D52\u1D48\u1D49\u02E1
+        )\\n\\nwhere pos is the position and i is the dimension. That is, each dimension
+        of the positional encoding\\ncorresponds to a sinusoid. The wavelengths form
+        a geometric progression from 2\u03C0 to 10000 \xB7 2\u03C0. We\\nchose this
+        function because we hypothesized it would allow the model to easily learn
+        to attend by\\nrelative positions, since for any fixed offset k, P Epos+k
+        can be represented as a linear function of\\nP Epos.\\nWe also experimented
+        with using learned positional embeddings [9] instead, and found that the two\\nversions
+        produced nearly identical results (see Table 3 row (E)). We chose the sinusoidal
+        version\\nbecause it may allow the model to extrapolate to sequence lengths
+        longer than the ones encountered\\nduring training.\\n\\n4    Why Self-Attention\\n\\nIn
+        this section we compare various aspects of self-attention layers to the recurrent
+        and convolu-\\ntional layers commonly used for mapping one variable-length
+        sequence of symbol representations\\n(x1, ..., xn) to another sequence of
+        equal length (z1, ..., zn), with xi, zi \u2208 Rd, such as a hidden\\nlayer
+        in a typical sequence transduction encoder or decoder. Motivating our use
+        of self-attention we\\nconsider three desiderata.\\nOne is the total computational
+        complexity per layer. Another is the amount of computation that can\\nbe parallelized,
+        as measured by the minimum number of sequential operations required.\\nThe
+        third is the path length between long-range dependencies in the network. Learning
+        long-range\\ndependencies is a key challenge in many sequence transduction
+        tasks. One key factor affecting the\\nability to learn such dependencies is
+        the length of the paths forward and backward signals have to\\ntraverse in
+        the network. The shorter these paths between any combination of positions
+        in the input\\nand output sequences, the easier it is to learn long-range
+        dependencies [12]. Hence we also compare\\nthe maximum path length between
+        any two input and output positions in networks composed of the\\ndifferent
+        layer types.\\nAs noted in Table 1, a self-attention layer connects all positions
+        with a constant number of sequentially\\nexecuted operations, whereas a recurrent
+        layer requires O(n) sequential operations. In terms of\\ncomputational complexity,
+        self-attention layers are faster than recurrent layers when the sequence\\n\\n
+        \                                               6\\n\\n---\\n\\nlength n is
+        smaller than the representation dimensionality d, which is most often the
+        case with\\nsentence representations used by state-of-the-art models in machine
+        translations, such as word-piece\\n[38] and byte-pair [31] representations.
+        To improve computational performance for tasks involving\\nvery long sequences,
+        self-attention could be restricted to considering only a neighborhood of size
+        r in\\nthe input sequence centered around the respective output position.
+        This would increase the maximum\\npath length to O(n/r). We plan to investigate
+        this approach further in future work.\\nA single convolutional layer with
+        kernel width k < n does not connect all pairs of input and output\\npositions.
+        Doing so requires a stack of O(n/k) convolutional layers in the case of contiguous
+        kernels,\\nor O(logk(n)) in the case of dilated convolutions [18], increasing
+        the length of the longest paths\\nbetween any two positions in the network.
+        Convolutional layers are generally more expensive than\\nrecurrent layers,
+        by a factor of k. Separable convolutions [6], however, decrease the complexity\\nconsiderably,
+        to O(k \xB7 n \xB7 d + n \xB7 d2). Even with k = n, however, the complexity
+        of a separable\\nconvolution is equal to the combination of a self-attention
+        layer and a point-wise feed-forward layer,\\nthe approach we take in our model.\\nAs
+        side benefit, self-attention could yield more interpretable models. We inspect
+        attention distributions\\nfrom our models and present and discuss examples
+        in the appendix. Not only do individual attention\\nheads clearly learn to
+        perform different tasks, many appear to exhibit behavior related to the syntactic\\nand
+        semantic structure of the sentences.\\n\\n5    Training\\n\\nThis section
+        describes the training regime for our models.\\n\\n5.1  Training Data and
+        Batching\\n\\nWe trained on the standard WMT 2014 English-German dataset consisting
+        of about 4.5 million\\nsentence pairs. Sentences were encoded using byte-pair
+        encoding [3], which has a shared source-\\ntarget vocabulary of about 37000
+        tokens. For English-French, we used the significantly larger WMT\\n2014 English-French
+        dataset consisting of 36M sentences and split tokens into a 32000 word-piece\\nvocabulary
+        [38]. Sentence pairs were batched together by approximate sequence length.
+        Each training\\nbatch contained a set of sentence pairs containing approximately
+        25000 source tokens and 25000\\ntarget tokens.\\n\\n5.2  Hardware and Schedule\\n\\nWe
+        trained our models on one machine with 8 NVIDIA P100 GPUs. For our base models
+        using\\nthe hyperparameters described throughout the paper, each training
+        step took about 0.4 seconds. We\\ntrained the base models for a total of 100,000
+        steps or 12 hours. For our big models,(described on the\\nbottom line of table
+        3), step time was 1.0 seconds. The big models were trained for 300,000 steps\\n(3.5
+        days).\\n\\n5.3  Optimizer\\n\\nWe used the Adam optimizer [20] with \u03B21
+        = 0.9, \u03B22 = 0.98 and \u03F5 = 10\u22129. We varied the learning\\nrate
+        over the course of training, according to the formula:\\n\\n              lrate
+        = d\u22120.5 \xB7 min(step_num\u22120.5, step_num \xB7 warmup_steps\u22121.5)
+        \      (3)\\n              model\\n\\nThis corresponds to increasing the learning
+        rate linearly for the first warmup_steps training steps,\\nand decreasing
+        it thereafter proportionally to the inverse square root of the step number.
+        We used\\nwarmup_steps = 4000.\\n\\n5.4  Regularization\\n\\nWe employ three
+        types of regularization during training:\\n\\n                                        7\\n\\n---\\n\\nTable
+        2: The Transformer achieves better BLEU scores than previous state-of-the-art
+        models on the\\nEnglish-to-German and English-to-French newstest2014 tests
+        at a fraction of the training cost.\\n\\nModel                              BLEU
+        \           Training Cost (FLOPs)\\n                                   EN-DE
+        EN-FR     EN-DE          EN-FR\\nByteNet [18]               23.75\\nDeep-Att
+        + PosUnk [39]                    39.2                    1.0 \xB7 1020\\nGNMT
+        + RL [38]              24.6         39.92     2.3 \xB7 1019     1.4 \xB7 1020\\nConvS2S
+        [9]                25.16         40.46     9.6 \xB7 1018     1.5 \xB7 1020\\nMoE
+        [32]                   26.03         40.56     2.0 \xB7 1019     1.2 \xB7
+        1020\\nDeep-Att + PosUnk Ensemble [39]           40.4                    8.0
+        \xB7 1020\\nGNMT + RL Ensemble [38]    26.30         41.16     1.8 \xB7 1020
+        \    1.1 \xB7 1021\\nConvS2S Ensemble [9]       26.36         41.29     7.7
+        \xB7 1019     1.2 \xB7 1021\\nTransformer (base model)    27.3          38.1
+        \    3.3 \xB7 1018\\nTransformer (big)           28.4          41.8     2.3
+        \xB7 1019\\n\\nResidual Dropout      We apply dropout [33] to the output of
+        each sub-layer, before it is added to the\\nsub-layer input and normalized.
+        In addition, we apply dropout to the sums of the embeddings and the\\npositional
+        encodings in both the encoder and decoder stacks. For the base model, we use
+        a rate of\\nPdrop = 0.1.\\n\\nLabel Smoothing             During training,
+        we employed label smoothing of value \u03F5ls = 0.1 [36]. This\\nhurts perplexity,
+        as the model learns to be more unsure, but improves accuracy and BLEU score.\\n\\n6
+        \   Results\\n\\n6.1  Machine Translation\\n\\nOn the WMT 2014 English-to-German
+        translation task, the big transformer model (Transformer (big)\\nin Table
+        2) outperforms the best previously reported models (including ensembles) by
+        more than 2.0\\nBLEU, establishing a new state-of-the-art BLEU score of 28.4.
+        The configuration of this model is\\nlisted in the bottom line of Table 3.
+        Training took 3.5 days on 8 P100 GPUs. Even our base model\\nsurpasses all
+        previously published models and ensembles, at a fraction of the training cost
+        of any of\\nthe competitive models.\\nOn the WMT 2014 English-to-French translation
+        task, our big model achieves a BLEU score of 41.0,\\noutperforming all of
+        the previously published single models, at less than 1/4 the training cost
+        of the\\nprevious state-of-the-art model. The Transformer (big) model trained
+        for English-to-French used\\ndropout rate Pdrop = 0.1, instead of 0.3.\\nFor
+        the base models, we used a single model obtained by averaging the last 5 checkpoints,
+        which\\nwere written at 10-minute intervals. For the big models, we averaged
+        the last 20 checkpoints. We\\nused beam search with a beam size of 4 and length
+        penalty \u03B1 = 0.6 [38]. These hyperparameters\\nwere chosen after experimentation
+        on the development set. We set the maximum output length during\\ninference
+        to input length + 50, but terminate early when possible [38].\\nTable 2 summarizes
+        our results and compares our translation quality and training costs to other
+        model\\narchitectures from the literature. We estimate the number of floating
+        point operations used to train a\\nmodel by multiplying the training time,
+        the number of GPUs used, and an estimate of the sustained\\nsingle-precision
+        floating-point capacity of each GPU 5.\\n\\n6.2  Model Variations\\n\\nTo
+        evaluate the importance of different components of the Transformer, we varied
+        our base model\\nin different ways, measuring the change in performance on
+        English-to-German translation on the\\n\\n     5We used values of 2.8, 3.7,
+        6.0 and 9.5 TFLOPS for K80, K40, M40 and P100, respectively.\\n\\n                                         8\\n\\n---\\n\\nTable
+        3: Variations on the Transformer architecture. Unlisted values are identical
+        to those of the base\\nmodel. All metrics are on the English-to-German translation
+        development set, newstest2013. Listed\\nperplexities are per-wordpiece, according
+        to our byte-pair encoding, and should not be compared to\\nper-word perplexities.\\n\\n
+        \      N      dmodel    dff     h     dk     dv    Pdrop    \u03F5ls    train
+        \    PPL       BLEU   params\\n                                                                   steps
+        \    (dev)    (dev)    \xD7106\\nbase     6      512     2048    8     64
+        \    64     0.1     0.1     100K     4.92      25.8     65\\n                                1
+        \   512    512                              5.29      24.9\\n(A)                             4
+        \   128    128                              5.00      25.5\\n                                16
+        \   32     32                              4.91      25.8\\n                                32
+        \   16     16                              5.01      25.4\\n(B)                                   16
+        \                                    5.16      25.1     58\\n                                      32
+        \                                    5.01      25.4     60\\n         2                                                                   6.11
+        \     23.7     36\\n         4                                                                   5.19
+        \     25.3     50\\n         8                                                                   4.88
+        \     25.5     80\\n(C)             256                   32     32                              5.75
+        \     24.5     28\\n               1024                  128    128                              4.66
+        \     26.0    168\\n                        1024                                                 5.12
+        \     25.4     53\\n                        4096                                                 4.75
+        \     26.2     90\\n                                                    0.0
+        \                     5.77      24.6\\n(D)                                                 0.2
+        \    0.0              4.95      25.5\\n                                                                             4.67
+        \     25.3\\n                                                            0.2
+        \             5.47      25.7\\n(E)             positional embedding instead
+        of sinusoids                    4.92      25.7\\nbig      6     1024     4096
+        \   16        0.3                300K            4.33      26.4  213\\n\\ndevelopment
+        set, newstest2013. We used beam search as described in the previous section,
+        but no\\ncheckpoint averaging. We present these results in Table 3.\\nIn Table
+        3 rows (A), we vary the number of attention heads and the attention key and
+        value dimensions,\\nkeeping the amount of computation constant, as described
+        in Section 3.2.2. While single-head\\nattention is 0.9 BLEU worse than the
+        best setting, quality also drops off with too many heads.\\nIn Table 3 rows
+        (B), we observe that reducing the attention key size dk hurts model quality.
+        This\\nsuggests that determining compatibility is not easy and that a more
+        sophisticated compatibility\\nfunction than dot product may be beneficial.
+        We further observe in rows (C) and (D) that, as expected,\\nbigger models
+        are better, and dropout is very helpful in avoiding over-fitting. In row (E)
+        we replace our\\nsinusoidal positional encoding with learned positional embeddings
+        [9], and observe nearly identical\\nresults to the base model.\\n\\n6.3  English
+        Constituency Parsing\\n\\nTo evaluate if the Transformer can generalize to
+        other tasks we performed experiments on English\\nconstituency parsing. This
+        task presents specific challenges: the output is subject to strong structural\\nconstraints
+        and is significantly longer than the input. Furthermore, RNN sequence-to-sequence\\nmodels
+        have not been able to attain state-of-the-art results in small-data regimes
+        [37].\\nWe trained a 4-layer transformer with dmodel = 1024 on the Wall Street
+        Journal (WSJ) portion of the\\nPenn Treebank [25], about 40K training sentences.
+        We also trained it in a semi-supervised setting,\\nusing the larger high-confidence
+        and BerkleyParser corpora from with approximately 17M sentences\\n[37]. We
+        used a vocabulary of 16K tokens for the WSJ only setting and a vocabulary
+        of 32K tokens\\nfor the semi-supervised setting.\\nWe performed only a small
+        number of experiments to select the dropout, both attention and residual\\n(section
+        5.4), learning rates and beam size on the Section 22 development set, all
+        other parameters\\nremained unchanged from the English-to-German base translation
+        model. During inference, we\\n\\n                                 9\\n\\n---\\n\\n
+        \   Table 4: The Transformer generalizes well to English constituency parsing
+        (Results are on Section 23\\n    of WSJ)\\n               Parser                          Training
+        \        WSJ 23 F1\\nVinyals & Kaiser el al. (2014) [37]    WSJ only, discriminative
+        \   88.3\\n     Petrov et al. (2006) [29]         WSJ only, discriminative
+        \   90.4\\n       Zhu et al. (2013) [40]          WSJ only, discriminative
+        \   90.4\\n       Dyer et al. (2016) [8]          WSJ only, discriminative
+        \   91.7\\n       Transformer (4 layers)          WSJ only, discriminative
+        \   91.3\\n       Zhu et al. (2013) [40]              semi-supervised         91.3\\n
+        \    Huang & Harper (2009) [14]            semi-supervised         91.3\\n
+        \   McClosky et al. (2006) [26]            semi-supervised         92.1\\nVinyals
+        & Kaiser el al. (2014) [37]        semi-supervised         92.1\\n       Transformer
+        (4 layers)              semi-supervised         92.7\\n      Luong et al.
+        (2015) [23]                multi-task           93.0\\n       Dyer et al.
+        (2016) [8]                 generative           93.3\\n\\n    increased the
+        maximum output length to input length + 300. We used a beam size of 21 and
+        \u03B1 = 0.3\\n    for both WSJ only and the semi-supervised setting.\\n    Our
+        results in Table 4 show that despite the lack of task-specific tuning our
+        model performs sur-\\n    prisingly well, yielding better results than all
+        previously reported models with the exception of the\\n    Recurrent Neural
+        Network Grammar [8].\\n    In contrast to RNN sequence-to-sequence models
+        [37], the Transformer outperforms the Berkeley-\\n    Parser [29] even when
+        training only on the WSJ training set of 40K sentences.\\n\\n    7    Conclusion\\n\\n
+        \   In this work, we presented the Transformer, the first sequence transduction
+        model based entirely on\\n    attention, replacing the recurrent layers most
+        commonly used in encoder-decoder architectures with\\n    multi-headed self-attention.\\n
+        \   For translation tasks, the Transformer can be trained significantly faster
+        than architectures based\\n    on recurrent or convolutional layers.    On
+        both WMT 2014 English-to-German and WMT 2014\\n    English-to-French translation
+        tasks, we achieve a new state of the art. In the former task our best\\n    model
+        outperforms even all previously reported ensembles.\\n    We are excited about
+        the future of attention-based models and plan to apply them to other tasks.
+        We\\n    plan to extend the Transformer to problems involving input and output
+        modalities other than text and\\n    to investigate local, restricted attention
+        mechanisms to efficiently handle large inputs and outputs\\n    such as images,
+        audio and video. Making generation less sequential is another research goals
+        of ours.\\n    The code we used to train and evaluate our models is available
+        at https://github.com/\\n    tensorflow/tensor2tensor.\\n\\n    Acknowledgements
+        We are grateful to Nal Kalchbrenner and Stephan Gouws for their fruitful\\n
+        \   comments, corrections and inspiration.\\n\\n    References\\n     [1]
+        \ Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization.
+        arXiv preprint\\n          arXiv:1607.06450, 2016.\\n     [2]  Dzmitry Bahdanau,
+        Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly\\n
+        \         learning to align and translate. CoRR, abs/1409.0473, 2014.\\n     [3]
+        \ Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc V. Le. Massive exploration
+        of neural\\n          machine translation architectures. CoRR, abs/1703.03906,
+        2017.\\n     [4]  Jianpeng Cheng, Li Dong, and Mirella Lapata. Long short-term
+        memory-networks for machine\\n          reading. arXiv preprint arXiv:1601.06733,
+        2016.\\n\\n                                             10\\n\\n---\\n\\n
+        [5]  Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares,
+        Holger Schwenk,\\n      and Yoshua Bengio. Learning phrase representations
+        using rnn encoder-decoder for statistical\\n      machine translation. CoRR,
+        abs/1406.1078, 2014.\\n [6]  Francois Chollet. Xception: Deep learning with
+        depthwise separable convolutions.   arXiv\\n      preprint arXiv:1610.02357,
+        2016.\\n [7]  Junyoung Chung, \xC7aglar G\xFCl\xE7ehre, Kyunghyun Cho, and
+        Yoshua Bengio. Empirical evaluation\\n      of gated recurrent neural networks
+        on sequence modeling. CoRR, abs/1412.3555, 2014.\\n [8]  Chris Dyer, Adhiguna
+        Kuncoro, Miguel Ballesteros, and Noah A. Smith. Recurrent neural\\n      network
+        grammars. In Proc. of NAACL, 2016.\\n [9]  Jonas Gehring, Michael Auli, David
+        Grangier, Denis Yarats, and Yann N. Dauphin. Convolu-\\n      tional sequence
+        to sequence learning. arXiv preprint arXiv:1705.03122v2, 2017.\\n[10]  Alex
+        Graves.     Generating sequences with recurrent neural networks.    arXiv
+        preprint\\n      arXiv:1308.0850, 2013.\\n[11]  Kaiming He, Xiangyu Zhang,
+        Shaoqing Ren, and Jian Sun. Deep residual learning for im-\\n      age recognition.
+        \ In Proceedings of the IEEE Conference on Computer Vision and Pattern\\n
+        \     Recognition, pages 770\u2013778, 2016.\\n[12]  Sepp Hochreiter, Yoshua
+        Bengio, Paolo Frasconi, and J\xFCrgen Schmidhuber. Gradient flow in\\n      recurrent
+        nets: the difficulty of learning long-term dependencies, 2001.\\n[13]  Sepp
+        Hochreiter and J\xFCrgen Schmidhuber. Long short-term memory.         Neural
+        computation,\\n      9(8):1735\u20131780, 1997.\\n[14]  Zhongqiang Huang and
+        Mary Harper. Self-training PCFG grammars with latent annotations\\n      across
+        languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural\\n
+        \     Language Processing, pages 832\u2013841. ACL, August 2009.\\n[15]  Rafal
+        Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring\\n
+        \     the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.\\n[16]
+        \ \u0141ukasz Kaiser and Samy Bengio. Can active memory replace attention?
+        In Advances in Neural\\n      Information Processing Systems, (NIPS), 2016.\\n[17]
+        \ \u0141ukasz Kaiser and Ilya Sutskever. Neural GPUs learn algorithms. In
+        International Conference\\n      on Learning Representations (ICLR), 2016.\\n[18]
+        \ Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex
+        Graves, and Ko-\\n      ray Kavukcuoglu. Neural machine translation in linear
+        time. arXiv preprint arXiv:1610.10099v2,\\n      2017.\\n[19]  Yoon Kim, Carl
+        Denton, Luong Hoang, and Alexander M. Rush. Structured attention networks.\\n
+        \     In International Conference on Learning Representations, 2017.\\n[20]
+        \ Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
+        In ICLR, 2015.\\n[21]  Oleksii Kuchaiev and Boris Ginsburg. Factorization
+        tricks for LSTM networks. arXiv preprint\\n      arXiv:1703.10722, 2017.\\n[22]
+        \ Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang,
+        Bowen\\n      Zhou, and Yoshua Bengio. A structured self-attentive sentence
+        embedding. arXiv preprint\\n      arXiv:1703.03130, 2017.\\n[23]  Minh-Thang
+        Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. Multi-task\\n
+        \     sequence to sequence learning. arXiv preprint arXiv:1511.06114, 2015.\\n[24]
+        \ Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches
+        to attention-\\n      based neural machine translation. arXiv preprint arXiv:1508.04025,
+        2015.\\n\\n                                       11\\n\\n---\\n\\n[25]  Mitchell
+        P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large
+        annotated\\n      corpus of english: The penn treebank. Computational linguistics,
+        19(2):313\u2013330, 1993.\\n\\n[26]  David McClosky, Eugene Charniak, and
+        Mark Johnson. Effective self-training for parsing. In\\n      Proceedings
+        of the Human Language Technology Conference of the NAACL, Main Conference,\\n
+        \     pages 152\u2013159. ACL, June 2006.\\n\\n[27]  Ankur Parikh, Oscar T\xE4ckstr\xF6m,
+        Dipanjan Das, and Jakob Uszkoreit. A decomposable attention\\n      model.
+        In Empirical Methods in Natural Language Processing, 2016.\\n\\n[28]  Romain
+        Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive\\n
+        \     summarization. arXiv preprint arXiv:1705.04304, 2017.\\n\\n[29]  Slav
+        Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact,\\n
+        \     and interpretable tree annotation. In Proceedings of the 21st International
+        Conference on\\n      Computational Linguistics and 44th Annual Meeting of
+        the ACL, pages 433\u2013440. ACL, July\\n      2006.\\n\\n[30]  Ofir Press
+        and Lior Wolf. Using the output embedding to improve language models.     arXiv\\n
+        \     preprint arXiv:1608.05859, 2016.\\n\\n[31]  Rico Sennrich, Barry Haddow,
+        and Alexandra Birch. Neural machine translation of rare words\\n      with
+        subword units. arXiv preprint arXiv:1508.07909, 2015.\\n\\n[32]  Noam Shazeer,
+        Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton,\\n
+        \     and Jeff Dean. Outrageously large neural networks: The sparsely-gated
+        mixture-of-experts\\n      layer. arXiv preprint arXiv:1701.06538, 2017.\\n\\n[33]
+        \ Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and
+        Ruslan Salakhutdi-\\n      nov. Dropout: a simple way to prevent neural networks
+        from overfitting. Journal of Machine\\n      Learning Research, 15(1):1929\u20131958,
+        2014.\\n\\n[34]  Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob
+        Fergus. End-to-end memory\\n      networks. In C. Cortes, N. D. Lawrence,
+        D. D. Lee, M. Sugiyama, and R. Garnett, editors,\\n      Advances in Neural
+        Information Processing Systems 28, pages 2440\u20132448. Curran Associates,\\n
+        \     Inc., 2015.\\n\\n[35]  Ilya Sutskever, Oriol Vinyals, and Quoc VV Le.
+        Sequence to sequence learning with neural\\n      networks. In Advances in
+        Neural Information Processing Systems, pages 3104\u20133112, 2014.\\n\\n[36]
+        \ Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and
+        Zbigniew Wojna.\\n      Rethinking the inception architecture for computer
+        vision. CoRR, abs/1512.00567, 2015.\\n\\n[37]  Vinyals & Kaiser, Koo, Petrov,
+        Sutskever, and Hinton. Grammar as a foreign language. In\\n      Advances
+        in Neural Information Processing Systems, 2015.\\n\\n[38]  Yonghui Wu, Mike
+        Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang\\n      Macherey,
+        Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google\u2019s neural
+        machine\\n      translation system: Bridging the gap between human and machine
+        translation. arXiv preprint\\n      arXiv:1609.08144, 2016.\\n\\n[39]  Jie
+        Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu.    Deep recurrent models
+        with\\n      fast-forward connections for neural machine translation. CoRR,
+        abs/1606.04199, 2016.\\n\\n[40]  Muhua Zhu, Yue Zhang, Wenliang Chen, Min
+        Zhang, and Jingbo Zhu.  Fast and accurate\\n      shift-reduce constituent
+        parsing. In Proceedings of the 51st Annual Meeting of the ACL (Volume\\n      1:
+        Long Papers), pages 434\u2013443. ACL, August 2013.\\n\\n                                         12\\n\\n---\\n\\nInput-Input
+        Layer5\\nAttention Visualizations\\n\\ngovernments    registration\\nAmerican
+        \      process\\nmajority    passed    making    difficult<EOS>    <pad>\\n
+        \                               <pad>             <pad>\\n                                            <pad>
+        <pad>\\n                      since     voting more    <pad>\\nspirit      have
+        \ laws 2009\\nIt  is in this  thata  of    new    the  or    .\\n\\nIt  is
+        \ this  thata  of    the  or    .                      <EOS>\\n                                  difficult\\n
+        \       in  spirit                    more                        <pad>\\n
+        \                          since      process                <pad>\\n                           laws
+        \                             <pad> <pad> <pad>\\n                           have
+        new    voting                          <pad>\\n                           American\\n
+        \                     majority     passed  2009\\n                                           making\\n
+        \                                 governments    registration\\n\\nFigure
+        3: An example of the attention mechanism following long-distance dependencies
+        in the\\nencoder self-attention in layer 5 of 6. Many of the attention heads
+        attend to a distant dependency of\\nthe verb \u2018making\u2019, completing
+        the phrase \u2018making...more difficult\u2019. Attentions here shown only
+        for\\nthe word \u2018making\u2019. Different colors represent different heads.
+        Best viewed in color.\\n\\n13\\n\\n---\\n\\nInput-Input Layer5\\n\\napplication\\n
+        \          missing    <EOS>\\nLaw never perfect    should    what    opinion
+        \ <pad>\\nThe  will  be  ,     butits    be just-  thisis  we are  ,  in my
+        \ .\\n\\nInput-Input Layer5    .  <EOS>\\nThe    ,    its       -  this    ,\\n
+        \      be perfectbut    be just  is what are  in my\\nLaw    never            should
+        \     we        missing    <pad>\\n       will             application                      opinion\\n\\napplication\\n
+        \          missing    <EOS>\\nLaw never perfect    should    what    opinion
+        \ <pad>\\nThe  will  be  ,  but its    be just-  thisis  we are  ,  in  my
+        \ .\\n\\nThe    ,    its    -  this    ,    .  <EOS>\\n       be perfectbut
+        \   be just  is  what are  in  my\\nLaw    never            should       we
+        \       missing    <pad>\\n       will             application                       opinion\\n\\nFigure
+        4: Two attention heads, also in layer 5 of 6, apparently involved in anaphora
+        resolution. Top:\\nFull attentions for head 5. Bottom: Isolated attentions
+        from just the word \u2018its\u2019 for attention heads 5\\nand 6. Note that
+        the attentions are very sharp for this word.\\n\\n14\\n\\n---\\n\\nInput-Input
+        Layer5\\n\\napplication\\n           missing    <EOS>\\nLaw never perfect
+        \   should    what    opinion  <pad>\\nThe  will  be  ,  but its    be just-
+        \ thisis  we are  ,  in  my  .\\n\\nThe    ,    its    -  this    ,    .  <EOS>\\n
+        \      be perfectbut    be just  is  what are  in  my\\nLaw    never            should
+        \      we        missing    <pad>\\n       will             application                       opinion\\nInput-Input
+        Layer5\\n\\napplication\\n           missing    <EOS>\\nLaw never perfect
+        \   should    what    opinion  <pad>\\nThe  will  be  ,  but its    be just-
+        \ thisis  we are  ,  in  my  .\\n\\nThe    ,    its    -  this    ,    .  <EOS>\\n
+        \      be perfectbut    be just  is  what are  in  my\\nLaw    never            should
+        \      we   missing    <pad>\\n       will             application                  opinion\\n\\nFigure
+        5: Many of the attention heads exhibit behaviour that seems related to the
+        structure of the\\nsentence. We give two such examples above, from two different
+        heads from the encoder self-attention\\nat layer 5 of 6. The heads clearly
+        learned to perform different tasks.\\n\\n15\",\"job_metadata\":{\"credits_used\":0,\"job_credits_usage\":0,\"job_pages\":0,\"job_auto_mode_triggered_pages\":0,\"job_is_cache_hit\":true}}"
+    headers:
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '47254'
+      Content-Type:
+      - application/json
+      Date:
+      - Sat, 17 Jan 2026 11:05:27 GMT
+      Strict-Transport-Security:
+      - max-age=31536000; includeSubDomains
+      x-correlation-id:
+      - 0697e9d8-7d1f-4f67-96a7-01a19b6f5d46
+      x-session-id:
+      - bfb1df66-56af-47c2-be0f-ce3e346f3211
+    status:
+      code: 200
+      message: OK


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the test file that uses this cassette and VCR configuration echo "=== Looking for test file using this cassette ===" find . -name "*test_llamaparse*.py" -type f echo -e "\n=== Searching for VCR matcher configuration ===" rg -A 5 -B 5 "test_llamaparse_load_data_instrumentation" --type=py echo -e "\n=== Checking for VCR/pytest-recording config ===" rg "record_mode|match_on|vcr|cassette" --type=py packages/opentelemetry-instrumentation-llamaindex/tests | head -30

Repository: traceloop/openllmetry

Length of output: 2875

🏁 Script executed:

#!/bin/bash # Check the test file structure echo "=== Checking for test file ===" test_file=$(find . -name "*test_llamaparse*.py" -type f) if [ -n "$test_file" ]; then echo "Found: $test_file" echo "=== First 100 lines of test file ===" head -100 "$test_file" fi

Repository: traceloop/openllmetry

Length of output: 3087

🏁 Script executed:

#!/bin/bash # Search for API key/secret patterns in cassette file cassette_file="packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml" echo "=== Checking cassette for potential secrets ===" # Look for common secret patterns grep -i "api.key\|api_key\|authorization\|bearer\|token\|password\|secret" "$cassette_file" || echo "No obvious secrets found in headers" echo -e "\n=== Checking response body for sensitive data ===" # Look at the response body more carefully grep -o "Authorization\|X-API-Key\|x-api-key" "$cassette_file" || echo "No auth headers detected" echo -e "\n=== Checking what UUIDs/IDs are present ===" grep -o "id\": \"[^\"]*\"" "$cassette_file" | head -10

Repository: traceloop/openllmetry

Length of output: 1632

🏁 Script executed:

#!/bin/bash # Check the vcr_config in conftest.py for matcher settings echo "=== VCR Configuration in conftest.py ===" grep -A 20 "def vcr_config" packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py

Repository: traceloop/openllmetry

Length of output: 745

🏁 Script executed:

#!/bin/bash # Verify if session IDs and correlation IDs should be scrubbed echo "=== Checking if session/correlation IDs are consistent or should be scrubbed ===" grep -o "x-session-id:[^}]*\|x-correlation-id:[^}]*" packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml | sort | uniq -c

Repository: traceloop/openllmetry

Length of output: 193

🏁 Script executed:

#!/bin/bash # Check if the default VCR matcher settings and what it matches on echo "=== Verifying pytest-vcr default matching behavior ===" rg "match_on|record_mode" packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py echo -e "\n=== Check if pytest-vcr is being used and its defaults ===" grep -r "pytest.mark.vcr\|pytest_vcr\|vcr_config" packages/opentelemetry-instrumentation-llamaindex/tests/*.py | head -5

Repository: traceloop/openllmetry

Length of output: 617

🏁 Script executed:

#!/bin/bash # Verify the actual request body to see if parsing_instruction presence matters echo "=== Checking if body variations could cause matcher issues ===" python3 << 'EOF' # pytest-vcr with default config matches on: # - method, scheme, host, port, path, query # - NOT on request body or response headers # - This means body changes (removing parsing_instruction) should NOT cause mismatches print("Default VCR matchers (per pytest-vcr):") print(" - method") print(" - scheme") print(" - host") print(" - port") print(" - path") print(" - query") print("\nDoes NOT match on: request body, response headers") print("\nConclusion: The cassette body/header changes should NOT break VCR matching") EOF

Repository: traceloop/openllmetry

Length of output: 352

Scrub session and correlation ID headers from the cassette using VCR filters.

The cassette contains x-session-id and x-correlation-id response headers that should be filtered to prevent potentially sensitive session data from being stored. Update vcr_config() in conftest.py to add these headers to the filter list.

Recommended change:

def vcr_config(): return { "filter_headers": ["authorization", "api-key", "x-session-id", "x-correlation-id"], "ignore_hosts": ["raw.githubusercontent.com"], }

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_llamaparse/test_llamaparse_load_data_instrumentation.yaml` around lines 797 - 1545, The VCR cassette contains sensitive x-session-id and x-correlation-id response headers that must be scrubbed; update the vcr_config() function (in conftest.py) so its returned "filter_headers" list includes "x-session-id" and "x-correlation-id" in addition to existing entries like "authorization" and "api-key", preserving the other keys (e.g., "ignore_hosts") in the returned config.

…support it

ellipsis-dev · 2026-01-17T12:50:29Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In
`@packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/dispatcher_wrapper.py`:
- Around line 232-244: The code unconditionally logs tool arguments for
call_tool spans (in the call_tool branch using ev, WorkflowToolCall,
span.set_attribute("tool.arguments", json.dumps(...), JSONEncoder)), which can
expose PII; update this block to consult the package's masking/suppression
config or utility (e.g., the existing masking flag/function used elsewhere)
before serializing ev.tool_kwargs—only call json.dumps and set the
"tool.arguments" attribute when the config permits, otherwise either omit the
attribute or set a redacted placeholder; ensure you reference the same ev,
WorkflowToolCall and span.set_attribute symbols so the gating logic is applied
exactly in this call_tool handling path.

In
`@packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py`:
- Around line 82-147: The token coercion and choices handling can raise
TypeError or misreport counts: when converting billed.input_tokens /
billed.output_tokens (variable billed) to int, guard against None (only int(...)
if value is not None) and for the dict-path use
.get("input_tokens")/.get("output_tokens") without defaulting to 0 so missing
keys remain None; only set span attributes after confirming non-None values.
Also extend the finish-reason handling around the choices variable to accept
dict-style choices (e.g., handle choices[0].get("finish_reason") when choices is
a list of dicts) before setting the span attribute via span.set_attribute with
GenAIAttributes or SpanAttributes.

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (1)
65-118: Reduce brittleness of prompt-index assertions.
The test currently assumes prompt indices .0 and .1; llama-index prompt ordering can shift. Consider asserting that any prompt content matches the user question instead of fixed indices.
♻️ Suggested adjustment
-    assert f"{GenAIAttributes.GEN_AI_PROMPT}.0.content" in llm_span_1.attributes
-    assert f"{GenAIAttributes.GEN_AI_PROMPT}.1.content" in llm_span_1.attributes
-    assert llm_span_1.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.1.content"] == (
-        "What is 2 times 3?"
-    )
+    prompt_contents = [
+        v
+        for k, v in llm_span_1.attributes.items()
+        if k.startswith(f"{GenAIAttributes.GEN_AI_PROMPT}.") and k.endswith(".content")
+    ]
+    assert "What is 2 times 3?" in prompt_contents

coderabbitai · 2026-01-17T12:56:06Z

...ry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/dispatcher_wrapper.py

+        # Extract tool information for call_tool spans (workflow-based agents)
+        if method_name == "call_tool":
+            try:
+                # The 'ev' argument is a WorkflowToolCall event
+                ev = bound_args.arguments.get("ev")
+                if ev and isinstance(ev, WorkflowToolCall):
+                    span.set_attribute("tool.name", ev.tool_name)
+                    span.set_attribute(
+                        "tool.arguments",
+                        json.dumps(ev.tool_kwargs, cls=JSONEncoder)
+                    )
+            except Exception:
+                pass


⚠️ Potential issue | 🟠 Major

Gate tool arguments behind a configuration mechanism to protect PII.

ev.tool_kwargs contains tool function arguments, which can include user input and sensitive data. Currently, tool.arguments is always recorded as JSON, regardless of any prompt suppression settings. This poses a privacy risk when the instrumentation is configured to limit sensitive data recording.

Implement conditional gating (similar to the existing masking configuration in the package) to respect suppression semantics when recording tool arguments. This could either reference a configuration flag or a utility function that determines whether to record such sensitive attributes.

Example approach

if method_name == "call_tool": try: # The 'ev' argument is a WorkflowToolCall event ev = bound_args.arguments.get("ev") if ev and isinstance(ev, WorkflowToolCall): span.set_attribute("tool.name", ev.tool_name) - span.set_attribute( - "tool.arguments", - json.dumps(ev.tool_kwargs, cls=JSONEncoder) - ) + if should_record_sensitive_data(): + span.set_attribute( + "tool.arguments", + json.dumps(ev.tool_kwargs, cls=JSONEncoder), + ) except Exception: pass

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/dispatcher_wrapper.py` around lines 232 - 244, The code unconditionally logs tool arguments for call_tool spans (in the call_tool branch using ev, WorkflowToolCall, span.set_attribute("tool.arguments", json.dumps(...), JSONEncoder)), which can expose PII; update this block to consult the package's masking/suppression config or utility (e.g., the existing masking flag/function used elsewhere) before serializing ev.tool_kwargs—only call json.dumps and set the "tool.arguments" attribute when the config permits, otherwise either omit the attribute or set a redacted placeholder; ensure you reference the same ev, WorkflowToolCall and span.set_attribute symbols so the gating logic is applied exactly in this call_tool handling path.

coderabbitai · 2026-01-17T12:56:06Z

...ntelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py

+    # Get model name - handle both dict and object formats
+    model = None
+    if hasattr(raw, "model"):
+        model = raw.model
+    elif isinstance(raw, dict) and "model" in raw:
+        model = raw.get("model")
+    if model:
+        span.set_attribute(GenAIAttributes.GEN_AI_RESPONSE_MODEL, model)
+
+    # Handle token usage - support multiple formats
+    input_tokens = None
+    output_tokens = None
+    total_tokens = None
+
+    # Try OpenAI format first: raw.usage with completion_tokens, prompt_tokens
+    usage = getattr(raw, "usage", None) or (raw.get("usage") if isinstance(raw, dict) else None)
+    if usage:
+        if hasattr(usage, "completion_tokens"):
+            output_tokens = usage.completion_tokens
+            input_tokens = usage.prompt_tokens
+            total_tokens = usage.total_tokens
+        elif isinstance(usage, dict):
+            output_tokens = usage.get("completion_tokens")
+            input_tokens = usage.get("prompt_tokens")
+            total_tokens = usage.get("total_tokens")
+
+    # Try Cohere format: raw.meta.tokens or raw.meta.billed_units
+    if input_tokens is None or output_tokens is None:
+        meta = getattr(raw, "meta", None) or (raw.get("meta") if isinstance(raw, dict) else None)
+        if meta:
+            # Try meta.tokens first (actual token counts)
+            tokens = getattr(meta, "tokens", None) or (meta.get("tokens") if isinstance(meta, dict) else None)
+            if tokens:
+                if hasattr(tokens, "input_tokens"):
+                    input_tokens = tokens.input_tokens
+                    output_tokens = tokens.output_tokens
+                elif isinstance(tokens, dict):
+                    input_tokens = tokens.get("input_tokens")
+                    output_tokens = tokens.get("output_tokens")
+
+            # Fallback to meta.billed_units if tokens not found
+            if input_tokens is None or output_tokens is None:
+                billed = getattr(meta, "billed_units", None) or (
+                    meta.get("billed_units") if isinstance(meta, dict) else None
+                )
+                if billed:
+                    if hasattr(billed, "input_tokens"):
+                        input_tokens = int(billed.input_tokens)
+                        output_tokens = int(billed.output_tokens)
+                    elif isinstance(billed, dict):
+                        input_tokens = int(billed.get("input_tokens", 0))
+                        output_tokens = int(billed.get("output_tokens", 0))
+
+    # Set token attributes if found
+    if output_tokens is not None:
+        span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, int(output_tokens))
+    if input_tokens is not None:
+        span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, int(input_tokens))
+    if total_tokens is not None:
+        span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, int(total_tokens))
+    elif input_tokens is not None and output_tokens is not None:
+        # Calculate total if not provided (e.g., for Cohere)
+        span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, int(input_tokens) + int(output_tokens))
+
+    # Handle finish reason for OpenAI-style responses
+    choices = getattr(raw, "choices", None)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py

Repository: traceloop/openllmetry

Length of output: 173

🏁 Script executed:

sed -n '70,160p' packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py | cat -n

Repository: traceloop/openllmetry

Length of output: 4487

🏁 Script executed:

rg -n "def dont_throw" packages/opentelemetry-instrumentation-llamaindex/

Repository: traceloop/openllmetry

Length of output: 189

🏁 Script executed:

rg -n "@dont_throw" packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py | head -20

Repository: traceloop/openllmetry

Length of output: 204

🏁 Script executed:

sed -n '38,50p' packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py | cat -n

Repository: traceloop/openllmetry

Length of output: 555

🏁 Script executed:

sed -n '38,55p' packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py | cat -n

Repository: traceloop/openllmetry

Length of output: 782

Add None guards for token coercion and support dict-based choices for consistency.

Lines 60–61: int(billed.input_tokens) and int(billed.output_tokens) can raise TypeError if values are None, which the @dont_throw decorator suppresses. Additionally, the dict fallback (lines 63–64) defaults missing keys to 0 instead of treating them as absent, creating asymmetric behavior that can misreport token counts. Lines 78–82 only handle object-style choices[0].finish_reason without supporting dict format, inconsistent with the dict handling established elsewhere in this function.

Suggested fixes

if input_tokens is None or output_tokens is None: billed = getattr(meta, "billed_units", None) or ( meta.get("billed_units") if isinstance(meta, dict) else None ) if billed: if hasattr(billed, "input_tokens"): - input_tokens = int(billed.input_tokens) - output_tokens = int(billed.output_tokens) + if billed.input_tokens is not None: + input_tokens = int(billed.input_tokens) + if billed.output_tokens is not None: + output_tokens = int(billed.output_tokens) elif isinstance(billed, dict): - input_tokens = int(billed.get("input_tokens", 0)) - output_tokens = int(billed.get("output_tokens", 0)) + if "input_tokens" in billed and billed["input_tokens"] is not None: + input_tokens = int(billed["input_tokens"]) + if "output_tokens" in billed and billed["output_tokens"] is not None: + output_tokens = int(billed["output_tokens"]) - choices = getattr(raw, "choices", None) + choices = getattr(raw, "choices", None) or ( + raw.get("choices") if isinstance(raw, dict) else None + ) if choices: - span.set_attribute( - SpanAttributes.LLM_RESPONSE_FINISH_REASON, choices[0].finish_reason - ) + first = choices[0] + finish_reason = ( + first.get("finish_reason") if isinstance(first, dict) else first.finish_reason + ) + if finish_reason is not None: + span.set_attribute( + SpanAttributes.LLM_RESPONSE_FINISH_REASON, finish_reason + )

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py` around lines 82 - 147, The token coercion and choices handling can raise TypeError or misreport counts: when converting billed.input_tokens / billed.output_tokens (variable billed) to int, guard against None (only int(...) if value is not None) and for the dict-path use .get("input_tokens")/.get("output_tokens") without defaulting to 0 so missing keys remain None; only set span attributes after confirming non-None values. Also extend the finish-reason handling around the choices variable to accept dict-style choices (e.g., handle choices[0].get("finish_reason") when choices is a list of dicts) before setting the span attribute via span.set_attribute with GenAIAttributes or SpanAttributes.

ellipsis-dev · 2026-01-17T16:23:05Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

ellipsis-dev · 2026-01-17T16:45:11Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@packages/opentelemetry-instrumentation-agno/pyproject.toml`:
- Around line 75-76: The [tool.uv] constraint-dependencies entry with
"urllib3>=2.6.3" is placed in a package pyproject.toml but UV only reads
constraints from the workspace-root; move the constraint-dependencies
declaration (the [tool.uv] table and its constraint-dependencies array
containing "urllib3>=2.6.3") into the repository root pyproject.toml and remove
it from the package-level pyproject.toml so the urllib3 constraint is applied
during dependency resolution.
- Line 44: The project bumped vcrpy to "vcrpy>=8.0.0,<9" but the VCR cassettes
were not regenerated; re-record or verify all cassettes under
packages/opentelemetry-instrumentation-agno/tests/cassettes/ using vcrpy 8.x
(run the test suite or cassette regeneration command with Python ≥3.10 and
urllib3 ≥2.x), update any cassette format or host/httpx-related differences as
needed to ensure tests pass, and commit the regenerated cassette files alongside
the pyproject.toml change.

In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml`:
- Line 12: Revert the Python requirement back to supports-3.9 by changing
requires-python from ">=3.10,<4" to ">=3.9,<4" and add an explicit constraint
for the vulnerable dependency instead of dropping 3.9: add a
constraint-dependencies entry under the [tool.uv] (or the project's constraints
section) that includes "filelock>=3.20.3" (you can also include the suggested
"urllib3>=2.6.3"), so the fixes are applied via dependency pinning rather than
removing support for Python 3.9; update the pyproject.toml accordingly, touching
the requires-python field and adding the constraint-dependencies list.

In `@packages/opentelemetry-instrumentation-cohere/pyproject.toml`:
- Line 12: The pyproject.toml change sets requires-python = ">=3.10,<4" but CI
only tests 3.11; update the CI matrix in the build-packages job of
.github/workflows/ci.yml to include Python 3.10 (add "3.10" to the
python-version/matrix entry used by the build-packages job) so the minimum
supported interpreter is validated, or alternatively run local validation and
note that in CI — reference the requires-python entry in pyproject.toml and the
build-packages job matrix when making the change.

In `@packages/opentelemetry-instrumentation-mistralai/.python-version`:
- Line 1: The PR incorrectly implies CVE-2025-68146 is fixed by moving to Python
3.10; either add an explicit dependency constraint filelock>=3.20.1 to
pyproject.toml (ensuring the lock/update and tests run) to actually remediate
the CVE, or update the commit/PR text to remove the CVE claim and state the real
reason for setting .python-version to "3.10" (clarify compatibility/maintenance
intent); reference the .python-version file, pyproject.toml dependencies, and
the filelock package/CVE-2025-68146 when making the change.

In `@packages/opentelemetry-instrumentation-ollama/pyproject.toml`:
- Around line 74-75: The per-package pyproject.toml contains a [tool.uv]
constraint-dependencies entry which is ignored by uv; remove the [tool.uv]
constraint-dependencies = ["urllib3>=2.6.3"] block from the package
pyproject.toml and add the same setting under [tool.uv] in the repository
workspace root pyproject.toml (or other workspace-level configuration that uv
reads) so the urllib3 constraint is applied; if there is currently no root
pyproject.toml create one and place the [tool.uv] constraint-dependencies entry
there.

In `@packages/opentelemetry-instrumentation-openai-agents/pyproject.toml`:
- Around line 75-76: The [tool.uv] constraint-dependencies entry currently
defined as constraint-dependencies = ["urllib3>=2.6.3"] in package
pyproject.toml must be moved to the workspace root pyproject.toml (or removed
from per-package files) because uv only reads this field at the root; to fix,
add a [tool.uv] section in the root pyproject.toml containing
constraint-dependencies = ["urllib3>=2.6.3"] (or delete the
constraint-dependencies line from each package pyproject.toml) so the urllib3
constraint is actually enforced.

In `@packages/opentelemetry-instrumentation-openai/pyproject.toml`:
- Line 42: Update test infra and VCR cassettes to be compatible with vcrpy 8.x
by running the full test suite and re-recording any failing cassettes;
specifically, after bumping the dependency string "vcrpy>=8.0.0,<9" in
pyproject.toml, run tests locally/CI, recreate VCR cassettes using the project's
test helper (the cassette recording utility used in your repo), and commit the
new cassette files, while also ensuring CI/Python matrix uses Python 3.10+ and
that any httpx/urllib3-related fixtures or adapters in your test helpers are
updated to the vcrpy 8.x API.

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-llamaindex/pyproject.toml (1)

38-44: Significant llama-index ecosystem version bumps.

Multiple test dependencies jumped major versions. The past review comment flagged breaking changes in llama-index 0.14+ (removed checkpointer, sub-workflows, deprecated Workflow methods) and llama-parse 0.6+ (JSON config payload changes), which was marked as addressed in commit 0d1309f.

Ensure the test suite passes with these new versions via uv run pytest packages/opentelemetry-instrumentation-llamaindex/tests/.

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-lancedb/pyproject.toml (1)
77-78: Confirm that urllib3>=2.6.3 is validated on PyPI; consider adding an upper bound for defensive versioning.

urllib3>=2.6.3 is available on PyPI (released Jan 7, 2026) and is compatible with transitive dependencies (e.g., vcrpy 8.0.0-8.x). Adding an upper bound <3 is recommended to prevent unexpected major version upgrades when urllib3 3.0 is released.
♻️ Suggested constraint tightening
[tool.uv]
-constraint-dependencies = ["urllib3>=2.6.3"]
+constraint-dependencies = ["urllib3>=2.6.3,<3"]

coderabbitai · 2026-01-17T16:48:09Z

packages/opentelemetry-instrumentation-agno/pyproject.toml

  "pytest-sugar==1.0.0",
  "pytest>=8.2.2,<9",
-  "vcrpy>=6.0.1,<7",
+  "vcrpy>=8.0.0,<9",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

python3 << 'EOF' import urllib.request import json # Check vcrpy versions on PyPI url = "https://pypi.org/pypi/vcrpy/json" try: with urllib.request.urlopen(url) as response: data = json.loads(response.read().decode()) # Get available versions releases = data['releases'] versions_6x = sorted([v for v in releases.keys() if v.startswith('6.')]) versions_8x = sorted([v for v in releases.keys() if v.startswith('8.')]) current_version = data['info']['version'] print(f"Current PyPI version: {current_version}") print(f"\nAvailable 6.x versions: {versions_6x[-3:] if versions_6x else 'None'}") print(f"Available 8.x versions: {versions_8x[-3:] if versions_8x else 'None'}") print(f"All 8.x versions exist: {len(versions_8x) > 0}") except Exception as e: print(f"Error: {e}") EOF

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

# Find all .yaml or .json cassette files in test directories find . -path "**/tests/**" -type f $ -name "*.yaml" -o -name "*.json" $ | head -20

Repository: traceloop/openllmetry

Length of output: 2468

🌐 Web query:

vcrpy 8.0 breaking changes migration from 6.x

💡 Result:

Summary checklist to migrate from vcrpy 6.x → 8.x

Upgrade Python to a supported version (v8 drops Python 3.9). Install/use Python >=3.10+. [1]

Ensure urllib3 >= 2 (v8 dropped support for urllib3 < 2). Upgrade dependency before installing vcrpy. [1]

Install vcrpy 8.x: pip install --upgrade "vcrpy>=8.0.0". Test your test-suite. [1]

Test httpx/aiohttp behaviour: v8 rewrote httpx support (now patches httpcore) and fixed several httpx/http proxy issues — run tests that use httpx and any custom httpx transports. [1]

Re-record cassettes if necessary: vcrpy 6.0 warned that some cassettes produced by earlier releases needed re-creating due to binary-format issues. If you see cassette/serialization errors or mismatched responses, delete or re-record affected cassettes. [2]

Optional: take advantage of new v8 features (e.g., drop_unused_requests cassette option) and verify logging/URI/proxy changes do not affect assertions. [1]

References

vcrpy v8.0.0 release notes (breaking changes, httpx, urllib3, new options). [1]

vcrpy 6.0.0 changelog (note about possibly needing to recreate some cassettes). [2]

🏁 Script executed:

# Check the full pyproject.toml for Python and urllib3 constraints cat -n ./packages/opentelemetry-instrumentation-agno/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 2331

🏁 Script executed:

# Check git history/diff for cassette changes (last 5 commits) git log --oneline -10 -- ./packages/opentelemetry-instrumentation-agno/tests/cassettes/

Repository: traceloop/openllmetry

Length of output: 107

🏁 Script executed:

# Inspect a sample cassette file to check for potential secrets/PII head -50 ./packages/opentelemetry-instrumentation-agno/tests/cassettes/test_agent/test_agent_run_basic.yaml

Repository: traceloop/openllmetry

Length of output: 1677

🏁 Script executed:

# Check if cassettes were modified in this PR by checking git diff stat git diff --stat -- "./packages/opentelemetry-instrumentation-agno/tests/cassettes/"

Repository: traceloop/openllmetry

Length of output: 47

🏁 Script executed:

# Check recent changes to cassettes more broadly git log --oneline -20 -- "**/tests/cassettes/" | head -10

Repository: traceloop/openllmetry

Length of output: 47

Regenerate all VCR cassettes with vcrpy 8.x before merging.

The major version bump from vcrpy 6.x to 8.x includes breaking changes (httpx rewrite, urllib3 >=2 requirement, potential cassette format changes). While the Python (≥3.10) and urllib3 (≥2.6.3) constraints are properly configured, the cassettes have not been regenerated in this PR. Per best practices, all cassettes in packages/opentelemetry-instrumentation-agno/tests/cassettes/ must be re-recorded or verified to work with vcrpy 8.x.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-agno/pyproject.toml` at line 44, The project bumped vcrpy to "vcrpy>=8.0.0,<9" but the VCR cassettes were not regenerated; re-record or verify all cassettes under packages/opentelemetry-instrumentation-agno/tests/cassettes/ using vcrpy 8.x (run the test suite or cassette regeneration command with Python ≥3.10 and urllib3 ≥2.x), update any cassette format or host/httpx-related differences as needed to ensure tests pass, and commit the regenerated cassette files alongside the pyproject.toml change.

coderabbitai · 2026-01-17T16:48:09Z

packages/opentelemetry-instrumentation-agno/pyproject.toml

+[tool.uv]
+constraint-dependencies = ["urllib3>=2.6.3"]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

urllib3 2.6.3 CVE security advisory

💡 Result:

Summary: urllib3 2.6.3 (released Jan 7, 2026) fixes a high‑severity decompression-bomb/DoS vulnerability tracked as CVE‑2026‑21441. Versions >=1.22 and <2.6.3 are affected; upgrade to 2.6.3 (or later) to mitigate. [1][2][3]

Sources:

urllib3 2.6.3 release on PyPI (upload/release metadata). [1]

urllib3 GitHub release notes (2.6.3) mentioning the security fix (GHSA-38jv-5279-wg99). [2]

CVE summary for CVE-2026-21441 describing affected versions and impact. [3]

🏁 Script executed:

# Check the pyproject.toml file to see full context and dependencies cat packages/opentelemetry-instrumentation-agno/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 1799

🌐 Web query:

uv package manager constraint-dependencies pyproject.toml documentation

💡 Result:

Summary — what tool.uv.constraint-dependencies does

Purpose: global version constraints that restrict what versions uv may choose when resolving your project’s dependencies; adding a package here does NOT cause it to be installed by itself. [1][2]

Location: put it under [tool.uv] in the workspace-root pyproject.toml; uv lock / uv sync / uv run only read this field from the workspace root (they ignore member pyproject.toml or uv.toml declarations). [1][2]

Syntax: list of requirement strings (e.g. "grpcio<1.65" or "setuptools==60.0.0" for build constraints). [1][2]

Interaction with pip/compile: uv will append these pyproject constraints to any --constraint files when compiling/locking. [1]

Minimal example (pyproject.toml)
[tool.uv]
constraint-dependencies = ["grpcio<1.65"]

References

uv reference — Settings (constraint-dependencies / build-constraint-dependencies). [docs.astral.sh]. [1]

uv Workspace/pyproject (ToolUv struct) documentation (fields: constraint_dependencies, build_constraint_dependencies; root-only note). [docs.rs]. [2]

Move [tool.uv] constraint-dependencies to the workspace-root pyproject.toml.

The constraint-dependencies field for urllib3>=2.6.3 is placed in a package-specific pyproject.toml, but per uv documentation, uv only reads constraint-dependencies from the workspace-root pyproject.toml and explicitly ignores declarations in member packages. This means the constraint won't be applied during dependency resolution.

While the CVE remediation goal is valid (urllib3 2.6.3, released Jan 7, 2026, fixes CVE-2026-21441), move the [tool.uv] section to the repository's root pyproject.toml for the constraint to take effect.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-agno/pyproject.toml` around lines 75 - 76, The [tool.uv] constraint-dependencies entry with "urllib3>=2.6.3" is placed in a package pyproject.toml but UV only reads constraints from the workspace-root; move the constraint-dependencies declaration (the [tool.uv] table and its constraint-dependencies array containing "urllib3>=2.6.3") into the repository root pyproject.toml and remove it from the package-level pyproject.toml so the urllib3 constraint is applied during dependency resolution.

coderabbitai · 2026-01-17T16:48:09Z

packages/opentelemetry-instrumentation-anthropic/pyproject.toml

 license = "Apache-2.0"
 readme = "README.md"
-requires-python = ">=3.9,<4"
+requires-python = ">=3.10,<4"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

filelock CVE Python 3.9 2024 2025

💡 Result:

Short answer: Yes — the filelock Python package had TOCTOU/symlink CVEs in 2025 (CVE-2025-68146) and 2026 (CVE-2026-22701). Both are issues in filelock itself (not in CPython) and can affect projects running on Python 3.9 if they use vulnerable filelock versions. Fixes: upgrade to filelock >= 3.20.1 (CVE-2025-68146) and >= 3.20.3 (CVE-2026-22701). Sources: NVD and advisory pages. [1][2][3]

References

CVE-2025-68146 (filelock TOCTOU; fixed in 3.20.1). [1]

GitLab/PyPI advisory for CVE-2025-68146. [2]

NVD entry for CVE-2026-22701 (SoftFileLock TOCTOU; fixed in 3.20.3). [3]

🏁 Script executed:

cat -n packages/opentelemetry-instrumentation-anthropic/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 2384

🏁 Script executed:

rg -i "filelock" packages/opentelemetry-instrumentation-anthropic/

Repository: traceloop/openllmetry

Length of output: 47

🏁 Script executed:

# Search for filelock as a transitive dependency by checking opentelemetry-instrumentation git ls-files | grep -E "(pyproject\.toml|requirements)" | head -20

Repository: traceloop/openllmetry

Length of output: 1319

🏁 Script executed:

# Check if there are other changes in the same PR that might explain Python 3.10 git log --oneline --all | head -20

Repository: traceloop/openllmetry

Length of output: 1169

🏁 Script executed:

# Check the parent package to see if it requires Python 3.10 rg "requires-python" packages/opentelemetry-instrumentation/pyproject.toml 2>/dev/null || echo "Not found"

Repository: traceloop/openllmetry

Length of output: 74

🌐 Web query:

CVE filelock Python 3.9 specific vulnerability

💡 Result:

Short answer: There is no CVE that targets Python 3.9 itself — the known filelock CVEs are vulnerabilities in the filelock library (not the Python interpreter) and affect certain filelock versions regardless of which Python interpreter version you run. See below.

Details:

CVE-2025-68146 — TOCTOU race in Unix/Windows lock-file creation; fixed in filelock 3.20.1. Affects filelock users on all platforms; not specific to Python 3.9. [1][2]

CVE-2026-22701 — TOCTOU race in SoftFileLock; fixed in filelock 3.20.3. Also a filelock-version issue, not a Python-3.9-specific interpreter bug. [2][3]

Note: some Linux distro package names/versions (e.g., “python-filelock 3.9.0-1”) refer to the filelock package version, not the Python interpreter version; those packaged versions may be listed as “vulnerable” if they contain an affected filelock release. [3]

Recommended action: upgrade filelock to >=3.20.3 (or >=3.20.1 to address CVE-2025-68146) or apply the vendor mitigations in the advisories. [1][2]

Sources:
[1] GitLab Advisory / filelock CVE-2025-68146.
[2] NVD / CVE-2026-22701 and AWS/Explore pages (fixes in 3.20.3 / 3.20.1).
[3] Debian security tracker showing packaged filelock versions and vulnerability status.

Dropping Python 3.9 is an unnecessary breaking change.

The filelock CVEs (CVE-2025-68146, CVE-2026-22701) are vulnerabilities in the filelock package itself, not Python-version specific. They affect all Python versions equally and can be fixed by pinning filelock >= 3.20.3. Dropping Python 3.9 support is not required to remediate these CVEs.

Instead, add an explicit filelock constraint to the project dependencies or [tool.uv] constraint-dependencies section:

constraint-dependencies = ["urllib3>=2.6.3", "filelock>=3.20.3"]

Then revert requires-python to ">=3.9,<4" to preserve backward compatibility.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml` at line 12, Revert the Python requirement back to supports-3.9 by changing requires-python from ">=3.10,<4" to ">=3.9,<4" and add an explicit constraint for the vulnerable dependency instead of dropping 3.9: add a constraint-dependencies entry under the [tool.uv] (or the project's constraints section) that includes "filelock>=3.20.3" (you can also include the suggested "urllib3>=2.6.3"), so the fixes are applied via dependency pinning rather than removing support for Python 3.9; update the pyproject.toml accordingly, touching the requires-python field and adding the constraint-dependencies list.

coderabbitai · 2026-01-17T16:48:09Z

packages/opentelemetry-instrumentation-cohere/pyproject.toml

 license = "Apache-2.0"
 readme = "README.md"
-requires-python = ">=3.9,<4"
+requires-python = ">=3.10,<4"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # 1. Find and display the pyproject.toml file content echo "=== Cohere instrumentation pyproject.toml ===" cat packages/opentelemetry-instrumentation-cohere/pyproject.toml echo -e "\n=== Searching for Python version pins across repo ===" rg -n -C2 '(python-version|PYTHON_VERSION|3\.9|3\.10|requires-python)' --type-list | head -50 echo -e "\n=== CI workflow files ===" fd -e yml -e yaml . .github 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 2507

🏁 Script executed:

#!/bin/bash # Check root pyproject.toml and parent package dependencies echo "=== Root pyproject.toml (if exists) ===" if [ -f "pyproject.toml" ]; then cat pyproject.toml | grep -A5 -B5 "requires-python\|python" fi echo -e "\n=== Checking sibling packages Python requirements ===" fd -e pyproject.toml "packages/.*pyproject.toml" | while read f; do echo "--- $f ---" grep "requires-python" "$f" done

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

#!/bin/bash # Check CI workflow for Python version matrix echo "=== GitHub Actions CI workflow ===" cat .github/workflows/ci.yml | head -100 echo -e "\n=== Search for 3.9 references ===" rg "3\.9" --no-heading -n 2>/dev/null || echo "No 3.9 references found" echo -e "\n=== Search for python-version references ===" rg "python-version|python.version" --no-heading -n 2>/dev/null || echo "No version pins found"

Repository: traceloop/openllmetry

Length of output: 50379

🏁 Script executed:

#!/bin/bash # Check Python requirements in other packages echo "=== All package requires-python statements ===" fd -e pyproject.toml "packages/" | xargs grep "requires-python" 2>/dev/null echo -e "\n=== Check for any tooling that pins older Python ===" rg "3\.9" --type py --type toml --max-count=10 2>/dev/null || echo "No 3.9 found in code"

Repository: traceloop/openllmetry

Length of output: 1299

Add Python 3.10 to CI test matrix to validate the minimum version floor.

The CI workflow currently only tests with Python 3.11 (see build-packages job matrix in .github/workflows/ci.yml). Setting requires-python = ">=3.10,<4" without testing against 3.10 creates a gap where the minimum version constraint is untested. Either add 3.10 to the matrix or verify that tooling and dependencies have been validated against 3.10 locally before merging.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-cohere/pyproject.toml` at line 12, The pyproject.toml change sets requires-python = ">=3.10,<4" but CI only tests 3.11; update the CI matrix in the build-packages job of .github/workflows/ci.yml to include Python 3.10 (add "3.10" to the python-version/matrix entry used by the build-packages job) so the minimum supported interpreter is validated, or alternatively run local validation and note that in CI — reference the requires-python entry in pyproject.toml and the build-packages job matrix when making the change.

coderabbitai · 2026-01-17T16:48:09Z

packages/opentelemetry-instrumentation-mistralai/.python-version

@@ -1 +1 @@
-3.9.5
+3.10


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check pyproject.toml in mistralai instrumentation package echo "=== pyproject.toml in mistralai instrumentation ===" find packages/opentelemetry-instrumentation-mistralai -name "pyproject.toml" -type f | head -1 | xargs cat 2>/dev/null || echo "File not found" echo "" echo "=== All .python-version files in packages directory ===" find packages -name ".python-version" -type f | sort | xargs -I {} sh -c 'echo "=== {} ==="; cat {}' echo "" echo "=== Check if setup.py or setup.cfg exist in mistralai package ===" find packages/opentelemetry-instrumentation-mistralai -type f $ -name "setup.py" -o -name "setup.cfg" $ | xargs ls -la

Repository: traceloop/openllmetry

Length of output: 5669

🌐 Web query:

filelock CVE Python 3.10 security vulnerability

💡 Result:

Short answer: A TOCTOU (time-of-check/time-of-use) race condition in the filelock Python library (fixed in filelock 3.20.1) allows a local attacker who can create symlinks to cause lock-file creation to follow a symlink and truncate or corrupt arbitrary files. Affects filelock versions prior to 3.20.1 on Unix/Linux/macOS and Windows (requires ability to create symlinks). Upgrade to filelock >= 3.20.1; if you cannot upgrade immediately, restrict lock-file directories (chmod 0700), avoid predictable lock paths, or use SoftFileLock as a partial mitigation. [1][2][3]

Sources:

GitLab advisory (CVE-2025-68146) — filelock TOCTOU/symlink attack. [1]

NVD entry for CVE-2025-68146 (description, affected versions, fix 3.20.1). [2]

AWS/ALAS and distro trackers summarizing impact and mitigations. [3]

🏁 Script executed:

#!/bin/bash # Search for filelock constraints in pyproject.toml and related files echo "=== Searching for filelock in mistralai package files ===" find packages/opentelemetry-instrumentation-mistralai -type f $ -name "*.toml" -o -name "*.txt" -o -name "*.cfg" $ | xargs grep -l filelock 2>/dev/null || echo "No direct filelock references found" echo "" echo "=== Check for uv.lock or requirements files with filelock ===" find packages -maxdepth 1 -name "uv.lock" -o -name "requirements*.txt" 2>/dev/null | xargs grep filelock 2>/dev/null || echo "No filelock version found in lock files" echo "" echo "=== Check root level uv.lock or similar ===" find . -maxdepth 2 -name "uv.lock" -type f | head -1 | xargs grep filelock 2>/dev/null | head -5 || echo "Checking failed or no filelock entry" echo "" echo "=== Check if there's a constraint on urllib3 or other packages ===" grep -r "urllib3" packages/opentelemetry-instrumentation-mistralai/ 2>/dev/null || echo "No urllib3 constraints found in mistralai package"

Repository: traceloop/openllmetry

Length of output: 600

Clarify CVE remediation or add filelock version constraint.

The configuration is consistent (.python-version and pyproject.toml both require Python ≥3.10) and mistralai is compatible with Python 3.10. However, the commit message's claim that Python 3.10 resolves the filelock CVE is unsupported: CVE-2025-68146 is fixed only by upgrading filelock to ≥3.20.1, not by the Python version itself. No filelock version constraint was added in this PR.

Either add an explicit filelock>=3.20.1 constraint to the dependencies if CVE remediation is the goal, or clarify the actual reason for the Python 3.10 upgrade if the CVE claim was inaccurate.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-mistralai/.python-version` at line 1, The PR incorrectly implies CVE-2025-68146 is fixed by moving to Python 3.10; either add an explicit dependency constraint filelock>=3.20.1 to pyproject.toml (ensuring the lock/update and tests run) to actually remediate the CVE, or update the commit/PR text to remove the CVE claim and state the real reason for setting .python-version to "3.10" (clarify compatibility/maintenance intent); reference the .python-version file, pyproject.toml dependencies, and the filelock package/CVE-2025-68146 when making the change.

coderabbitai · 2026-01-17T16:48:10Z

packages/opentelemetry-instrumentation-ollama/pyproject.toml

+[tool.uv]
+constraint-dependencies = ["urllib3>=2.6.3"]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

uv constraint-dependencies tool.uv pyproject.toml

💡 Result:

Use the tool.uv setting constraint-dependencies in pyproject.toml to add version constraints that uv will apply when resolving your project's runtime dependencies. Constraints restrict which versions can be selected but do not by themselves cause a package to be installed (the package must be requested by a direct or transitive dependency).

Key points

Setting: [tool.uv].constraint-dependencies is a list of requirement strings (e.g. "grpcio<1.65"). [1][2]

It is only read from the workspace root pyproject.toml for uv lock / uv sync / uv run (uv ignores per-package declarations or uv.toml). [1][2]

Use it when you need to force or limit versions of transitive dependencies without adding them as direct dependencies.

Example pyproject.toml snippet
[tool.uv]
constraint-dependencies = ["grpcio<1.65"]

References
[1] uv settings — constraint-dependencies (official docs).
[2] uv Workspace pyproject ToolUv fields (docs.rs).

🏁 Script executed:

# Check uv version requirement and urllib3 dependencies in the project cd packages/opentelemetry-instrumentation-ollama cat pyproject.toml | head -100

Repository: traceloop/openllmetry

Length of output: 1760

🏁 Script executed:

# Search for urllib3 references across the project to identify potential conflicts rg "urllib3" --type py --type toml -A 2 -B 2

Repository: traceloop/openllmetry

Length of output: 13976

🏁 Script executed:

# Check if there's a workspace root pyproject.toml ls -la pyproject.toml 2>/dev/null && echo "=== Root pyproject.toml found ===" || echo "No root pyproject.toml" # Check if this is configured as a workspace cat pyproject.toml | grep -A 10 "\[tool.uv\]" 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

# Verify workspace member structure git ls-files | grep "pyproject.toml" | head -5

Repository: traceloop/openllmetry

Length of output: 377

🏁 Script executed:

# Check for uv workspace configuration (uv.lock or other indicators) git ls-files | grep -E "(uv\.lock|pyproject\.toml$)" | head -1 ls -la uv.lock 2>/dev/null || echo "No uv.lock found"

Repository: traceloop/openllmetry

Length of output: 140

Remove or relocate [tool.uv] constraint-dependencies to workspace root.

The constraint-dependencies setting in lines 74–75 has no effect in per-package pyproject.toml files. According to uv documentation, [tool.uv] is only read from the workspace root pyproject.toml; per-package declarations are ignored. Since this repository has no root pyproject.toml, move the constraint to a workspace-level configuration file for it to take effect.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-ollama/pyproject.toml` around lines 74 - 75, The per-package pyproject.toml contains a [tool.uv] constraint-dependencies entry which is ignored by uv; remove the [tool.uv] constraint-dependencies = ["urllib3>=2.6.3"] block from the package pyproject.toml and add the same setting under [tool.uv] in the repository workspace root pyproject.toml (or other workspace-level configuration that uv reads) so the urllib3 constraint is applied; if there is currently no root pyproject.toml create one and place the [tool.uv] constraint-dependencies entry there.

coderabbitai · 2026-01-17T16:48:10Z

packages/opentelemetry-instrumentation-openai-agents/pyproject.toml

+[tool.uv]
+constraint-dependencies = ["urllib3>=2.6.3"]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

uv constraint-dependencies pyproject.toml documentation

💡 Result:

Do you mean the "uv" tool/package (which project?), or are you referring to "uvicorn", "poetry", or another build/packaging tool? I need which project/URL to search so I can find the exact pyproject.toml "constraint-dependencies" documentation.

🏁 Script executed:

# Find CI configuration files find . -type f $ -name "*.yml" -o -name "*.yaml" $ | grep -E "(workflow|ci|action)" | head -20

Repository: traceloop/openllmetry

Length of output: 2464

🏁 Script executed:

# Check the pyproject.toml file for context cat -n packages/opentelemetry-instrumentation-openai-agents/pyproject.toml | head -100

Repository: traceloop/openllmetry

Length of output: 2372

🏁 Script executed:

# Check CI configuration cat -n .github/workflows/ci.yml | head -100

Repository: traceloop/openllmetry

Length of output: 3937

🏁 Script executed:

# Search for other uses of [tool.uv] in the repo rg '\[tool\.uv\]' -A 2

Repository: traceloop/openllmetry

Length of output: 6017

🌐 Web query:

astral-sh uv constraint-dependencies pyproject.toml

💡 Result:

Use uv's tool.uv settings in your workspace root pyproject.toml.

constraint-dependencies — constraints applied when resolving runtime/regular dependencies. Provide a list of version specifiers (strings). uv only reads this from the workspace root. Example:
[tool.uv]
constraint-dependencies = ["grpcio<1.65", "httpx==0.27.2"]. [1]

build-constraint-dependencies — constraints applied when resolving build (PEP 517) dependencies. Also only read from the workspace root. Example:
[tool.uv]
build-constraint-dependencies = ["setuptools==60.0.0"]. [1]

Notes:

Both fields are list[str] and do not cause packages to be installed by themselves — they only restrict versions selected during resolution. [1][2]

Sources:
[1] uv Settings (tool.uv) — docs.astral.sh.
[2] Managing dependencies — docs.astral.sh.

🏁 Script executed:

# Check for workspace root pyproject.toml ls -la pyproject.toml 2>/dev/null || echo "No pyproject.toml in root" # Check uv.toml for workspace config ls -la uv.toml 2>/dev/null || echo "No uv.toml in root" # Look for workspace configuration rg 'workspace' pyproject.toml 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 109

🏁 Script executed:

# Check the structure to find where workspace root might be find . -maxdepth 2 -name "pyproject.toml" -type f | head -10

Repository: traceloop/openllmetry

Length of output: 47

🏁 Script executed:

# Let me check if there's a workspace configuration elsewhere cat -n pyproject.toml 2>/dev/null || echo "Checking repository structure..." # Check what nx setup is (since the CI uses npx nx) cat -n nx.json 2>/dev/null | head -50

Repository: traceloop/openllmetry

Length of output: 259

🏁 Script executed:

# Verify the CI actually runs uv commands by looking at more of ci.yml cat -n .github/workflows/ci.yml | tail -100

Repository: traceloop/openllmetry

Length of output: 3945

Move constraint-dependencies to workspace root pyproject.toml.

uv supports the constraint-dependencies field, and CI does install uv. However, according to uv documentation, constraint-dependencies is only read and enforced from the workspace root pyproject.toml, not from individual package files. The current configuration in 32+ packages is not being enforced by uv. Create a root pyproject.toml with this constraint, or remove it from individual packages if a workspace is not intended.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-openai-agents/pyproject.toml` around lines 75 - 76, The [tool.uv] constraint-dependencies entry currently defined as constraint-dependencies = ["urllib3>=2.6.3"] in package pyproject.toml must be moved to the workspace root pyproject.toml (or removed from per-package files) because uv only reads this field at the root; to fix, add a [tool.uv] section in the root pyproject.toml containing constraint-dependencies = ["urllib3>=2.6.3"] (or delete the constraint-dependencies line from each package pyproject.toml) so the urllib3 constraint is actually enforced.

coderabbitai · 2026-01-17T16:48:10Z

packages/opentelemetry-instrumentation-openai/pyproject.toml

  "pytest>=8.2.2,<9",
  "requests>=2.31.0,<3",
-  "vcrpy>=6.0.1,<7",
+  "vcrpy>=8.0.0,<9",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the current vcrpy dependency in the pyproject.toml file cat packages/opentelemetry-instrumentation-openai/pyproject.toml | grep -A 2 -B 2 vcrpy

Repository: traceloop/openllmetry

Length of output: 135

🏁 Script executed:

# List test files in the package fd -t f "test_" packages/opentelemetry-instrumentation-openai/tests --max-depth 3

Repository: traceloop/openllmetry

Length of output: 1433

🏁 Script executed:

# Look for VCR cassettes (typically YAML files) in the test directory fd -t f "\.(yaml|yml)$" packages/opentelemetry-instrumentation-openai/tests

Repository: traceloop/openllmetry

Length of output: 22763

🏁 Script executed:

# Look at a sample cassette to understand the format head -30 packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat.yaml

Repository: traceloop/openllmetry

Length of output: 743

🌐 Web query:

vcrpy breaking changes version 6 to version 8 major version upgrade

💡 Result:

Summary of major breaking changes when upgrading from vcrpy 6.x → 8.x (notes pulled from the project changelogs/releases):

Cassette/serialization changes: v6 warned that some cassettes may need to be recreated because previous releases saved a binary format incorrectly (you may need to re-record cassettes or run migration tools). Also custom serializer/deserializer APIs changed in earlier 1.x-era migration notes — verify any custom persisters/serializers. [1]

Dependency / supported-Python changes:

v8 drops support for Python 3.9 (so minimum supported Python is >=3.10/3.11 depending on the exact v8 patch). [2]

v8 drops support for urllib3 < 2 (you must be using urllib3 v2+). [2]

HTTP client integrations and behavior:

v6 included fixes for httpx support; v8 rewrote httpx support to patch httpcore instead of httpx and fixed several httpx-related exceptions and custom-transport support — if you rely on httpx behavior or custom transports, test those flows after upgrade. [1][2]

HTTPS proxy handling was changed/fixed in v8 (proxy addresses no longer appear in cassette URIs). [2]

Removed/changed optional integrations:

v6 removed support for boto (the old boto package) while keeping boto3; if you used boto (not boto3) that will break. v6 also dropped simplejson. Check for other removed/changed optional dependencies in the v6 notes. [1]

Migration advice (short):

Re-record cassettes (or back them up and run any migration scripts) and run your test-suite to detect cassette-format or serializer issues. [1][3]

Ensure runtime environment uses supported Python and dependency versions (upgrade urllib3 and Python if needed). [2]

Run integration tests for httpx/requests/urllib3-based code paths and for any custom persisters/serializers. [1][2]

Sources:

vcrpy v6 changelog (notable breaking items: cassette format/binary, drop boto/simplejson). [1] https://vcrpy.readthedocs.io/en/v6.0.0/changelog.html

vcrpy v8.0.0 release notes / changelog (drop Python 3.9, drop urllib3 <2, httpx rewrite, proxy handling fix). [2] (GitHub release / changelog)

Verify tests pass and re-record VCR cassettes for vcrpy 8.x compatibility.

Major version bump to vcrpy 8.x includes breaking changes that may require cassette re-recording. Key compatibility concerns:

Cassette format may need recreation

Python 3.9 no longer supported (minimum 3.10+)

urllib3 <2 no longer supported

httpx support was rewritten

Run the test suite and re-record cassettes if tests fail due to cassette format incompatibility.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-openai/pyproject.toml` at line 42, Update test infra and VCR cassettes to be compatible with vcrpy 8.x by running the full test suite and re-recording any failing cassettes; specifically, after bumping the dependency string "vcrpy>=8.0.0,<9" in pyproject.toml, run tests locally/CI, recreate VCR cassettes using the project's test helper (the cassette recording utility used in your repo), and commit the new cassette files, while also ensuring CI/Python matrix uses Python 3.10+ and that any httpx/urllib3-related fixtures or adapters in your test helpers are updated to the vcrpy 8.x API.

ellipsis-dev · 2026-01-17T16:51:36Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@packages/sample-app/pyproject.toml`:
- Line 17: Remove the two unused dependencies "llama-index-embeddings-openai"
and "llama-index-llms-huggingface" from the dependency list in pyproject.toml;
locate the entries by their package names and delete those lines, then ensure
the surrounding lines (commas and list formatting) remain valid so the TOML
dependency array parses correctly and the remaining used packages (llama-index,
llama-index-embeddings-huggingface, llama-index-llms-huggingface-api,
llama-index-vector-stores-chroma, llama-index-llms-openai) are unaffected.

coderabbitai · 2026-01-17T16:51:59Z

packages/sample-app/pyproject.toml

  "requests>=2.32.3,<3",
  "pinecone-client>=3.2.2,<4",
-  "llama-index>=0.12.52,<0.13.0",
+  "llama-index>=0.14.12,<0.15.0",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Read the pyproject.toml to see the actual dependency versions cat -n packages/sample-app/pyproject.toml | sed -n '10,50p'

Repository: traceloop/openllmetry

Length of output: 1685

🏁 Script executed:

# Check for test files in packages/sample-app fd -type f -name "*.py" packages/sample-app | grep -i test

Repository: traceloop/openllmetry

Length of output: 235

🏁 Script executed:

# Get the full context of the dependencies section cat -n packages/sample-app/pyproject.toml | sed -n '14,45p'

Repository: traceloop/openllmetry

Length of output: 1393

🏁 Script executed:

# Check for test files in packages/sample-app directory find packages/sample-app -name "*.py" | grep -i test

Repository: traceloop/openllmetry

Length of output: 186

🏁 Script executed:

# List all Python files in packages/sample-app ls -la packages/sample-app/ | head -20

Repository: traceloop/openllmetry

Length of output: 812

🏁 Script executed:

# Check if there's a tests directory find packages/sample-app -type d -name "test*"

Repository: traceloop/openllmetry

Length of output: 90

🏁 Script executed:

# Check the test files cat packages/sample-app/tests/test_placeholder.py

Repository: traceloop/openllmetry

Length of output: 97

🏁 Script executed:

# Check conftest.py cat packages/sample-app/tests/conftest.py

Repository: traceloop/openllmetry

Length of output: 124

🏁 Script executed:

# Now let's check the sample_app directory to understand what uses llama-index ls -la packages/sample-app/sample_app/

Repository: traceloop/openllmetry

Length of output: 5992

🏁 Script executed:

# Search for llama-index usage in the sample app rg "llama.index|from llama_index" packages/sample-app/ --type py

Repository: traceloop/openllmetry

Length of output: 1498

🏁 Script executed:

# Search for imports of llama_index.llms.huggingface (non-api version) rg "from llama_index.llms.huggingface[^_]|import.*huggingface[^_]" packages/sample-app/ --type py

Repository: traceloop/openllmetry

Length of output: 47

🏁 Script executed:

# Check if any package in the repo imports from llama_index.llms.huggingface rg "from llama_index.llms.huggingface[^_]|import.*HuggingFaceEmbedding|import.*HuggingFace" packages/ --type py

Repository: traceloop/openllmetry

Length of output: 464

🏁 Script executed:

# Get the full pyproject.toml to see the actual current versions before the change git log -1 --format="%H" packages/sample-app/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 105

🏁 Script executed:

# Check git diff to see what was actually changed git diff HEAD~1 HEAD packages/sample-app/pyproject.toml | head -80

Repository: traceloop/openllmetry

Length of output: 251

🏁 Script executed:

# Show what changed in the current commit git show --stat packages/sample-app/pyproject.toml | head -50

Repository: traceloop/openllmetry

Length of output: 333

🏁 Script executed:

# Check the actual diff for the pyproject.toml file git show packages/sample-app/pyproject.toml | head -100

Repository: traceloop/openllmetry

Length of output: 4023

🏁 Script executed:

# Look at the entire dependencies section to verify what's in the current state cat -n packages/sample-app/pyproject.toml | sed -n '13,50p'

Repository: traceloop/openllmetry

Length of output: 1587

🏁 Script executed:

# Check for usage of llama-index-embeddings-openai rg "llama_index.embeddings.openai|OpenAIEmbedding" packages/sample-app/ --type py

Repository: traceloop/openllmetry

Length of output: 307

Remove unused llama-index dependencies on lines 33 and 34.

The version bumps for active dependencies are appropriate. However, llama-index-embeddings-openai (line 33) and llama-index-llms-huggingface (line 34) are unused in the sample app:

Line 33: The chroma_app.py uses ChromaDB's OpenAIEmbeddingFunction directly, not llama-index's embeddings

Line 34: Only llama-index-llms-huggingface-api (line 35) is imported; the base huggingface LLM package is not referenced

Verified dependencies in use:

llama-index (core)

llama-index-embeddings-huggingface

llama-index-llms-huggingface-api

llama-index-vector-stores-chroma

llama-index-llms-openai

Note: Test validation cannot be performed as the only test file is a placeholder (test_placeholder.py contains only pass).

🤖 Prompt for AI Agents

In `@packages/sample-app/pyproject.toml` at line 17, Remove the two unused dependencies "llama-index-embeddings-openai" and "llama-index-llms-huggingface" from the dependency list in pyproject.toml; locate the entries by their package names and delete those lines, then ensure the surrounding lines (commas and list formatting) remain valid so the TOML dependency array parses correctly and the remaining used packages (llama-index, llama-index-embeddings-huggingface, llama-index-llms-huggingface-api, llama-index-vector-stores-chroma, llama-index-llms-openai) are unaffected.

ellipsis-dev · 2026-01-17T17:57:18Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@packages/opentelemetry-instrumentation-langchain/pyproject.toml`:
- Around line 37-47: The dependency version ranges for langchain and langgraph
are too permissive and allow known vulnerable releases; update the constraints
so langchain is at least 1.2.5 (e.g., change "langchain>=1.0.0,<2.0.0" to
"langchain>=1.2.5,<2.0.0") and langgraph is at least 1.0.1 (e.g., change
"langgraph>=1.0.0,<2.0.0" to "langgraph>=1.0.1,<2.0.0"); also review the
anthropic constraint ("anthropic>=0.75.0,<0.76.0") and raise the minimum if a
patched release exists or lock it to a vetted safe version—update the pyproject
dependency entries for langchain, langgraph (and anthropic if needed)
accordingly.

coderabbitai · 2026-01-17T17:57:42Z

packages/opentelemetry-instrumentation-langchain/pyproject.toml

+  "anthropic>=0.75.0,<0.76.0",
  "boto3>=1.35.49,<2",
-  "langchain-anthropic>=0.3.13,<0.4.0",
-  "langchain-aws>=0.2.11,<0.3.0",
-  "langchain-cohere==0.3.1",
-  "langchain-community>=0.3.3,<0.4.0",
-  "langchain-huggingface>=0.1.2,<0.2.0",
-  "langchain-openai>=0.3.1,<0.4.0",
-  "langchain>=0.3.15,<0.4.0",
+  "langchain-anthropic>=1.0.0,<2.0.0",
+  "langchain-aws>=1.0.0,<2.0.0",
+  "langchain-cohere>=0.5.0,<0.6.0",
+  "langchain-community>=0.4.0,<0.5.0",
+  "langchain-huggingface>=1.0.0,<2.0.0",
+  "langchain-openai>=1.0.0,<2.0.0",
+  "langchain>=1.0.0,<2.0.0",
  "langchainhub>=0.1.21,<0.2.0",
-  "langgraph>=0.4,<0.5.0",
+  "langgraph>=1.0.0,<2.0.0",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

langchain 1.0.0 CVE security fixes

💡 Result:

Short answer — yes. Two high-profile vulnerabilities affecting LangChain 1.0.0 are:

CVE-2025-65106 — a prompt/template-injection flaw affecting langchain 1.0.0 through 1.0.6 (patched in 1.0.7). [1]

CVE-2025-68664 — a serialization-injection/secret-extraction flaw affecting langchain >=1.0.0 and <1.2.5 (patched in 1.2.5). (A related JS/TS CVE impacts @langchain/core / langchain.js; those fixes are in the JS releases noted in advisories.) [2][3]

Action: upgrade any Python services using vulnerable versions to at least 1.0.7 (for the template fix) and to 1.2.5 (for the serialization fix) as appropriate, and also update affected LangChain.js packages per the advisories. [1][2][3]

Sources: [1] NVD entry for CVE-2025-65106; [2] LangChain security advisory summaries (serialization fix / CVE-2025-68664); [3] reporting on the JS-related CVE and version ranges.

🌐 Web query:

langgraph 1.0.0 CVE security vulnerabilities fixed

💡 Result:

Short answer: langgraph 1.0.0 is vulnerable to a deserialization RCE (CVE-2025-64439); that issue was fixed by upgrading to langgraph >= 1.0.1. The separate SQLite SQL‑injection advisories (CVE-2025-64104 / CVE-2025-8709) affect the SQLite checkpoint package and were fixed in langgraph-checkpoint-sqlite 2.0.11 (not by 1.0.0). Sources: Snyk (deserialization fix to 1.0.1) [1], vulnerability writeup (CVE-2025-64439) [2], GitHub advisory / CVE for the SQLite SQL injection (fixed in 2.0.11) [3].

References

[1] Snyk — Deserialization of Untrusted Data / langgraph (fix: upgrade to ≥1.0.1).

[2] Wiz / vulnerability database — CVE-2025-64439 (JsonPlusSerializer RCE).

[3] GitHub Advisory / NVD entries — LangGraph SQLite SQL injection (fixed in 2.0.11).

🌐 Web query:

anthropic 0.75.0 CVE security fixes

💡 Result:

Short answer: The anthropic Python SDK v0.75.0 (release 24 Nov 2025) does not list any CVE/security advisories in its release notes — the changelog shows features/bugfixes but no explicit CVE fixes. However, multiple CVEs have been published for Anthropic’s claude-code npm packages (e.g., CVE-2025-54795, CVE-2025-58764, CVE-2025-59536, CVE-2025-65099) which are separate projects and have their own patches. [1][2][3]

Sources:

anthropics/anthropic-sdk-python v0.75.0 release notes (no CVE entries). [1]

Snyk advisories and CVE listings for @anthropic-ai/claude-code (examples: CVE-2025-54795, CVE-2025-58764, CVE-2025-59536, CVE-2025-65099). [2][3]

If you’d like, I can:

list the specific claude-code CVEs and their fixed versions, or

check other Anthropic repos (claude-code, sandbox-runtime) for security advisories and recommended upgrades.

Version constraints allow vulnerable versions of langchain and langgraph.

The specified version ranges (langchain>=1.0.0,<2.0.0 and langgraph>=1.0.0,<2.0.0) permit vulnerable versions:

langchain 1.0.0–1.0.6 contain CVE-2025-65106 (template injection; fixed in 1.0.7)

langchain 1.0.0–1.2.4 contain CVE-2025-68664 (serialization injection; fixed in 1.2.5)

langgraph 1.0.0 contains CVE-2025-64439 (deserialization RCE; fixed in 1.0.1)

Additionally, anthropic v0.75.0 release notes list no CVE fixes. Tighten constraints to exclude known vulnerable versions, or specify minimum versions that include security patches (e.g., langchain>=1.2.5 and langgraph>=1.0.1).

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-langchain/pyproject.toml` around lines 37 - 47, The dependency version ranges for langchain and langgraph are too permissive and allow known vulnerable releases; update the constraints so langchain is at least 1.2.5 (e.g., change "langchain>=1.0.0,<2.0.0" to "langchain>=1.2.5,<2.0.0") and langgraph is at least 1.0.1 (e.g., change "langgraph>=1.0.0,<2.0.0" to "langgraph>=1.0.1,<2.0.0"); also review the anthropic constraint ("anthropic>=0.75.0,<0.76.0") and raise the minimum if a patched release exists or lock it to a vetted safe version—update the pyproject dependency entries for langchain, langgraph (and anthropic if needed) accordingly.

ellipsis-dev · 2026-01-17T18:40:08Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

ellipsis-dev · 2026-01-17T19:05:07Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@packages/opentelemetry-instrumentation-writer/pyproject.toml`:
- Around line 75-76: The pyproject.toml already uses the [tool.uv]
constraint-dependencies entry but ensure the constraint-dependencies array
contains the exact secure minimums by keeping "urllib3>=2.6.3" and
"pyarrow>=18.1.0" in the constraint-dependencies list (the
constraint-dependencies key) so transitive installs via uv are pinned; if not
present, add those two entries to the constraint-dependencies value to enforce
the secure versions referenced (look for the constraint-dependencies line and
the strings urllib3 and pyarrow to locate and update).

packages/opentelemetry-instrumentation-writer/pyproject.toml

ellipsis-dev · 2026-01-18T08:25:50Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@packages/opentelemetry-instrumentation-chromadb/pyproject.toml`:
- Line 12: Update the Python requirement in the three package pyproject.toml
files so they match chromadb: change requires-python in
packages/opentelemetry-semantic-conventions-ai/pyproject.toml,
packages/opentelemetry-instrumentation-qdrant/pyproject.toml, and
packages/opentelemetry-instrumentation-milvus/pyproject.toml to ">=3.10,<4"
(ensure the literal string matches the chromadb line and save each file).

In `@packages/opentelemetry-instrumentation-cohere/pyproject.toml`:
- Line 43: The vcrpy bump to "vcrpy>=8.0.0,<9" requires Python >=3.10 and
urllib3 >=2.0.0; update pyproject.toml to ensure python_requires is >=3.10 (or
adjust CI matrix) and add or align a test dependency constraint for urllib3
(e.g., "urllib3>=2.0.0") so environments meet vcrpy's requirements, then run the
full test suite and re-record any VCR cassettes containing binary HTTP bodies if
tests fail.

In `@packages/opentelemetry-instrumentation-lancedb/pyproject.toml`:
- Line 12: The pyproject.toml entry requires-python = ">=3.10,<4" is a breaking
min-version bump and must be documented in the release notes/CHANGELOG; add a
concise changelog entry under the upcoming release (or Unreleased) stating that
this package (packages/opentelemetry-instrumentation-lancedb) now requires
Python >=3.10 (dropping 3.9), mark it as a breaking change, and include a short
rationale and migration note (e.g., users on 3.9 must upgrade or pin prior
versions); update the CHANGELOG.md or RELEASE_NOTES with this entry and ensure
the entry mirrors the repository-wide note about 29/32 packages moving to
>=3.10.

In `@packages/opentelemetry-instrumentation-mistralai/pyproject.toml`:
- Line 42: The project updated the vcrpy dependency in pyproject.toml
("vcrpy>=8.0.0,<9"), so re-record all VCR cassettes used by the test suite to
ensure they match vcrpy 8.x behavior; run the relevant integration tests that
use VCR, delete or regenerate the existing cassette files, verify recorded HTTP
interactions and sensitive-data filtering are correct, and commit the new
cassette files so tests pass under the updated vcrpy version.

♻️ Duplicate comments (5)

packages/opentelemetry-instrumentation-agno/pyproject.toml (1)

44-44: Re-record VCR cassettes with vcrpy 8 and scrub secrets.

The major vcrpy bump can change cassette formats and matching; please re-record/verify cassettes and ensure PII/secret filtering remains in place. Based on learnings, ensure cassettes are regenerated and scrubbed.

packages/opentelemetry-instrumentation-cohere/pyproject.toml (1)

12-12: CI test matrix gap for Python 3.10 minimum version.

This concern was already raised in a previous review: the CI workflow tests only Python 3.11, so the new >=3.10 floor remains untested. Please address the prior feedback before merging.
packages/opentelemetry-instrumentation-langchain/pyproject.toml (1)
37-37: Tighten LangChain/LangGraph minimums to patched releases.
These ranges still allow known vulnerable versions (and this PR is a CVE remediation). Please raise minimums to patched versions and confirm whether the new anthropic minimum is security‑motivated or just compatibility.
🔧 Proposed constraint hardening (adjust to latest patched versions)
-  "langchain>=1.0.0,<2.0.0",
+  "langchain>=1.2.5,<2.0.0",
...
-  "langgraph>=1.0.0,<2.0.0",
+  "langgraph>=1.0.1,<2.0.0",
What are the latest patched minimum versions for langchain (CVE-2025-65106, CVE-2025-68664) and langgraph (CVE-2025-64439)?
Also applies to: 39-49
packages/sample-app/pyproject.toml (1)
34-37: Re-verify usage of llama-index-embeddings-openai and llama-index-llms-huggingface.
Line 35-36 were previously identified as unused in the sample app; please confirm they’re now required, otherwise remove to cut dependency surface.

If you want to verify quickly, this read-only scan should help:
#!/bin/bash
# Search for llama-index OpenAI embeddings / HuggingFace LLM usage in sample-app
rg -nP --type=py -C2 'llama_index\.embeddings\.openai|llama_index\.llms\.huggingface' packages/sample-app
rg -nP --type=py -C2 'OpenAIEmbedding|HuggingFace' packages/sample-app
packages/opentelemetry-instrumentation-openai/pyproject.toml (1)

42-42: Re-validate vcrpy 8.x cassette compatibility.
Run the relevant tests and re-record any failing cassettes due to vcrpy’s major changes.

🧹 Nitpick comments (2)

packages/sample-app/pyproject.toml (1)

29-31: Verify LangChain 1.x migration and necessity of langchain-classic.
Line 29-31 move to LangChain 1.x; please confirm sample-app imports are aligned with the new module paths and that langchain-classic is actually needed (drop if unused).
packages/opentelemetry-instrumentation-lancedb/pyproject.toml (1)
37-37: Consider adding an upper bound for lancedb test dependency.

The upper bound was removed (previously <0.10.0). While this is a test-only dependency, adding an upper bound like <0.27 or <1 would improve reproducibility and prevent unexpected CI failures from future breaking releases.
Suggested change
-  "lancedb>=0.26.0",
+  "lancedb>=0.26.0,<1",

coderabbitai · 2026-01-18T08:33:50Z

packages/opentelemetry-instrumentation-chromadb/pyproject.toml

 license = "Apache-2.0"
 readme = "README.md"
-requires-python = ">=3.9,<4"
+requires-python = ">=3.10,<4"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for requires-python settings across all pyproject.toml files echo "=== requires-python settings in all pyproject.toml files ===" rg -n --glob 'pyproject.toml' 'requires-python' echo -e "\n=== Python 3.9 references in .github directory (CI configs) ===" rg -n '3\.9' .github 2>/dev/null || echo "No 3.9 references found in .github" echo -e "\n=== Python 3.9 references in entire repo (limited to relevant files) ===" rg -n '3\.9' --glob '*.toml' --glob '*.yml' --glob '*.yaml' --glob '*.py' . | head -50

Repository: traceloop/openllmetry

Length of output: 50379

Address Python version inconsistency across packages.

The chromadb change to requires-python = ">=3.10,<4" is correct, but three other packages still allow Python 3.9:

packages/opentelemetry-semantic-conventions-ai/pyproject.toml

packages/opentelemetry-instrumentation-qdrant/pyproject.toml

packages/opentelemetry-instrumentation-milvus/pyproject.toml

Align these to >=3.10,<4" for consistency across the repository. CI configuration has been properly updated.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-chromadb/pyproject.toml` at line 12, Update the Python requirement in the three package pyproject.toml files so they match chromadb: change requires-python in packages/opentelemetry-semantic-conventions-ai/pyproject.toml, packages/opentelemetry-instrumentation-qdrant/pyproject.toml, and packages/opentelemetry-instrumentation-milvus/pyproject.toml to ">=3.10,<4" (ensure the literal string matches the chromadb line and save each file).

coderabbitai · 2026-01-18T08:33:51Z

packages/opentelemetry-instrumentation-cohere/pyproject.toml

  "pytest-sugar==1.0.0",
  "pytest>=8.2.2,<9",
-  "vcrpy>=6.0.1,<7",
+  "vcrpy>=8.0.0,<9",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Verify vcrpy 8.0.0 exists on PyPI and check latest version echo "=== Checking vcrpy versions on PyPI ===" curl -s https://pypi.org/pypi/vcrpy/json | jq -r '.releases | keys | .[]' | sort -V | tail -15 echo -e "\n=== Latest vcrpy version ===" curl -s https://pypi.org/pypi/vcrpy/json | jq -r '.info.version' echo -e "\n=== vcrpy 8.0.0 release info ===" curl -s https://pypi.org/pypi/vcrpy/8.0.0/json | jq -r '.info | {version, release_date: .release_url}'

Repository: traceloop/openllmetry

Length of output: 348

🌐 Web query:

vcrpy 8.0.0 breaking changes cassette format

💡 Result:

Short answer: vcrpy 8.0.0 did not introduce a new cassette format or a breaking cassette-format change. The cassette format has been versioned as "1" since the 1.x series; older 0.x cassettes remain incompatible and must be re-recorded or migrated, but no further cassette-format change was made in 8.0.0. [1][2]

Sources:

serialize implementation showing cassette format version = 1 and the old 0.x → 1.x migration warning. [1]

vcrpy 8.0.0 release notes (breaking changes list — drops Python 3.9 and older urllib3 support; no cassette-format change). [2]

🌐 Web query:

vcrpy 6.x to 8.x migration guide compatibility

💡 Result:

Brief migration checklist (vcrpy 6.x → 8.x)

Python: ensure your environment meets v8’s Python support (v8 drops older Pythons — verify target version per v8 release). [1][2]

urllib3: v8 removes support for urllib3 < 2 — require urllib3 >= 2.0.0. Update your pinned dependency. [2][3]

HTTP libraries:

httpx/httpcore: v8 rewrote httpx support to patch httpcore; exercise any tests that use custom httpx transports and watch for Response-reading issues. Run your httpx tests after upgrade. [2][3]

Confirm other adapters you use (requests, aiohttp, urllib3, tornado, boto3, etc.) remain supported in v8. [2]

Cassettes: some earlier releases (notably around v6) changed how binary bodies were saved — you will likely need to re-record or migrate old cassette files (backup then re-record). Use the cassette-migration tooling or delete & re-record where feasible. [1][2]

Removed/changed integrations:

v6 already dropped legacy boto (keeps boto3); review any boto (non-boto3) usage. [1]

If you maintain custom serializers/deserializers or matcher extensions, verify they still conform to the current serializer API. [2]

New features / options: v8 added options like drop_unused_requests for cassette cleanup — consider using them to tidy recordings. [3]

Test plan: run full test suite with v8, paying special attention to HTTP-heavy tests, proxy handling, redirects and any custom transports. Back up cassettes before testing.

Key sources:

vcrpy 6.0.0 changelog (notes about binary cassette changes, boto drop). [1]

vcrpy docs (installation / compatibility for v8, supported libraries). [2]

v8.0.0 release notes (breaking changes: urllib3 support, Python support, httpx/httpcore rewrite, new options). [3]

If you want, I can:

produce a concrete pip/requirements diff for your project,

search your repo (or a sample) for things likely to break (custom serializers, boto usage, httpx transports),
or

draft a short automated test script to validate cassettes after upgrade. Which would you like?

vcrpy major version bump to 8.0.0 is valid, but verify Python and urllib3 compatibility.

vcrpy 8.0.0 exists on PyPI and doesn't introduce cassette format breaking changes. However, v8 requires Python 3.10+ and urllib3 ≥ 2.0.0, which may affect your test environment. While cassette format stability is assured, run the test suite after this upgrade to ensure all VCR recordings remain compatible—cassettes recorded under v6 may need re-recording if they contain binary HTTP bodies due to earlier v6 handling changes.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-cohere/pyproject.toml` at line 43, The vcrpy bump to "vcrpy>=8.0.0,<9" requires Python >=3.10 and urllib3 >=2.0.0; update pyproject.toml to ensure python_requires is >=3.10 (or adjust CI matrix) and add or align a test dependency constraint for urllib3 (e.g., "urllib3>=2.0.0") so environments meet vcrpy's requirements, then run the full test suite and re-record any VCR cassettes containing binary HTTP bodies if tests fail.

coderabbitai · 2026-01-18T08:33:51Z

packages/opentelemetry-instrumentation-lancedb/pyproject.toml

 license = "Apache-2.0"
 readme = "README.md"
-requires-python = ">=3.9,<4"
+requires-python = ">=3.10,<4"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check if Python version requirement is consistent across all packages in this repo # Find all pyproject.toml files and check their requires-python values echo "=== Python version requirements across packages ===" fd -t f 'pyproject.toml' packages/ --exec grep -H 'requires-python' {} \; | sort -u # Check for any changelog or release notes mentioning Python version changes echo "" echo "=== Checking for changelog/release notes files ===" fd -t f $ -name 'CHANGELOG*' -o -name 'RELEASE*' -o -name 'HISTORY*' -o -name 'NEWS*' $ | head -20 # Also check if there are git-related docs echo "" echo "=== Checking root directory for documentation ===" ls -la | grep -E '\.(md|rst|txt)$'

Repository: traceloop/openllmetry

Length of output: 3995

🏁 Script executed:

#!/bin/bash # Check CHANGELOG for Python version bump documentation echo "=== Recent CHANGELOG entries mentioning Python ===" head -100 CHANGELOG.md | grep -A 5 -B 5 -i "python" echo "" echo "=== Full lancedb pyproject.toml ===" cat -n packages/opentelemetry-instrumentation-lancedb/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

#!/bin/bash # Read the lancedb pyproject.toml file echo "=== lancedb pyproject.toml content ===" cat packages/opentelemetry-instrumentation-lancedb/pyproject.toml echo "" echo "=== Searching CHANGELOG for 'Python 3.10' mention ===" grep -n "Python 3.10\|python.*3\.10" CHANGELOG.md | head -20

Repository: traceloop/openllmetry

Length of output: 1905

Ensure the Python 3.10 requirement change is documented in release notes.

The Python 3.10 minimum version bump is intentional and consistent with the ecosystem—29 of 32 packages in this repository now require Python >=3.10. However, this breaking change should be explicitly documented in the CHANGELOG or release notes to inform users dropping support for Python 3.9.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-lancedb/pyproject.toml` at line 12, The pyproject.toml entry requires-python = ">=3.10,<4" is a breaking min-version bump and must be documented in the release notes/CHANGELOG; add a concise changelog entry under the upcoming release (or Unreleased) stating that this package (packages/opentelemetry-instrumentation-lancedb) now requires Python >=3.10 (dropping 3.9), mark it as a breaking change, and include a short rationale and migration note (e.g., users on 3.9 must upgrade or pin prior versions); update the CHANGELOG.md or RELEASE_NOTES with this entry and ensure the entry mirrors the repository-wide note about 29/32 packages moving to >=3.10.

coderabbitai · 2026-01-18T08:33:51Z

packages/opentelemetry-instrumentation-mistralai/pyproject.toml

  "pytest-sugar==1.0.0",
  "pytest>=8.2.2,<9",
-  "vcrpy>=6.0.1,<7",
+  "vcrpy>=8.0.0,<9",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify vcrpy 8.0.0 exists on PyPI and check available versions curl -s https://pypi.org/pypi/vcrpy/json | jq -r '.releases | keys[]' | sort -V | tail -20

Repository: traceloop/openllmetry

Length of output: 184

🌐 Web query:

vcrpy 8.0.0 release notes breaking changes changelog

💡 Result:

Breaking changes in vcrpy 8.0.0

Drop support for Python 3.9. [1]

Drop support for urllib3 < 2 (addresses CVE warnings from urllib3 1.x). [1][2]

See the full release notes/changelog on the GitHub release page for v8.0.0. [1]

Sources:
[1] vcrpy v8.0.0 release notes — GitHub.
[2] vcrpy v8.0.0 summary — NewReleases.

🏁 Script executed:

cat packages/opentelemetry-instrumentation-mistralai/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 1728

VCR cassettes should be re-recorded after this vcrpy upgrade.

vcrpy 8.0.0 exists and is compatible with this package (Python >=3.10). However, upgrading from vcrpy 6.x to 8.x may require re-recording test cassettes to ensure API interactions are properly captured and prevent test failures.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-mistralai/pyproject.toml` at line 42, The project updated the vcrpy dependency in pyproject.toml ("vcrpy>=8.0.0,<9"), so re-record all VCR cassettes used by the test suite to ensure they match vcrpy 8.x behavior; run the relevant integration tests that use VCR, delete or regenerate the existing cassette files, verify recorded HTTP interactions and sensitive-data filtering are correct, and commit the new cassette files so tests pass under the updated vcrpy version.

ellipsis-dev · 2026-01-18T08:36:33Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py`:
- Around line 39-50: The test test_agents_and_tools sets llm =
OpenAI(model="gpt-4o-mini") but existing VCR cassettes still contain responses
for gpt-3.5-turbo-0613; re-record the VCR cassette(s) for this test so they
reflect the new model: run the test suite (or the specific pytest invocation
with VCR enabled) to regenerate test_agents_and_tools.yaml, verify the new
cassette contains gpt-4o-mini entries, and commit the updated cassette; ensure
conftest.py VCR filters remain intact so no secrets are captured.
- Line 135: The VCR cassette for test_agent_with_multiple_tools was recorded
against the old model and must be re-recorded to match the updated Cohere model;
update the test by re-running the test that defines llm =
Cohere(model="command-a-03-2025") (and any helper that initializes the agent) so
VCR captures interactions for "command-a-03-2025" and commit the new cassette;
ensure VCR filtering remains configured to strip authorization/api-key headers
and verify the cassette filename referenced by the test is replaced with the
newly recorded file.

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (2)
86-100: Hardcoded token counts are brittle and will break if cassettes are re-recorded.

The exact token count assertions (e.g., output_tokens == 44, input_tokens == 538) are tightly coupled to the current VCR cassette recordings. If the cassettes need to be re-recorded due to API changes or model updates, these assertions will fail.

Consider using > 0 checks (similar to the Cohere test below) or range-based assertions to make the tests more resilient while still validating that token usage is captured.
♻️ Suggested approach
-    assert llm_span_1.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 44
-    assert llm_span_1.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 538
-    assert llm_span_1.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 582
+    assert llm_span_1.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] > 0
+    assert llm_span_1.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] > 0
+    assert llm_span_1.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] > 0

     # Verify second LLM span
     assert len(llm_spans) >= 2, "Expected at least 2 LLM spans"
     llm_span_2 = llm_spans[1]
     assert llm_span_2.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat"
     assert llm_span_2.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] == "gpt-4o-mini"
     assert GenAIAttributes.GEN_AI_RESPONSE_MODEL in llm_span_2.attributes
     assert f"{GenAIAttributes.GEN_AI_PROMPT}.0.content" in llm_span_2.attributes
     assert f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" in llm_span_2.attributes
-    assert llm_span_2.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 30
-    assert llm_span_2.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 594
-    assert llm_span_2.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 624
+    assert llm_span_2.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] > 0
+    assert llm_span_2.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] > 0
+    assert llm_span_2.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] > 0
189-206: Consider consolidating attribute key checks.

The dual-key fallback pattern (GenAIAttributes.GEN_AI_REQUEST_MODEL in ... or "gen_ai.request.model" in ...) is repeated multiple times. If both formats are equivalent (they should be, since GenAIAttributes.GEN_AI_REQUEST_MODEL should equal "gen_ai.request.model"), this redundancy can be simplified.

If they differ due to version compatibility, consider defining a helper function to reduce duplication:
♻️ Optional helper approach
def has_attr(span, *keys):
    """Check if span has any of the provided attribute keys."""
    return any(k in span.attributes for k in keys)

def get_attr(span, *keys):
    """Get first matching attribute value from span."""
    for k in keys:
        if k in span.attributes:
            return span.attributes[k]
    return None

coderabbitai · 2026-01-18T09:01:53Z

packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py

 @pytest.mark.vcr
-def test_agents_and_tools(instrument_legacy, span_exporter, log_exporter):
+@pytest.mark.asyncio
+async def test_agents_and_tools(instrument_legacy, span_exporter, log_exporter):
    def multiply(a: int, b: int) -> int:
        """Multiply two integers and returns the result integer"""
        return a * b

    multiply_tool = FunctionTool.from_defaults(fn=multiply)
-    llm = OpenAI(model="gpt-3.5-turbo-0613")
-    agent = ReActAgent.from_tools([multiply_tool], llm=llm, verbose=True)
+    llm = OpenAI(model="gpt-4o-mini")
+    agent = ReActAgent(tools=[multiply_tool], llm=llm, verbose=True, streaming=False)

-    agent.chat("What is 2 times 3?")
+    await agent.run("What is 2 times 3?")


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the cassettes directory for this test find . -type d -name "cassettes" | head -5

Repository: traceloop/openllmetry

Length of output: 406

🏁 Script executed:

#!/bin/bash # List cassette files related to test_agents find . -path "*/cassettes/*test_agents*" -type f 2>/dev/null | head -10

Repository: traceloop/openllmetry

Length of output: 1346

🏁 Script executed:

#!/bin/bash # Get the actual test file to verify it matches the snippet cat -n packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py | sed -n '39,50p'

Repository: traceloop/openllmetry

Length of output: 624

🏁 Script executed:

#!/bin/bash # Check directory structure around the test file ls -la packages/opentelemetry-instrumentation-llamaindex/tests/ | head -20

Repository: traceloop/openllmetry

Length of output: 721

🏁 Script executed:

#!/bin/bash # Check the model name in the cassettes grep -i "gpt-4o-mini\|gpt-3.5-turbo" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools_with_events_with_content.yaml | head -5

Repository: traceloop/openllmetry

Length of output: 251

🏁 Script executed:

#!/bin/bash # Check for API keys, tokens, or secrets in cassettes grep -i -E "(api.?key|secret|password|bearer|authorization|sk-[a-zA-Z0-9])" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools_with_events_with_content.yaml | head -10

Repository: traceloop/openllmetry

Length of output: 47

🏁 Script executed:

#!/bin/bash # Check the file size and structure of the cassettes wc -l ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools_*.yaml

Repository: traceloop/openllmetry

Length of output: 355

🏁 Script executed:

#!/antml:max_thinking Let me also check if there's a basic test_agents_and_tools.yaml file without the _with_events suffix </thinking> bash # Check if there's a plain test_agents_and_tools.yaml cassette ls -la ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/ | grep "test_agents_and_tools"

Repository: traceloop/openllmetry

Length of output: 407

🏁 Script executed:

#!/bin/bash # Check the main test_agents_and_tools.yaml cassette content head -50 ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml

Repository: traceloop/openllmetry

Length of output: 2923

🏁 Script executed:

#!/bin/bash # Count occurrences of each model in the main cassette echo "=== Model references in main cassette ===" grep -c "gpt-3.5-turbo-0613" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml || echo "gpt-3.5-turbo-0613: 0 occurrences" grep -c "gpt-4o-mini" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml || echo "gpt-4o-mini: 0 occurrences"

Repository: traceloop/openllmetry

Length of output: 110

🏁 Script executed:

#!/bin/bash # Check if cassettes scrub sensitive headers properly head -100 ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml | grep -i "authorization\|api-key\|x-api-key" || echo "No obvious unscubbed auth headers found in first 100 lines"

Repository: traceloop/openllmetry

Length of output: 123

🏁 Script executed:

#!/bin/bash # Check conftest.py for VCR configuration and filters cat -n packages/opentelemetry-instrumentation-llamaindex/tests/conftest.py | grep -A 20 -B 5 "vcr\|filter"

Repository: traceloop/openllmetry

Length of output: 1013

🏁 Script executed:

#!/bin/bash # Show the exact lines with the old model in cassettes grep -n "gpt-3.5-turbo-0613" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml

Repository: traceloop/openllmetry

Length of output: 348

🏁 Script executed:

#!/bin/bash # Verify there are zero gpt-4o-mini references in cassettes grep -n "gpt-4o-mini" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agents_and_tools.yaml | wc -l

Repository: traceloop/openllmetry

Length of output: 66

VCR cassettes must be re-recorded before merging. The test code uses model="gpt-4o-mini" (line 47), but the cassettes still contain gpt-3.5-turbo-0613 at lines 30, 154, and 275 of test_agents_and_tools.yaml. Per coding guidelines, cassettes must be re-recorded when API interactions change to ensure test accuracy. The VCR filter configuration in conftest.py properly scrubs authorization headers, so no secrets are exposed in cassettes.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py` around lines 39 - 50, The test test_agents_and_tools sets llm = OpenAI(model="gpt-4o-mini") but existing VCR cassettes still contain responses for gpt-3.5-turbo-0613; re-record the VCR cassette(s) for this test so they reflect the new model: run the test suite (or the specific pytest invocation with VCR enabled) to regenerate test_agents_and_tools.yaml, verify the new cassette contains gpt-4o-mini entries, and commit the updated cassette; ensure conftest.py VCR filters remain intact so no secrets are captured.

coderabbitai · 2026-01-18T09:01:53Z

packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py


    sql_database = make_sql_table()
-    llm = Cohere()
+    llm = Cohere(model="command-a-03-2025")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Cohere command model API 2025 2026 available models list

💡 Result:

Latest available "Command" models (names & notable variants) — sourced from Cohere docs:

command-a-03-2025. [1]

command-a-vision-07-2025 (multimodal / vision). [1][3]

command-a-reasoning-08-2025 (reasoning / agentic). [2]

command-a-translate-08-2025 (translation). [4]

command-r7b-12-2024. [1]

command-r-08-2024 and command-r-plus-08-2024 (listed as recommended replacements for older command models). [5][1]

Note: Cohere has deprecated older command models (e.g., command-r-03-2024, command-r-plus-04-2024, command-light, command). See the deprecation notice for details. [5]

Sources: Cohere model overview and changelog/release notes. [1][2][3][4][5]

🏁 Script executed:

# Check the test_agents.py file for VCR cassette configuration and the full context around line 135 cd packages/opentelemetry-instrumentation-llamaindex || true head -200 tests/test_agents.py | tail -100

Repository: traceloop/openllmetry

Length of output: 4686

🏁 Script executed:

# Search for VCR cassettes or fixtures related to this test find . -name "*cassette*" -o -name "*vcr*" 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 1423

🏁 Script executed:

# Check imports in the test file to see how VCR is configured grep -n "vcr\|cassette\|pytest" tests/test_agents.py | head -30

Repository: traceloop/openllmetry

Length of output: 118

🏁 Script executed:

# Find the test_agents.py file find . -name "test_agents.py" -type f

Repository: traceloop/openllmetry

Length of output: 209

🏁 Script executed:

# List the cassettes directory for llamaindex ls -la ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/ 2>/dev/null | head -50

Repository: traceloop/openllmetry

Length of output: 506

🏁 Script executed:

# Search for test_agent_with_multiple_tools in the llamaindex tests find ./packages/opentelemetry-instrumentation-llamaindex/tests -name "*.py" -exec grep -l "test_agent_with_multiple_tools\|command-a-03-2025" {} \;

Repository: traceloop/openllmetry

Length of output: 137

🏁 Script executed:

# Check the cassettes for test_agents ls -la ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/ | head -30

Repository: traceloop/openllmetry

Length of output: 1073

🏁 Script executed:

# Read the test_agents.py file around line 135 sed -n '120,150p' ./packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py

Repository: traceloop/openllmetry

Length of output: 1071

🏁 Script executed:

# Check the beginning of test_agents.py for VCR and other configurations head -50 ./packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py

Repository: traceloop/openllmetry

Length of output: 1961

🏁 Script executed:

# Check if there's a cassette for test_agent_with_multiple_tools find ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents -name "*multiple*"

Repository: traceloop/openllmetry

Length of output: 465

🏁 Script executed:

# Check the cassettes for any potential secrets or API keys head -100 ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml

Repository: traceloop/openllmetry

Length of output: 6959

🏁 Script executed:

# Search for API key patterns in the cassette grep -i "api.key\|authorization\|bearer\|secret" ./packages/opentelemetry-instrumentation-llamaindex/tests/cassettes/test_agents/test_agent_with_multiple_tools.yaml | head -20

Repository: traceloop/openllmetry

Length of output: 47

🏁 Script executed:

# Check the test file to see how VCR is configured for filtering grep -A 20 -B 5 "pytest.ini\|conftest\|vcr_config\|filter" ./packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py | head -40

Repository: traceloop/openllmetry

Length of output: 47

🏁 Script executed:

# Check if there's a conftest.py that configures VCR filtering find ./packages/opentelemetry-instrumentation-llamaindex/tests -name "conftest.py" -exec cat {} \;

Repository: traceloop/openllmetry

Length of output: 4570

Re-record the VCR cassette for test_agent_with_multiple_tools.

The model command-a-03-2025 is valid, but the cassette contains recorded interactions using the older command-r model. The cassette must be re-recorded to match the updated model specification. VCR filtering is properly configured to exclude sensitive headers (authorization, api-key), so secrets are protected.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py` at line 135, The VCR cassette for test_agent_with_multiple_tools was recorded against the old model and must be re-recorded to match the updated Cohere model; update the test by re-running the test that defines llm = Cohere(model="command-a-03-2025") (and any helper that initializes the agent) so VCR captures interactions for "command-a-03-2025" and commit the new cassette; ensure VCR filtering remains configured to strip authorization/api-key headers and verify the cassette filename referenced by the test is replaced with the newly recorded file.

ellipsis-dev · 2026-01-18T09:07:31Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

ellipsis-dev · 2026-01-18T09:19:22Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml`:
- Around line 75-76: Add a filelock constraint to the
[tool.uv].constraint-dependencies entry by including "filelock>=3.20.3" and
remove "pip>=25.3" from that list (or move the pip constraint out of
constraint-dependencies into a dev/build tooling configuration if it is truly
needed for build tooling); update the constraint-dependencies array that
currently contains ["urllib3>=2.6.3", "pip>=25.3"] to instead include
"filelock>=3.20.3" and only keep runtime transitive deps like "urllib3>=2.6.3".

In `@packages/opentelemetry-instrumentation-bedrock/pyproject.toml`:
- Line 41: The dependency bump to "vcrpy>=8.0.0,<9" in pyproject.toml requires
regenerating test cassettes because vcrpy v8 rejects legacy v6/v7 cassettes; run
the migration tool (python -m vcr.migration) or delete and re-record cassettes
in the package opentelemetry-instrumentation-bedrock tests directory, then
re-run the test suite to regenerate them; after regeneration ensure you apply
the project's VCR filters/scrubbers to remove any secrets/PII from the new
cassettes before committing.

In `@packages/opentelemetry-instrumentation-langchain/pyproject.toml`:
- Around line 91-92: Update the constraint-dependencies entry: keep
urllib3>=2.6.3 and pip>=25.3 as-is, correct the CVE attribution for
langgraph-checkpoint (it addresses CVE-2025-64439) and add
langgraph-checkpoint-sqlite>=2.0.11 to address the SQLite injection CVEs
(CVE-2025-64104 and CVE-2025-8709); modify the constraint-dependencies list (the
symbol to change is constraint-dependencies) to include
"langgraph-checkpoint-sqlite>=2.0.11" alongside the existing
"langgraph-checkpoint>=4.0.0", and ensure any documentation or commands
referencing package management use the "uv run <command>" pattern per
guidelines.

In `@packages/opentelemetry-instrumentation-openai-agents/pyproject.toml`:
- Line 11: Update the minimum Python version in this package by setting
requires-python = ">=3.10,<4" in pyproject.toml (the requires-python entry) and
ensure consistency across the package: update any Python classifiers in
pyproject.toml or setup metadata, adjust CI job matrix entries that reference
Python 3.9 to start at 3.10, and verify that any runtime checks or packaging
metadata (e.g., in tool.poetry or project.urls) do not still enumerate 3.9
support so the package metadata and CI reflect the new >=3.10 requirement.

In `@packages/opentelemetry-instrumentation-pinecone/pyproject.toml`:
- Line 12: The release workflow currently tests only Python 3.11 while
pyproject.toml declares requires-python = ">=3.10,<4"; update
.github/workflows/release.yml to include Python 3.10 in the python-version
matrix for all release jobs (the Python matrix entries around the job
definitions at lines referenced in the review) so the declared minimum is
validated, and add a classifiers section to the project's pyproject.toml (add
the classifiers key under the project metadata) including at minimum
"Programming Language :: Python :: 3.10" and "Programming Language :: Python ::
3.11" to make supported versions explicit in package metadata.
- Line 44: Update the pytest-recording constraint in pyproject.toml to a version
that declares vcrpy 8 support (e.g., change pytest-recording to ">=0.13.2" or
preferably ">=0.13.4"), then re-record the VCR cassettes used by the Pinecone
tests (notably test_pinecone_retrieval.yaml) using vcrpy 8 to accommodate
httpx/httpcore and proxy/redirect behavior changes; after re-recording, verify
cassettes have sensitive data scrubbed (ensure filtered Authorization header and
other secrets are properly filtered) before committing.

In `@packages/opentelemetry-instrumentation-together/pyproject.toml`:
- Line 43: Upgrade to vcrpy 8.0.0 requires re-recording and scrubbing test
cassettes; re-run the tests that use pytest-recording to re-record the cassettes
under tests/cassettes/test_chat/ and tests/cassettes/test_completion/ (ensure
record_mode is set to 'all' or appropriate in your pytest-recording/conftest.py
fixture), then sanitize those new cassettes using pytest-recording filters (or
the VCR.py before_record/after_record hooks configured in conftest.py) to remove
keys, tokens, PII and any headers (e.g., Authorization, Cookie, API keys) before
committing; verify playback succeeds with urllib3>=2 and Python 3.10+.

In `@packages/opentelemetry-instrumentation-writer/pyproject.toml`:
- Around line 75-76: The constraint-dependencies entry in this package's
[tool.uv] section is ineffective because uv only reads constraints from the
workspace root; move the line constraint-dependencies = ["urllib3>=2.6.3",
"pyarrow>=18.1.0", "pip>=25.3"] out of
packages/opentelemetry-instrumentation-writer/pyproject.toml and add it to the
root pyproject.toml under its [tool.uv] section so the uv resolver will honor
these constraints.

♻️ Duplicate comments (6)

packages/sample-app/pyproject.toml (1)
35-36: Remove unused llama-index packages to reduce attack surface.

These two dependencies still look unused in the sample app. If no imports exist, remove them to avoid unnecessary installs and CVE exposure.
🧹 Proposed cleanup
-  "llama-index-embeddings-openai>=0.5.0,<0.6.0",
-  "llama-index-llms-huggingface>=0.6.0,<0.7.0",
Run this to confirm there are no usages:
#!/bin/bash
rg -n "llama_index\.embeddings\.openai|OpenAIEmbedding" 
rg -nP "llama_index\.llms\.huggingface(?!_api)"
packages/opentelemetry-instrumentation-openai-agents/pyproject.toml (1)

74-76: constraint-dependencies not enforced in per-package files.

As noted in a previous review, uv's constraint-dependencies is only read from the workspace root pyproject.toml, not from individual package files. This configuration will have no effect here.

packages/opentelemetry-instrumentation-mistralai/pyproject.toml (1)

42-42: vcrpy upgrade to 8.0.0 for CVE remediation.

The vcrpy version bump addresses CVE warnings from urllib3 1.x by requiring urllib3 >= 2.

The need to re-record VCR cassettes after this upgrade was already flagged in a previous review.

packages/opentelemetry-instrumentation-openai/pyproject.toml (1)

12-12: Python 3.10+ and vcrpy 8.x changes are consistent.

The Python version bump to 3.10+ aligns with vcrpy 8.x requirements (which dropped Python 3.9 support). Ensure test cassettes have been re-recorded if any tests fail due to cassette format changes.

Also applies to: 42-42

packages/opentelemetry-instrumentation-langchain/pyproject.toml (1)

37-49: Vulnerable version ranges already flagged in previous review.

The CVE concerns with langchain>=1.0.0 and langgraph>=1.0.0 allowing vulnerable versions have been identified in a prior review comment. Please address that feedback by tightening these constraints to exclude known vulnerable releases.

packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1)

12-12: Dropping Python 3.9 is a breaking change — see previous review comment.

The past review correctly identified that this change is unnecessary for CVE remediation. The filelock CVEs affect the filelock package version, not the Python interpreter version. The suggested fix was to add filelock>=3.20.3 to constraint-dependencies instead.

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-lancedb/pyproject.toml (1)
37-37: Consider adding an upper bound to the lancedb dependency.

The lancedb test dependency was changed from >=0.9.0,<0.10.0 to >=0.26.0 without an upper bound. This significant version jump (0.9.x → 0.26.x) may include breaking API changes, and removing the upper bound could lead to unexpected test failures when new major versions are released.
♻️ Suggested fix
-  "lancedb>=0.26.0",
+  "lancedb>=0.26.0,<0.27.0",

coderabbitai · 2026-01-18T09:23:26Z

packages/opentelemetry-instrumentation-anthropic/pyproject.toml

+[tool.uv]
+constraint-dependencies = ["urllib3>=2.6.3", "pip>=25.3"]


⚠️ Potential issue | 🟠 Major

Missing filelock constraint; pip constraint seems unnecessary.

If the Python 3.10 bump was motivated by filelock CVEs (as mentioned in the PR title for CVE remediation), the filelock>=3.20.3 constraint should be added here — this was the suggested fix from the past review to avoid dropping Python 3.9.

Additionally, pip>=25.3 is an unusual constraint since pip is not typically a transitive runtime dependency. If this is for development/build tooling, it may not need to be in constraint-dependencies.

Suggested fix

[tool.uv] -constraint-dependencies = ["urllib3>=2.6.3", "pip>=25.3"] +constraint-dependencies = ["urllib3>=2.6.3", "filelock>=3.20.3"]

#!/bin/bash # Check if pip is actually a transitive dependency in this package rg -l "pip" packages/opentelemetry-instrumentation-anthropic/ --type py || echo "No pip references found in source" # Check if filelock is used as a transitive dependency rg "filelock" packages/opentelemetry-instrumentation-anthropic/ || echo "No direct filelock references"

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml` around lines 75 - 76, Add a filelock constraint to the [tool.uv].constraint-dependencies entry by including "filelock>=3.20.3" and remove "pip>=25.3" from that list (or move the pip constraint out of constraint-dependencies into a dev/build tooling configuration if it is truly needed for build tooling); update the constraint-dependencies array that currently contains ["urllib3>=2.6.3", "pip>=25.3"] to instead include "filelock>=3.20.3" and only keep runtime transitive deps like "urllib3>=2.6.3".

coderabbitai · 2026-01-18T09:23:27Z

packages/opentelemetry-instrumentation-bedrock/pyproject.toml

  "pytest-sugar==1.0.0",
  "pytest>=8.2.2,<9",
-  "vcrpy>=6.0.1,<7",
+  "vcrpy>=8.0.0,<9",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find test cassettes in the bedrock instrumentation package find packages/opentelemetry-instrumentation-bedrock -type f $ -name "*.yaml" -o -name "*.json" -o -name "*cassette*" $ 2>/dev/null | head -20

Repository: traceloop/openllmetry

Length of output: 2802

🏁 Script executed:

# Also check the overall test structure for bedrock find packages/opentelemetry-instrumentation-bedrock -type d -name "tests" -o -name "test" 2>/dev/null

Repository: traceloop/openllmetry

Length of output: 117

🏁 Script executed:

# List contents of test directory if it exists ls -la packages/opentelemetry-instrumentation-bedrock/tests/ 2>/dev/null || echo "No tests directory"

Repository: traceloop/openllmetry

Length of output: 411

🌐 Web query:

vcrpy 8.0.0 breaking changes migration from v6 v7 cassette format

💡 Result:

Short answer

VCR.py 8 uses the modern "versioned" cassette format (cassette format version = 1) and will reject older un‑versioned/legacy cassettes. If you have cassettes produced by much older VCR.py releases they’re not compatible and must be migrated or re‑recorded. [1][2]

What to do when migrating from v6/v7

Easiest: delete existing cassette files and re-run tests so VCR.py re‑records them. This is the recommended approach if re‑recording is feasible. [1][2]

Migration helper: VCR.py includes a migration script that attempts to convert old cassette files to the new format:
python -m vcr.migration PATH
(PATH = a cassette file or directory). Back up cassettes before running it. [1][2]

Notes and gotchas

Custom serializers: if you implemented a custom serializer/deserializer, update it to the newer API (serializers take dict -> string; deserializers take string -> dict). [1]

v6.x warning: some v6 releases noted binary format issues and recommended re‑creating affected cassettes — if you saw that in v6 you may still need to re‑record even after attempting migration. [3]

If migration fails the safest route is to delete + re‑record. The serialize implementation will raise an error when it detects truly old cassette structures. [2]

References

VCR.py docs — Upgrade / New Cassette Format and serializer notes. [vcrpy docs]. [1]

vcr.serialize source (cassette format version, detection + error message and serialize/deserialize behavior). [2]

vcrpy 6.0.0 changelog (note about possible need to recreate cassettes due to binary format). [3]

Would you like me to run the migration for a specific cassette directory (I can show the exact command and checks) or inspect a sample cassette to see if it needs migration?

🏁 Script executed:

# Inspect a sample cassette to check if it has version header (versioned format) head -20 "packages/opentelemetry-instrumentation-bedrock/tests/metrics/cassettes/test_bedrock_metrics/test_invoke_model_metrics.yaml"

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

# Check another cassette from traces head -20 "packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_anthropic/test_anthropic_2_completion.yaml"

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

# Search for "version:" field in cassettes to see if they're versioned rg "^version:" packages/opentelemetry-instrumentation-bedrock/tests/ --type yaml | head -5

Repository: traceloop/openllmetry

Length of output: 151

vcrpy 8.x upgrade requires cassette regeneration.

This major version bump (6 → 8, skipping v7) changes the cassette format. vcrpy 8 uses the versioned cassette format and rejects legacy cassettes from v6/v7. The existing test cassettes must be regenerated by deleting them and re-running the tests, or by using the migration script (python -m vcr.migration packages/opentelemetry-instrumentation-bedrock/tests/). Ensure regenerated cassettes are scrubbed of any secrets or PII using VCR filters.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-bedrock/pyproject.toml` at line 41, The dependency bump to "vcrpy>=8.0.0,<9" in pyproject.toml requires regenerating test cassettes because vcrpy v8 rejects legacy v6/v7 cassettes; run the migration tool (python -m vcr.migration) or delete and re-record cassettes in the package opentelemetry-instrumentation-bedrock tests directory, then re-run the test suite to regenerate them; after regeneration ensure you apply the project's VCR filters/scrubbers to remove any secrets/PII from the new cassettes before committing.

coderabbitai · 2026-01-18T09:23:27Z

packages/opentelemetry-instrumentation-langchain/pyproject.toml

+[tool.uv]
+constraint-dependencies = ["urllib3>=2.6.3", "langgraph-checkpoint>=4.0.0", "pip>=25.3"]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

urllib3 2.6.3 CVE vulnerabilities security fixes

💡 Result:

Short answer: urllib3 2.6.3 patches a high‑severity decompression‑bomb bypass (CVE‑2026‑21441) that affected the streaming API when following HTTP redirects. The release note also includes a behavioral change for Retry‑After handling and other bug fixes — upgrade to 2.6.3. [1][2][3]

Sources

urllib3 2.6.3 release notes (GitHub). [1]

NVD / CVE‑2026‑21441 details. [2]

Fedora / advisory summary noting the same fix. [3]

🌐 Web query:

langgraph-checkpoint 4.0.0 CVE-2025-64104 CVE-2025-8709 SQLite injection

💡 Result:

Short answer: The two CVEs (CVE-2025-64104 and CVE-2025-8709) describe SQL‑injection vulnerabilities in LangGraph’s SQLite checkpoint/store implementation (improper string concatenation of filter operators). They affect the langgraph-checkpoint-sqlite package (known affected release: 2.0.10) and were fixed in 2.0.11 — so upgrade to >= 2.0.11 or apply the upstream patch. [1][2][3]

References:

NVD entry / summary for CVE-2025-8709. [1]

GitHub Advisory for the SQLite store SQL‑injection (patch and fixed release). [2]

OpenCVE / advisory summary for CVE-2025-64104 (notes fix in 2.0.11). [3]

🌐 Web query:

pip 25.3 CVE security vulnerabilities

💡 Result:

Short answer: pip 25.3 fixes a known tar-extraction symlink vulnerability tracked as CVE-2025-8869 (affects pip versions before 25.3). The flaw could allow a malicious sdist to overwrite files outside the extraction directory; upgrading to pip 25.3 (or using a Python that implements PEP 706) is the recommended remediation. [1][2][3]

Sources:

Advisory summary of the issue and fix. [1]

Ubuntu security notice describing CVE-2025-8869 and mitigations (upgrade pip or use Python with PEP 706). [2]

Debian security tracker showing the fix in pip 25.3+dfsg-1. [3]

🌐 Web query:

langgraph-checkpoint changelog 4.0.0 security

💡 Result:

Summary — langgraph-checkpoint v4.0.0 (security items)

v4.0.0 released on Jan 12, 2026; release notes show bug fixes (no new RCE patch text in the tag itself). [1]

A prior critical RCE in the JsonPlusSerializer (CVE‑2025‑64439) affected langgraph‑checkpoint < 3.0; upgrading to 3.0+ (and thus 4.0.0) removes unsafe deserialization of custom objects in the legacy "json" mode. [2][3]

Security scanners / databases report no known vulnerabilities in 4.0.0. [4][5]

Action: upgrade to langgraph‑checkpoint >= 3.0 (4.0.0 recommended) and, if you must deserialize old “json”-mode payloads, audit/whitelist allowed modules via the serde config. [2][5]

Sources:
[1] GitHub release: langchain-ai/langgraph — langgraph-checkpoint==4.0.0 (Jan 12, 2026).
[2] GitLab advisory / CVE‑2025‑64439 — JsonPlusSerializer RCE; fixed in 3.0+.
[3] LangGraph docs changelog noting requirement of langgraph-checkpoint >3.0 to avoid deserialization vuln.
[4] Safety DB / package page — latest version 4.0.0 with no known vulnerabilities.
[5] Snyk package page — no known security issues in 4.0.0.

Correct the langgraph-checkpoint CVE attribution and add missing SQLite checkpoint constraint.

The constraint-dependencies has an inaccurate CVE attribution:

urllib3>=2.6.3 correctly addresses CVE-2026-21441 (decompression-bomb bypass)

pip>=25.3 correctly addresses CVE-2025-8869 (tar-extraction symlink vulnerability)

langgraph-checkpoint>=4.0.0 does NOT fix CVE-2025-64104 or CVE-2025-8709 — those SQLite injection CVEs affect langgraph-checkpoint-sqlite 2.0.10 and are fixed in langgraph-checkpoint-sqlite 2.0.11. The langgraph-checkpoint 4.0.0 constraint instead fixes CVE-2025-64439 (JsonPlusSerializer RCE in versions < 3.0).

To address the SQLite injection CVEs, add langgraph-checkpoint-sqlite>=2.0.11 to the constraint-dependencies. Note that these constraints only apply when using the uv package manager; pip users will not benefit from them. Per coding guidelines, ensure package management commands use uv run <command>.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-langchain/pyproject.toml` around lines 91 - 92, Update the constraint-dependencies entry: keep urllib3>=2.6.3 and pip>=25.3 as-is, correct the CVE attribution for langgraph-checkpoint (it addresses CVE-2025-64439) and add langgraph-checkpoint-sqlite>=2.0.11 to address the SQLite injection CVEs (CVE-2025-64104 and CVE-2025-8709); modify the constraint-dependencies list (the symbol to change is constraint-dependencies) to include "langgraph-checkpoint-sqlite>=2.0.11" alongside the existing "langgraph-checkpoint>=4.0.0", and ensure any documentation or commands referencing package management use the "uv run <command>" pattern per guidelines.

coderabbitai · 2026-01-18T09:23:27Z

packages/opentelemetry-instrumentation-openai-agents/pyproject.toml

 license = "Apache-2.0"
 readme = "README.md"
-requires-python = ">=3.9,<4"
+requires-python = ">=3.10,<4"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check Python version requirements across all pyproject.toml files in the repo echo "=== Python version requirements across packages ===" fd -t f 'pyproject.toml' | xargs grep -l 'requires-python' | while read f; do echo "--- $f ---" grep 'requires-python' "$f" done

Repository: traceloop/openllmetry

Length of output: 3391

🌐 Web query:

Python 3.9 end-of-life date official announcement

💡 Result:

Official end-of-life date for Python 3.9: October 31, 2025. [1][2]

Sources:

PEP 596 (Python 3.9 release schedule / lifespan). [1]

Python.org release notice for Python 3.9.25 (marks 3.9 reached EOL on 2025-10-31). [2]

Python 3.10 minimum version bump is appropriate.

Dropping Python 3.9 support is well-justified: Python 3.9 reached end-of-life on October 31, 2025, and this change aligns with the broader ecosystem update. However, note that the transition is still in progress—while most packages now require Python 3.10+, three packages still support 3.9:

opentelemetry-instrumentation-milvus

opentelemetry-instrumentation-qdrant

opentelemetry-semantic-conventions-ai

The change in openai-agents is consistent with the majority of the codebase.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-openai-agents/pyproject.toml` at line 11, Update the minimum Python version in this package by setting requires-python = ">=3.10,<4" in pyproject.toml (the requires-python entry) and ensure consistency across the package: update any Python classifiers in pyproject.toml or setup metadata, adjust CI job matrix entries that reference Python 3.9 to start at 3.10, and verify that any runtime checks or packaging metadata (e.g., in tool.poetry or project.urls) do not still enumerate 3.9 support so the package metadata and CI reflect the new >=3.10 requirement.

coderabbitai · 2026-01-18T09:23:27Z

packages/opentelemetry-instrumentation-pinecone/pyproject.toml

 license = "Apache-2.0"
 readme = "README.md"
-requires-python = ">=3.9,<4"
+requires-python = ">=3.10,<4"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check packaging metadata and CI for Python version declarations echo "=== Checking requires-python declarations ===" rg -n "requires-python" --type toml --type cfg echo -e "\n=== Checking python_requires declarations ===" rg -n "python_requires" --type py --type cfg echo -e "\n=== Checking Programming Language classifiers ===" rg -n "Programming Language :: Python" --type toml --type cfg echo -e "\n=== Checking python-version in GitHub workflows ===" rg -n "python-version" .github/workflows/

Repository: traceloop/openllmetry

Length of output: 1158

🏁 Script executed:

#!/bin/bash # Display the specific pinecone pyproject.toml file echo "=== Pinecone package pyproject.toml ===" cat -n packages/opentelemetry-instrumentation-pinecone/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 2479

🏁 Script executed:

#!/bin/bash # Check if there are any runtime Python version checks in the pinecone instrumentation echo "=== Searching for sys.version checks in pinecone package ===" fd . packages/opentelemetry-instrumentation-pinecone -type f $ -name "*.py" $ rg "sys.version" packages/opentelemetry-instrumentation-pinecone

Repository: traceloop/openllmetry

Length of output: 313

🏁 Script executed:

#!/bin/bash # Get context around the CI workflow python-version configurations echo "=== ci.yml context around python-version declarations ===" sed -n '50,85p' .github/workflows/ci.yml echo "" sed -n '135,170p' .github/workflows/ci.yml

Repository: traceloop/openllmetry

Length of output: 2591

🏁 Script executed:

#!/bin/bash # Check if classifiers exist in the pinecone pyproject.toml echo "=== Check for classifiers in pinecone pyproject.toml ===" grep -n "classifiers" packages/opentelemetry-instrumentation-pinecone/pyproject.toml # If not found, check a few other packages echo -e "\n=== Check other packages for requires-python and classifiers ===" for pkg in packages/opentelemetry-instrumentation-*/pyproject.toml; do echo "--- $pkg ---" grep -n "requires-python\|classifiers" "$pkg" | head -5 done | head -50

Repository: traceloop/openllmetry

Length of output: 2780

🏁 Script executed:

#!/bin/bash # Check release.yml for which package versions it releases echo "=== release.yml workflow content (relevant sections) ===" sed -n '1,50p' .github/workflows/release.yml echo "" echo "=== release.yml job configurations ===" grep -n "python-version\|needs\|jobs:" .github/workflows/release.yml

Repository: traceloop/openllmetry

Length of output: 1641

Align release workflow with Python 3.10 minimum requirement.

Line 12 declares requires-python = ">=3.10,<4", and CI does test on 3.10 in the test matrix. However, .github/workflows/release.yml (lines 22, 63, 104, 137) uses only Python 3.11 across all release jobs. Consider adding Python 3.10 to the release workflow matrix to explicitly validate the declared minimum before publishing.

Additionally, no classifiers are present in the project's pyproject.toml files. Consider adding classifiers metadata (e.g., Programming Language :: Python :: 3.10, Programming Language :: Python :: 3.11) to clarify version support in package metadata.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-pinecone/pyproject.toml` at line 12, The release workflow currently tests only Python 3.11 while pyproject.toml declares requires-python = ">=3.10,<4"; update .github/workflows/release.yml to include Python 3.10 in the python-version matrix for all release jobs (the Python matrix entries around the job definitions at lines referenced in the review) so the declared minimum is validated, and add a classifiers section to the project's pyproject.toml (add the classifiers key under the project metadata) including at minimum "Programming Language :: Python :: 3.10" and "Programming Language :: Python :: 3.11" to make supported versions explicit in package metadata.

coderabbitai · 2026-01-18T09:23:27Z

packages/opentelemetry-instrumentation-pinecone/pyproject.toml

  "pytest-sugar==1.0.0",
  "pytest>=8.2.2,<9",
-  "vcrpy>=6.0.1,<7",
+  "vcrpy>=8.0.0,<9",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

pytest-recording vcrpy 8.x compatibility breaking changes

💡 Result:

Short summary — what changed in VCR.py 8 that can break pytest-recording users

VCR.py 8 dropped support for Python 3.9 and also requires urllib3 >= 2 (i.e., dropped urllib3 < 2). If your environment uses Python 3.9 or older urllib3, upgrading to v8 will be incompatible. [1]

VCR.py 8 contains internal/behaviour changes (rewritten httpx support to patch httpcore, fixes to proxy/redirect handling, new options such as drop_unused_requests) that may require re-recording cassettes or updating code that relied on prior vcrpy internals. [1]

pytest-recording itself has recent releases and packaging updates tied to newer Python versions; confirm you have a pytest-recording release that declares compatibility with vcrpy 8 (or that your test environment meets vcrpy8 requirements). If you depend on an older pytest-recording pinned to older vcrpy versions, upgrading vcrpy to 8 may break things until pytest-recording explicitly supports v8. Check/upgrade pytest-recording to a release that supports vcrpy 8 or keep vcrpy <8. [2][3]

What to do

If you plan to upgrade to vcrpy 8: ensure your Python version >= the one v8 supports and urllib3 >= 2, update pytest-recording to a release that declares compatibility, and re-run/ re-record cassettes where behavior changed (httpx, proxy/redirects). [1][2]

Sources

vcrpy v8.0.0 release notes (breaking changes: drop Python 3.9, drop urllib3 <2; httpx/httpcore rework; new options). [1]

pytest-recording project / PyPI (project and releases; check package metadata for Python/version constraints). [2][3]

References
[1] vcrpy v8 release notes / changelog.
[2] pytest-recording GitHub / project README.
[3] pytest-recording PyPI release metadata.

🏁 Script executed:

# First, let's examine the pyproject.toml file and surrounding context cat packages/opentelemetry-instrumentation-pinecone/pyproject.toml | head -60

Repository: traceloop/openllmetry

Length of output: 186

🏁 Script executed:

# Check if there are test files and cassettes in the pinecone instrumentation find packages/opentelemetry-instrumentation-pinecone -type f -name "*.py" -path "*/tests/*" | head -20

Repository: traceloop/openllmetry

Length of output: 264

🏁 Script executed:

# Look for VCR cassette files find packages/opentelemetry-instrumentation-pinecone -type f $ -name "*.yaml" -o -name "*.json" $ | grep -i cassette

Repository: traceloop/openllmetry

Length of output: 168

🏁 Script executed:

# Check the dependencies more carefully grep -A 10 "dependencies\|pytest-recording\|vcrpy" packages/opentelemetry-instrumentation-pinecone/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 1040

🌐 Web query:

pytest-recording 0.13.1 vcrpy 8 compatibility support

💡 Result:

Short answer: pytest-recording 0.13.1 (released Dec 6, 2023) predates VCR.py v8 and does not explicitly guarantee compatibility with vcrpy 8 — upgrade to a later pytest-recording (0.13.2+ / recommended 0.13.4) for better chance of compatibility. [1][2][3]

Sources:

pytest-recording release listing (0.13.1 date). [1]

pytest-recording project page / README (integration with VCR.py; no explicit vcrpy‑8 guarantee on older releases). [2]

VCR.py (vcrpy) documentation (v8 series). [3]

References:
[1] piwheels release list for pytest-recording (0.13.1 — 2023-12-06).
[2] pytest-recording project / PyPI pages.
[3] vcrpy documentation (v8).

🏁 Script executed:

# Check conftest.py to see VCR configuration cat packages/opentelemetry-instrumentation-pinecone/tests/conftest.py

Repository: traceloop/openllmetry

Length of output: 1893

🏁 Script executed:

# Examine the cassette file - check its format and timestamp if available head -50 packages/opentelemetry-instrumentation-pinecone/tests/cassettes/test_query/test_pinecone_retrieval.yaml

Repository: traceloop/openllmetry

Length of output: 1605

🏁 Script executed:

# Check git history of cassette to see if it was recently updated cd packages/opentelemetry-instrumentation-pinecone && git log --oneline -n 10 tests/cassettes/test_query/test_pinecone_retrieval.yaml 2>/dev/null || echo "Git history not available"

Repository: traceloop/openllmetry

Length of output: 93

🏁 Script executed:

# Examine test_query.py to see how VCR is used cat packages/opentelemetry-instrumentation-pinecone/tests/test_query.py

Repository: traceloop/openllmetry

Length of output: 7137

Upgrade pytest-recording to support vcrpy 8 and re-record cassettes.

The pinned version pytest-recording>=0.13.1,<0.14.0 predates vcrpy 8 (released Dec 2023) and lacks explicit compatibility. Upgrade to pytest-recording>=0.13.2 (recommended 0.13.4+) which declares vcrpy 8 support. Additionally, vcrpy 8 introduced breaking changes (httpx/httpcore rewrite, proxy/redirect handling fixes) that require cassettes to be re-recorded. Verify that test_pinecone_retrieval.yaml has been re-recorded with vcrpy 8 to ensure proper behavior. Ensure cassettes contain no secrets (verify the filtered authorization header and other sensitive data are properly scrubbed).

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-pinecone/pyproject.toml` at line 44, Update the pytest-recording constraint in pyproject.toml to a version that declares vcrpy 8 support (e.g., change pytest-recording to ">=0.13.2" or preferably ">=0.13.4"), then re-record the VCR cassettes used by the Pinecone tests (notably test_pinecone_retrieval.yaml) using vcrpy 8 to accommodate httpx/httpcore and proxy/redirect behavior changes; after re-recording, verify cassettes have sensitive data scrubbed (ensure filtered Authorization header and other secrets are properly filtered) before committing.

coderabbitai · 2026-01-18T09:23:27Z

packages/opentelemetry-instrumentation-together/pyproject.toml

  "pytest>=8.2.2,<9",
  "together>=1.2.0,<2",
-  "vcrpy>=6.0.1,<7",
+  "vcrpy>=8.0.0,<9",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, examine the pyproject.toml file to understand the change and context head -n 50 packages/opentelemetry-instrumentation-together/pyproject.toml | tail -n 15

Repository: traceloop/openllmetry

Length of output: 384

🏁 Script executed:

# Search for any test configuration or cassette-related setup fd cassettes packages/opentelemetry-instrumentation-together/tests

Repository: traceloop/openllmetry

Length of output: 129

🏁 Script executed:

# Look for vcrpy usage in test files rg "vcr\." packages/opentelemetry-instrumentation-together/tests -l

Repository: traceloop/openllmetry

Length of output: 47

🌐 Web query:

vcrpy 8.0.0 breaking changes between 6.x and 8.x release notes

💡 Result:

Summary of breaking changes you need to know (from v6.x → v8.0.0):

v6.0.0 (notable BREAKING items):

Fix to httpx support that may require re-recording cassettes because a binary format changed. [1]

Drop support for boto (vcrpy still supports boto3). [1]

Drop simplejson dependency. [1]

v8.0.0 (notable BREAKING items):

Drop support for Python 3.9. [2]

Drop support for urllib3 < 2 (moves to urllib3 v2+ to address CVE/warning issues). [2]

Sources:

vcrpy changelog (v6.0.0). [1] https://vcrpy.readthedocs.io/en/v6.0.0/changelog.html

vcrpy v8.0.0 release notes summary. [2] https://newreleases.io/project/github/kevin1024/vcrpy/release/v8.0.0

🌐 Web query:

vcrpy library pypi version history changelog

💡 Result:

You can find vcrpy's PyPI release history and files on its PyPI page, and its detailed changelog on the project docs/GitHub:

PyPI release history (all released versions, upload dates, and downloadable files). [1]

Official changelog (per-release notes) in the vcrpy documentation on Read the Docs. [2]

Alternate aggregated changelog/version list (LibHunt). [3]

Links:
[1] PyPI project page — vcrpy (release history & files).
[2] vcrpy Changelog — Read the Docs (per-release notes).
[3] VCR.py changelog overview — LibHunt.

🏁 Script executed:

# Check Python version requirement in pyproject.toml rg "requires-python" packages/opentelemetry-instrumentation-together/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

# Check urllib3 version requirement rg "urllib3" packages/opentelemetry-instrumentation-together/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

# Check if there's pytest-recording configuration or cassette setup head -n 100 packages/opentelemetry-instrumentation-together/tests/conftest.py 2>/dev/null || echo "No conftest.py found"

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

# Look for any cassette files to see if they might be stale find packages/opentelemetry-instrumentation-together/tests/cassettes -name "*.yaml" -o -name "*.json" | head -10

Repository: traceloop/openllmetry

Length of output: 151

🏁 Script executed:

grep "requires-python" packages/opentelemetry-instrumentation-together/pyproject.toml

Repository: traceloop/openllmetry

Length of output: 94

🏁 Script executed:

cat packages/opentelemetry-instrumentation-together/pyproject.toml | grep -A 20 "dependencies"

Repository: traceloop/openllmetry

Length of output: 973

🏁 Script executed:

ls -la packages/opentelemetry-instrumentation-together/tests/cassettes/ | head -15

Repository: traceloop/openllmetry

Length of output: 295

Verify vcrpy 8.0.0 cassettes are re-recorded and clean.

The upgrade dependencies are compatible: Python 3.10+ is required (satisfies vcrpy 8.0.0's Python 3.9 drop), and urllib3 >=2.6.3 is already specified (meets vcrpy's urllib3 >= 2 requirement). Since tests use pytest-recording wrapper, direct API changes are minimal.

Re-record VCR cassettes in tests/cassettes/test_chat/ and tests/cassettes/test_completion/ to ensure compatibility, and ensure all cassettes are scrubbed of secrets or PII using pytest-recording's filtering mechanisms.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-together/pyproject.toml` at line 43, Upgrade to vcrpy 8.0.0 requires re-recording and scrubbing test cassettes; re-run the tests that use pytest-recording to re-record the cassettes under tests/cassettes/test_chat/ and tests/cassettes/test_completion/ (ensure record_mode is set to 'all' or appropriate in your pytest-recording/conftest.py fixture), then sanitize those new cassettes using pytest-recording filters (or the VCR.py before_record/after_record hooks configured in conftest.py) to remove keys, tokens, PII and any headers (e.g., Authorization, Cookie, API keys) before committing; verify playback succeeds with urllib3>=2 and Python 3.10+.

coderabbitai · 2026-01-18T09:23:27Z

packages/opentelemetry-instrumentation-writer/pyproject.toml

+[tool.uv]
+constraint-dependencies = ["urllib3>=2.6.3", "pyarrow>=18.1.0", "pip>=25.3"]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check workspace structure and all [tool.uv] sections echo "=== Files matching pyproject.toml in repo ===" fd 'pyproject.toml' --type f echo -e "\n=== All [tool.uv] sections ===" rg -B2 '\[tool\.uv\]' --glob 'pyproject.toml' -A3 echo -e "\n=== All constraint-dependencies occurrences ===" rg 'constraint-dependencies' --glob 'pyproject.toml' -B1 -A1

Repository: traceloop/openllmetry

Length of output: 186

🌐 Web query:

uv constraint-dependencies workspace root scope behavior

💡 Result:

Short answer: uv only reads constraint-related settings from the workspace root pyproject.toml. Specifically, constraint-dependencies (and build-constraint-dependencies, override-dependencies, and exclude-dependencies) set at the workspace root are applied by uv commands like uv lock / uv sync / uv run; uv ignores those same keys when declared in non-root workspace members or in uv.toml files. See the docs for the exact settings and the same note in the locking docs. [1][2]

Sources

uv reference — settings (constraint-dependencies / build-constraint-dependencies / override-dependencies notes). [1]

uv locking docs (how constraint files are used and the note that uv appends constraint-dependencies from the workspace root). [2]

🌐 Web query:

uv package manager pyproject.toml constraint-dependencies non-root package

💡 Result:

Short answer: uv only reads constraint-dependencies from the workspace root. To apply constraints to a non-root (workspace member) package, put the constraint-dependencies in the root pyproject.toml (or use a constraints file passed to uv pip compile). uv will ignore constraint-dependencies declared inside other workspace members or uv.toml files. [1][2]

Example (root pyproject.toml):
[tool.uv]
constraint-dependencies = ["somepkg<2.0"] # applies to resolution for workspace members

If you need build-time constraints, use build-constraint-dependencies at the workspace root or a build constraints file. [1][3]

Sources:

uv settings (constraint-dependencies / build-constraint-dependencies). [1]

uv virtual/workspace dependency behavior and using sources/constraints. [2]

troubleshooting / build-constraint-dependencies example. [3]

🌐 Web query:

uv monorepo constraint-dependencies inheritance or propagation

💡 Result:

Short answer: uv's constraint-dependencies do not inherit/propagate from workspace members — uv (uv lock / uv sync / uv run) reads constraint-dependencies (and build-constraint-dependencies) only from the pyproject.toml at the workspace root and ignores declarations in other workspace members or uv.toml files. Use the root pyproject.toml to declare constraints for the whole monorepo. [1][2]

References:

uv settings docs (constraint-dependencies / build-constraint-dependencies note). [1]

uv resolution docs (how constraints are applied during resolution). [2]

[1] https://docs.astral.sh/uv/reference/settings/
[2] https://docs.astral.sh/uv/concepts/resolution/

Move constraint-dependencies to the workspace root pyproject.toml.

uv only reads constraint-dependencies from the workspace root; constraints declared in non-root workspace members (like this package) are silently ignored. Move constraint-dependencies = ["urllib3>=2.6.3", "pyarrow>=18.1.0", "pip>=25.3"] to the [tool.uv] section in the root pyproject.toml so these constraints actually take effect during dependency resolution.

🤖 Prompt for AI Agents

In `@packages/opentelemetry-instrumentation-writer/pyproject.toml` around lines 75 - 76, The constraint-dependencies entry in this package's [tool.uv] section is ineffective because uv only reads constraints from the workspace root; move the line constraint-dependencies = ["urllib3>=2.6.3", "pyarrow>=18.1.0", "pip>=25.3"] out of packages/opentelemetry-instrumentation-writer/pyproject.toml and add it to the root pyproject.toml under its [tool.uv] section so the uv resolver will honor these constraints.

ellipsis-dev · 2026-01-18T09:31:32Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

ellipsis-dev · 2026-01-18T09:43:26Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

npm audit fix

f4ce84c

galkleinman marked this pull request as ready for review January 15, 2026 16:46

Merge branch 'main' into gk/cves-remediation-150126

3917dba

ellipsis-dev bot reviewed Jan 15, 2026

View reviewed changes

OzBenSimhonTraceloop approved these changes Jan 15, 2026

View reviewed changes

galkleinman added 2 commits January 17, 2026 13:46

llama-index 0.4.12 + pyasn1 0.6.2

89bd645

Merge branch 'gk/cves-remediation-150126' of github.com:traceloop/ope…

915f6be

…nllmetry into gk/cves-remediation-150126

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

return missing assertions for llama-index and fix instrumentation to …

0d1309f

…support it

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

urllib 2.6.3 + vcrpy 8

35a4236

python 3.10 to avoid filelock cve

cb74d35

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

transformers 4.57.6

f862128

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

langchain & langgraph upgrades

06ff435

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

fix langgraph and langchain imports and use to meet new apis

c959d09

requests 2.32.5

2eb3078

pyarrow

a56072d

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

packages/opentelemetry-instrumentation-writer/pyproject.toml Outdated Show resolved Hide resolved

uv workspace

1b430ea

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

fix tests

607cdeb

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

revert uv workspace

f6baa5b

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

fix test

a9eaa2c

fix tests values

45f9402

galkleinman merged commit adba2bf into main Jan 18, 2026
12 of 14 checks passed

		[tool.uv]
		constraint-dependencies = ["urllib3>=2.6.3", "pip>=25.3"]

		@@ -1 +1 @@
		3.9.5
		3.10

chore(deps): various cves remediation #3565

chore(deps): various cves remediation #3565

Uh oh!

Conversation

galkleinman commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot commented Jan 17, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot commented Jan 17, 2026

Uh oh!

ellipsis-dev bot commented Jan 17, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot commented Jan 17, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

galkleinman commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading