Skip to content

Conversation

@hassiebp
Copy link
Contributor

@hassiebp hassiebp commented Nov 6, 2024

Important

Move serialization to background threads, improve logging, and add performance overhead test for Langfuse tracing.

  • Behavior:
    • Move serialization to background threads in langfuse/task_manager.py.
    • Handle serialization errors in Consumer._next() in langfuse/task_manager.py.
    • Add _filter_io_from_event_body() to filter input and output from logs in langfuse/client.py.
  • Logging:
    • Update logging to exclude input and output in langfuse/client.py and langfuse/task_manager.py.
    • Add debug logs for batch processing in Consumer class in langfuse/task_manager.py.
  • Testing:
    • Add test_langfuse_overhead() in tests/test_langchain.py to measure performance overhead of Langfuse tracing.
  • Misc:
    • Remove unused kwargs_log in _log_debug_event() in langfuse/callback/langchain.py.

This description was created by Ellipsis for 3a22aee. It will automatically update as commits are pushed.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: Experimental PR review

PR Summary

This PR optimizes performance by moving heavy operations like serialization, sampling and masking to background threads in the Langfuse Python SDK.

  • Moves Pydantic model serialization from main thread to Consumer threads in langfuse/task_manager.py to reduce synchronous processing overhead
  • Removes detailed kwargs logging from debug events in langfuse/callback/langchain.py to minimize logging overhead
  • Adds filtering of sensitive input/output data from debug logs in langfuse/client.py for better security
  • Introduces performance test test_langfuse_overhead to verify tracing overhead remains under 10ms
  • Defers sampling and masking operations to background threads by moving them from TaskManager to Consumer class

4 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: Experimental PR review

PR Summary

(updates since last review)

This PR adds proper queue task completion handling in the task manager's _next method. Here's a focused summary of the key changes:

  • Added queue.task_done() call after sampling out events to properly mark filtered tasks as complete
  • Added queue.task_done() call after serialization errors to ensure failed tasks don't remain in queue
  • Added debug logging for item sizes and batch processing in Consumer class
  • Added truncation of large items that exceed size limits with proper logging
  • Added error handling for serialization failures with detailed logging

The changes focus on improving queue management reliability and preventing potential memory leaks by ensuring all tasks are properly marked as complete, whether they are filtered out, fail serialization, or exceed size limits.

1 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: Experimental PR review

PR Summary

(updates since last review)

This PR adds performance testing and improves error handling in the task manager's background processing.

  • Added test_concurrency() in tests/test_core_sdk.py to verify concurrent processing of 100 generations
  • Added test_multiple_tasks_without_predecessor() in tests/test_task_manager.py to validate batch processing reliability
  • Added detailed error handling for serialization failures in Consumer._next() with proper error logging
  • Added size limit validation and truncation for oversized items with MAX_MSG_SIZE and BATCH_SIZE_LIMIT constants

1 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: Experimental PR review

PR Summary

(updates since last review)

Based on the latest changes and previous reviews, here's a focused summary of the new updates:

Adds filtering of metadata from debug logs and updates sampling rate access in tests.

  • Modified _filter_io_from_event_body in langfuse/client.py to also exclude 'metadata' field from debug logs for better verbosity control
  • Updated sampling rate access pattern in test_sdk_setup.py to use direct _sample_rate property instead of _sampler.sample_rate
  • Added test assertions to verify sampling rate configuration through task manager property
  • Improved test coverage for debug log filtering with metadata exclusion

The changes are focused on logging improvements and test maintenance, building on the previous performance optimizations without introducing major new functionality.

2 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile

@hassiebp hassiebp merged commit 87f98f0 into main Nov 6, 2024
2 of 3 checks passed
@hassiebp hassiebp deleted the perf-move-serializations-to-background branch November 6, 2024 17:46
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: Experimental PR review

PR Summary

(updates since last review)

Based on the latest changes and previous reviews, here's a focused summary of the new updates:

The PR adjusts performance test thresholds and adds validation in test_langchain.py:

  • Increased performance test threshold from 10ms to 50ms in test_langfuse_overhead() to account for background thread processing overhead
  • Added assertion to verify full execution takes >1 second to validate meaningful overhead measurement
  • Test generates large random dictionary input to stress test serialization performance
  • Test compares execution times with and without Langfuse tracing to measure actual overhead impact

The changes focus on making the performance tests more realistic and reliable by adjusting thresholds and adding validation, without changing the core test functionality.

1 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants