perf: move serialization to background threads #994

hassiebp · 2024-11-06T16:36:15Z

Important

Move serialization to background threads, improve logging, and add performance overhead test for Langfuse tracing.

Behavior:
- Move serialization to background threads in langfuse/task_manager.py.
- Handle serialization errors in Consumer._next() in langfuse/task_manager.py.
- Add _filter_io_from_event_body() to filter input and output from logs in langfuse/client.py.
Logging:
- Update logging to exclude input and output in langfuse/client.py and langfuse/task_manager.py.
- Add debug logs for batch processing in Consumer class in langfuse/task_manager.py.
Testing:
- Add test_langfuse_overhead() in tests/test_langchain.py to measure performance overhead of Langfuse tracing.
Misc:
- Remove unused kwargs_log in _log_debug_event() in langfuse/callback/langchain.py.

^{This description was created by}^{for 3a22aee. It will automatically update as commits are pushed.}

greptile-apps

Disclaimer: Experimental PR review

PR Summary

This PR optimizes performance by moving heavy operations like serialization, sampling and masking to background threads in the Langfuse Python SDK.

Moves Pydantic model serialization from main thread to Consumer threads in langfuse/task_manager.py to reduce synchronous processing overhead
Removes detailed kwargs logging from debug events in langfuse/callback/langchain.py to minimize logging overhead
Adds filtering of sensitive input/output data from debug logs in langfuse/client.py for better security
Introduces performance test test_langfuse_overhead to verify tracing overhead remains under 10ms
Defers sampling and masking operations to background threads by moving them from TaskManager to Consumer class

_{4 file(s) reviewed, 2 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

langfuse/client.py

langfuse/task_manager.py

greptile-apps

Disclaimer: Experimental PR review

PR Summary

(updates since last review)

This PR adds proper queue task completion handling in the task manager's _next method. Here's a focused summary of the key changes:

Added queue.task_done() call after sampling out events to properly mark filtered tasks as complete
Added queue.task_done() call after serialization errors to ensure failed tasks don't remain in queue
Added debug logging for item sizes and batch processing in Consumer class
Added truncation of large items that exceed size limits with proper logging
Added error handling for serialization failures with detailed logging

The changes focus on improving queue management reliability and preventing potential memory leaks by ensuring all tasks are properly marked as complete, whether they are filtered out, fail serialization, or exceed size limits.

_{1 file(s) reviewed, no comment(s)}
_{Edit PR Review Bot Settings | Greptile}

langfuse/task_manager.py

langfuse/callback/langchain.py

langfuse/client.py

langfuse/task_manager.py

greptile-apps

Disclaimer: Experimental PR review

PR Summary

(updates since last review)

This PR adds performance testing and improves error handling in the task manager's background processing.

Added test_concurrency() in tests/test_core_sdk.py to verify concurrent processing of 100 generations
Added test_multiple_tasks_without_predecessor() in tests/test_task_manager.py to validate batch processing reliability
Added detailed error handling for serialization failures in Consumer._next() with proper error logging
Added size limit validation and truncation for oversized items with MAX_MSG_SIZE and BATCH_SIZE_LIMIT constants

_{1 file(s) reviewed, 1 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

langfuse/task_manager.py

greptile-apps

Disclaimer: Experimental PR review

PR Summary

(updates since last review)

Based on the latest changes and previous reviews, here's a focused summary of the new updates:

Adds filtering of metadata from debug logs and updates sampling rate access in tests.

Modified _filter_io_from_event_body in langfuse/client.py to also exclude 'metadata' field from debug logs for better verbosity control
Updated sampling rate access pattern in test_sdk_setup.py to use direct _sample_rate property instead of _sampler.sample_rate
Added test assertions to verify sampling rate configuration through task manager property
Improved test coverage for debug log filtering with metadata exclusion

The changes are focused on logging improvements and test maintenance, building on the previous performance optimizations without introducing major new functionality.

_{2 file(s) reviewed, no comment(s)}
_{Edit PR Review Bot Settings | Greptile}

greptile-apps

Disclaimer: Experimental PR review

PR Summary

(updates since last review)

Based on the latest changes and previous reviews, here's a focused summary of the new updates:

The PR adjusts performance test thresholds and adds validation in test_langchain.py:

Increased performance test threshold from 10ms to 50ms in test_langfuse_overhead() to account for background thread processing overhead
Added assertion to verify full execution takes >1 second to validate meaningful overhead measurement
Test generates large random dictionary input to stress test serialization performance
Test compares execution times with and without Langfuse tracing to measure actual overhead impact

The changes focus on making the performance tests more realistic and reliable by adjusting thresholds and adding validation, without changing the core test functionality.

_{1 file(s) reviewed, no comment(s)}
_{Edit PR Review Bot Settings | Greptile}

hassiebp added 4 commits November 6, 2024 16:51

perf: move all serialization to background threads

c263f35

remove io from debug logs

1b56dba

move masking and serialization to bg

5f1ce0c

add test and move sampler to bg

d8b9e73

greptile-apps bot reviewed Nov 6, 2024

View reviewed changes

langfuse/client.py Show resolved Hide resolved

langfuse/task_manager.py Show resolved Hide resolved

fix: mark skipped items as done in queue

f86d03e

greptile-apps bot reviewed Nov 6, 2024

View reviewed changes

ellipsis-dev bot reviewed Nov 6, 2024

View reviewed changes

langfuse/task_manager.py Show resolved Hide resolved

maxdeichmann reviewed Nov 6, 2024

View reviewed changes

langfuse/callback/langchain.py Show resolved Hide resolved

maxdeichmann reviewed Nov 6, 2024

View reviewed changes

langfuse/client.py Outdated Show resolved Hide resolved

maxdeichmann reviewed Nov 6, 2024

View reviewed changes

langfuse/task_manager.py Outdated Show resolved Hide resolved

add exclude_none

a4685dc

greptile-apps bot reviewed Nov 6, 2024

View reviewed changes

langfuse/task_manager.py Show resolved Hide resolved

hassiebp added 2 commits November 6, 2024 18:27

add metadata to log filter

4137cb0

fix test

b7ee6f3

greptile-apps bot reviewed Nov 6, 2024

View reviewed changes

fix test

3a22aee

hassiebp merged commit 87f98f0 into main Nov 6, 2024
2 of 3 checks passed

hassiebp deleted the perf-move-serializations-to-background branch November 6, 2024 17:46

greptile-apps bot reviewed Nov 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: move serialization to background threads #994

perf: move serialization to background threads #994

Uh oh!

hassiebp commented Nov 6, 2024 •

edited by ellipsis-dev bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: move serialization to background threads #994

perf: move serialization to background threads #994

Uh oh!

Conversation

hassiebp commented Nov 6, 2024 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Disclaimer: Experimental PR review

PR Summary

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Disclaimer: Experimental PR review

PR Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Disclaimer: Experimental PR review

PR Summary

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Disclaimer: Experimental PR review

PR Summary

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Disclaimer: Experimental PR review

PR Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hassiebp commented Nov 6, 2024 •

edited by ellipsis-dev bot

Loading