feat(evals): add run_batched_evaluation #1436

hassiebp · 2025-11-12T13:59:48Z

Important

Adds run_batched_evaluation to Langfuse client for large-scale evaluation with error handling, retry logic, and resume capability, along with comprehensive tests.

Behavior:
- Adds run_batched_evaluation() to Langfuse client in client.py for large-scale evaluation of traces and observations.
- Implements error handling, retry logic, and resume capability.
Modules:
- New batch_evaluation.py module for batch evaluation logic.
Testing:
- Adds tests/test_batch_evaluation.py with 40+ test cases.
Misc:
- Moves import statements to the top in client.py to adhere to style guide.

^{This description was created by}^{for 931bdd2. You can customize this summary. It will automatically update as commits are pushed.}

Disclaimer: Experimental PR review

Greptile Overview

Greptile Summary

This PR adds run_batched_evaluation to enable large-scale evaluation of traces, observations, and sessions. The implementation includes mapper functions, evaluators, composite evaluators, comprehensive error handling, retry logic, and resume capability.

Key Changes

Added new batch_evaluation.py module with core implementation
Added run_batched_evaluation() method to Langfuse client
Exported new types (EvaluatorInputs, MapperFunction, CompositeEvaluatorFunction, EvaluatorStats, BatchEvaluationResumeToken, BatchEvaluationResult) to public API
Comprehensive test suite with 40+ test cases

Issues Found

Import statement inside method violates style guide (should be at module top)

Confidence Score: 4/5

Safe to merge with minor style improvement
Well-architected implementation with comprehensive error handling, retry logic, and extensive test coverage. Only minor style violation found (inline import). Code is production-ready with proper protocols, type hints, and documentation.
langfuse/_client/client.py needs import moved to top per style guide

Important Files Changed

File Analysis

Filename	Score	Overview
langfuse/_client/client.py	4/5	added `run_batched_evaluation` method with comprehensive docs; import should be moved to top per style guide
langfuse/batch_evaluation.py	5/5	new module implementing batch evaluation with proper error handling, retry logic, and resume capability

Sequence Diagram

sequenceDiagram
    participant User
    participant Langfuse as Langfuse Client
    participant Runner as BatchEvaluationRunner
    participant API as Langfuse API
    participant Mapper
    participant Evaluator
    
    User->>Langfuse: run_batched_evaluation(scope, mapper, evaluators, ...)
    Langfuse->>Runner: create BatchEvaluationRunner
    Langfuse->>Runner: run_async(...)
    
    loop For each batch (pagination)
        Runner->>API: fetch_batch_with_retry(scope, filter, page)
        API-->>Runner: items batch
        
        loop For each item in batch (concurrent)
            Runner->>Mapper: map(item)
            Mapper-->>Runner: EvaluatorInputs
            
            loop For each evaluator
                Runner->>Evaluator: evaluate(input, output, ...)
                Evaluator-->>Runner: Evaluation(s)
                Runner->>Langfuse: create_score(trace_id/obs_id/session_id)
            end
            
            opt If composite_evaluator
                Runner->>Evaluator: composite_evaluator(item, evaluations)
                Evaluator-->>Runner: composite Evaluation
                Runner->>Langfuse: create_score(...)
            end
        end
    end
    
    Runner->>Langfuse: flush()
    Runner-->>Langfuse: BatchEvaluationResult
    Langfuse-->>User: BatchEvaluationResult

Context used:

Rule from dashboard - Move imports to the top of the module instead of placing them within functions or methods. (source)

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

tests/test_batch_evaluation.py

hassiebp added 2 commits November 12, 2025 14:59

feat(evals): add run_batched_evaluation

e92b1f3

Merge branch 'main' into add-batch-evals

9902def

greptile-apps bot reviewed Nov 12, 2025

View reviewed changes

hassiebp added 12 commits November 12, 2025 15:11

add str method

f6ab661

push

a6cd970

push

92e72ed

remove sessions

e069711

add composite evaluator to run_experiments

66288a6

add item evaluations

06a6e37

Merge branch 'main' into add-batch-evals

da756c5

push

dde2f5b

push

fec1786

push

47a37df

push

c7d8fde

push

929a6a2

ellipsis-dev bot reviewed Nov 14, 2025

View reviewed changes

tests/test_batch_evaluation.py Outdated Show resolved Hide resolved

hassiebp added 4 commits November 14, 2025 11:47

push

48ed142

push

31beb2b

push

61c07fe

push

931bdd2

hassiebp merged commit 4c57007 into main Nov 14, 2025
12 checks passed

hassiebp deleted the add-batch-evals branch November 14, 2025 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(evals): add run_batched_evaluation #1436

feat(evals): add run_batched_evaluation #1436

Uh oh!

hassiebp commented Nov 12, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(evals): add run_batched_evaluation #1436

feat(evals): add run_batched_evaluation #1436

Uh oh!

Conversation

hassiebp commented Nov 12, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Disclaimer: Experimental PR review

Greptile Overview

Greptile Summary

Key Changes

Issues Found

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hassiebp commented Nov 12, 2025 •

edited by ellipsis-dev bot

Loading