Skip to content

Conversation

@misran3
Copy link

@misran3 misran3 commented Jan 8, 2026

Multi-LLM Support Implementation

Summary

This PR adds comprehensive multi-LLM provider support to the Android Use agent, enabling users to choose between OpenAI, Anthropic (Claude), Google Gemini, and AWS Bedrock. The implementation uses Pydantic AI for a unified interface and structured outputs, making the codebase provider-agnostic while improving type safety and reliability.

Key Features

Multi-Provider Support

  • OpenAI (GPT-4o, GPT-4o-mini, etc.)
  • Anthropic Claude (Claude Sonnet 4, etc.)
  • Google Gemini (Gemini 2.0 Flash, etc.)
  • AWS Bedrock (Claude on Bedrock)

Structured Outputs

  • Pydantic models for type-safe action validation
  • Automatic schema validation
  • Better error messages

Environment-Based Configuration

  • Simple provider switching via LLM_PROVIDER env var
  • Provider-specific API key management
  • Optional model overrides per provider

Comprehensive Testing

  • 11/11 unit tests passing
  • Provider initialization tests
  • Error handling tests
  • Integration tests

Architecture Changes

Before

# Direct OpenAI client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
response = client.chat.completions.create(model="gpt-4o", ...)
decision = json.loads(response.choices[0].message.content)

After

# Unified LLM manager with structured outputs
llm_manager = LLMManager()  # Reads LLM_PROVIDER env var
action = await llm_manager.get_decision(goal, screen_context)
# Returns validated Pydantic model (TapAction, TypeAction, etc.)

New Files

  • action_models.py - Pydantic models for structured action outputs
  • llm_manager.py - Multi-provider LLM manager
  • tests/test_action_models.py - Action model validation tests
  • tests/test_llm_manager.py - LLM manager initialization tests
  • tests/test_kernel_integration.py - Integration tests
  • tests/MANUAL_TEST_RESULTS.md - Manual testing template
  • examples/openai_example.sh - OpenAI usage example
  • examples/anthropic_example.sh - Anthropic usage example
  • examples/gemini_example.sh - Gemini usage example
  • examples/bedrock_example.sh - Bedrock usage example

Modified Files

  • kernel.py - Refactored to use LLM manager, async/await pattern
  • requirements.txt - Updated to pydantic-ai-slim[openai,anthropic,google,bedrock]
  • README.md - Added multi-LLM configuration instructions

Usage

Quick Start

# Choose your provider
export LLM_PROVIDER="openai"  # or anthropic, gemini, bedrock

# Set API key
export OPENAI_API_KEY="sk-..."

# Optional: Override default model
export OPENAI_MODEL="gpt-4o"

# Run the agent
python kernel.py

Provider-Specific Examples

# OpenAI
export LLM_PROVIDER="openai"
export OPENAI_API_KEY="sk-..."

# Anthropic Claude
export LLM_PROVIDER="anthropic"
export ANTHROPIC_API_KEY="sk-..."

# Google Gemini (cheapest option)
export LLM_PROVIDER="gemini"
export GOOGLE_API_KEY="..."

# AWS Bedrock
export LLM_PROVIDER="bedrock"
export AWS_PROFILE="default"

Available Models

Provider Default Model Override Env Var
OpenAI gpt-4o OPENAI_MODEL
Anthropic claude-sonnet-4 ANTHROPIC_MODEL
Gemini gemini-2.0-flash-exp GEMINI_MODEL
Bedrock anthropic.claude-sonnet-4-20250514-v1:0 BEDROCK_MODEL

Testing

Automated Tests

pytest tests/ -v
# 11/11 tests passing

Manual Testing

  • Error handling verified via unit tests
  • Real API testing requires valid credentials
  • See tests/MANUAL_TEST_RESULTS.md for testing checklist

Breaking Changes

⚠️ Migration Required

  1. Environment Variables: Must set LLM_PROVIDER env var

    # Before
    export OPENAI_API_KEY="sk-..."
    
    # After
    export LLM_PROVIDER="openai"
    export OPENAI_API_KEY="sk-..."
  2. Async Pattern: run_agent() is now async

    # Before
    run_agent(goal)
    
    # After
    asyncio.run(run_agent(goal))
  3. Dependencies: New requirement format

    # Before: openai>=1.12.0
    # After: pydantic-ai-slim[openai,anthropic,google,bedrock]
    

Benefits

  1. Cost Optimization: Switch to cheaper providers (Gemini) for non-critical workflows
  2. Reliability: Fallback to alternative providers if one is down
  3. Type Safety: Pydantic models catch errors before execution
  4. Developer Experience: Clear error messages, better testing
  5. Future-Proof: Easy to add new providers (Ollama, Llama, etc.)

Next Steps

  • Manual testing with real API keys
  • Refactor package structure for modularity and PyPI distribution
  • Add Ollama/Llama support for local models
  • Optional retry/fallback logic between providers

Questions & Contact

Have questions about this PR or want to discuss implementation details?

📧 Email: mohammed.misran@pitt.edu

📅 Schedule a call: Calendly

Feel free to reach out for clarifications, feedback, or collaboration opportunities!

- Added Pydantic AI for unified LLM interface
- Support for OpenAI, Anthropic, Gemini, AWS Bedrock
- Provider selection via LLM_PROVIDER env var
- Model overrides via provider-specific env vars
- Structured outputs using Pydantic models
- Fail-fast error handling
- Updated documentation and examples
@misran3 misran3 marked this pull request as draft January 8, 2026 12:10
@misran3
Copy link
Author

misran3 commented Jan 8, 2026

Hey @ethanjlimgit, I've raised this as a draft PR to be transparent about the work in progress and get early feedback.

Current Status:

  • All unit tests passing (11/11)
  • Currently performing manual testing with real API keys
  • Verifying each provider works as expected

Feel free to review the code, ask questions, or share thoughts! I'm happy to discuss any aspect of the implementation or make adjustments based on your feedback.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant