Feature: Multi-LLM Support Implementation #9

misran3 · 2026-01-08T12:03:03Z

Multi-LLM Support Implementation

Summary

This PR adds comprehensive multi-LLM provider support to the Android Use agent, enabling users to choose between OpenAI, Anthropic (Claude), Google Gemini, and AWS Bedrock. The implementation uses Pydantic AI for a unified interface and structured outputs, making the codebase provider-agnostic while improving type safety and reliability.

Key Features

✅ Multi-Provider Support

OpenAI (GPT-4o, GPT-4o-mini, etc.)
Anthropic Claude (Claude Sonnet 4, etc.)
Google Gemini (Gemini 2.0 Flash, etc.)
AWS Bedrock (Claude on Bedrock)

✅ Structured Outputs

Pydantic models for type-safe action validation
Automatic schema validation
Better error messages

✅ Environment-Based Configuration

Simple provider switching via LLM_PROVIDER env var
Provider-specific API key management
Optional model overrides per provider

✅ Comprehensive Testing

11/11 unit tests passing
Provider initialization tests
Error handling tests
Integration tests

Architecture Changes

Before

# Direct OpenAI client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
response = client.chat.completions.create(model="gpt-4o", ...)
decision = json.loads(response.choices[0].message.content)

After

# Unified LLM manager with structured outputs
llm_manager = LLMManager()  # Reads LLM_PROVIDER env var
action = await llm_manager.get_decision(goal, screen_context)
# Returns validated Pydantic model (TapAction, TypeAction, etc.)

New Files

action_models.py - Pydantic models for structured action outputs
llm_manager.py - Multi-provider LLM manager
tests/test_action_models.py - Action model validation tests
tests/test_llm_manager.py - LLM manager initialization tests
tests/test_kernel_integration.py - Integration tests
tests/MANUAL_TEST_RESULTS.md - Manual testing template
examples/openai_example.sh - OpenAI usage example
examples/anthropic_example.sh - Anthropic usage example
examples/gemini_example.sh - Gemini usage example
examples/bedrock_example.sh - Bedrock usage example

Modified Files

kernel.py - Refactored to use LLM manager, async/await pattern
requirements.txt - Updated to pydantic-ai-slim[openai,anthropic,google,bedrock]
README.md - Added multi-LLM configuration instructions

Usage

Quick Start

# Choose your provider
export LLM_PROVIDER="openai"  # or anthropic, gemini, bedrock

# Set API key
export OPENAI_API_KEY="sk-..."

# Optional: Override default model
export OPENAI_MODEL="gpt-4o"

# Run the agent
python kernel.py

Provider-Specific Examples

# OpenAI
export LLM_PROVIDER="openai"
export OPENAI_API_KEY="sk-..."

# Anthropic Claude
export LLM_PROVIDER="anthropic"
export ANTHROPIC_API_KEY="sk-..."

# Google Gemini (cheapest option)
export LLM_PROVIDER="gemini"
export GOOGLE_API_KEY="..."

# AWS Bedrock
export LLM_PROVIDER="bedrock"
export AWS_PROFILE="default"

Available Models

Provider	Default Model	Override Env Var
OpenAI	`gpt-4o`	`OPENAI_MODEL`
Anthropic	`claude-sonnet-4`	`ANTHROPIC_MODEL`
Gemini	`gemini-2.0-flash-exp`	`GEMINI_MODEL`
Bedrock	`anthropic.claude-sonnet-4-20250514-v1:0`	`BEDROCK_MODEL`

Testing

Automated Tests

pytest tests/ -v
# 11/11 tests passing

Manual Testing

Error handling verified via unit tests
Real API testing requires valid credentials
See tests/MANUAL_TEST_RESULTS.md for testing checklist

Breaking Changes

⚠️ Migration Required

Environment Variables: Must set LLM_PROVIDER env var

# Before
export OPENAI_API_KEY="sk-..."

# After
export LLM_PROVIDER="openai"
export OPENAI_API_KEY="sk-..."

Async Pattern: run_agent() is now async

# Before
run_agent(goal)

# After
asyncio.run(run_agent(goal))

Dependencies: New requirement format

# Before: openai>=1.12.0
# After: pydantic-ai-slim[openai,anthropic,google,bedrock]

Benefits

Cost Optimization: Switch to cheaper providers (Gemini) for non-critical workflows
Reliability: Fallback to alternative providers if one is down
Type Safety: Pydantic models catch errors before execution
Developer Experience: Clear error messages, better testing
Future-Proof: Easy to add new providers (Ollama, Llama, etc.)

Next Steps

Manual testing with real API keys
Refactor package structure for modularity and PyPI distribution
Add Ollama/Llama support for local models
Optional retry/fallback logic between providers

Questions & Contact

Have questions about this PR or want to discuss implementation details?

📧 Email: mohammed.misran@pitt.edu

📅 Schedule a call: Calendly

Feel free to reach out for clarifications, feedback, or collaboration opportunities!

- Added Pydantic AI for unified LLM interface - Support for OpenAI, Anthropic, Gemini, AWS Bedrock - Provider selection via LLM_PROVIDER env var - Model overrides via provider-specific env vars - Structured outputs using Pydantic models - Fail-fast error handling - Updated documentation and examples

misran3 · 2026-01-08T12:13:33Z

Hey @ethanjlimgit, I've raised this as a draft PR to be transparent about the work in progress and get early feedback.

Current Status:

All unit tests passing (11/11)
Currently performing manual testing with real API keys
Verifying each provider works as expected

Feel free to review the code, ask questions, or share thoughts! I'm happy to discuss any aspect of the implementation or make adjustments based on your feedback.

Thanks!

misran3 added 8 commits January 8, 2026 16:48

deps: add pydantic-ai and multi-llm provider SDKs

e080cb9

feat: add pydantic action models for structured outputs

c972bb3

feat: add LLM manager with multi-provider support

284547e

refactor: migrate kernel.py to use LLM manager with async support

9fc3538

docs: add multi-LLM provider configuration instructions

0a7fc68

docs: add provider-specific example scripts

aa91970

feat: add provider info to agent startup output

b6a2b03

misran3 marked this pull request as draft January 8, 2026 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Multi-LLM Support Implementation #9

Feature: Multi-LLM Support Implementation #9

misran3 commented Jan 8, 2026

Uh oh!

misran3 commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature: Multi-LLM Support Implementation #9

Are you sure you want to change the base?

Feature: Multi-LLM Support Implementation #9

Conversation

misran3 commented Jan 8, 2026

Multi-LLM Support Implementation

Summary

Key Features

Architecture Changes

Before

After

New Files

Modified Files

Usage

Quick Start

Provider-Specific Examples

Available Models

Testing

Automated Tests

Manual Testing

Breaking Changes

Benefits

Next Steps

Questions & Contact

Uh oh!

misran3 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

misran3 commented Jan 8, 2026 •

edited

Loading