You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 15, 2025. It is now read-only.
Problem: Users need to optimize system prompts for simple, direct LLM calls (e.g., via OpenAI API) for tasks like SQL code reviews or schema change validation, incorporating company-specific guidelines (e.g., attached PDF/YAML files for style rules).
Proposed Solution: Extend bbeval opt to support direct LLM optimization:
Input: Base system prompt + attached guidelines file (e.g., --guidelines coding_rules.pdf).
Use: BootstrapRS (≤5 trials) to tune prompt for accuracy on test tasks (e.g., "Review this SQL: [code]" → scored against expected feedback).
Output: JSON prompt with enforced rules (e.g., required placeholders like {{guidelines}}, banned phrases).
Integration: --mode direct-llm flag; validate via code_execution tool for SQL syntax.
Benefits: Enables quick tuning for non-agentic reviews; reduces manual prompt engineering by 80% on guideline adherence.