diff --git a/.gitignore b/.gitignore index b67ecefe9..e1b579559 100644 --- a/.gitignore +++ b/.gitignore @@ -5,6 +5,9 @@ workspace/* set_env.sh sample_case_results.csv +# Project config (use taskweaver_config.json.example as template) +project/taskweaver_config.json + # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..227e2edbe --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,301 @@ +# AGENTS.md - TaskWeaver Development Guide + +**Generated:** 2026-01-26 | **Commit:** 7c2888e | **Branch:** liqun/add_variables_to_code_generator + +This document provides guidance for AI coding agents working on the TaskWeaver codebase. + +## Subdirectory Knowledge Bases +- [`taskweaver/llm/AGENTS.md`](taskweaver/llm/AGENTS.md) - LLM provider abstractions +- [`taskweaver/ces/AGENTS.md`](taskweaver/ces/AGENTS.md) - Code execution service (Jupyter kernels) +- [`taskweaver/code_interpreter/AGENTS.md`](taskweaver/code_interpreter/AGENTS.md) - Code interpreter role variants +- [`taskweaver/memory/AGENTS.md`](taskweaver/memory/AGENTS.md) - Memory data model (Post/Round/Conversation) +- [`taskweaver/ext_role/AGENTS.md`](taskweaver/ext_role/AGENTS.md) - Extended role definitions + +## Project Overview + +TaskWeaver is a **code-first agent framework** for data analytics tasks. It uses Python 3.10+ and follows a modular architecture with dependency injection (using `injector`). + +## Build & Development Commands + +### Installation +```bash +# Use the existing conda environment +conda activate taskweaver + +# Or create a new one +conda create -n taskweaver python=3.10 +conda activate taskweaver + +# Install dependencies +pip install -r requirements.txt + +# Install in editable mode +pip install -e . +``` + +**Note**: The project uses a conda environment named `taskweaver`. + +### Running Tests +```bash +# Run all unit tests +pytest tests/unit_tests -v + +# Run a single test file +pytest tests/unit_tests/test_plugin.py -v + +# Run a specific test function +pytest tests/unit_tests/test_plugin.py::test_load_plugin_yaml -v + +# Run tests with coverage +pytest tests/unit_tests -v --cov=taskweaver --cov-report=html + +# Collect tests without running (useful for verification) +pytest tests/unit_tests --collect-only +``` + +### Linting & Formatting +```bash +# Run pre-commit hooks (autoflake, isort, black, flake8) +pre-commit run --all-files + +# Run individual tools +black --config=.linters/pyproject.toml . +isort --settings-path=.linters/pyproject.toml . +flake8 --config=.linters/tox.ini taskweaver/ +``` + +### Running the Application +```bash +# CLI mode +python -m taskweaver -p ./project/ + +# As a module +python -m taskweaver +``` + +## Code Style Guidelines + +### Formatting Configuration +- **Line length**: 120 characters (configured in `.linters/pyproject.toml`) +- **Formatter**: Black with `--config=.linters/pyproject.toml` +- **Import sorting**: isort with `profile = "black"` + +### Import Organization +```python +# Standard library imports first +import os +from dataclasses import dataclass +from typing import Any, Dict, List, Optional + +# Third-party imports +from injector import inject + +# Local imports (known_first_party = ["taskweaver"]) +from taskweaver.config.config_mgt import AppConfigSource +from taskweaver.logging import TelemetryLogger +``` + +### Type Annotations +- **Required**: All function parameters and return types must have type hints +- **Use `Optional[T]`** for nullable types +- **Use `List`, `Dict`, `Tuple`** from `typing` module +- **Dataclasses** are preferred for structured data + +```python +from dataclasses import dataclass +from typing import Any, Dict, List, Optional + +@dataclass +class Post: + id: str + send_from: str + send_to: str + message: str + attachment_list: List[Attachment] + + @staticmethod + def create( + message: Optional[str], + send_from: str, + send_to: str = "Unknown", + ) -> Post: + ... +``` + +### Naming Conventions +- **Classes**: PascalCase (`CodeGenerator`, `PluginRegistry`) +- **Functions/methods**: snake_case (`compose_prompt`, `get_attachment`) +- **Variables**: snake_case (`plugin_pool`, `chat_history`) +- **Constants**: UPPER_SNAKE_CASE (`MAX_RETRY_COUNT`) +- **Private members**: prefix with underscore (`_configure`, `_get_config_value`) +- **Config classes**: suffix with `Config` (`PlannerConfig`, `RoleConfig`) + +### Dependency Injection Pattern +TaskWeaver uses the `injector` library for DI. Follow this pattern: + +```python +from injector import inject, Module, provider + +class MyConfig(ModuleConfig): + def _configure(self) -> None: + self._set_name("my_module") + self.some_setting = self._get_str("setting_name", "default_value") + +class MyService: + @inject + def __init__( + self, + config: MyConfig, + logger: TelemetryLogger, + other_dependency: OtherService, + ): + self.config = config + self.logger = logger +``` + +### Error Handling +- Use specific exception types when possible +- Log errors with context before re-raising +- Use assertions for internal invariants + +```python +try: + result = self.llm_api.chat_completion_stream(...) +except (JSONDecodeError, AssertionError) as e: + self.logger.error(f"Failed to parse LLM output due to {str(e)}") + self.tracing.set_span_status("ERROR", str(e)) + raise +``` + +### Docstrings +Use triple-quoted docstrings for classes and public methods: + +```python +def get_embeddings(self, strings: List[str]) -> List[List[float]]: + """ + Embedding API + + :param strings: list of strings to be embedded + :return: list of embeddings + """ +``` + +### Trailing Commas +Always use trailing commas in multi-line structures (enforced by `add-trailing-comma`): + +```python +app_injector = Injector( + [LoggingModule, PluginModule], # trailing comma +) + +config = { + "key1": "value1", + "key2": "value2", # trailing comma +} +``` + +## Project Structure + +``` +taskweaver/ +├── app/ # Application entry points and session management +├── ces/ # Code execution service (see ces/AGENTS.md) +├── chat/ # Chat interfaces (console, web) +├── cli/ # CLI implementation +├── code_interpreter/ # Code generation and interpretation (see code_interpreter/AGENTS.md) +├── config/ # Configuration management +├── ext_role/ # Extended roles (see ext_role/AGENTS.md) +├── llm/ # LLM integrations (see llm/AGENTS.md) +├── logging/ # Logging and telemetry +├── memory/ # Conversation memory (see memory/AGENTS.md) +├── misc/ # Utilities and component registry +├── module/ # Core modules (tracing, events) +├── planner/ # Planning logic +├── plugin/ # Plugin system +├── role/ # Role base classes +├── session/ # Session management +├── utils/ # Helper utilities +└── workspace/ # Workspace management + +tests/ +└── unit_tests/ # Unit tests (pytest) + ├── data/ # Test fixtures (plugins, prompts, examples) + └── ces/ # Code execution tests +``` + +### Module and Role Overview (what lives where) + +- **app/**: Bootstraps dependency injection; wires TaskWeaverApp, SessionManager, config binding. +- **session/**: Orchestrates Planner + worker roles, memory, workspace management, event emitter, tracing. +- **planner/**: Planner role; LLM-powered task decomposition and planning logic. +- **code_interpreter/**: Code generation and execution (full, CLI-only, plugin-only); code verification/AST checks. +- **memory/**: Conversation history, rounds, posts, attachments, experiences; RoundCompressor utilities. +- **llm/**: LLM API facades; providers include OpenAI/Azure, Anthropic, Ollama, Google GenAI, Qwen, ZhipuAI, Groq, Azure ML, mock; embeddings via OpenAI/Azure, Ollama, Google GenAI, sentence_transformers, Qwen, ZhipuAI. +- **plugin/**: Plugin base classes and registry/context for function-style plugins. +- **role/**: Core role abstractions, RoleRegistry, PostTranslator. +- **ext_role/**: Extended roles (web_search, web_explorer, image_reader, document_retriever, recepta, echo). +- **module/**: Core modules like tracing and event_emitter wiring. +- **logging/**: TelemetryLogger and logging setup. +- **workspace/**: Session-scoped working directories and execution cwd helpers. + +## Testing Patterns + +### Using Fixtures +```python +import pytest +from injector import Injector + +@pytest.fixture() +def app_injector(request: pytest.FixtureRequest): + from taskweaver.config.config_mgt import AppConfigSource + config = {"llm.api_key": "test_key"} + app_injector = Injector([LoggingModule, PluginModule]) + app_config = AppConfigSource(config=config) + app_injector.binder.bind(AppConfigSource, to=app_config) + return app_injector +``` + +### Test Markers +```python +@pytest.mark.app_config({"custom.setting": "value"}) +def test_with_custom_config(app_injector): + ... +``` + +## Flake8 Ignores +The following are intentionally ignored (see `.linters/tox.ini`): +- `E402`: Module level import not at top of file +- `W503`: Line break before binary operator +- `W504`: Line break after binary operator +- `E203`: Whitespace before ':' +- `F401`: Import not used (only in `__init__.py`) + +## Key Patterns + +### Creating Unique IDs +```python +from taskweaver.utils import create_id +post_id = "post-" + create_id() # Format: post-YYYYMMDD-HHMMSS- +``` + +### Reading/Writing YAML +```python +from taskweaver.utils import read_yaml, write_yaml +data = read_yaml("path/to/file.yaml") +write_yaml("path/to/file.yaml", data) +``` + +### Configuration Access +```python +class MyConfig(ModuleConfig): + def _configure(self) -> None: + self._set_name("my_module") + self.enabled = self._get_bool("enabled", False) + self.path = self._get_path("base_path", "/default/path") + self.model = self._get_str("model", None, required=False) +``` + +## CI/CD +- Tests run on Python 3.11 via GitHub Actions +- Pre-commit hooks include: autoflake, isort, black, flake8, gitleaks, detect-secrets +- All PRs to `main` trigger the pytest workflow diff --git a/docs/design/code-interpreter-vars.md b/docs/design/code-interpreter-vars.md new file mode 100644 index 000000000..5ebf847df --- /dev/null +++ b/docs/design/code-interpreter-vars.md @@ -0,0 +1,64 @@ +# Code Interpreter Visible Variable Surfacing + +## Problem +The code interpreter generates Python in a persistent kernel but the prompt does not explicitly remind the model which variables already exist in that kernel. This can lead to redundant redefinitions or missed reuse of prior results. We want to surface only the newly defined (non-library) variables to the model in subsequent turns. + +## Goals +- Capture the current user/kernel-visible variables after each execution (excluding standard libs and plugins). +- Propagate these variables to the code interpreter’s prompt so it can reuse them. +- Keep noise low: skip modules/functions and internal/builtin names; truncate large reprs. +- Maintain backward compatibility; do not break existing attachments or execution flow. + +## Non-Goals +- Full introspection of module internals or large data snapshots. +- Persisting variables across sessions beyond current conversation. + +## Design Overview +1) **Collect kernel variables after execution** + - In the IPython magics layer (`_taskweaver_exec_post_check`) call a context helper to extract visible variables from `local_ns`. + - Filtering rules: + - Skip names starting with `_`. + - Skip builtins and common libs: `__builtins__`, `In`, `Out`, `get_ipython`, `exit`, `quit`, `pd`, `np`, `plt`. + - Skip modules, functions, and plugin instances (data-only snapshot). + - For values, store `(name, repr(value))`, truncated to 500 chars; numpy arrays get shape/dtype-aware pretty repr; fall back to `` on repr errors. + - Store the snapshot on `ExecutorPluginContext.latest_variables`. + +2) **Return variables with execution result** + - `Executor.get_post_execution_state` includes `variables` (list of `(name, repr)` tuples). + - `Environment._parse_exec_result` copies these into `ExecutionResult.variables` (added to dataclass). + +3) **Surface variables to user and prompt** + - `CodeExecutor.format_code_output` renders available variables when there is no explicit result/output, using `pretty_repr` to keep lines concise. + - `CodeInterpreter.reply` attaches a `session_variables` attachment (JSON list of tuples) when variables are present. + - Prompt threading strategy: + - Assistant turns: `session_variables` is **ignored** via `ignored_types` to avoid polluting assistant history with execution-state metadata. + - Final user turn of the latest round: the attachment is decoded and appended as a “Currently available variables” block so the model can reuse state in the next code generation. + +4) **Attachment type** + - Added `AttachmentType.session_variables` to carry the variable snapshot per execution. + +## Open Items / Next Steps +- Revisit filtering to ensure we skip large data/DF previews (could add size/type caps). +- Validate end-to-end with unit tests for: variable capture, attachment propagation, prompt inclusion, and formatting. + + +## Files Touched +- `taskweaver/ces/runtime/context.py` — collect and store visible variables. +- `taskweaver/ces/runtime/executor.py` — expose variables in post-execution state. +- `taskweaver/ces/environment.py` — carry variables into `ExecutionResult`. +- `taskweaver/ces/common.py` — add `variables` to `ExecutionResult` dataclass. +- `taskweaver/memory/attachment.py` — add `session_variables` attachment type. +- `taskweaver/code_interpreter/code_interpreter/code_interpreter.py` — attach captured vars to posts. +- `taskweaver/code_interpreter/code_interpreter/code_generator.py` — ignore var attachments in assistant text; include in feedback. +- `taskweaver/code_interpreter/code_executor.py` — display available variables when no explicit output. +- `taskweaver/utils/__init__.py` — add `pretty_repr` helper for safe truncation. + +## Rationale +- Keeps the model aware of live state without inflating prompts with full outputs. +- Avoids re-importing/recomputing when variables already exist. +- Uses attachments so downstream consumers (UI/logs) can also show the state. + +## Risks / Mitigations +- **Large values**: truncated repr and filtered types keep prompt size bounded; consider type-based caps later. +- **Noise from libs**: explicit ignore list for common imports; can expand as needed. +- **Compatibility**: new attachment type is additive; existing flows remain unchanged. diff --git a/docs/design/threading_model.md b/docs/design/threading_model.md new file mode 100644 index 000000000..7f766f79c --- /dev/null +++ b/docs/design/threading_model.md @@ -0,0 +1,613 @@ +# TaskWeaver Threading Model - Design Document + +**Generated:** 2026-01-26 | **Author:** AI Agent | **Status:** Documentation + +## Overview + +TaskWeaver employs a **dual-thread architecture** for console-based user interaction. When a user submits a request, the main process spawns two threads: + +1. **Execution Thread** - Runs the actual task processing (LLM calls, code execution) +2. **Animation Thread** - Handles real-time console display with status updates and animations + +These threads communicate via an **event-driven architecture** using a shared update queue protected by threading primitives. + +## Architecture Diagram + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ Main Thread │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ TaskWeaverChatApp.run() │ │ +│ │ └── _handle_message(input) │ │ +│ │ └── TaskWeaverRoundUpdater.handle_message() │ │ +│ │ ├── Polls for confirmation requests │ │ +│ │ └── Handles user confirmation input │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌───────────────┴───────────────┐ │ +│ ▼ ▼ │ +│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │ +│ │ Execution Thread (t_ex) │ │ Animation Thread (t_ui) │ │ +│ │ │ │ │ │ +│ │ session.send_message() │ │ _animate_thread() │ │ +│ │ ├── Planner.reply() │ │ ├── Process updates │ │ +│ │ ├── CodeInterpreter │ │ ├── Render status bar │ │ +│ │ │ .reply() │ │ ├── Display messages │ │ +│ │ │ ├── generate code │ │ ├── Animate spinner │ │ +│ │ │ ├── verify code │ │ └── Pause on confirm │ │ +│ │ │ ├── WAIT confirm ◄─┼───┼──────────────────────────── │ │ +│ │ │ └── execute code │ │ │ │ +│ │ └── Event emission ──────┼───┼──► pending_updates queue │ │ +│ │ │ │ │ │ +│ └─────────────────────────────┘ └─────────────────────────────┘ │ +│ │ │ │ +│ └───────────────┬───────────────┘ │ +│ ▼ │ +│ exit_event.set() │ +│ Main thread joins │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Key Components + +### 1. TaskWeaverRoundUpdater (chat/console/chat.py) + +The central coordinator that manages both threads and handles events. + +```python +class TaskWeaverRoundUpdater(SessionEventHandlerBase): + def __init__(self): + self.exit_event = threading.Event() # Signals completion + self.update_cond = threading.Condition() # Wakes animation thread + self.lock = threading.Lock() # Protects shared state + + self.pending_updates: List[Tuple[str, str]] = [] # Event queue + + # Pause/resume handshake for animation thread + self.pause_animation = threading.Event() # Main requests pause + self.animation_paused = threading.Event() # Animation acknowledges pause + + self.result: Optional[str] = None +``` + +### 2. Thread Spawning (handle_message) + +```python +def handle_message(self, session, message, files): + def execution_thread(): + try: + round = session.send_message(message, event_handler=self, files=files) + last_post = round.post_list[-1] + if last_post.send_to == "User": + self.result = last_post.message + finally: + self.exit_event.set() + with self.update_cond: + self.update_cond.notify_all() + + t_ui = threading.Thread(target=lambda: self._animate_thread(), daemon=True) + t_ex = threading.Thread(target=execution_thread, daemon=True) + + t_ui.start() + t_ex.start() + + # Main thread waits for completion + while True: + self.exit_event.wait(0.1) + if self.exit_event.is_set(): + break +``` + +### 3. Event Flow + +``` +┌──────────────────┐ emit() ┌──────────────────┐ handle() ┌──────────────────┐ +│ PostEventProxy │─────────────►│ SessionEventEmitter│────────────►│TaskWeaverRoundUpdater│ +└──────────────────┘ └──────────────────┘ └──────────────────┘ + │ + ▼ + ┌──────────────────┐ + │ pending_updates │ + │ (queue) │ + └──────────────────┘ + │ + ▼ + ┌──────────────────┐ + │ Animation Thread │ + │ (consumer) │ + └──────────────────┘ +``` + +## Event Types + +### Session Events (EventScope.session) +| Event | Description | +|-------|-------------| +| `session_start` | Session initialization | +| `session_end` | Session termination | +| `session_new_round` | New conversation round | + +### Round Events (EventScope.round) +| Event | Description | +|-------|-------------| +| `round_start` | User query processing begins | +| `round_end` | User query processing complete | +| `round_error` | Error during processing | +| `round_new_post` | New message in round | + +### Post Events (EventScope.post) +| Event | Description | +|-------|-------------| +| `post_start` | Role begins generating response | +| `post_end` | Role finished response | +| `post_error` | Error in post generation | +| `post_status_update` | Status text change ("generating code", "executing") | +| `post_send_to_update` | Recipient change | +| `post_message_update` | Message content streaming | +| `post_attachment_update` | Attachment (code, plan, etc.) update | + +## Animation Thread Details + +The animation thread (`_animate_thread`) runs a continuous loop with confirmation-aware synchronization: + +```python +def _animate_thread(self): + while True: + # Check if pause is requested FIRST, before any output + if self.pause_animation.is_set(): + # Signal that animation has paused + self.animation_paused.set() + # Wait until pause is lifted + while self.pause_animation.is_set(): + if self.exit_event.is_set(): + break + with self.update_cond: + self.update_cond.wait(0.1) + continue + + # Animation is running, clear the paused signal + self.animation_paused.clear() + + clear_line() + + # Process all pending updates atomically + with self.lock: + for action, opt in self.pending_updates: + if action == "start_post": + # Display role header: ╭───< Planner > + elif action == "end_post": + # Display completion: ╰──● sending to User + # ... other actions + self.pending_updates.clear() + + if self.exit_event.is_set(): + break + + # Check again before printing status line + if self.pause_animation.is_set(): + continue + + # Display animated status bar + display_status_bar(role, status, get_ani_frame(counter)) + + # Rate limit animation + with self.update_cond: + self.update_cond.wait(0.2) +``` + +### Console Output Format + +``` + ╭───< Planner > + ├─► [plan] 1. Parse input data... + ├──● The task involves processing the CSV file... + ╰──● sending message to CodeInterpreter + + ╭───< CodeInterpreter > + ├─► [reply_content] import pandas as pd... + ├─► [verification] CORRECT + ├─► [execution_status] SUCCESS + ├──● [Execution result]... + ╰──● sending message to Planner +``` + +## Synchronization Primitives + +| Primitive | Purpose | +|-----------|---------| +| `threading.Lock` (`lock`) | Protects `pending_updates` queue during read/write | +| `threading.Event` (`exit_event`) | Signals execution completion | +| `threading.Event` (`pause_animation`) | Main requests animation to pause | +| `threading.Event` (`animation_paused`) | Animation acknowledges it has paused | +| `threading.Condition` (`update_cond`) | Wakes animation thread when updates available | + +### Critical Sections + +1. **Event emission** (execution thread writes): +```python +with self.lock: + self.pending_updates.append(("status_update", msg)) +``` + +2. **Update processing** (animation thread reads): +```python +with self.lock: + for action, opt in self.pending_updates: + # Process... + self.pending_updates.clear() +``` + +## Additional Threading: Stream Smoother + +The LLM module (`llm/__init__.py`) uses a separate threading model for **LLM response streaming**: + +```python +def _stream_smoother(self, stream_init): + """ + Smooths LLM token streaming for better UX. + + Problem: LLM tokens arrive in bursts (fast) then pauses (slow). + Solution: Buffer tokens and emit at normalized rate. + """ + buffer_content = "" + finished = False + + def base_stream_puller(): + # Thread: Pull from LLM, add to buffer + for msg in stream_init(): + with update_lock: + buffer_content += msg["content"] + + thread = threading.Thread(target=base_stream_puller) + thread.start() + + # Main: Drain buffer at smoothed rate + while not finished: + yield normalized_chunk() +``` + +## Thread Lifecycle + +``` +Time ──────────────────────────────────────────────────────────────────────────────────► + +Main ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░████████████░░░░░░░░░░░░░░░░░████████████████ + spawn wait(exit_event) confirm? wait confirm? join + +Execution ░░░░██████████████████████████████░░░░░░░░░░░░██████████████████░░░░░░░░░░░░░░ + Planner → CodeInterpreter WAIT(cond) continue→result + +Animation ░░░░██░██░██░██░██░██░██░██░██░██░░░░░░░░░░░░░██░██░██░██░██░██░░░░░░░░░░░░░░░ + render → sleep → render PAUSED resume → render + +Legend: █ = active, ░ = waiting/idle +``` + +### With Confirmation Flow + +``` +Time ──────────────────────────────────────────────────────────────────► + +Main ████░░░░░░░░░░░░░░░░████████████████░░░░░░░░░░░░████████████████ + spawn polling show code polling join + get input + +Execution ░░░░██████████████████░░░░░░░░░░░░░██████████████░░░░░░░░░░░░░░ + generate code BLOCKED execute code done + request confirm (waiting) (if approved) + +Animation ░░░░██░██░██░██░██░░░░░░░░░░░░░░░░░░██░██░██░██░░░░░░░░░░░░░░░░ + animate STOPPED resume + (no output) animation + +Legend: █ = active, ░ = waiting/idle +``` + +## Error Handling + +### Keyboard Interrupt +```python +try: + while True: + self.exit_event.wait(0.1) + if self.exit_event.is_set(): + break +except KeyboardInterrupt: + error_message("Interrupted by user") + exit(1) # Immediate exit - session state unknown +``` + +### Execution Errors +```python +def execution_thread(): + try: + round = session.send_message(...) + except Exception as e: + self.response.append("Error") + raise e + finally: + self.exit_event.set() # Always signal completion +``` + +## Design Rationale + +### Why Two Threads? + +1. **Non-blocking UI**: LLM calls and code execution can take seconds/minutes. Animation thread keeps UI responsive. + +2. **Real-time feedback**: Users see incremental progress (streaming text, status updates) rather than waiting for complete response. + +3. **Clean separation**: Execution logic doesn't need to know about display; display doesn't block execution. + +### Why Event Queue? + +1. **Decoupling**: Event emitters (Planner, CodeInterpreter) don't know about console display. + +2. **Batching**: Multiple rapid events can be processed in single animation frame. + +3. **Thread safety**: Queue with lock is simpler than direct UI updates from multiple threads. + +## Comparison with Other Modes + +| Mode | Threading | Display | +|------|-----------|---------| +| Console (`chat_taskweaver`) | 2 threads (exec + anim) | Real-time animated | +| Web/API | Single thread per request | WebSocket/SSE streaming | +| Programmatic | Caller's thread | Event callbacks | + +## Animation Pause Handshake Pattern + +The console UI uses a simple, extensible handshake pattern to temporarily pause animation output when exclusive console access is needed. + +### The Pattern + +```python +# Two events form the handshake +pause_animation = threading.Event() # Request: "please pause" +animation_paused = threading.Event() # Acknowledgment: "I have paused" +``` + +### How It Works + +**Requester (main thread or any code needing exclusive console):** +```python +# 1. Request pause +self.pause_animation.set() + +# 2. Wait for acknowledgment +self.animation_paused.wait() + +# 3. Safe to use console exclusively +do_exclusive_console_work() + +# 4. Release +self.animation_paused.clear() +self.pause_animation.clear() +``` + +**Animation thread (responder):** +```python +while True: + # Check at START of loop, before any output + if self.pause_animation.is_set(): + self.animation_paused.set() # Acknowledge + while self.pause_animation.is_set(): # Wait for release + wait() + continue + + self.animation_paused.clear() # Signal "I'm running" + do_animation_output() +``` + +### Timing Diagram + +``` +Main Thread Animation Thread +─────────── ──────────────── + [Loop start] + pause_animation? → NO + Clear animation_paused + Print status line... +Set pause_animation ─────────────────────────────────────────► +Wait for animation_paused [Loop start] + │ pause_animation? → YES + │ Set animation_paused + ◄────────────────────────────────────────────────────┘ +(wait returns) Wait in loop... +Show prompt, get input (no output) +Clear animation_paused +Clear pause_animation ───────────────────────────────────────► + [Loop continues] + pause_animation? → NO + Resume output +``` + +### Why This Pattern + +1. **Simple**: Two events, clear semantics +2. **Safe**: Animation always checks before output +3. **Extensible**: Any code can use it, not just confirmation +4. **No locks needed**: Handshake guarantees ordering + +### Current Usage + +| Feature | Uses Handshake | +|---------|----------------| +| Code confirmation prompt | ✓ | +| (Future) Interactive debugging | Can use same pattern | +| (Future) Multi-line input | Can use same pattern | + +### Adding New Features + +To add a new feature that needs exclusive console access: + +```python +def my_new_feature(self): + # Pause animation + self.pause_animation.set() + self.animation_paused.wait(timeout=5.0) + + try: + # Your exclusive console work here + result = get_user_input() + finally: + # Always release, even on error + self.animation_paused.clear() + self.pause_animation.clear() + + return result +``` + +--- + +## Code Execution Confirmation + +When `code_interpreter.require_confirmation` is enabled, TaskWeaver will pause before executing generated code to get user confirmation. + +### Confirmation Flow + +``` +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ Execution Thread│ │ Main Thread │ │Animation Thread │ +└────────┬────────┘ └────────┬────────┘ └────────┬────────┘ + │ │ │ + │ generate code │ │ [Loop start] + │ verify code │ │ Check confirmation_active + │ │ │ → false, continue + │ │ │ Clear animation_stopped + │ │ │ Acquire output_lock + │ │ │ Print status line + │ │ │ Release output_lock + │ │ │ │ + │ set _confirmation_event│ │ [Loop start] + │ emit confirmation_req │ │ Check confirmation_active + │ WAIT on _confirm_cond ─┼────────────────────────┼─→ true! + │ (blocked) │ │ Set animation_stopped ◄──┐ + │ │ detect confirmation │ Wait in loop │ + │ │ set confirmation_active│ │ + │ │ wait animation_stopped ─────────────────────────────┘ + │ │ acquire output_lock │ (cannot acquire lock) + │ │ clear line, show code │ │ + │ │ get user input [y/N] │ │ (waiting) + │ │ show result │ │ + │ │ release output_lock │ │ + │ │ clear animation_stopped│ │ + │ │ clear confirmation_active │ + │ │ set _confirmation_result │ + │ │ notify _confirm_cond │ │ + │ ◄──────────────────────┼────────────────────────┤ │ + │ (unblocked) │ │ [Loop continues] + │ read & clear result │ │ confirmation_active=false + │ │ │ Resume normal animation + │ if approved: │ │ │ + │ execute code │ │ │ + │ else: │ │ │ + │ cancel execution │ │ │ + ▼ ▼ ▼ +``` + +### Configuration + +Enable confirmation in your `taskweaver_config.json`: + +```json +{ + "code_interpreter.require_confirmation": true +} +``` + +### Synchronization Primitives + +The confirmation system uses a two-level synchronization approach to prevent race conditions where the animation thread could overwrite user input: + +#### Event Emitter Primitives (in `SessionEventEmitter`) + +| Primitive | Set By | Cleared By | Purpose | +|-----------|--------|------------|---------| +| `_confirmation_event` | Execution thread | Main thread | Signals that confirmation is pending | +| `_confirmation_cond` | Main thread | - | Condition variable for blocking/waking execution thread | +| `_confirmation_result` | Main thread | Execution thread | Stores user's decision (True/False) | + +#### Console UI Primitives (in `TaskWeaverRoundUpdater`) + +| Primitive | Set By | Cleared By | Purpose | +|-----------|--------|------------|---------| +| `pause_animation` | Main thread | Main thread | Requests animation to pause | +| `animation_paused` | Animation thread | Main thread | Confirms animation has paused | + +### Thread Responsibilities + +**Execution Thread:** +- Sets `_confirmation_event` when code needs confirmation +- Emits `post_confirmation_request` event +- Waits on `_confirmation_cond` until user responds +- Reads and clears `_confirmation_result` + +**Main Thread (`_handle_confirmation`):** +1. Sets `pause_animation` to signal animation thread +2. Waits for `animation_paused` to ensure animation has paused +3. Displays code and gets user input (safe from interference) +4. Clears `animation_paused` and `pause_animation` +5. Sets `_confirmation_result` and notifies `_confirmation_cond` + +**Animation Thread (`_animate_thread`):** +1. Checks `pause_animation` at **start of each loop iteration** +2. If set: sets `animation_paused` and waits in a loop until `pause_animation` cleared +3. If not set: clears `animation_paused` and proceeds with output + +### Why This Design Prevents Race Conditions + +The handshake guarantees animation has stopped before main thread shows the prompt: + +``` +Animation Thread Main Thread +──────────────── ─────────── +[Loop iteration] +Check pause_animation → false +Clear animation_paused +Print status line + Set pause_animation + Wait for animation_paused +[Next loop iteration] +Check pause_animation → TRUE +Set animation_paused ───────────────────► animation_paused.wait() returns +Wait in loop Show prompt, get input +(no output) Clear animation_paused + Clear pause_animation +Check pause_animation → false +Resume normal operation +``` + +### Key Implementation Points + +1. **Early check**: Animation thread checks `pause_animation` at the **very start** of its loop, before any output operations +2. **Explicit acknowledgment**: `animation_paused` confirms animation has paused (not just signaled to pause) +3. **Clean display**: Main thread clears any leftover animation before showing code +4. **Extensible**: Any code needing exclusive console access can use the same handshake + +## File References + +| File | Component | +|------|-----------| +| `chat/console/chat.py` | `TaskWeaverRoundUpdater`, `_animate_thread`, `_handle_confirmation` | +| `module/event_emitter.py` | `SessionEventEmitter`, `TaskWeaverEvent`, `PostEventProxy`, `ConfirmationHandler` | +| `code_interpreter/code_interpreter/code_interpreter.py` | `CodeInterpreter.reply()` (confirmation request) | +| `llm/__init__.py` | `_stream_smoother` (LLM streaming) | +| `ces/manager/defer.py` | `deferred_var` (kernel warm-up) | + +## Summary + +TaskWeaver's console interface uses a clean dual-thread model: +- **Execution thread**: Runs the agent pipeline (Planner → CodeInterpreter → result) +- **Animation thread**: Consumes events and renders real-time console output + +Communication happens via an event queue (`pending_updates`) protected by a lock, with a condition variable for efficient wake-up. This design provides responsive UI feedback during long-running AI operations while maintaining clean separation of concerns. + +### Animation Pause Handshake + +When exclusive console access is needed (e.g., confirmation prompts), use the handshake: +1. Set `pause_animation` → wait for `animation_paused` +2. Do exclusive work +3. Clear `animation_paused` → clear `pause_animation` + +This pattern is simple, safe, and extensible to future features. diff --git a/playground/UI/.chainlit/config.toml b/playground/UI/.chainlit/config.toml deleted file mode 100644 index 874bf08c5..000000000 --- a/playground/UI/.chainlit/config.toml +++ /dev/null @@ -1,84 +0,0 @@ -[project] -# Whether to enable telemetry (default: true). No personal data is collected. -enable_telemetry = false - -# List of environment variables to be provided by each user to use the app. -user_env = [] - -# Duration (in seconds) during which the session is saved when the connection is lost -session_timeout = 3600 - -# Enable third parties caching (e.g LangChain cache) -cache = false - -# Follow symlink for asset mount (see https://github.com/Chainlit/chainlit/issues/317) -# follow_symlink = true - -[features] -# Show the prompt playground -prompt_playground = true - -# Process and display HTML in messages. This can be a security risk (see https://stackoverflow.com/questions/19603097/why-is-it-dangerous-to-render-user-generated-html-or-javascript) -unsafe_allow_html = true - -# Process and display mathematical expressions. This can clash with "$" characters in messages. -latex = false - -# Authorize users to upload files with messages -spontaneous_file_upload.enabled = true - -# Allows user to use speech to text -[features.speech_to_text] - enabled = false - # See all languages here https://github.com/JamesBrill/react-speech-recognition/blob/HEAD/docs/API.md#language-string - # language = "en-US" - -[UI] -# Name of the app and chatbot. -name = "TaskWeaver" - -# Show the readme while the conversation is empty. -show_readme_as_default = true - -# Description of the app and chatbot. This is used for HTML tags. -# description = "Chat with TaskWeaver" - -# Large size content are by default collapsed for a cleaner ui -default_collapse_content = false - -# The default value for the expand messages settings. -default_expand_messages = true - -# Hide the chain of thought details from the user in the UI. -hide_cot = false - -# Link to your github repo. This will add a github button in the UI's header. -# github = "https://github.com/microsoft/TaskWeaver" - -# Specify a CSS file that can be used to customize the user interface. -# The CSS file can be served from the public directory or via an external link. -custom_css = "/public/style_v1.css" - -# Override default MUI light theme. (Check theme.ts) -[UI.theme.light] - #background = "#FAFAFA" - #paper = "#FFFFFF" - - [UI.theme.light.primary] - #main = "#F80061" - #dark = "#980039" - #light = "#FFE7EB" - -# Override default MUI dark theme. (Check theme.ts) -[UI.theme.dark] - #background = "#FAFAFA" - #paper = "#FFFFFF" - - [UI.theme.dark.primary] - #main = "#F80061" - #dark = "#980039" - #light = "#FFE7EB" - - -[meta] -generated_by = "0.7.700" diff --git a/playground/UI/app.py b/playground/UI/app.py deleted file mode 100644 index 6a081c1a3..000000000 --- a/playground/UI/app.py +++ /dev/null @@ -1,462 +0,0 @@ -import atexit -import functools -import os -import re -import sys -from typing import Any, Dict, List, Optional, Tuple, Union - -import requests - -# change current directory to the directory of this file for loading resources -os.chdir(os.path.dirname(__file__)) - -try: - import chainlit as cl - - print( - "If UI is not started, please go to the folder playground/UI and run `chainlit run app.py` to start the UI", - ) -except Exception: - raise Exception( - "Package chainlit is required for using UI. Please install it manually by running: " - "`pip install chainlit` and then run `chainlit run app.py`", - ) - -repo_path = os.path.join(os.path.dirname(__file__), "../../") -sys.path.append(repo_path) -from taskweaver.app.app import TaskWeaverApp -from taskweaver.memory.attachment import AttachmentType -from taskweaver.memory.type_vars import RoleName -from taskweaver.module.event_emitter import PostEventType, RoundEventType, SessionEventHandlerBase -from taskweaver.session.session import Session - -project_path = os.path.join(repo_path, "project") -app = TaskWeaverApp(app_dir=project_path, use_local_uri=True) -atexit.register(app.stop) -app_session_dict: Dict[str, Session] = {} - - -def elem(name: str, cls: str = "", attr: Dict[str, str] = {}, **attr_dic: str): - all_attr = {**attr, **attr_dic} - if cls: - all_attr.update({"class": cls}) - - attr_str = "" - if len(all_attr) > 0: - attr_str += "".join(f' {k}="{v}"' for k, v in all_attr.items()) - - def inner(*children: str): - children_str = "".join(children) - return f"<{name}{attr_str}>{children_str}" - - return inner - - -def txt(content: str, br: bool = True): - content = content.replace("<", "<").replace(">", ">") - if br: - content = content.replace("\n", "
") - else: - content = content.replace("\n", " ") - return content - - -div = functools.partial(elem, "div") -span = functools.partial(elem, "span") -blinking_cursor = span("tw-end-cursor")() - - -def file_display(files: List[Tuple[str, str]], session_cwd_path: str): - elements: List[cl.Element] = [] - for file_name, file_path in files: - # if image, no need to display as another file - if file_path.endswith((".png", ".jpg", ".jpeg", ".gif")): - image = cl.Image( - name=file_path, - display="inline", - path=file_path if os.path.isabs(file_path) else os.path.join(session_cwd_path, file_path), - size="large", - ) - elements.append(image) - elif file_path.endswith((".mp3", ".wav", ".flac")): - audio = cl.Audio( - name="converted_speech", - display="inline", - path=file_path if os.path.isabs(file_path) else os.path.join(session_cwd_path, file_path), - ) - elements.append(audio) - else: - if file_path.endswith(".csv"): - import pandas as pd - - data = ( - pd.read_csv(file_path) - if os.path.isabs(file_path) - else pd.read_csv(os.path.join(session_cwd_path, file_path)) - ) - row_count = len(data) - table = cl.Text( - name=file_path, - content=f"There are {row_count} in the data. The top {min(row_count, 5)} rows are:\n" - + data.head(n=5).to_markdown(), - display="inline", - ) - elements.append(table) - else: - print(f"Unsupported file type: {file_name} for inline display.") - # download files from plugin context - file = cl.File( - name=file_name, - display="inline", - path=file_path if os.path.isabs(file_path) else os.path.join(session_cwd_path, file_path), - ) - elements.append(file) - return elements - - -def is_link_clickable(url: str): - if url: - try: - response = requests.get(url) - # If the response status code is 200, the link is clickable - return response.status_code == 200 - except requests.exceptions.RequestException: - return False - else: - return False - - -class ChainLitMessageUpdater(SessionEventHandlerBase): - def __init__(self, root_step: cl.Step): - self.root_step = root_step - self.reset_cur_step() - self.suppress_blinking_cursor() - - def reset_cur_step(self): - self.cur_step: Optional[cl.Step] = None - self.cur_attachment_list: List[Tuple[str, AttachmentType, str, bool]] = [] - self.cur_post_status: str = "Updating" - self.cur_send_to: RoleName = "Unknown" - self.cur_message: str = "" - self.cur_message_is_end: bool = False - self.cur_message_sent: bool = False - - def suppress_blinking_cursor(self): - cl.run_sync(self.root_step.stream_token("")) - if self.cur_step is not None: - cl.run_sync(self.cur_step.stream_token("")) - - def handle_round( - self, - type: RoundEventType, - msg: str, - extra: Any, - round_id: str, - **kwargs: Any, - ): - if type == RoundEventType.round_error: - self.root_step.is_error = True - self.root_step.output = msg - cl.run_sync(self.root_step.update()) - - def handle_post( - self, - type: PostEventType, - msg: str, - extra: Any, - post_id: str, - round_id: str, - **kwargs: Any, - ): - if type == PostEventType.post_start: - self.reset_cur_step() - self.cur_step = cl.Step(name=extra["role"], show_input=True, root=False) - cl.run_sync(self.cur_step.__aenter__()) - elif type == PostEventType.post_end: - assert self.cur_step is not None - content = self.format_post_body(True) - cl.run_sync(self.cur_step.stream_token(content, True)) - cl.run_sync(self.cur_step.__aexit__(None, None, None)) # type: ignore - self.reset_cur_step() - elif type == PostEventType.post_error: - pass - elif type == PostEventType.post_attachment_update: - assert self.cur_step is not None, "cur_step should not be None" - id: str = extra["id"] - a_type: AttachmentType = extra["type"] - is_end: bool = extra["is_end"] - # a_extra: Any = extra["extra"] - if len(self.cur_attachment_list) == 0 or id != self.cur_attachment_list[-1][0]: - self.cur_attachment_list.append((id, a_type, msg, is_end)) - - else: - prev_msg = self.cur_attachment_list[-1][2] - self.cur_attachment_list[-1] = (id, a_type, prev_msg + msg, is_end) - - elif type == PostEventType.post_send_to_update: - self.cur_send_to = extra["role"] - elif type == PostEventType.post_message_update: - self.cur_message += msg - if extra["is_end"]: - self.cur_message_is_end = True - elif type == PostEventType.post_status_update: - self.cur_post_status = msg - - if self.cur_step is not None: - content = self.format_post_body(False) - cl.run_sync(self.cur_step.stream_token(content, True)) - if self.cur_message_is_end and not self.cur_message_sent: - self.cur_message_sent = True - self.cur_step.elements = [ - *(self.cur_step.elements or []), - cl.Text( - content=self.cur_message, - display="inline", - ), - ] - cl.run_sync(self.cur_step.update()) - self.suppress_blinking_cursor() - - def get_message_from_user(self, prompt: str, timeout: int = 120) -> Optional[str]: - ask_user_msg = cl.AskUserMessage(content=prompt, author=" ", timeout=timeout) - res = cl.run_sync(ask_user_msg.send()) - cl.run_sync(ask_user_msg.remove()) - if res is not None: - res_msg = cl.Message.from_dict(res) - msg_txt = res_msg.content - cl.run_sync(res_msg.remove()) - return msg_txt - return None - - def get_confirm_from_user( - self, - prompt: str, - actions: List[Union[Tuple[str, str], str]], - timeout: int = 120, - ) -> Optional[str]: - cl_actions: List[cl.Action] = [] - for arg_action in actions: - if isinstance(arg_action, str): - cl_actions.append(cl.Action(name=arg_action, value=arg_action)) - else: - name, value = arg_action - cl_actions.append(cl.Action(name=name, value=value)) - ask_user_msg = cl.AskActionMessage(content=prompt, actions=cl_actions, author=" ", timeout=timeout) - res = cl.run_sync(ask_user_msg.send()) - cl.run_sync(ask_user_msg.remove()) - if res is not None: - for action in cl_actions: - if action.value == res["value"]: - return action.value - return None - - def format_post_body(self, is_end: bool) -> str: - content_chunks: List[str] = [] - - for attachment in self.cur_attachment_list: - a_type = attachment[1] - - # skip artifact paths always - if a_type in [AttachmentType.artifact_paths]: - continue - - # skip Python in final result - if is_end and a_type in [AttachmentType.reply_content]: - continue - - content_chunks.append(self.format_attachment(attachment)) - - if self.cur_message != "": - if self.cur_send_to == "Unknown": - content_chunks.append("**Message**:") - else: - content_chunks.append(f"**Message To {self.cur_send_to}**:") - - if not self.cur_message_sent: - content_chunks.append( - self.format_message(self.cur_message, self.cur_message_is_end), - ) - - if not is_end: - content_chunks.append( - div("tw-status")( - span("tw-status-updating")( - elem("svg", viewBox="22 22 44 44")(elem("circle")()), - ), - span("tw-status-msg")(txt(self.cur_post_status + "...")), - ), - ) - - return "\n\n".join(content_chunks) - - def format_attachment( - self, - attachment: Tuple[str, AttachmentType, str, bool], - ) -> str: - id, a_type, msg, is_end = attachment - header = div("tw-atta-header")( - div("tw-atta-key")( - " ".join([item.capitalize() for item in a_type.value.split("_")]), - ), - div("tw-atta-id")(id), - ) - atta_cnt: List[str] = [] - - if a_type in [AttachmentType.plan, AttachmentType.init_plan]: - items: List[str] = [] - lines = msg.split("\n") - for idx, row in enumerate(lines): - item = row - if "." in row and row.split(".")[0].isdigit(): - item = row.split(".", 1)[1].strip() - items.append( - div("tw-plan-item")( - div("tw-plan-idx")(str(idx + 1)), - div("tw-plan-cnt")( - txt(item), - blinking_cursor if not is_end and idx == len(lines) - 1 else "", - ), - ), - ) - atta_cnt.append(div("tw-plan")(*items)) - elif a_type in [AttachmentType.execution_result]: - atta_cnt.append( - elem("pre", "tw-execution-result")( - elem("code")(txt(msg)), - ), - ) - elif a_type in [AttachmentType.reply_content]: - atta_cnt.append( - elem("pre", "tw-python", {"data-lang": "python"})( - elem("code", "language-python")(txt(msg, br=False)), - ), - ) - else: - atta_cnt.append(txt(msg)) - if not is_end: - atta_cnt.append(blinking_cursor) - - return div("tw-atta")( - header, - div("tw-atta-cnt")(*atta_cnt), - ) - - def format_message(self, message: str, is_end: bool) -> str: - content = txt(message, br=False) - begin_regex = re.compile(r"^```(\w*)$\n", re.MULTILINE) - end_regex = re.compile(r"^```$\n?", re.MULTILINE) - - if not is_end: - end_tag = " " + blinking_cursor - else: - end_tag = "" - - while True: - start_label = begin_regex.search(content) - if not start_label: - break - start_pos = content.index(start_label[0]) - lang_tag = start_label[1] - content = "".join( - [ - content[:start_pos], - f'
',
-                    content[start_pos + len(start_label[0]) :],
-                ],
-            )
-
-            end_pos = end_regex.search(content)
-            if not end_pos:
-                content += end_tag + "
" - end_tag = "" - break - end_pos_pos = content.index(end_pos[0]) - content = f"{content[:end_pos_pos]}{content[end_pos_pos + len(end_pos[0]):]}" - - content += end_tag - return content - - -@cl.on_chat_start -async def start(): - user_session_id = cl.user_session.get("id") - app_session_dict[user_session_id] = app.get_session() - print("Starting new session") - - -@cl.on_chat_end -async def end(): - user_session_id = cl.user_session.get("id") - app_session = app_session_dict[user_session_id] - print(f"Stopping session {app_session.session_id}") - app_session.stop() - app_session_dict.pop(user_session_id) - - -@cl.on_message -async def main(message: cl.Message): - user_session_id = cl.user_session.get("id") # type: ignore - session: Session = app_session_dict[user_session_id] # type: ignore - session_cwd_path = session.execution_cwd - - # display loader before sending message - async with cl.Step(name="", show_input=True, root=True) as root_step: - response_round = await cl.make_async(session.send_message)( - message.content, - files=[ - { - "name": element.name if element.name else "file", - "path": element.path, - } - for element in message.elements - if element.type == "file" or element.type == "image" - ], - event_handler=ChainLitMessageUpdater(root_step), - ) - - artifact_paths = [ - p - for p in response_round.post_list - for a in p.attachment_list - if a.type == AttachmentType.artifact_paths - for p in a.content - ] - - for post in [p for p in response_round.post_list if p.send_to == "User"]: - files: List[Tuple[str, str]] = [] - if len(artifact_paths) > 0: - for file_path in artifact_paths: - # if path is image or csv (the top 5 rows), display it - file_name = os.path.basename(file_path) - files.append((file_name, file_path)) - - # Extract the file path from the message and display it - user_msg_content = post.message - pattern = r"(!?)\[(.*?)\]\((.*?)\)" - matches = re.findall(pattern, user_msg_content) - for match in matches: - img_prefix, file_name, file_path = match - if "://" in file_path: - if not is_link_clickable(file_path): - user_msg_content = user_msg_content.replace( - f"{img_prefix}[{file_name}]({file_path})", - file_name, - ) - continue - files.append((file_name, file_path)) - user_msg_content = user_msg_content.replace( - f"{img_prefix}[{file_name}]({file_path})", - file_name, - ) - elements = file_display(files, session_cwd_path) - await cl.Message( - author="TaskWeaver", - content=f"{user_msg_content}", - elements=elements if len(elements) > 0 else None, - ).send() - - -if __name__ == "__main__": - from chainlit.cli import run_chainlit - - run_chainlit(__file__) diff --git a/playground/UI/chainlit.md b/playground/UI/chainlit.md deleted file mode 100644 index 4eae7eafc..000000000 --- a/playground/UI/chainlit.md +++ /dev/null @@ -1,16 +0,0 @@ -# Welcome to *TaskWeaver* ! - -*Hi there, User! 👋 We're excited to have you on board.* - -TaskWeaver is a code-first agent framework for seamlessly planning and executing data analytics tasks. This innovative framework interprets user requests through coded snippets and efficiently coordinates a variety of plugins in the form of functions to execute data analytics tasks. It supports key Features like: rich data structure, customized algorithms, incorporating domain-specific knowledge, stateful conversation, code verification, easy to use, debug and extend. - -## Useful Links 🔗 - -- **Quick Start:** Quick start TaskWeaver with [README](https://github.com/microsoft/TaskWeaver?tab=readme-ov-file#-quick-start) ✨ -- **Advanced Configurations:** Get started with our [TaskWeaver Documents](https://microsoft.github.io/TaskWeaver/) 📚 -- **Technical Report:** Check out our [TaskWeaver Report](https://export.arxiv.org/abs/2311.17541) for more details! 📖 -- **Discord Channel:** Join the TaskWeaver [Discord Channel](https://discord.gg/Z56MXmZgMb) for discussions 💬 - -We can't wait to see what you create with TaskWeaver! - -**Start the Conversation!** diff --git a/playground/UI/public/favicon.ico b/playground/UI/public/favicon.ico deleted file mode 100644 index be759b297..000000000 Binary files a/playground/UI/public/favicon.ico and /dev/null differ diff --git a/playground/UI/public/logo_dark.png b/playground/UI/public/logo_dark.png deleted file mode 100644 index 5e4eaa009..000000000 Binary files a/playground/UI/public/logo_dark.png and /dev/null differ diff --git a/playground/UI/public/logo_light.png b/playground/UI/public/logo_light.png deleted file mode 100644 index 5e4eaa009..000000000 Binary files a/playground/UI/public/logo_light.png and /dev/null differ diff --git a/playground/UI/public/style_v1.css b/playground/UI/public/style_v1.css deleted file mode 100644 index 65758fb8b..000000000 --- a/playground/UI/public/style_v1.css +++ /dev/null @@ -1,192 +0,0 @@ -img[alt='logo'] { - max-height: 40px !important; - display: inline-block; -} - -.post { - border: 1px solid #ccc; - padding: 10px; - margin-bottom: 10px; - max-width: 800px; -} -.markdown-body { - padding-left: 10px; - padding-right: 10px; -} - -.tw-atta { - display: block; - border: solid 2px #aaa5; - border-radius: 10px; - background-color: #fff4; - overflow: hidden; - box-shadow: 0 0 10px 2px #ccc5; - margin: 10px 0; -} - -.tw-atta-header { - height: 20px; - border-bottom: solid 2px #aaa5; - padding: 5px 10px; - background-color: #5090ff55; - font-weight: 500; - display: flex; -} - -.tw-atta-key { - flex: 1; -} -.tw-atta-id { - opacity: 0.3; - font-size: 0.8em; -} - -.tw-atta-cnt { - padding: 10px 20px; -} - -.markdown-body .tw-plan { - position: relative; -} -div.markdown-body div.tw-plan::before { - content: ''; - display: block; - width: 4px; - height: calc(100% + 20px); - position: absolute; - background-color: #eee; - top: -10px; - left: 15px; -} - -div.markdown-body div.tw-plan-item { - display: flex; -} - -.markdown-body div.tw-plan-idx { - flex: 0 0 20px; - position: relative; - width: 20px; - height: 20px; - border-radius: 12px; - text-align: center; - line-height: 20px; - border: solid 2px #a0c0ff; - background-color: #c0e0ff; - margin: 5px !important; - margin-top: 5px; - font-weight: 500; - color: #555; -} - -.markdown-body div.tw-plan-cnt { - margin: 5px 10px; - margin-top: 5px; -} - -.markdown-body .tw-status { - display: inline-block; - padding: 5px 10px; - border-radius: 3px; - font-size: 14px; - line-height: 20px; - font-weight: 500; - color: #555; - white-space: nowrap; - background-color: #eee; - min-width: 120px; - margin: 10px; -} - -.markdown-body .tw-status-msg { - margin: 10px; - padding: 0; - height: 20px; -} - -/* Updater spinner (adopted from MUI for align with Chainlit) */ -@keyframes tw-updating-status-ani-dash { - 0% { - stroke-dasharray: 1px, 200px; - stroke-dashoffset: 0; - } - - 50% { - stroke-dasharray: 100px, 200px; - stroke-dashoffset: -15px; - } - - 100% { - stroke-dasharray: 100px, 200px; - stroke-dashoffset: -125px; - } -} - -@keyframes tw-updating-status-ani-circle { - 0% { - transform: rotate(0deg); - } - - 100% { - transform: rotate(360deg); - } -} - -.markdown-body .tw-status-updating { - width: 20px; - height: 20px; - display: inline-block; - color: #aaa; - animation: 1.4s linear 0s infinite normal none running - tw-updating-status-ani-circle; -} - -.markdown-body .tw-status-updating svg { - display: block; -} - -.markdown-body .tw-status-updating svg circle { - stroke: currentColor; - stroke-dasharray: 80px, 200px; - stroke-dashoffset: 0; - stroke-width: 4; - fill: none; - r: 20; - cx: 44; - cy: 44; - animation: tw-updating-status-ani-dash 1.4s ease-in-out infinite; -} - -@keyframes tw-blinking-dot { - 0% { - opacity: 0.2; - } - - 20% { - opacity: 1; - } - - 100% { - opacity: 0.2; - } -} - -span.tw-end-cursor { - content: ''; - display: inline-flex; - width: 10px; - border-radius: 5px; - margin-left: 10px; -} - -span.tw-end-cursor::after { - content: ''; - position: relative; - display: block; - width: 10px; - height: 10px; - border-radius: 5px; - background-color: #a0c0ff; - margin: auto; - animation: tw-blinking-dot 0.7s ease-in-out infinite; -} diff --git a/project/examples/planner_examples/example-planner-default-1.yaml b/project/examples/planner_examples/example-planner-default-1.yaml index 1cd2fde0d..069aae3b6 100644 --- a/project/examples/planner_examples/example-planner-default-1.yaml +++ b/project/examples/planner_examples/example-planner-default-1.yaml @@ -14,11 +14,6 @@ rounds: - type: plan_reasoning content: |- The user wants to count the rows of the data file /home/data.csv. The first step is to load the data file and count the rows of the loaded data. - - type: init_plan - content: |- - 1. Load the data file - 2. Count the rows of the loaded data - 3. Check the execution result and report the result to the user - type: plan content: |- 1. Instruct CodeInterpreter to load the data file and count the rows of the loaded data @@ -40,11 +35,6 @@ rounds: The data file /home/data.csv is loaded and there are 100 rows in the data file The execution result is correct The user query is successfully answered - - type: init_plan - content: |- - 1. Load the data file - 2. Count the rows of the loaded data - 3. Check the execution result and report the result to the user - type: plan content: |- 1. Instruct CodeInterpreter to load the data file and count the rows of the loaded data diff --git a/project/examples/planner_examples/example-planner-default-2.yaml b/project/examples/planner_examples/example-planner-default-2.yaml index ba548245d..86e61a181 100644 --- a/project/examples/planner_examples/example-planner-default-2.yaml +++ b/project/examples/planner_examples/example-planner-default-2.yaml @@ -14,13 +14,10 @@ rounds: - type: plan_reasoning content: |- The user greets the Planner - - type: init_plan - content: |- - 1. Respond to the user's greeting - type: plan content: |- 1. Respond to the user's greeting - type: current_plan_step content: 1. Respond to the user's greeting - type: stop - content: Completed \ No newline at end of file + content: Completed diff --git a/project/examples/planner_examples/example-planner-echo.yaml b/project/examples/planner_examples/example-planner-echo.yaml index f94715e74..54b02d79c 100644 --- a/project/examples/planner_examples/example-planner-echo.yaml +++ b/project/examples/planner_examples/example-planner-echo.yaml @@ -14,9 +14,6 @@ rounds: - type: plan_reasoning content: |- The user wants to echo the input 'Hello World' - - type: init_plan - content: |- - 1. Ask Echo to echo the user's input, 'Hello World' - type: plan content: |- 1. Ask Echo to echo the user's input, 'Hello World' @@ -35,9 +32,6 @@ rounds: - type: plan_reasoning content: |- The user query is successfully answered - - type: init_plan - content: |- - 1. Ask Echo to echo the user's input, 'Hello World' - type: plan content: |- 1. Ask Echo to echo the user's input, 'Hello World' diff --git a/project/examples/planner_examples/example-planner-recepta.yaml b/project/examples/planner_examples/example-planner-recepta.yaml index d32574006..075e0af71 100644 --- a/project/examples/planner_examples/example-planner-recepta.yaml +++ b/project/examples/planner_examples/example-planner-recepta.yaml @@ -48,12 +48,6 @@ rounds: - type: plan_reasoning content: |- The first step is to extract the following information from the employee summary: `Employee ID`, `Department`, `Project Code`, `Role`, `Manager`, `Location`. - - type: init_plan - content: |- - 1. extract the following information from the employee summary: `Employee ID`, `Department`, `Project Code`, `Role`, `Manager`, `Location` - 2. validate the role - 3. construct and print the SQL query using the extracted details - 4. check the execution result and report the result to the user - type: plan content: |- 1. extract information from the employee summary @@ -78,12 +72,6 @@ rounds: - type: plan_reasoning content: |- I have extracted the information from the employee summary. The next step is to verify if the role is one of the recognized roles. - - type: init_plan - content: |- - 1. extract the following information from the employee summary: `Employee ID`, `Department`, `Project Code`, `Role`, `Manager`, `Location` - 2. validate the role - 3. construct and print the SQL query using the extracted details - 4. check the execution result and report the result to the user - type: plan content: |- 1. extract information from the employee summary @@ -119,12 +107,6 @@ rounds: - type: plan_reasoning content: |- I can prepare and print the SQL query using the extracted details to the User. - - type: init_plan - content: |- - 1. extract the following information from the employee summary: `Employee ID`, `Department`, `Project Code`, `Role`, `Manager`, `Location` - 2. validate the role - 3. construct and print the SQL query using the extracted details - 4. check the execution result and report the result to the user - type: plan content: |- 1. extract information from the employee summary @@ -134,4 +116,4 @@ rounds: - type: current_plan_step content: 3. construct and print the SQL query using the extracted details and report the result to the user - type: stop - content: Completed \ No newline at end of file + content: Completed diff --git a/project/taskweaver_config.json b/project/taskweaver_config.json deleted file mode 100644 index 7a17b3d89..000000000 --- a/project/taskweaver_config.json +++ /dev/null @@ -1,5 +0,0 @@ -{ - "llm.api_base": "https://api.openai.com/v1", - "llm.api_key": "", - "llm.model": "gpt-4-1106-preview" -} \ No newline at end of file diff --git a/project/taskweaver_config.json.example b/project/taskweaver_config.json.example new file mode 100644 index 000000000..24d30a9b6 --- /dev/null +++ b/project/taskweaver_config.json.example @@ -0,0 +1,6 @@ +{ + "llm.api_type": "openai", + "llm.api_base": "https://api.openai.com/v1", + "llm.api_key": "YOUR_API_KEY", + "llm.model": "gpt-4" +} diff --git a/taskweaver/ces/AGENTS.md b/taskweaver/ces/AGENTS.md new file mode 100644 index 000000000..dc06fb748 --- /dev/null +++ b/taskweaver/ces/AGENTS.md @@ -0,0 +1,73 @@ +# Code Execution Service (CES) - AGENTS.md + +Jupyter kernel-based code execution with local and container modes. + +## Structure + +``` +ces/ +├── environment.py # Environment class - kernel management (~700 lines) +├── common.py # ExecutionResult, ExecutionArtifact, EnvPlugin dataclasses +├── client.py # CES client for remote execution +├── __init__.py # Exports +├── kernel/ # Custom Jupyter kernel implementation +│ └── ext.py # IPython magic commands for TaskWeaver +├── runtime/ # Runtime support files +└── manager/ # Session/kernel lifecycle management +``` + +## Key Classes + +### Environment (environment.py) +Main orchestrator for code execution: +- `EnvMode.Local`: Direct kernel via `MultiKernelManager` +- `EnvMode.Container`: Docker container with mounted volumes + +### ExecutionResult (common.py) +```python +@dataclass +class ExecutionResult: + execution_id: str + code: str + is_success: bool + error: str + output: str + stdout: List[str] + stderr: List[str] + log: List[str] + artifact: List[ExecutionArtifact] + variables: Dict[str, str] # Session variables from execution +``` + +## Execution Flow + +1. `start_session()` - Creates kernel (local or container) +2. `load_plugin()` - Registers plugins in kernel namespace +3. `execute_code()` - Runs code, captures output/artifacts +4. `stop_session()` - Cleanup kernel/container + +## Container Mode Specifics + +- Image: `taskweavercontainers/taskweaver-executor:latest` +- Ports: 5 ports mapped (shell, iopub, stdin, hb, control) +- Volumes: `ces/` and `cwd/` mounted read-write +- Connection file written to `ces/conn-{session}-{kernel}.json` + +## Custom Kernel Magics (kernel/ext.py) + +```python +%_taskweaver_session_init {session_id} +%_taskweaver_plugin_register {name} +%_taskweaver_plugin_load {name} +%_taskweaver_exec_pre_check {index} {exec_id} +%_taskweaver_exec_post_check {index} {exec_id} +%%_taskweaver_update_session_var +``` + +## Adding Plugin Support + +Plugins are loaded via magic commands: +1. `_taskweaver_plugin_register` - Registers plugin class +2. `_taskweaver_plugin_load` - Instantiates with config + +Session variables updated via `%%_taskweaver_update_session_var` magic. diff --git a/taskweaver/ces/common.py b/taskweaver/ces/common.py index ebcd89fef..5fd5f1f7e 100644 --- a/taskweaver/ces/common.py +++ b/taskweaver/ces/common.py @@ -68,6 +68,7 @@ class ExecutionResult: log: List[Tuple[str, str, str]] = dataclasses.field(default_factory=list) artifact: List[ExecutionArtifact] = dataclasses.field(default_factory=list) + variables: List[Tuple[str, str]] = dataclasses.field(default_factory=list) class Client(ABC): diff --git a/taskweaver/ces/environment.py b/taskweaver/ces/environment.py index e5f30ed16..5b2cfe868 100644 --- a/taskweaver/ces/environment.py +++ b/taskweaver/ces/environment.py @@ -697,6 +697,8 @@ def _parse_exec_result( preview=artifact_dict["preview"], ) result.artifact.append(artifact_item) + elif key == "variables": + result.variables = value else: pass diff --git a/taskweaver/ces/kernel/ctx_magic.py b/taskweaver/ces/kernel/ctx_magic.py index efc672486..cf9dba4e5 100644 --- a/taskweaver/ces/kernel/ctx_magic.py +++ b/taskweaver/ces/kernel/ctx_magic.py @@ -58,6 +58,7 @@ def _taskweaver_exec_pre_check(self, line: str): def _taskweaver_exec_post_check(self, line: str, local_ns: Dict[str, Any]): if "_" in local_ns: self.executor.ctx.set_output(local_ns["_"]) + self.executor.ctx.extract_visible_variables(local_ns) return fmt_response(True, "", self.executor.get_post_execution_state()) @cell_magic diff --git a/taskweaver/ces/runtime/context.py b/taskweaver/ces/runtime/context.py index e5ad1c142..1c4b38818 100644 --- a/taskweaver/ces/runtime/context.py +++ b/taskweaver/ces/runtime/context.py @@ -1,7 +1,14 @@ import os +import types from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple +try: + import numpy as _np # type: ignore +except Exception: # pragma: no cover - optional dependency + _np = None + from taskweaver.module.prompt_util import PromptUtil +from taskweaver.plugin.base import Plugin from taskweaver.plugin.context import ArtifactType, LogErrorLevel, PluginContext if TYPE_CHECKING: @@ -15,6 +22,7 @@ def __init__(self, executor: Any) -> None: self.artifact_list: List[Dict[str, str]] = [] self.log_messages: List[Tuple[LogErrorLevel, str, str]] = [] self.output: List[Tuple[str, str]] = [] + self.latest_variables: List[Tuple[str, str]] = [] @property def execution_id(self) -> str: @@ -147,6 +155,54 @@ def get_session_var( return self.executor.session_var[variable_name] return default + def extract_visible_variables(self, local_ns: Dict[str, Any]) -> List[Tuple[str, str]]: + ignore_names = { + "__builtins__", + "In", + "Out", + "get_ipython", + "exit", + "quit", + "pd", + "np", + "plt", + } + + visible: List[Tuple[str, str]] = [] + for name, value in local_ns.items(): + if name.startswith("_") or name in ignore_names: + continue + + if isinstance(value, (types.ModuleType, types.FunctionType)): + continue + + if isinstance(value, Plugin) or getattr(value, "__module__", "").startswith("taskweaver_ext.plugin"): + continue + + if _np is not None and isinstance(value, _np.ndarray): + try: + rendered = _np.array2string( + value, + max_line_width=120, + threshold=20, + edgeitems=3, + ) + rendered = f"ndarray shape={value.shape} dtype={value.dtype} value={rendered}" + except Exception: + rendered = "" + visible.append((name, rendered[:500])) + continue + + try: + rendered = repr(value) + except Exception: + rendered = "" + + visible.append((name, rendered[:500])) + + self.latest_variables = visible + return visible + def wrap_text_with_delimiter_temporal(self, text: str) -> str: """wrap text with delimiter""" return PromptUtil.wrap_text_with_delimiter( diff --git a/taskweaver/ces/runtime/executor.py b/taskweaver/ces/runtime/executor.py index 6359c3ba6..9a7420348 100644 --- a/taskweaver/ces/runtime/executor.py +++ b/taskweaver/ces/runtime/executor.py @@ -226,6 +226,7 @@ def get_post_execution_state(self): "artifact": self.ctx.artifact_list, "log": self.ctx.log_messages, "output": self.ctx.get_normalized_output(), + "variables": list(self.ctx.latest_variables), } def log(self, level: LogErrorLevel, message: str): diff --git a/taskweaver/chat/console/chat.py b/taskweaver/chat/console/chat.py index b7125425b..b17622d12 100644 --- a/taskweaver/chat/console/chat.py +++ b/taskweaver/chat/console/chat.py @@ -9,7 +9,13 @@ import click -from taskweaver.module.event_emitter import PostEventType, RoundEventType, SessionEventHandlerBase, SessionEventType +from taskweaver.module.event_emitter import ( + ConfirmationHandler, + PostEventType, + RoundEventType, + SessionEventHandlerBase, + SessionEventType, +) if TYPE_CHECKING: from taskweaver.memory.attachment import AttachmentType @@ -65,7 +71,30 @@ def user_input_message(prompt: str = " Human ") -> str: continue -class TaskWeaverRoundUpdater(SessionEventHandlerBase): +def user_confirmation_input(prompt: str = "Execute code? [Y/n]: ") -> str: + import prompt_toolkit + + session = prompt_toolkit.PromptSession[str]( + multiline=False, + ) + + while True: + try: + user_input: str = session.prompt( + prompt_toolkit.formatted_text.FormattedText( + [ + ("bg:ansiyellow fg:ansiblack", " Confirm "), + ("fg:ansiyellow", "▶"), + ("", f" {prompt}"), + ], + ), + ) + return user_input.strip().lower() + except KeyboardInterrupt: + return "n" + + +class TaskWeaverRoundUpdater(SessionEventHandlerBase, ConfirmationHandler): def __init__(self): self.exit_event = threading.Event() self.update_cond = threading.Condition() @@ -74,10 +103,23 @@ def __init__(self): self.last_attachment_id = "" self.pending_updates: List[Tuple[str, str]] = [] + # Handshake pair for pausing animation (e.g., during confirmation prompts) + self.pause_animation = threading.Event() # Main requests pause + self.animation_paused = threading.Event() # Animation acknowledges pause + self.animation_paused.set() # Initially paused (not animating yet) + self.messages: List[Tuple[str, str]] = [] self.response: List[str] = [] self.result: Optional[str] = None + def request_confirmation( + self, + code: str, + round_id: str, + post_id: Optional[str], + ) -> bool: + return True + def handle_session( self, type: SessionEventType, @@ -153,6 +195,8 @@ def handle_message( message: str, files: List[Dict[Literal["name", "path", "content"], str]], ) -> Optional[str]: + session.event_emitter.confirmation_handler = self + def execution_thread(): try: round = session.send_message( @@ -171,7 +215,7 @@ def execution_thread(): with self.update_cond: self.update_cond.notify_all() - t_ui = threading.Thread(target=lambda: self._animate_thread(), daemon=True) + t_ui = threading.Thread(target=lambda: self._animate_thread(session), daemon=True) t_ex = threading.Thread(target=execution_thread, daemon=True) t_ui.start() @@ -179,6 +223,10 @@ def execution_thread(): exit_no_wait: bool = False try: while True: + if session.event_emitter.confirmation_pending: + self._handle_confirmation(session) + continue + self.exit_event.wait(0.1) if self.exit_event.is_set(): break @@ -186,7 +234,6 @@ def execution_thread(): error_message("Interrupted by user") exit_no_wait = True - # keyboard interrupt leave the session in unknown state, exit directly exit(1) finally: self.exit_event.set() @@ -200,8 +247,45 @@ def execution_thread(): return self.result - def _animate_thread(self): - # get terminal width + def _handle_confirmation(self, session: Session) -> None: + from colorama import ansi + + # Signal animation thread to pause + self.pause_animation.set() + + # Wait for animation thread to acknowledge it has paused + self.animation_paused.wait(timeout=5.0) + + # Clear any leftover animation output + print(ansi.clear_line(), end="\r") + + code = session.event_emitter.pending_confirmation_code or "" + + # Display code in a style consistent with the UI + click.secho( + click.style(" ├─► ", fg="blue") + + click.style("[", fg="blue") + + click.style("confirm", fg="bright_cyan") + + click.style("]", fg="blue"), + ) + for line in code.split("\n"): + click.secho(click.style(" │ ", fg="blue") + click.style(line, fg="bright_black")) + + response = user_confirmation_input() + approved = response not in ("n", "no") + + if approved: + click.secho(click.style(" │ ", fg="blue") + click.style("✓ approved", fg="green")) + else: + click.secho(click.style(" │ ", fg="blue") + click.style("✗ cancelled", fg="red")) + + # Allow animation thread to resume + self.animation_paused.clear() + self.pause_animation.clear() + + session.event_emitter.provide_confirmation(approved) + + def _animate_thread(self, session: Session): terminal_column = shutil.get_terminal_size().columns counter = 0 status_msg = "preparing" @@ -306,6 +390,22 @@ def format_status_message(limit: int): last_time = 0 while True: + # Check if we should pause FIRST, before any output + if self.pause_animation.is_set(): + # Signal that animation has paused + self.animation_paused.set() + # Wait until pause is lifted + while self.pause_animation.is_set(): + if self.exit_event.is_set(): + break + with self.update_cond: + self.update_cond.wait(0.1) + # Reset for next iteration + continue + + # Animation is running, clear the paused signal + self.animation_paused.clear() + clear_line() with self.lock: for action, opt in self.pending_updates: @@ -367,6 +467,10 @@ def format_status_message(limit: int): if self.exit_event.is_set(): break + # Check again before printing status line + if self.pause_animation.is_set(): + continue + cur_message_prefix: str = " TaskWeaver " cur_ani_frame = get_ani_frame(counter) cur_message_display_len = ( diff --git a/taskweaver/cli/web.py b/taskweaver/cli/web.py deleted file mode 100644 index b05b8e4c7..000000000 --- a/taskweaver/cli/web.py +++ /dev/null @@ -1,54 +0,0 @@ -import click - -from taskweaver.cli.util import require_workspace - - -@click.command() -@require_workspace() -@click.option( - "--host", - "-h", - default="localhost", - help="Host to run TaskWeaver web server", - type=str, - show_default=True, -) -@click.option("--port", "-p", default=8080, help="Port to run TaskWeaver web server", type=int, show_default=True) -@click.option( - "--debug", - "-d", - is_flag=True, - default=False, - help="Run TaskWeaver web server in debug mode", - show_default=True, -) -@click.option( - "--open/--no-open", - "-o/-n", - is_flag=True, - default=True, - help="Open TaskWeaver web server in browser", - show_default=True, -) -def web(host: str, port: int, debug: bool, open: bool): - """Start TaskWeaver web server""" - - from taskweaver.chat.web import start_web_service - - if not debug: - # debug mode will restart app iteratively, skip the plugin listing - # display_enabled_examples_plugins() - pass - - def post_app_start(): - if open: - click.secho("launching web browser...", fg="green") - open_url = f"http://{'localhost' if host == '0.0.0.0' else host}:{port}" - click.launch(open_url) - - start_web_service( - host, - port, - is_debug=debug, - post_app_start=post_app_start if open else None, - ) diff --git a/taskweaver/code_interpreter/AGENTS.md b/taskweaver/code_interpreter/AGENTS.md new file mode 100644 index 000000000..cb545e7ab --- /dev/null +++ b/taskweaver/code_interpreter/AGENTS.md @@ -0,0 +1,83 @@ +# Code Interpreter - AGENTS.md + +Code generation and execution roles with multiple variants. + +## Structure + +``` +code_interpreter/ +├── interpreter.py # Interpreter ABC (update_session_variables) +├── code_executor.py # CodeExecutor - bridges to CES +├── code_verification.py # AST-based code validation +├── plugin_selection.py # Plugin selection logic +├── code_interpreter/ # Full code interpreter +│ ├── code_interpreter.py # CodeInterpreter role (~320 lines) +│ ├── code_generator.py # LLM-based code generation +│ └── code_interpreter.role.yaml +├── code_interpreter_cli_only/ # CLI-only variant (no plugins) +│ ├── code_interpreter_cli_only.py +│ ├── code_generator_cli_only.py +│ └── code_interpreter_cli_only.role.yaml +└── code_interpreter_plugin_only/ # Plugin-only variant (no free-form code) + ├── code_interpreter_plugin_only.py + ├── code_generator_plugin_only.py + └── code_interpreter_plugin_only.role.yaml +``` + +## Role Variants + +| Variant | Plugins | Free-form Code | Use Case | +|---------|---------|----------------|----------| +| `code_interpreter` | Yes | Yes | Full capability | +| `code_interpreter_cli_only` | No | Yes | Restricted to CLI | +| `code_interpreter_plugin_only` | Yes | No | Only plugin calls | + +## Key Classes + +### CodeInterpreter (Role, Interpreter) +- Orchestrates: CodeGenerator -> verification -> CodeExecutor +- Retry logic: up to `max_retry_count` (default 3) on failures +- Config: `CodeInterpreterConfig` (verification settings, blocked functions) + +### CodeGenerator +- LLM-based code generation from conversation context +- Outputs: code + explanation via PostEventProxy +- Configurable verification: allowed_modules, blocked_functions + +### CodeExecutor +- Wraps CES Environment +- Plugin loading from PluginRegistry +- Session variable management + +## Code Verification (code_verification.py) + +AST-based checks: +- `allowed_modules`: Whitelist of importable modules +- `blocked_functions`: Blacklist (default: `eval`, `exec`, `open`, etc.) + +```python +code_verify_errors = code_snippet_verification( + code, + code_verification_on=True, + allowed_modules=["pandas", "numpy"], + blocked_functions=["eval", "exec"], +) +``` + +## Role YAML Schema + +```yaml +module: taskweaver.code_interpreter.code_interpreter.code_interpreter.CodeInterpreter +alias: CodeInterpreter # Used in message routing +intro: | + - Description of capabilities + - {plugin_description} placeholder for dynamic plugin list +``` + +## Execution Flow + +1. `reply()` called with Memory context +2. CodeGenerator produces code via LLM +3. Code verification (if enabled) +4. CodeExecutor runs code in CES kernel +5. Results formatted back to Post diff --git a/taskweaver/code_interpreter/code_executor.py b/taskweaver/code_interpreter/code_executor.py index 40974950d..f7859f0d5 100644 --- a/taskweaver/code_interpreter/code_executor.py +++ b/taskweaver/code_interpreter/code_executor.py @@ -10,6 +10,7 @@ from taskweaver.module.tracing import Tracing, get_tracer, tracing_decorator from taskweaver.plugin.context import ArtifactType from taskweaver.session import SessionMetadata +from taskweaver.utils import pretty_repr TRUNCATE_CHAR_LENGTH = 1500 @@ -190,6 +191,11 @@ def format_code_output( lines.append( "The result of above Python code after execution is:\n" + str(output), ) + elif len(result.variables) > 0: + lines.append("The following variables are currently available in the Python session:\n") + for name, val in result.variables: + lines.append(f"- {name}: {pretty_repr(val, limit=500)}") + lines.append("") elif result.is_success: if len(result.stdout) > 0: lines.append( diff --git a/taskweaver/code_interpreter/code_interpreter/code_generator.py b/taskweaver/code_interpreter/code_interpreter/code_generator.py index 5800fb1c0..321631dc3 100644 --- a/taskweaver/code_interpreter/code_interpreter/code_generator.py +++ b/taskweaver/code_interpreter/code_interpreter/code_generator.py @@ -1,7 +1,7 @@ import datetime import json import os -from typing import List, Optional +from typing import List, Literal, Optional, Union from injector import inject @@ -119,10 +119,10 @@ def compose_verification_requirements( + ", ".join([f"{module}" for module in self.allowed_modules]), ) - if len(self.allowed_modules) == 0: + if self.allowed_modules is not None and len(self.allowed_modules) == 0: requirements.append(f"- {self.role_name} cannot import any Python modules.") - if len(self.blocked_functions) > 0: + if self.blocked_functions is not None and len(self.blocked_functions) > 0: requirements.append( f"- {self.role_name} cannot use the following Python functions: " + ", ".join([f"{function}" for function in self.blocked_functions]), @@ -207,10 +207,11 @@ def compose_conversation( AttachmentType.code_error, AttachmentType.execution_status, AttachmentType.execution_result, + AttachmentType.session_variables, ] is_first_post = True - last_post: Post = None + last_post: Optional[Post] = None for round_index, conversation_round in enumerate(rounds): for post_index, post in enumerate(conversation_round.post_list): # compose user query @@ -292,10 +293,24 @@ def compose_conversation( if len(user_message) > 0: # add requirements to the last user message if is_final_post and add_requirements: + available_vars_section = "" + session_vars = post.get_attachment(AttachmentType.session_variables) + if session_vars is not None and len(session_vars) > 0: + try: + decoded_vars = json.loads(session_vars[0].content) + if isinstance(decoded_vars, list) and len(decoded_vars) > 0: + formatted_vars = "\n".join([f"- {name}: {value}" for name, value in decoded_vars]) + available_vars_section = ( + "\nCurrently available variables in the Python session:\n" + formatted_vars + ) + except Exception: + pass user_message += "\n" + self.query_requirements_template.format( CODE_GENERATION_REQUIREMENTS=self.compose_verification_requirements(), ROLE_NAME=self.role_name, ) + if available_vars_section: + user_message += available_vars_section chat_history.append( format_chat_message(role="user", message=user_message), ) @@ -365,7 +380,7 @@ def reply( }, ) - def early_stop(_type: AttachmentType, value: str) -> bool: + def early_stop(_type: Union[AttachmentType, Literal["message", "send_to"]], value: str) -> bool: if _type in [AttachmentType.reply_content]: return True else: @@ -443,6 +458,7 @@ def format_code_feedback(post: Post) -> str: feedback = "" verification_status = "" execution_status = "" + variable_lines = [] for attachment in post.attachment_list: if attachment.type == AttachmentType.verification and attachment.content == "CORRECT": feedback += "## Verification\nCode verification has been passed.\n" @@ -466,4 +482,13 @@ def format_code_feedback(post: Post) -> str: execution_status = "FAILURE" elif attachment.type == AttachmentType.execution_result and execution_status != "NONE": feedback += f"{attachment.content}\n" + elif attachment.type == AttachmentType.session_variables: + try: + variables = json.loads(attachment.content) + if isinstance(variables, list) and len(variables) > 0: + variable_lines.extend([f"- {name}: {value}" for name, value in variables]) + except Exception: + pass + if len(variable_lines) > 0: + feedback += "## Available Variables\n" + "\n".join(variable_lines) + "\n" return feedback diff --git a/taskweaver/code_interpreter/code_interpreter/code_interpreter.py b/taskweaver/code_interpreter/code_interpreter/code_interpreter.py index 82dc9da6e..a5adeaa2e 100644 --- a/taskweaver/code_interpreter/code_interpreter/code_interpreter.py +++ b/taskweaver/code_interpreter/code_interpreter/code_interpreter.py @@ -1,3 +1,4 @@ +import json import os from typing import Dict, Literal, Optional @@ -58,6 +59,7 @@ def _configure(self): ) self.code_prefix = self._get_str("code_prefix", "") + self.require_confirmation = self._get_bool("require_confirmation", False) def update_verification( @@ -233,6 +235,27 @@ def reply( elif len(code_verify_errors) == 0: update_verification(post_proxy, "CORRECT", "No error is found.") + if self.config.require_confirmation: + post_proxy.update_status("awaiting confirmation") + self.logger.info("Requesting user confirmation for code execution") + + confirmed = self.event_emitter.request_code_confirmation( + code.content, + post_proxy.post.id, + ) + + if not confirmed: + self.logger.info("Code execution cancelled by user") + self.tracing.set_span_status("OK", "User cancelled code execution.") + update_execution( + post_proxy, + "NONE", + "Code execution was cancelled by user.", + ) + post_proxy.update_message("Code execution was cancelled.") + self.retry_count = 0 + return post_proxy.end() + executable_code = f"{code.content}" full_code_prefix = None if self.config.code_prefix: @@ -247,6 +270,12 @@ def reply( code=executable_code, ) + if len(exec_result.variables) > 0: + post_proxy.update_attachment( + json.dumps(exec_result.variables), + AttachmentType.session_variables, + ) + code_output = self.executor.format_code_output( exec_result, with_code=False, diff --git a/taskweaver/ext_role/AGENTS.md b/taskweaver/ext_role/AGENTS.md new file mode 100644 index 000000000..3768efa54 --- /dev/null +++ b/taskweaver/ext_role/AGENTS.md @@ -0,0 +1,104 @@ +# Extended Roles - AGENTS.md + +Custom role definitions extending TaskWeaver capabilities. + +## Structure + +``` +ext_role/ +├── __init__.py +├── web_search/ # Web search role +│ ├── web_search.py +│ └── web_search.role.yaml +├── web_explorer/ # Browser automation role +│ ├── web_explorer.py +│ ├── driver.py # Selenium/Playwright driver +│ ├── planner.py # Web action planning +│ └── web_explorer.role.yaml +├── image_reader/ # Image analysis role +│ ├── image_reader.py +│ └── image_reader.role.yaml +├── document_retriever/ # Document RAG role +│ ├── document_retriever.py +│ └── document_retriever.role.yaml +├── recepta/ # Custom tool orchestration +│ ├── recepta.py +│ └── recepta.role.yaml +└── echo/ # Debug/test echo role + ├── echo.py + └── echo.role.yaml +``` + +## Role YAML Schema + +Each role requires a `.role.yaml` file: + +```yaml +module: taskweaver.ext_role.{name}.{name}.{ClassName} +alias: {DisplayName} # Used in message routing +intro: | + - Capability description line 1 + - Capability description line 2 +``` + +## Creating a New Extended Role + +1. Create directory: `ext_role/my_role/` +2. Create `my_role.py`: +```python +from taskweaver.role import Role +from taskweaver.role.role import RoleConfig, RoleEntry + +class MyRoleConfig(RoleConfig): + def _configure(self): + # Config inherits from parent dir name + self.custom_setting = self._get_str("custom", "default") + +class MyRole(Role): + @inject + def __init__( + self, + config: MyRoleConfig, + logger: TelemetryLogger, + tracing: Tracing, + event_emitter: SessionEventEmitter, + role_entry: RoleEntry, + ): + super().__init__(config, logger, tracing, event_emitter, role_entry) + + def reply(self, memory: Memory, **kwargs) -> Post: + # Implement role logic + post_proxy = self.event_emitter.create_post_proxy(self.alias) + # ... process and respond + return post_proxy.end() +``` + +3. Create `my_role.role.yaml`: +```yaml +module: taskweaver.ext_role.my_role.my_role.MyRole +alias: MyRole +intro: | + - This role does X + - It can handle Y +``` + +4. Enable in session config: +```json +{ + "session.roles": ["planner", "code_interpreter", "my_role"] +} +``` + +## Role Discovery + +RoleRegistry scans: +- `ext_role/*/\*.role.yaml` +- `code_interpreter/*/\*.role.yaml` + +Registry refreshes every 5 minutes (TTL). + +## Naming Convention + +- Directory name = role name = config namespace +- Class name = PascalCase of directory name +- Alias used for `send_to`/`send_from` in Posts diff --git a/taskweaver/llm/AGENTS.md b/taskweaver/llm/AGENTS.md new file mode 100644 index 000000000..f631a8165 --- /dev/null +++ b/taskweaver/llm/AGENTS.md @@ -0,0 +1,62 @@ +# LLM Module - AGENTS.md + +Provider abstraction layer for LLM and embedding services. + +## Structure + +``` +llm/ +├── base.py # Abstract base: CompletionService, EmbeddingService, LLMModuleConfig +├── util.py # ChatMessageType, format_chat_message, token counting +├── openai.py # OpenAI/Azure OpenAI provider (largest file ~430 lines) +├── anthropic.py # Anthropic Claude provider +├── google_genai.py # Google Generative AI provider +├── ollama.py # Ollama local LLM provider +├── qwen.py # Alibaba Qwen provider +├── zhipuai.py # ZhipuAI provider +├── groq.py # Groq provider +├── azure_ml.py # Azure ML endpoints +├── sentence_transformer.py # Local embedding via sentence_transformers +├── mock.py # Mock provider for testing +├── placeholder.py # Placeholder when no LLM configured +└── __init__.py # LLMApi facade class +``` + +## Key Patterns + +### Provider Registration +New providers must: +1. Subclass `CompletionService` or `EmbeddingService` from `base.py` +2. Implement `chat_completion()` generator or `get_embeddings()` +3. Register in `__init__.py` LLMApi class's provider mapping + +### Config Hierarchy +```python +class MyProviderConfig(LLMServiceConfig): + def _configure(self) -> None: + self._set_name("my_provider") # creates llm.my_provider.* namespace + self.custom_setting = self._get_str("custom_setting", "default") +``` + +### ChatMessageType +```python +ChatMessageType = TypedDict("ChatMessageType", { + "role": str, # "system", "user", "assistant" + "content": str, + "name": NotRequired[str], +}) +``` + +## Adding a New LLM Provider + +1. Create `taskweaver/llm/newprovider.py` +2. Implement config class extending `LLMServiceConfig` +3. Implement service class extending `CompletionService` +4. Add to `_completion_service_map` in `__init__.py` +5. Document in `llm.api_type` config options + +## Common Gotchas + +- `response_format` options: `"json_object"`, `"text"`, `"json_schema"` +- Streaming: All providers return `Generator[ChatMessageType, None, None]` +- OpenAI file is largest (~430 lines) - handles both OpenAI and Azure OpenAI diff --git a/taskweaver/memory/AGENTS.md b/taskweaver/memory/AGENTS.md new file mode 100644 index 000000000..9cc5dc752 --- /dev/null +++ b/taskweaver/memory/AGENTS.md @@ -0,0 +1,105 @@ +# Memory Module - AGENTS.md + +Conversation history data model: Post, Round, Conversation, Attachment. + +## Structure + +``` +memory/ +├── memory.py # Memory class - session conversation store +├── conversation.py # Conversation - list of Rounds +├── round.py # Round - single user query + responses +├── post.py # Post - single message between roles +├── attachment.py # Attachment - typed data on Posts +├── type_vars.py # Type aliases (RoleName, etc.) +├── experience.py # Experience storage and retrieval +├── compression.py # RoundCompressor for prompt compression +├── plugin.py # PluginModule for DI +├── shared_memory_entry.py # SharedMemoryEntry for cross-role data +└── utils.py # Utility functions +``` + +## Data Model Hierarchy + +``` +Memory +└── Conversation + └── Round[] + ├── user_query: str + ├── state: "created" | "finished" | "failed" + └── Post[] + ├── send_from: str (role name) + ├── send_to: str (role name) + ├── message: str + └── Attachment[] + ├── type: AttachmentType + ├── content: str + └── extra: Any +``` + +## Key Classes + +### Post (post.py) +```python +@dataclass +class Post: + id: str + send_from: str + send_to: str + message: str + attachment_list: List[Attachment] + + @staticmethod + def create(message: str, send_from: str, send_to: str) -> Post +``` + +### AttachmentType (attachment.py) +```python +class AttachmentType(str, Enum): + # Planning + plan = "plan" + current_plan_step = "current_plan_step" + + # Code execution + reply_content = "reply_content" # Generated code + verification = "verification" + execution_status = "execution_status" + execution_result = "execution_result" + + # Control flow + revise_message = "revise_message" + invalid_response = "invalid_response" + + # Shared state + shared_memory_entry = "shared_memory_entry" + session_variables = "session_variables" +``` + +### SharedMemoryEntry (shared_memory_entry.py) +Cross-role communication: +```python +@dataclass +class SharedMemoryEntry: + type: str # "plan", "experience_sub_path", etc. + scope: str # "round" or "conversation" + content: str +``` + +## Memory Patterns + +### Role-specific Round Filtering +```python +# Get rounds relevant to a specific role +rounds = memory.get_role_rounds(role="Planner", include_failure_rounds=False) +``` + +### Shared Memory Queries +```python +# Get shared entries by type +entries = memory.get_shared_memory_entries(entry_type="plan") +``` + +## Serialization + +All dataclasses support `to_dict()` and `from_dict()` for YAML/JSON persistence. +Experience saving: `memory.save_experience(exp_dir, thin_mode=True)` diff --git a/taskweaver/memory/attachment.py b/taskweaver/memory/attachment.py index dc9bcd334..dad72516e 100644 --- a/taskweaver/memory/attachment.py +++ b/taskweaver/memory/attachment.py @@ -9,7 +9,6 @@ class AttachmentType(Enum): # Planner Type - init_plan = "init_plan" plan = "plan" current_plan_step = "current_plan_step" plan_reasoning = "plan_reasoning" @@ -44,6 +43,9 @@ class AttachmentType(Enum): invalid_response = "invalid_response" text = "text" + # CodeInterpreter - visible variables snapshot + session_variables = "session_variables" + # shared memory entry shared_memory_entry = "shared_memory_entry" @@ -111,7 +113,7 @@ def to_dict(self) -> AttachmentDict: } @staticmethod - def from_dict(content: AttachmentDict) -> Attachment: + def from_dict(content: AttachmentDict) -> Optional[Attachment]: # deprecated types if content["type"] in ["python", "sample", "text"]: raise ValueError( @@ -120,6 +122,10 @@ def from_dict(content: AttachmentDict) -> Attachment: f"on how to fix it.", ) + removed_types = ["init_plan"] + if content["type"] in removed_types: + return None + type = AttachmentType(content["type"]) return Attachment.create( type=type, diff --git a/taskweaver/memory/post.py b/taskweaver/memory/post.py index 4b053691c..eafb06790 100644 --- a/taskweaver/memory/post.py +++ b/taskweaver/memory/post.py @@ -73,14 +73,18 @@ def to_dict(self) -> Dict[str, Any]: @staticmethod def from_dict(content: Dict[str, Any]) -> Post: """Convert the dict to a post. Will assign a new id to the post.""" + attachments = [] + if content["attachment_list"] is not None: + for attachment in content["attachment_list"]: + parsed = Attachment.from_dict(attachment) + if parsed is not None: + attachments.append(parsed) return Post( id="post-" + secrets.token_hex(6), message=content["message"], send_from=content["send_from"], send_to=content["send_to"], - attachment_list=[Attachment.from_dict(attachment) for attachment in content["attachment_list"]] - if content["attachment_list"] is not None - else [], + attachment_list=attachments, ) def add_attachment(self, attachment: Attachment) -> None: diff --git a/taskweaver/module/event_emitter.py b/taskweaver/module/event_emitter.py index 7b7708c89..df1629ba9 100644 --- a/taskweaver/module/event_emitter.py +++ b/taskweaver/module/event_emitter.py @@ -1,6 +1,7 @@ from __future__ import annotations import abc +import threading from contextlib import contextmanager from dataclasses import dataclass from enum import Enum @@ -40,6 +41,8 @@ class PostEventType(Enum): post_send_to_update = "post_send_to_update" post_message_update = "post_message_update" post_attachment_update = "post_attachment_update" + post_confirmation_request = "post_confirmation_request" + post_confirmation_response = "post_confirmation_response" @dataclass @@ -123,6 +126,32 @@ def handle_post( pass +class ConfirmationHandler(abc.ABC): + """Protocol for handling execution confirmations. + + Implementers of this protocol can intercept code execution + and request user confirmation before proceeding. + """ + + @abc.abstractmethod + def request_confirmation( + self, + code: str, + round_id: str, + post_id: Optional[str], + ) -> bool: + """Request confirmation for code execution. + + Args: + code: The code that is about to be executed. + round_id: The current round ID. + post_id: The current post ID (may be None). + + Returns: + True to proceed with execution, False to abort. + """ + + class PostEventProxy: def __init__(self, emitter: SessionEventEmitter, round_id: str, post: Post) -> None: self.emitter = emitter @@ -233,6 +262,75 @@ def __init__(self): self.handlers: List[SessionEventHandler] = [] self.current_round_id: Optional[str] = None + self._confirmation_handler: Optional[ConfirmationHandler] = None + self._confirmation_event = threading.Event() + self._confirmation_cond = threading.Condition() + self._confirmation_result: Optional[bool] = None + self._confirmation_code: Optional[str] = None + self._confirmation_post_id: Optional[str] = None + + @property + def confirmation_handler(self) -> Optional[ConfirmationHandler]: + return self._confirmation_handler + + @confirmation_handler.setter + def confirmation_handler(self, handler: Optional[ConfirmationHandler]): + self._confirmation_handler = handler + + @property + def confirmation_pending(self) -> bool: + return self._confirmation_event.is_set() + + @property + def pending_confirmation_code(self) -> Optional[str]: + return self._confirmation_code + + def request_code_confirmation(self, code: str, post_id: Optional[str] = None) -> bool: + if self._confirmation_handler is None: + return True + + self._confirmation_code = code + self._confirmation_post_id = post_id + self._confirmation_event.set() + + self.emit( + TaskWeaverEvent( + EventScope.post, + PostEventType.post_confirmation_request, + self.current_round_id, + post_id, + code, + extra={"code": code}, + ), + ) + + with self._confirmation_cond: + while self._confirmation_result is None: + self._confirmation_cond.wait() + result = self._confirmation_result + self._confirmation_result = None + self._confirmation_code = None + self._confirmation_post_id = None + + self.emit( + TaskWeaverEvent( + EventScope.post, + PostEventType.post_confirmation_response, + self.current_round_id, + post_id, + "approved" if result else "rejected", + extra={"approved": result}, + ), + ) + + return result + + def provide_confirmation(self, approved: bool): + with self._confirmation_cond: + self._confirmation_result = approved + self._confirmation_event.clear() # Clear immediately so main thread won't see stale True + self._confirmation_cond.notify_all() + def emit(self, event: TaskWeaverEvent): for handler in self.handlers: handler.handle(event) diff --git a/taskweaver/planner/planner.py b/taskweaver/planner/planner.py index 31de46201..c5329222b 100644 --- a/taskweaver/planner/planner.py +++ b/taskweaver/planner/planner.py @@ -268,8 +268,6 @@ def check_post_validity(post: Post): missing_elements.append("message") attachment_types = [attachment.type for attachment in post.attachment_list] - if AttachmentType.init_plan not in attachment_types: - missing_elements.append("init_plan") if AttachmentType.plan not in attachment_types: missing_elements.append("plan") if AttachmentType.current_plan_step not in attachment_types: diff --git a/taskweaver/planner/planner_prompt.yaml b/taskweaver/planner/planner_prompt.yaml index 1b87b82c0..32ab3a954 100644 --- a/taskweaver/planner/planner_prompt.yaml +++ b/taskweaver/planner/planner_prompt.yaml @@ -44,24 +44,16 @@ instruction_template: |- 2. Planner use its own skills to complete the task step, which is recommended when the task step is simple. ## Planner's planning process - You need to make a step-by-step plan to complete the User's task. The planning process includes 2 phases: `init_plan` and `plan`. - In the `init_plan` phase, you need to decompose the User's task into subtasks and list them as the detailed plan steps. - In the `plan` phase, you need to refine the initial plan by merging adjacent steps that have sequential dependency or no dependency, unless the merged step becomes too complicated. - - ### init_plan - - Decompose User's task into subtasks and list them as the detailed subtask steps. - - Annotate the dependencies between these steps. There are 2 dependency types: - 1. Sequential Dependency: the current subtask depends on the previous subtask, but they can be executed in one step by a Worker, - and no additional information is required. - 2. Interactive Dependency: the current subtask depends on the previous subtask but they cannot be executed in one step by a Worker, - typically without necessary information (e.g., hyperparameters, data path, model name, file content, data schema, etc.). - 3. No Dependency: the current subtask can be executed independently without any dependency. - - The initial plan must contain dependency annotations for sequential and interactive dependencies. - - ### plan - - Planner should try to merge adjacent steps that have sequential dependency or no dependency. - - Planner should not merge steps with interactive dependency. - - The final plan must not contain dependency annotations. + You need to make a step-by-step plan to complete the User's task. + When creating the plan, you should mentally decompose the task into subtasks and consider their dependencies: + - Sequential Dependency: the current subtask depends on the previous subtask, but they can be executed in one step by a Worker, and no additional information is required. + - Interactive Dependency: the current subtask depends on the previous subtask but they cannot be executed in one step by a Worker, typically without necessary information (e.g., hyperparameters, data path, model name, file content, data schema, etc.). + - No Dependency: the current subtask can be executed independently without any dependency. + + Based on this analysis, create a compact plan by: + - Merging adjacent steps that have sequential dependency or no dependency into single steps + - Keeping steps with interactive dependency separate (they require intermediate results before proceeding) + - The final plan should be concise and actionable, without dependency annotations ## Planner's communication process - Planner should communicate with the User and Workers by specifying the `send_to` field in the response. @@ -73,37 +65,30 @@ instruction_template: |- + AdditionalInformation: The User's request is incomplete or missing critical information and requires additional information. + SecurityRisks: The User's request contains potential security risks or illegal activities and requires rejection. + TaskFailure: The task fails after few attempts and requires the User's confirmation to proceed. + + UserCancelled: The User has explicitly cancelled the operation (e.g., declined code execution confirmation). Do NOT retry or continue the task - stop immediately and acknowledge the cancellation. + + ### Examples of planning + The examples below show how to think about task decomposition and create compact plans: - ### Examples of planning process [Example 1] User: count rows for ./data.csv - init_plan: - 1. Read ./data.csv file - 2. Count the rows of the loaded data - 3. Check the execution result and report the result to the user + Reasoning: Reading and counting can be done in one step (sequential dependency), but we need execution results before reporting (interactive dependency). plan: 1. Read ./data.csv file and count the rows of the loaded data 2. Check the execution result and report the result to the user [Example 2] User: Read a manual file and follow the instructions in it. - init_plan: - 1. Read the file content and show its content to the user - 2. Follow the instructions based on the file content. - 3. Confirm the completion of the instructions and report the result to the user + Reasoning: We must read the file first to know what instructions to follow (interactive dependency), then execute them (interactive dependency). plan: 1. Read the file content and show its content to the user - 2. follow the instructions based on the file content. + 2. Follow the instructions based on the file content 3. Confirm the completion of the instructions and report the result to the user [Example 3] User: detect anomaly on ./data.csv - init_plan: - 1. Read the ./data.csv and show me the top 5 rows to understand the data schema - 2. Confirm the columns to be detected anomalies - 3. Detect anomalies on the loaded data - 4. Check the execution result and report the detected anomalies to the user + Reasoning: Reading data and confirming columns can be merged (sequential), but anomaly detection needs the confirmed columns (interactive). plan: 1. Read the ./data.csv and show me the top 5 rows to understand the data schema and confirm the columns to be detected anomalies 2. Detect anomalies on the loaded data @@ -111,12 +96,7 @@ instruction_template: |- [Example 4] User: read a.csv and b.csv and join them together - init_plan: - 1. Load a.csv as dataframe and show me the top 5 rows to understand the data schema - 2. Load b.csv as dataframe and show me the top 5 rows to understand the data schema - 3. Ask which column to join - 4. Join the two dataframes - 5. Check the execution result and report the joined data to the user + Reasoning: Loading both files and asking about join column can be merged (sequential/no dependency), but joining needs the column choice (interactive). plan: 1. Load a.csv and b.csv as dataframes, show me the top 5 rows to understand the data schema, and ask which column to join 2. Join the two dataframes @@ -126,6 +106,7 @@ instruction_template: |- - When the request involves loading a file or pulling a table from db, Planner should always set the first subtask to reading the content to understand the structure or schema of the data. - When the request involves text analysis, Planner should always set the first subtask to read and print the text content to understand its content structure. - When the request involves read instructions for task execution, Planner should always update the plan to the steps and sub-steps in the instructions and then follow the updated plan to execute necessary actions. + - When a Worker responds with "Code execution was cancelled by user" or similar cancellation message, Planner must immediately stop the task with stop="UserCancelled" and NOT retry or attempt alternative approaches. ## Planner's response format - Planner must strictly format the response into the following JSON object: @@ -148,22 +129,18 @@ response_json_schema: |- "type": "string", "description": "The reasoning of the Planner's decision. It should include the analysis of the User's request, the Workers' responses, and the current environment context." }, - "init_plan": { - "type": "string", - "description": "The initial plan to decompose the User's task into subtasks and list them as the detailed subtask steps. The initial plan must contain dependency annotations for sequential and interactive dependencies." - }, "plan": { "type": "string", - "description": "The refined plan by merging adjacent steps that have sequential dependency or no dependency. The final plan must not contain dependency annotations." + "description": "The step-by-step plan to complete the User's task. Steps with sequential or no dependency should be merged. Steps with interactive dependency should be kept separate." }, "current_plan_step": { "type": "string", "description": "The current step Planner is executing." }, - "stop": { + "stop": { "type": "string", "description": "The stop reason when the Planner needs to talk to the User. Set it to 'InProcess' if the Planner is not talking to the User.", - "enum": ["InProcess", "Completed", "Clarification", "AdditionalInformation", "SecurityRisks", "TaskFailure"] + "enum": ["InProcess", "Completed", "Clarification", "AdditionalInformation", "SecurityRisks", "TaskFailure", "UserCancelled"] }, "send_to": { "type": "string", @@ -176,7 +153,6 @@ response_json_schema: |- }, "required": [ "plan_reasoning", - "init_plan", "plan", "current_plan_step", "stop", diff --git a/taskweaver/utils/__init__.py b/taskweaver/utils/__init__.py index e82610ae7..ee773773f 100644 --- a/taskweaver/utils/__init__.py +++ b/taskweaver/utils/__init__.py @@ -80,6 +80,18 @@ def json_dump(obj: Any, fp: Any): json.dump(obj, fp, cls=EnhancedJSONEncoder) +def pretty_repr(val: Any, limit: int = 200) -> str: + try: + rendered = repr(val) + except Exception: + rendered = "" + + if len(rendered) > limit: + omitted = len(rendered) - limit + return f"{rendered[:limit]}...omitted {omitted} chars..." + return rendered + + def generate_md5_hash(content: str) -> str: from hashlib import md5 diff --git a/tests/unit_tests/data/examples/planner_examples/example-planner.yaml b/tests/unit_tests/data/examples/planner_examples/example-planner.yaml index 525463ab3..066396022 100644 --- a/tests/unit_tests/data/examples/planner_examples/example-planner.yaml +++ b/tests/unit_tests/data/examples/planner_examples/example-planner.yaml @@ -11,11 +11,6 @@ rounds: send_from: Planner send_to: CodeInterpreter attachment_list: - - type: init_plan - content: |- - 1. load the data file - 2. count the rows of the loaded data - 3. report the result to the user - type: plan content: |- 1. instruct CodeInterpreter to load the data file and count the rows of the loaded data @@ -30,14 +25,9 @@ rounds: send_from: Planner send_to: User attachment_list: - - type: init_plan - content: |- - 1. load the data file - 2. count the rows of the loaded data - 3. report the result to the user - type: plan content: |- 1. instruct CodeInterpreter to load the data file and count the rows of the loaded data 2. report the result to the user - type: current_plan_step - content: 2. report the result to the user \ No newline at end of file + content: 2. report the result to the user diff --git a/tests/unit_tests/data/examples/planner_examples/sub/example-planner.yaml b/tests/unit_tests/data/examples/planner_examples/sub/example-planner.yaml index 525463ab3..066396022 100644 --- a/tests/unit_tests/data/examples/planner_examples/sub/example-planner.yaml +++ b/tests/unit_tests/data/examples/planner_examples/sub/example-planner.yaml @@ -11,11 +11,6 @@ rounds: send_from: Planner send_to: CodeInterpreter attachment_list: - - type: init_plan - content: |- - 1. load the data file - 2. count the rows of the loaded data - 3. report the result to the user - type: plan content: |- 1. instruct CodeInterpreter to load the data file and count the rows of the loaded data @@ -30,14 +25,9 @@ rounds: send_from: Planner send_to: User attachment_list: - - type: init_plan - content: |- - 1. load the data file - 2. count the rows of the loaded data - 3. report the result to the user - type: plan content: |- 1. instruct CodeInterpreter to load the data file and count the rows of the loaded data 2. report the result to the user - type: current_plan_step - content: 2. report the result to the user \ No newline at end of file + content: 2. report the result to the user diff --git a/tests/unit_tests/data/prompts/planner_prompt.yaml b/tests/unit_tests/data/prompts/planner_prompt.yaml index 851daf707..84133a835 100644 --- a/tests/unit_tests/data/prompts/planner_prompt.yaml +++ b/tests/unit_tests/data/prompts/planner_prompt.yaml @@ -33,54 +33,38 @@ instruction_template: |- - Planner can ignore the permission or file access issues since Workers are powerful and can handle them. ## Planner's planning process - You need to make a step-by-step plan to complete the User's task. The planning process includes 2 phases: `init_plan` and `plan`. - In the `init_plan` phase, you need to decompose the User's task into subtasks and list them as the detailed plan steps. - In the `plan` phase, you need to refine the initial plan by merging adjacent steps that have sequential dependency or no dependency, unless the merged step becomes too complicated. + You need to make a step-by-step plan to complete the User's task. + When creating the plan, you should mentally decompose the task into subtasks and consider their dependencies: + - Sequential Dependency: the current subtask depends on the previous subtask, but they can be executed in one step by a Worker, and no additional information is required. + - Interactive Dependency: the current subtask depends on the previous subtask but they cannot be executed in one step by a Worker, typically without necessary information (e.g., hyperparameters, data path, model name, file content, data schema, etc.). + - No Dependency: the current subtask can be executed independently without any dependency. - ### init_plan - - Decompose User's task into subtasks and list them as the detailed subtask steps. - - Annotate the dependencies between these steps. There are 2 dependency types: - 1. Sequential Dependency: the current subtask depends on the previous subtask, but they can be executed in one step by a Worker, - and no additional information is required. - 2. Interactive Dependency: the current subtask depends on the previous subtask but they cannot be executed in one step by a Worker, - typically without necessary information (e.g., hyperparameters, data path, model name, file content, data schema, etc.). - 3. No Dependency: the current subtask can be executed independently without any dependency. - - The initial plan must contain dependency annotations for sequential and interactive dependencies. + Based on this analysis, create a compact plan by: + - Merging adjacent steps that have sequential dependency or no dependency into single steps + - Keeping steps with interactive dependency separate (they require intermediate results before proceeding) + - The final plan should be concise and actionable, without dependency annotations - ### plan - - Planner should try to merge adjacent steps that have sequential dependency or no dependency. - - Planner should not merge steps with interactive dependency. - - The final plan must not contain dependency annotations. + ### Examples of planning + The examples below show how to think about task decomposition and create compact plans: - ### Examples of planning process [Example 1] User: count rows for ./data.csv - init_plan: - 1. Read ./data.csv file - 2. Count the rows of the loaded data - 3. Check the execution result and report the result to the user + Reasoning: Reading and counting can be done in one step (sequential dependency), but we need execution results before reporting (interactive dependency). plan: 1. Read ./data.csv file and count the rows of the loaded data 2. Check the execution result and report the result to the user [Example 2] User: Read a manual file and follow the instructions in it. - init_plan: - 1. Read the file content and show its content to the user - 2. Follow the instructions based on the file content. - 3. Confirm the completion of the instructions and report the result to the user + Reasoning: We must read the file first to know what instructions to follow (interactive dependency), then execute them (interactive dependency). plan: 1. Read the file content and show its content to the user - 2. follow the instructions based on the file content. + 2. Follow the instructions based on the file content 3. Confirm the completion of the instructions and report the result to the user [Example 3] User: detect anomaly on ./data.csv - init_plan: - 1. Read the ./data.csv and show me the top 5 rows to understand the data schema - 2. Confirm the columns to be detected anomalies - 3. Detect anomalies on the loaded data - 4. Check the execution result and report the detected anomalies to the user + Reasoning: Reading data and confirming columns can be merged (sequential), but anomaly detection needs the confirmed columns (interactive). plan: 1. Read the ./data.csv and show me the top 5 rows to understand the data schema and confirm the columns to be detected anomalies 2. Detect anomalies on the loaded data @@ -88,12 +72,7 @@ instruction_template: |- [Example 4] User: read a.csv and b.csv and join them together - init_plan: - 1. Load a.csv as dataframe and show me the top 5 rows to understand the data schema - 2. Load b.csv as dataframe and show me the top 5 rows to understand the data schema - 3. Ask which column to join - 4. Join the two dataframes - 5. Check the execution result and report the joined data to the user + Reasoning: Loading both files and asking about join column can be merged (sequential/no dependency), but joining needs the column choice (interactive). plan: 1. Load a.csv and b.csv as dataframes, show me the top 5 rows to understand the data schema, and ask which column to join 2. Join the two dataframes @@ -120,13 +99,9 @@ response_json_schema: |- "response": { "type": "object", "properties": { - "init_plan": { - "type": "string", - "description": "The initial plan to decompose the User's task into subtasks and list them as the detailed subtask steps. The initial plan must contain dependency annotations for sequential and interactive dependencies." - }, "plan": { "type": "string", - "description": "The refined plan by merging adjacent steps that have sequential dependency or no dependency. The final plan must not contain dependency annotations." + "description": "The step-by-step plan to complete the User's task. Steps with sequential or no dependency should be merged. Steps with interactive dependency should be kept separate." }, "current_plan_step": { "type": "string", @@ -146,7 +121,6 @@ response_json_schema: |- } }, "required": [ - "init_plan", "plan", "current_plan_step", "send_to", diff --git a/tests/unit_tests/test_animation_pause_handshake.py b/tests/unit_tests/test_animation_pause_handshake.py new file mode 100644 index 000000000..8b14b238f --- /dev/null +++ b/tests/unit_tests/test_animation_pause_handshake.py @@ -0,0 +1,230 @@ +"""Tests for the animation pause handshake pattern in TaskWeaverRoundUpdater. + +The handshake uses two events: +- pause_animation: Main thread requests animation to pause +- animation_paused: Animation thread acknowledges it has paused +""" + +import threading +import time + + +class MockAnimationPauseHandshake: + """Minimal implementation of the handshake pattern for testing.""" + + def __init__(self): + self.pause_animation = threading.Event() + self.animation_paused = threading.Event() + self.animation_paused.set() # Initially paused (not running yet) + + self.exit_event = threading.Event() + self.update_cond = threading.Condition() + + # Track what animation thread does + self.output_count = 0 + self.output_log = [] + + def animation_thread_loop(self): + """Simulates the animation thread loop.""" + while not self.exit_event.is_set(): + # Check pause at START of loop + if self.pause_animation.is_set(): + self.animation_paused.set() + while self.pause_animation.is_set(): + if self.exit_event.is_set(): + return + with self.update_cond: + self.update_cond.wait(0.01) + continue + + self.animation_paused.clear() + + # Simulate output + self.output_count += 1 + self.output_log.append(f"output-{self.output_count}") + + with self.update_cond: + self.update_cond.wait(0.05) + + def request_pause(self, timeout: float = 1.0) -> bool: + """Request animation to pause and wait for acknowledgment.""" + self.pause_animation.set() + return self.animation_paused.wait(timeout=timeout) + + def release_pause(self): + """Release the pause and allow animation to resume.""" + self.animation_paused.clear() + self.pause_animation.clear() + + def stop(self): + """Stop the animation thread.""" + self.exit_event.set() + with self.update_cond: + self.update_cond.notify_all() + + +def test_handshake_pauses_animation(): + """Animation should stop outputting when pause is requested.""" + handler = MockAnimationPauseHandshake() + + t = threading.Thread(target=handler.animation_thread_loop, daemon=True) + t.start() + + # Let animation run for a bit + time.sleep(0.1) + count_before_pause = handler.output_count + assert count_before_pause > 0, "Animation should have produced output" + + # Request pause + assert handler.request_pause(), "Pause should be acknowledged" + + # Record count and wait + count_at_pause = handler.output_count + time.sleep(0.1) + count_after_wait = handler.output_count + + # No new output during pause + assert count_after_wait == count_at_pause, "Animation should not output while paused" + + # Release and verify animation resumes + handler.release_pause() + time.sleep(0.1) + count_after_release = handler.output_count + + assert count_after_release > count_at_pause, "Animation should resume after release" + + handler.stop() + t.join(timeout=1) + + +def test_handshake_blocks_until_acknowledged(): + """request_pause should block until animation acknowledges.""" + handler = MockAnimationPauseHandshake() + handler.animation_paused.clear() # Simulate animation running + + acknowledged = threading.Event() + + def delayed_acknowledge(): + time.sleep(0.1) + handler.animation_paused.set() + acknowledged.set() + + t = threading.Thread(target=delayed_acknowledge, daemon=True) + t.start() + + start = time.time() + result = handler.request_pause(timeout=1.0) + elapsed = time.time() - start + + assert result is True + assert elapsed >= 0.1, "Should have waited for acknowledgment" + assert acknowledged.is_set() + + t.join(timeout=1) + + +def test_handshake_timeout(): + """request_pause should timeout if animation doesn't acknowledge.""" + handler = MockAnimationPauseHandshake() + handler.animation_paused.clear() # Simulate animation that never acknowledges + + start = time.time() + result = handler.request_pause(timeout=0.1) + elapsed = time.time() - start + + # Event.wait returns False on timeout + assert result is False + assert elapsed >= 0.1 + + +def test_handshake_multiple_pause_resume_cycles(): + """Handshake should work correctly across multiple pause/resume cycles.""" + handler = MockAnimationPauseHandshake() + + t = threading.Thread(target=handler.animation_thread_loop, daemon=True) + t.start() + + for i in range(3): + # Let animation run + time.sleep(0.05) + handler.output_count + + # Pause + assert handler.request_pause(), f"Cycle {i}: Pause should be acknowledged" + count_at_pause = handler.output_count + time.sleep(0.05) + assert handler.output_count == count_at_pause, f"Cycle {i}: Should not output while paused" + + # Resume + handler.release_pause() + time.sleep(0.05) + assert handler.output_count > count_at_pause, f"Cycle {i}: Should resume after release" + + handler.stop() + t.join(timeout=1) + + +def test_handshake_exit_during_pause(): + """Animation thread should exit cleanly even while paused.""" + handler = MockAnimationPauseHandshake() + + t = threading.Thread(target=handler.animation_thread_loop, daemon=True) + t.start() + + # Pause animation + time.sleep(0.05) + handler.request_pause() + + # Exit while paused + handler.stop() + + # Thread should exit + t.join(timeout=1) + assert not t.is_alive(), "Thread should have exited" + + +def test_handshake_no_output_race(): + """Verify no output occurs between pause request and acknowledgment.""" + handler = MockAnimationPauseHandshake() + + t = threading.Thread(target=handler.animation_thread_loop, daemon=True) + t.start() + + # Let it run + time.sleep(0.05) + + # Pause and immediately record + handler.pause_animation.set() + handler.animation_paused.wait(timeout=1.0) + count_at_ack = handler.output_count + + # Wait and verify no change + time.sleep(0.1) + assert handler.output_count == count_at_ack, "No output should occur after acknowledgment" + + handler.release_pause() + handler.stop() + t.join(timeout=1) + + +def test_animation_paused_initially_set(): + """animation_paused should be set initially (before animation starts).""" + handler = MockAnimationPauseHandshake() + assert handler.animation_paused.is_set(), "Should be paused initially" + + +def test_animation_paused_cleared_when_running(): + """animation_paused should be cleared when animation is actively running.""" + handler = MockAnimationPauseHandshake() + + t = threading.Thread(target=handler.animation_thread_loop, daemon=True) + t.start() + + # Wait for animation to start running + time.sleep(0.1) + + # Should be cleared (running) + assert not handler.animation_paused.is_set(), "Should be cleared when running" + + handler.stop() + t.join(timeout=1) diff --git a/tests/unit_tests/test_event_emitter_confirmation.py b/tests/unit_tests/test_event_emitter_confirmation.py new file mode 100644 index 000000000..45d72af6b --- /dev/null +++ b/tests/unit_tests/test_event_emitter_confirmation.py @@ -0,0 +1,189 @@ +import threading +import time + +from taskweaver.module.event_emitter import ConfirmationHandler, PostEventType, SessionEventEmitter + + +class MockConfirmationHandler(ConfirmationHandler): + def __init__(self, auto_approve: bool = True): + self.auto_approve = auto_approve + self.confirmation_requested = False + self.last_code = None + self.last_round_id = None + self.last_post_id = None + + def request_confirmation(self, code: str, round_id: str, post_id: str | None) -> bool: + self.confirmation_requested = True + self.last_code = code + self.last_round_id = round_id + self.last_post_id = post_id + return self.auto_approve + + +def test_confirmation_auto_approve_when_no_handler(): + emitter = SessionEventEmitter() + emitter.start_round("test-round-1") + + result = emitter.request_code_confirmation("print('hello')", "post-1") + + assert result is True + assert not emitter.confirmation_pending + + +def test_confirmation_pending_property(): + emitter = SessionEventEmitter() + emitter.start_round("test-round-1") + handler = MockConfirmationHandler(auto_approve=True) + emitter.confirmation_handler = handler + + def request_thread(): + emitter.request_code_confirmation("print('hello')", "post-1") + + def provide_thread(): + while not emitter.confirmation_pending: + time.sleep(0.01) + + assert emitter.confirmation_pending + assert emitter.pending_confirmation_code == "print('hello')" + + emitter.provide_confirmation(True) + + t1 = threading.Thread(target=request_thread) + t2 = threading.Thread(target=provide_thread) + + t1.start() + t2.start() + + t1.join(timeout=2) + t2.join(timeout=2) + + assert not t1.is_alive() + assert not t2.is_alive() + + +def test_confirmation_approved(): + emitter = SessionEventEmitter() + emitter.start_round("test-round-1") + handler = MockConfirmationHandler() + emitter.confirmation_handler = handler + + result = None + + def request_thread(): + nonlocal result + result = emitter.request_code_confirmation("print('test')", "post-1") + + def provide_thread(): + while not emitter.confirmation_pending: + time.sleep(0.01) + emitter.provide_confirmation(True) + + t1 = threading.Thread(target=request_thread) + t2 = threading.Thread(target=provide_thread) + + t1.start() + t2.start() + + t1.join(timeout=2) + t2.join(timeout=2) + + assert result is True + + +def test_confirmation_rejected(): + emitter = SessionEventEmitter() + emitter.start_round("test-round-1") + handler = MockConfirmationHandler() + emitter.confirmation_handler = handler + + result = None + + def request_thread(): + nonlocal result + result = emitter.request_code_confirmation("rm -rf /", "post-1") + + def provide_thread(): + while not emitter.confirmation_pending: + time.sleep(0.01) + emitter.provide_confirmation(False) + + t1 = threading.Thread(target=request_thread) + t2 = threading.Thread(target=provide_thread) + + t1.start() + t2.start() + + t1.join(timeout=2) + t2.join(timeout=2) + + assert result is False + + +def test_confirmation_events_emitted(): + emitter = SessionEventEmitter() + emitter.start_round("test-round-1") + handler = MockConfirmationHandler() + emitter.confirmation_handler = handler + + events_captured = [] + + class EventCapture: + def handle(self, event): + events_captured.append(event) + + emitter.register(EventCapture()) + + def request_thread(): + emitter.request_code_confirmation("test_code", "post-1") + + def provide_thread(): + while not emitter.confirmation_pending: + time.sleep(0.01) + emitter.provide_confirmation(True) + + t1 = threading.Thread(target=request_thread) + t2 = threading.Thread(target=provide_thread) + + t1.start() + t2.start() + + t1.join(timeout=2) + t2.join(timeout=2) + + request_events = [e for e in events_captured if e.t == PostEventType.post_confirmation_request] + response_events = [e for e in events_captured if e.t == PostEventType.post_confirmation_response] + + assert len(request_events) == 1 + assert request_events[0].msg == "test_code" + assert request_events[0].extra["code"] == "test_code" + + assert len(response_events) == 1 + assert response_events[0].msg == "approved" + assert response_events[0].extra["approved"] is True + + +def test_confirmation_state_cleared_after_response(): + emitter = SessionEventEmitter() + emitter.start_round("test-round-1") + handler = MockConfirmationHandler() + emitter.confirmation_handler = handler + + def request_thread(): + emitter.request_code_confirmation("code1", "post-1") + + def provide_thread(): + while not emitter.confirmation_pending: + time.sleep(0.01) + emitter.provide_confirmation(True) + + t1 = threading.Thread(target=request_thread) + t2 = threading.Thread(target=provide_thread) + + t1.start() + t2.start() + + t1.join(timeout=2) + t2.join(timeout=2) + + assert not emitter.confirmation_pending + assert emitter.pending_confirmation_code is None diff --git a/website/docs/quickstart.md b/website/docs/quickstart.md index e5b89ad8c..5ab018828 100644 --- a/website/docs/quickstart.md +++ b/website/docs/quickstart.md @@ -37,7 +37,9 @@ A project directory typically contains the following files and folders: ## OpenAI Configuration Before running TaskWeaver, you need to provide your OpenAI API key and other necessary information. -You can do this by editing the `taskweaver_config.json` file. +You can do this by creating a `taskweaver_config.json` file in your project directory. +A template file `taskweaver_config.json.example` is provided - copy it to `taskweaver_config.json` and fill in your credentials. + If you are using Azure OpenAI, you need to set the following parameters in the `taskweaver_config.json` file: ### Azure OpenAI ```json @@ -83,6 +85,5 @@ Human: ___ ``` There are other ways to start TaskWeaver: -- [A Chainlit UI interface](./usage/webui.md): TaskWeaver provides an experimental web-based interface to interact with the system. - [A Library](./usage/library.md): You can also use TaskWeaver as a library in your Python code. - [The all-in-one Docker image](./usage/docker.md): We provide a Docker image that contains all the dependencies to run TaskWeaver. diff --git a/website/docs/usage/webui.md b/website/docs/usage/webui.md deleted file mode 100644 index d1c31daaa..000000000 --- a/website/docs/usage/webui.md +++ /dev/null @@ -1,36 +0,0 @@ -# Web UI - -:::warning -Please note that this Web UI is a playground for development and testing purposes only. -Be cautious when running the Web UI, as anyone can access it if the port is open to the public. -If you want to deploy a Web UI for production, you need to address security concerns, such as authentication and authorization, -making sure the server is secure. -::: - -Follow the instruction in [Quick Start](../quickstart.md) to clone the repository and fill in the necessary configurations. - -Install the `chainlit` package by `pip install -U "chainlit<1.1.300"` if you don't have it in your environment. - -:::note -Chainlit has a major update in version 1.1.300 that may cause compatibility issues. -Please make sure you have the correct version installed. -::: - -Start the service by running the following command. - - -```bash -# assume you are in the TaskWeaver folder -cd playground/UI/ -# make sure you are in playground/UI/ folder -chainlit run app.py -``` - -Open the browser with http://localhost:8000 if it doesn't open automatically. -:::info -We now support uploading files using the Web UI. -::: -Below are some screenshots of the Web UI: -![TaskWeaver UI Screenshot 1](../../static/img/ui_screenshot_1.png) -![TaskWeaver UI Screenshot 2](../../static/img/ui_screenshot_2.png) - diff --git a/website/sidebars.js b/website/sidebars.js index 577af653c..1d60a040c 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -60,7 +60,6 @@ const sidebars = { collapsed: false, items: [ 'usage/cmd', - 'usage/webui', 'usage/library', 'usage/docker', ], diff --git a/website/static/img/ui_screenshot_1.png b/website/static/img/ui_screenshot_1.png deleted file mode 100644 index 0726254b7..000000000 Binary files a/website/static/img/ui_screenshot_1.png and /dev/null differ diff --git a/website/static/img/ui_screenshot_2.png b/website/static/img/ui_screenshot_2.png deleted file mode 100644 index 84df8c931..000000000 Binary files a/website/static/img/ui_screenshot_2.png and /dev/null differ