ServiceNow · amanjaiswal73892 · Jul 29, 2025 · Jul 29, 2025
diff --git a/experiments/osworld_docker_test.py b/experiments/osworld_docker_test.py
diff --git a/src/agentlab/benchmarks/setup.md → src/agentlab/benchmarks/osworld.md b/src/agentlab/benchmarks/setup.md → src/agentlab/benchmarks/osworld.md
@@ -31,7 +31,7 @@ The main entry point `experiments/run_osworld.py` is currently configured with h
 2. **Environment Variables:**
    - `AGENTLAB_DEBUG=1`: Automatically runs the debug subset (7 tasks from `osworld_debug_task_ids.json`)
 
-### Running OSWorld Tasks
+### Task subsets
 
 We provide different subsets of tasks:
 
@@ -42,10 +42,28 @@ We provide different subsets of tasks:
 ### Example Commands
 
 ```bash
-# Run with default debug subset (7 tasks)
+# Run with default debug subset using sequential execution in VMware VM
 python experiments/run_osworld.py
 ```
 
+### Parallel Execution with Docker
+To run OSWorld in parallel using Docker, ensure you have Docker installed and configured.
+To install it, follow the section from the OSWorld README on [Docker setup](https://github.com/xlang-ai/OSWorld?tab=readme-ov-file#docker-server-with-kvm-support-for-better-performance).
+Ensure that your docker installation support KVM, as OSWorld requires it for running VMs.
+We also recommend pulling the latest Docker image for OSWorld before running the benchmark:
+
+```bash
+docker pull happysixd/osworld-docker
+```
+
+After setting up Docker, you can change the `use_vmware` parameter in the script to `False` and run:
+
+```bash
+python experiments/run_osworld.py
+```
+You can control number of parallel jobs by setting the `n_jobs` parameter in the script, the default is 4.
+We recommend setting `n_jobs` to `your_number_of_cpu_cores - 2` to leave some resources for the host system and the benchmark itself.
+
 
 ### Configuration Notes