Skip to content

Commit 85d2979

Browse files
committed
Add optimal execution trading example
Created a new example that demonstrates optimal trading execution: - Implemented market simulator with price impact model - Added evaluator with metrics for cost improvement vs baseline - Fixed parameter ordering in simulate_execution function - Included timeout handling for search algorithms
1 parent aa567a7 commit 85d2979

File tree

3 files changed

+520
-0
lines changed

3 files changed

+520
-0
lines changed
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
README
2+
======
3+
4+
Optimal-Execution Toy Benchmark for OpenEvolve
5+
---------------------------------------------
6+
7+
This repository contains a **minimal yet complete** benchmark that lets an evolutionary-search engine learn how to execute a fixed quantity of shares in an order-book with market impact.
8+
It mirrors the structure of the earlier “function-minimisation” example but replaces the mathematical objective with a *trading* objective:
9+
10+
*Minimise implementation-shortfall / slippage when buying or selling a random volume during a short horizon.*
11+
12+
The benchmark is intentionally lightweight – short Python, no external dependencies – yet it shows every building-block you would find in a realistic execution engine:
13+
14+
1. synthetic order-book generation
15+
2. execution-schedule parameterisation
16+
3. a search / learning loop confined to an `EVOLVE-BLOCK`
17+
4. an **independent evaluator** that scores candidates on unseen market scenarios.
18+
19+
-------------------------------------------------------------------------------
20+
21+
Repository Layout
22+
-----------------
23+
24+
```
25+
.
26+
├── initial_program.py # candidate – contains the EVOLVE-BLOCK
27+
├── evaluator.py # ground-truth evaluator
28+
└── README.md # ← you are here
29+
```
30+
31+
Why two files?
32+
`initial_program.py` is what the evolutionary framework mutates.
33+
`evaluator.py` is trusted, *never* mutated and imports nothing except the
34+
candidate’s public `run_search()` function.
35+
36+
-------------------------------------------------------------------------------
37+
38+
Quick-start
39+
-----------
40+
41+
```
42+
python initial_program.py
43+
# Runs the candidate’s own training loop (random-search on α)
44+
45+
python evaluator.py initial_program.py
46+
# Scores the candidate on fresh market scenarios
47+
```
48+
49+
Typical console output:
50+
51+
```
52+
Best alpha: 1.482 | Estimated average slippage: 0.00834
53+
{'value_score': 0.213, 'speed_score': 0.667,
54+
'reliability': 1.0, 'overall_score': 0.269}
55+
```
56+
57+
-------------------------------------------------------------------------------
58+
59+
1. Mechanics – Inside the Candidate (`initial_program.py`)
60+
----------------------------------------------------------
61+
62+
The file is split into two parts:
63+
64+
### 1.1 EVOLVE-BLOCK (mutable)
65+
66+
```python
67+
# EVOLVE-BLOCK-START … EVOLVE-BLOCK-END
68+
```
69+
70+
Only the code between those delimiters will be altered by OpenEvolve.
71+
Everything else is *frozen*; it plays the role of a “library.”
72+
73+
Current strategy:
74+
75+
1. **Parameter** – a single scalar `alpha (α)`
76+
• α < 0 → front-loads the schedule
77+
• α = 0 → uniform (TWAP)
78+
• α > 0 → back-loads the schedule
79+
80+
2. **Search** – naïve random search over α
81+
(`search_algorithm()` evaluates ~250 random α’s and keeps the best.)
82+
83+
3. **Fitness** – measured by `evaluate_alpha()` which, in turn, calls the
84+
**fixed** simulator (`simulate_execution`) for many random scenarios and
85+
averages per-share slippage.
86+
87+
Return signature required by the evaluator:
88+
89+
```python
90+
def run_search() -> tuple[float, float]:
91+
return best_alpha, estimated_cost
92+
```
93+
94+
The first element (α) is mandatory; anything after that is ignored by the
95+
evaluator but can be useful for debugging.
96+
97+
### 1.2 Fixed “library” code (non-mutable)
98+
99+
* `create_schedule(volume, horizon, alpha)`
100+
Weights each slice `(t+1)^α`, then normalises to equal volume.
101+
102+
* `simulate_execution(...)`
103+
Ultra-simplified micro-structure:
104+
105+
• The mid-price `P_t` follows a Gaussian random walk
106+
• The current spread is constant (`±spread/2`)
107+
• Market impact grows linearly with child-order size relative to
108+
book depth:
109+
`impact = (size / depth) * spread/2`
110+
111+
Execution price for each slice:
112+
113+
```
114+
BUY : P_t + spread/2 + impact
115+
SELL: P_t - spread/2 - impact
116+
```
117+
118+
Slippage is summed over the horizon and returned *per share*.
119+
120+
-------------------------------------------------------------------------------
121+
122+
2. Mechanics – The Evaluator (`evaluator.py`)
123+
---------------------------------------------
124+
125+
The evaluator is the **oracle**; it owns the test scenarios and the scoring
126+
function. A successful candidate must *generalise*: the random numbers in
127+
the evaluator are independent from those inside the candidate.
128+
129+
### 2.1 Process flow
130+
131+
For each of `NUM_TRIALS = 10`:
132+
133+
1. Draw a *fresh* `(volume, side)` pair
134+
`volume ∈ [100, 1000]`, `side ∈ {buy, sell}`
135+
136+
2. Call `run_search()` **once** (time-limited to 8 s)
137+
138+
3. Extract α and compute:
139+
140+
```
141+
cost_candidate = simulate_execution(vol, side, α)
142+
cost_baseline = simulate_execution(vol, side, 0.0) # uniform TWAP
143+
improvement = (cost_baseline - cost_candidate)
144+
/ max(cost_baseline, 1e-9)
145+
```
146+
147+
4. Store runtime and improvement.
148+
149+
### 2.2 Scores
150+
151+
After the 10 trials:
152+
153+
```
154+
value_score = mean(max(0, improvement)) ∈ [0, 1]
155+
speed_score = min(10, 1/mean(runtime)) / 10 ∈ [0, 1]
156+
reliability_score = success / 10 ∈ [0, 1]
157+
158+
overall_score = 0.8·value + 0.1·speed + 0.1·reliability
159+
```
160+
161+
Intuition:
162+
163+
* **Value** (quality of execution) dominates.
164+
* **Speed** rewards fast optimisation but is capped.
165+
* **Reliability** ensures the candidate rarely crashes or times-out.
166+
167+
### 2.3 Stage-based evaluation (optional)
168+
169+
* `evaluate_stage1()` – smoke-test; passes if `overall_score > 0.05`
170+
* `evaluate_stage2()` – identical to `evaluate()`
171+
172+
Those mirrors the two-stage funnel from the previous demo.
173+
174+
-------------------------------------------------------------------------------
175+
176+
3. Extending the Benchmark
177+
--------------------------
178+
179+
The framework is deliberately tiny so you can experiment.
180+
181+
Ideas:
182+
183+
1. **Richer parameterisation**
184+
• Add `beta` for *U-shape* schedule
185+
• Add *child-order participation cap* (%ADV)
186+
187+
2. **Better search / learning**
188+
• Replace random search with gradient-free CMA-ES, Bayesian optimisation or
189+
even RL inside the EVOLVE-BLOCK.
190+
191+
3. **Enhanced market model**
192+
• Stochastic spread
193+
• Non-linear impact (`impact ∝ volume^γ`)
194+
• Resilience (price reverts after child order)
195+
196+
4. **Multi-objective scoring**
197+
Mix risk metrics (variance of slippage) into the evaluator.
198+
199+
When you add knobs, remember:
200+
201+
* All **simulation logic for evaluation must live in `evaluator.py`**.
202+
Candidates cannot peek or tamper with it.
203+
* The evaluator must still be able to extract the *decision variables* from
204+
the tuple returned by `run_search()`.
205+
206+
-------------------------------------------------------------------------------
207+
208+
4. Known Limitations
209+
--------------------
210+
211+
1. **Impact model is linear & memory-less**
212+
Good for demonstration; unrealistic for real-world HFT.
213+
214+
2. **No order-book micro-structure**
215+
We do not simulate queue positions, cancellations, hidden liquidity, etc.
216+
217+
3. **Single parameter α**
218+
Optimal execution in reality depends on volatility, spread forecast,
219+
order-book imbalance and so forth. Here we sidestep all that for clarity.
220+
221+
4. **Random search baseline**
222+
Evolutionary engines will easily outperform it; that is the point – we
223+
want a hill to climb.
224+
225+
-------------------------------------------------------------------------------
226+
227+
5. FAQ
228+
------
229+
230+
Q: **Why does the evaluator re-implement `simulate_execution`?**
231+
A: To guarantee the candidate cannot cheat by hard-coding answers from its own
232+
RNG realisations.
233+
234+
Q: **What happens if my `run_search()` returns something weird?**
235+
A: The evaluator casts the *first* item to `float`. Non-numeric or `NaN`
236+
values yield zero score.
237+
238+
Q: **Is it okay to import heavy libraries (pandas, torch) inside the EVOLVE-BLOCK?**
239+
A: Technically yes, but remember the 8-second time-out and the judge’s machine
240+
may not have GPU or large RAM.
241+
242+
-------------------------------------------------------------------------------
243+
244+
6. License
245+
----------
246+
247+
The example is released under the MIT License – do whatever you like, but
248+
please keep references to the original authorship when redistributing.
249+

0 commit comments

Comments
 (0)