diff --git a/documentation/cookbook/index.md b/documentation/cookbook/index.md index 4089f536b..c8c15d1e1 100644 --- a/documentation/cookbook/index.md +++ b/documentation/cookbook/index.md @@ -18,12 +18,24 @@ Each recipe provides a focused solution to a specific problem, with working code ## Structure -The Cookbook is organized into three main sections: +The Cookbook is organized into the following sections: - **SQL Recipes** - Common SQL patterns, window functions, and time-series queries + - **[Capital Markets](/docs/cookbook/sql/finance/)** - Technical indicators, execution analysis, and risk metrics for financial data + - **[Time-Series Patterns](/docs/cookbook/sql/time-series/elapsed-time/)** - Common patterns for working with time-series data + - **[Advanced SQL](/docs/cookbook/sql/advanced/rows-before-after-value-match/)** - Complex query patterns like pivoting, funnels, and histograms - **Programmatic** - Language-specific client examples and integration patterns - **Operations** - Deployment, configuration, and operational tasks +### Post-trade and execution analysis + +QuestDB's time-series joins (`ASOF JOIN`, `HORIZON JOIN`) and high-resolution timestamps make it well-suited for **Transaction Cost Analysis (TCA)** and post-trade workflows. The [Execution & Post-Trade Analysis](/docs/cookbook/sql/finance/) section includes recipes for: + +- [Slippage measurement](/docs/cookbook/sql/finance/slippage/) - Per-fill and aggregated slippage against mid and top-of-book +- [Markout analysis](/docs/cookbook/sql/finance/markout/) - Post-trade price reversion curves and adverse selection detection +- [Last look detection](/docs/cookbook/sql/finance/last-look/) - Millisecond-granularity counterparty analysis +- [Implementation shortfall](/docs/cookbook/sql/finance/implementation-shortfall/) - Cost decomposition into spread, permanent, and temporary impact + ## Running the examples **Most recipes run directly on our [live demo instance at demo.questdb.com](https://demo.questdb.com)** without any local setup. Queries that can be executed on the demo site are marked with a direct link to run them. diff --git a/documentation/cookbook/sql/finance/ecn-scorecard.md b/documentation/cookbook/sql/finance/ecn-scorecard.md new file mode 100644 index 000000000..afffc0fae --- /dev/null +++ b/documentation/cookbook/sql/finance/ecn-scorecard.md @@ -0,0 +1,233 @@ +--- +title: ECN scorecard +sidebar_label: ECN scorecard +description: Compare venue fill quality with a single dashboard query combining spread, slippage, fill size, and passive ratio +--- + +When evaluating execution across multiple venues, you often need several metrics side by side: spread conditions, slippage, fill sizes, and order type mix. Rather than running separate queries, this recipe produces a single **ECN scorecard** that summarizes fill quality per venue and symbol. + +## Problem + +You want a single dashboard-ready query that ranks venues by execution quality, combining spread at fill time, slippage against mid and top of book, average fill size, and what proportion of fills were passive. + +## Solution + +Use `ASOF JOIN` to pair each fill with the prevailing order book, then aggregate multiple metrics per ECN and symbol: + +```questdb-sql demo title="ECN fill quality scorecard (buy side)" +SELECT + t.symbol, + t.ecn, + count() AS fill_count, + sum(t.quantity) AS total_volume, + avg(t.quantity) AS avg_fill_size, + avg((m.best_ask - m.best_bid) + / ((m.best_bid + m.best_ask) / 2) * 10000) AS avg_spread_bps, + avg(((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000) AS avg_slippage_bps, + avg((m.best_ask - t.price) + / t.price * 10000) AS avg_slippage_vs_ask_bps, + avg(CASE WHEN t.passive THEN 1.0 ELSE 0.0 END) AS passive_ratio +FROM fx_trades t +ASOF JOIN market_data m ON (symbol) +WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.ecn +ORDER BY t.symbol, avg_slippage_bps; +``` + +## How it works + +Each row is one symbol-ECN combination. The metrics in each row: + +- **`fill_count`** and **`total_volume`** — how much activity the ECN sees for this symbol. Context for statistical significance. +- **`avg_fill_size`** — average quantity per fill. Venues with larger average fills may show more slippage simply due to size. +- **`avg_spread_bps`** — average spread at the time of each fill. Tells you what market conditions looked like when you traded on this venue. +- **`avg_slippage_bps`** — average slippage vs mid. Since this is buy-side, negative means you bought below mid (price improvement), positive means you paid above mid. +- **`avg_slippage_vs_ask_bps`** — average slippage vs the best ask. Isolates how much worse than the quoted ask you actually paid. Negative means you got price improvement vs the ask. +- **`passive_ratio`** — fraction of fills that were passive (limit orders). Higher passive ratio typically correlates with better slippage. + +Results are ordered by `avg_slippage_bps` so the best-performing ECN for each symbol appears first. + +:::note Buy-side only +This query filters to `side = 'buy'` because the slippage formulas are direction-specific (no `CASE` expression). For a sell-side scorecard, flip the slippage formulas: use `(t.price - mid) / t.price` for slippage vs mid, and `(t.price - m.best_bid) / t.price` for slippage vs bid. +::: + +## Interpreting results + +Compare rows for the same symbol across different ECNs: + +- **Low spread + low slippage**: The best combination — tight market and good fills. +- **Low spread + high slippage**: Tight quotes but fills executing poorly. May indicate latency issues or thin top-of-book liquidity. +- **High passive ratio + negative slippage**: Expected — passive fills provide liquidity and often get price improvement. +- **Large `avg_fill_size` + high slippage**: Size-driven impact. The venue may have less depth, causing larger orders to walk the book. +- **Low `fill_count`**: Treat metrics with caution — small sample sizes can be misleading. + +## ECN markout curves + +The scorecard above is a static snapshot. To see how fill quality evolves over time after execution, overlay markout curves per ECN. An ECN where markouts go steeply negative is delivering toxic flow — informed traders are picking you off there: + +```questdb-sql title="ECN markout curves side by side (buy side)" +SELECT + t.symbol, + t.ecn, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + avg(((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000) AS avg_markout_bps, + sum(((m.best_bid + m.best_ask) / 2 - t.price) + * t.quantity) AS total_pnl +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + RANGE FROM 0s TO 5m STEP 5s AS h +WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.ecn, h.offset +ORDER BY t.symbol, t.ecn, h.offset; +``` + +Plot these curves overlaid per ECN for each symbol. Compare the shapes: + +- **Flat near zero**: Neutral flow — no systematic post-trade price movement. This is healthy. +- **Rising (positive)**: Mean-reverting flow — the market comes back after the fill. You're providing liquidity at good levels on this venue. +- **Falling (negative)**: Toxic flow — the market moves against you after fills on this ECN. Informed traders may be concentrated there. +- **Sharp initial drop then flat**: The initial cost is the spread, and the market doesn't move further. Normal for aggressive fills on a well-functioning venue. + +Combine with the scorecard's `passive_ratio` and `avg_fill_size` to understand *why* a venue shows toxicity — it may simply be where your largest aggressive orders execute, rather than a venue-specific problem. + +## Toxicity by time of day + +Toxicity isn't static — an ECN may show clean markouts during London hours but turn toxic during Asia when liquidity thins out. Grouping by hour reveals intraday patterns: + +```questdb-sql title="ECN toxicity by hour (buy side)" +SELECT + t.symbol, + t.ecn, + hour(t.timestamp) AS hour_utc, + h.offset, + count() AS n, + avg(((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000) AS markout_5s_bps, + avg((m.best_ask - m.best_bid) + / ((m.best_bid + m.best_ask) / 2) * 10000) AS avg_spread_bps +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + LIST (5s) AS h +WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.ecn, hour(t.timestamp), h.offset +ORDER BY t.symbol, t.ecn, hour_utc; +``` + +The 5-second markout is used as a quick toxicity signal — long enough for informed flow to show up, short enough to stay responsive. + +Compare `markout_5s_bps` against `avg_spread_bps` for each hour. If an ECN shows tight spreads but deeply negative markouts during certain hours, the tight spreads are bait — you're earning a small spread but losing much more to adverse selection. Consider reducing or withdrawing liquidity on that venue during those hours. + +## Passive vs aggressive toxicity + +The aggregate markout curves above blend passive and aggressive fills together. Splitting by `t.passive` reveals a critical distinction — toxicity on passive fills means your resting orders are being picked off, while toxicity on aggressive fills means you're crossing into a market that moves against you immediately: + +```questdb-sql title="Passive vs aggressive toxicity per ECN (buy side)" +SELECT + t.symbol, + t.ecn, + t.passive, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + avg(((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000) AS avg_markout_bps +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + LIST (0, 1s, 5s, 10s, 1m) AS h +WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.ecn, t.passive, h.offset +ORDER BY t.symbol, t.ecn, t.passive, h.offset; +``` + +Compare the markout curves for `passive = true` vs `passive = false` on each ECN: + +- **Healthy passive fills**: Positive markout at offset 0 (you earned the spread), gradually decaying toward zero. You rested at a good level and the market didn't move against you. +- **Toxic passive fills**: Markout turns negative quickly. Someone on that ECN is systematically sniping your resting orders — they trade against you just before the market moves in their direction. +- **Healthy aggressive fills**: Small negative markout at offset 0 (you paid the spread), staying flat or recovering. Normal cost of crossing. +- **Toxic aggressive fills**: Markout becomes increasingly negative. The market continues to move against you after you cross, suggesting you're consistently late or trading against informed flow. + +An ECN showing clean aggregate markouts can still have a problem if passive fills are deeply toxic while aggressive fills look fine — the two patterns cancel out in the blend. Always check both sides separately. + +## Composite toxicity score + +Rank ECNs by a single toxicity metric — the volume-weighted 5-second markout — alongside an `adverse_fill_ratio` that shows what fraction of fills moved against you: + +```questdb-sql title="Composite toxicity score per ECN (buy side)" +SELECT + t.symbol, + t.ecn, + h.offset, + count() AS fill_count, + sum(t.quantity) AS total_volume, + sum(((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000 * t.quantity) + / sum(t.quantity) AS vw_markout_5s_bps, + avg(CASE + WHEN (m.best_bid + m.best_ask) / 2 < t.price THEN 1.0 + ELSE 0.0 + END) AS adverse_fill_ratio +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + LIST (5s) AS h +WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.ecn, h.offset +ORDER BY t.symbol, vw_markout_5s_bps; +``` + +The two metrics complement each other: + +- **`vw_markout_5s_bps`** — volume-weighted 5-second markout in basis points. Negative means the market moved against you after fills on this ECN. Volume-weighting ensures large fills dominate the score. +- **`adverse_fill_ratio`** — fraction of fills where the mid-price at 5 seconds was worse than the execution price. Tells you whether toxicity is driven by a few large bad fills or is systemic across the board. + +An ECN with a mildly negative `vw_markout_5s_bps` but 80%+ `adverse_fill_ratio` is fundamentally hostile — nearly every fill moves against you, even if the average magnitude is small. Conversely, a deeply negative `vw_markout_5s_bps` with a low `adverse_fill_ratio` suggests a few large toxic fills are dragging down the average, which may be addressable by adjusting size limits on that venue. + +## Pivoted ECN scorecard + +The sections above produce one row per ECN per horizon offset. Using `PIVOT`, you can reshape the results into a wide format — one row per symbol-ECN combination with fill count, average size, volume, and markout at each horizon as separate columns: + +```questdb-sql title="Pivoted ECN scorecard (buy side)" +WITH markouts AS ( + SELECT + t.symbol, + t.ecn, + t.price, + t.quantity, + h.offset, + m.best_bid, + m.best_ask + FROM fx_trades t + HORIZON JOIN market_data m ON (symbol) + LIST (0, 5s, 1m) AS h + WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +) +SELECT * FROM markouts +PIVOT ( + count() AS fills, + avg(quantity) AS avg_size, + sum(quantity) AS volume, + avg(((best_bid + best_ask) / 2 - price) / price * 10000) AS markout_bps + FOR offset IN (0 AS at_fill, 5000000000 AS t_5s, 60000000000 AS t_1m) + GROUP BY symbol, ecn +) +ORDER BY t_5s_markout_bps; +``` + +The result has columns like `at_fill_fills`, `at_fill_markout_bps`, `t_5s_markout_bps`, `t_1m_markout_bps`, etc. — one set per horizon. This is useful for dashboard views where you want a single wide table rather than long-form output. + +Raw markouts can be misleading if an ECN rejects most of your flow and only fills the toxic orders. Compare `at_fill_fills` and `at_fill_avg_size` across ECNs — an ECN that fills fewer, smaller orders but shows clean markouts may simply be rejecting the hard-to-fill flow. A more complete picture requires comparing fill sizes against quoted sizes or incorporating reject rates from an orders table. + +:::info Related documentation +- [ASOF JOIN](/docs/query/sql/asof-join/) +- [HORIZON JOIN](/docs/query/sql/horizon-join/) +- [Slippage per fill recipe](slippage.md) +- [Markout analysis recipe](markout.md) +- [Bid-ask spread recipe](bid-ask-spread.md) +::: diff --git a/documentation/cookbook/sql/finance/implementation-shortfall-order.md b/documentation/cookbook/sql/finance/implementation-shortfall-order.md new file mode 100644 index 000000000..f88e5fc83 --- /dev/null +++ b/documentation/cookbook/sql/finance/implementation-shortfall-order.md @@ -0,0 +1,246 @@ +--- +title: Order-level implementation shortfall +sidebar_label: Implementation shortfall (order) +description: Calculate total implementation shortfall per order by comparing volume-weighted execution price against arrival mid +--- + +The [fill-level IS decomposition](implementation-shortfall.md) breaks down cost into spread, permanent, and temporary components per symbol. This recipe calculates **total implementation shortfall per order** — comparing the volume-weighted average execution price across all fills against the mid-price at the time the first fill arrived. + +This is the headline TCA metric: how much did the entire order cost relative to where the market was when you started executing? + +## Problem + +Orders in `fx_trades` are often split into multiple partial fills (rows sharing the same `order_id`). You want to compute a single cost metric per order that accounts for all fills, weighted by size, and benchmarked against the arrival price (the mid at the time of the first fill). + +## Solution + +Use `ASOF JOIN` to capture the mid-price at each fill, then aggregate by `order_id` to get the volume-weighted average execution price and arrival mid: + +```questdb-sql demo title="Total implementation shortfall per order" +WITH fills_enriched AS ( + SELECT + f.order_id, + f.symbol, + f.side, + f.price, + f.quantity, + f.timestamp, + (m.best_bid + m.best_ask) / 2 AS mid_at_fill + FROM fx_trades f + ASOF JOIN market_data m ON (symbol) + WHERE f.timestamp IN '$yesterday' +), +order_summary AS ( + SELECT + order_id, + symbol, + side, + first(mid_at_fill) AS arrival_mid, + sum(price * quantity) / sum(quantity) AS avg_exec_price, + sum(quantity) AS total_qty, + count() AS n_fills, + min(timestamp) AS first_fill_ts, + max(timestamp) AS last_fill_ts + FROM fills_enriched + GROUP BY order_id, symbol, side +) +SELECT + order_id, + symbol, + side, + n_fills, + total_qty, + CASE WHEN side = 'buy' THEN 1 ELSE -1 END + * (avg_exec_price - arrival_mid) + / arrival_mid * 10000 AS total_is_bps +FROM order_summary +ORDER BY total_is_bps DESC; +``` + +## How it works + +### Step 1: Enrich fills with market state + +The `ASOF JOIN` pairs each fill with the most recent order book snapshot to compute the mid-price at execution time. + +### Step 2: Aggregate to order level + +The `order_summary` CTE groups fills by `order_id` and computes: + +- **`arrival_mid`** — `first(mid_at_fill)` gives the mid at the time of the earliest fill, which serves as the arrival price benchmark +- **`avg_exec_price`** — volume-weighted average price across all fills: `sum(price * quantity) / sum(quantity)` +- **`n_fills`** and **`total_qty`** — order size context + +### Step 3: Compute IS + +The final SELECT calculates the shortfall in basis points: + +``` +IS = direction * (avg_exec_price - arrival_mid) / arrival_mid * 10000 +``` + +Where `direction` is +1 for buys, -1 for sells — so positive IS always means you paid more than the arrival benchmark. + +Results are ordered worst-first (`DESC`) so the most expensive orders appear at the top. + +## Interpreting results + +- **Near-zero IS**: The order executed close to the arrival price. Good execution for the order size. +- **Positive IS (cost)**: The order executed worse than the arrival mid. For multi-fill orders, later fills may have walked the book or the market moved during execution. +- **Negative IS (savings)**: The order beat the arrival benchmark. Can happen with patient limit orders or favorable market movement during execution. +- **High `n_fills`**: Orders with many partial fills are more likely to show IS due to market movement between fills. Compare IS against `n_fills` and `last_fill_ts - first_fill_ts` to understand whether cost came from market impact or execution duration. + +## Execution drift (delay cost) + +Total IS tells you *how much* an order cost, but not *when* that cost accrued. Execution drift measures how much the mid-price moved against you between the first and last fill — isolating the cost of taking time to complete the order: + +```questdb-sql demo title="Mid-price drift during order execution" +WITH fills_enriched AS ( + SELECT + f.order_id, + f.symbol, + f.side, + f.price, + f.quantity, + f.timestamp, + (m.best_bid + m.best_ask) / 2 AS mid_at_fill + FROM fx_trades f + ASOF JOIN market_data m ON (symbol) + WHERE f.timestamp IN '$yesterday' +), +order_bounds AS ( + SELECT + order_id, + symbol, + side, + first(mid_at_fill) AS arrival_mid, + last(mid_at_fill) AS mid_at_last_fill, + min(timestamp) AS first_fill_ts, + max(timestamp) AS last_fill_ts + FROM fills_enriched + GROUP BY order_id, symbol, side +) +SELECT + order_id, + symbol, + side, + CASE WHEN side = 'buy' THEN 1 ELSE -1 END + * (mid_at_last_fill - arrival_mid) + / arrival_mid * 10000 AS execution_drift_bps, + last_fill_ts - first_fill_ts AS execution_duration +FROM order_bounds +ORDER BY execution_drift_bps DESC; +``` + +`execution_drift_bps` measures how much the mid moved against you from first fill to last fill. `execution_duration` shows how long the order took to complete. + +:::note Arrival price vs first fill +In this dataset, the arrival price and first fill are effectively the same moment. In a real trading system, the arrival price would be the mid at decision time (before the order was sent), and **delay cost** would be the drift from decision to first fill. With `fx_trades`, the best available proxy is drift during execution — from first fill to last fill. +::: + +High drift on long-duration orders suggests the market is moving against you while you execute. This can indicate that order sizes are too large for the available liquidity, or that execution is too slow. Compare with total IS — if drift accounts for most of the IS, faster execution would reduce costs. + +## Spread cost per order + +Isolate the spread component of execution cost — the quantity-weighted half-spread paid across all fills in an order: + +```questdb-sql demo title="Spread cost per order" +WITH fills_enriched AS ( + SELECT + f.order_id, + f.symbol, + f.side, + f.price, + f.quantity, + m.best_ask - m.best_bid AS spread_at_fill + FROM fx_trades f + ASOF JOIN market_data m ON (symbol) + WHERE f.timestamp IN '$yesterday' +) +SELECT + order_id, + symbol, + sum(0.5 * spread_at_fill * quantity) + / sum(quantity) AS avg_halfspread, + sum(0.5 * spread_at_fill / price * 10000 * quantity) + / sum(quantity) AS spread_cost_bps, + sum(quantity) AS total_qty +FROM fills_enriched +GROUP BY order_id, symbol +ORDER BY spread_cost_bps DESC; +``` + +Two spread metrics per order: + +- **`avg_halfspread`** — quantity-weighted average half-spread in price terms. This is the baseline cost of crossing the spread, weighted by how much volume went through at each spread level. +- **`spread_cost_bps`** — the same in basis points, normalized by fill price. + +Compare `spread_cost_bps` against total IS to understand how much of the execution cost was simply the spread vs. market impact. If spread cost accounts for most of the IS, execution quality is reasonable — you're paying the market price for immediacy. If total IS significantly exceeds spread cost, the excess is market impact or adverse drift. + +## Permanent vs temporary impact per order + +Decompose each order's total IS into permanent impact (information content) and temporary impact (transient dislocation that reverts). This uses `HORIZON JOIN` to capture the mid at fill time and 30 minutes later, then `PIVOT` to reshape into columns: + +```questdb-sql title="Order-level IS decomposition into permanent and temporary impact" +WITH order_markouts AS ( + SELECT + f.order_id, + f.symbol, + f.side, + h.offset, + sum((m.best_bid + m.best_ask) / 2 * f.quantity) + / sum(f.quantity) AS weighted_mid, + sum(f.price * f.quantity) / sum(f.quantity) AS avg_exec_price, + sum(f.quantity) AS total_qty + FROM fx_trades f + HORIZON JOIN market_data m ON (f.symbol = m.symbol) + LIST (0s, 30m) AS h + WHERE f.timestamp IN '$yesterday' +), +pivoted AS ( + SELECT * FROM order_markouts + PIVOT ( + first(weighted_mid) AS mid + FOR offset IN ( + 0 AS at_fill, + 1800000000000 AS at_30m + ) + GROUP BY order_id, symbol, side, avg_exec_price, total_qty + ) +) +SELECT + order_id, + symbol, + side, + total_qty, + CASE WHEN side = 'buy' THEN 1 ELSE -1 END + * (avg_exec_price - at_fill_mid) + / at_fill_mid * 10000 AS total_is_bps, + CASE WHEN side = 'buy' THEN 1 ELSE -1 END + * (at_30m_mid - at_fill_mid) + / at_fill_mid * 10000 AS permanent_bps, + CASE WHEN side = 'buy' THEN 1 ELSE -1 END + * (avg_exec_price - at_30m_mid) + / at_fill_mid * 10000 AS temporary_bps +FROM pivoted +ORDER BY total_is_bps DESC; +``` + +The first CTE does the heavy lifting — it computes the quantity-weighted mid and quantity-weighted average execution price per order *at each horizon offset*, so the aggregation happens before the PIVOT. The PIVOT then simply reshapes the two offsets (0s and 30m) into columns. + +This gives you three metrics per order: + +- **`total_is_bps`** — same as the headline IS above, for reference +- **`permanent_bps`** — how much the mid moved permanently (arrival mid vs mid 30 minutes after execution). High permanent impact suggests your order carried information or was perceived as informed. +- **`temporary_bps`** — how much of the cost reverted (fill price vs post-execution mid). High temporary impact means you moved the market but it bounced back — you paid for liquidity consumption, not information. + +The identity holds: **total IS = permanent + temporary**. An order with mostly permanent impact is genuinely moving the market. An order with mostly temporary impact is just paying for immediacy. + +:::info Related documentation +- [ASOF JOIN](/docs/query/sql/asof-join/) +- [HORIZON JOIN](/docs/query/sql/horizon-join/) +- [PIVOT](/docs/query/sql/pivot/) +- [GROUP BY](/docs/query/sql/group-by/) +- [Implementation shortfall decomposition recipe](implementation-shortfall.md) +- [Slippage per fill recipe](slippage.md) +::: diff --git a/documentation/cookbook/sql/finance/implementation-shortfall.md b/documentation/cookbook/sql/finance/implementation-shortfall.md new file mode 100644 index 000000000..81c3ca000 --- /dev/null +++ b/documentation/cookbook/sql/finance/implementation-shortfall.md @@ -0,0 +1,119 @@ +--- +title: Implementation shortfall decomposition +sidebar_label: Implementation shortfall +description: Decompose total execution cost into effective spread, permanent impact, and temporary impact using HORIZON JOIN and PIVOT +--- + +Implementation Shortfall (IS) is a standard Transaction Cost Analysis framework originally developed for equities (the Perold framework), where it is widely used to evaluate broker and algo execution quality. The same decomposition applies to FX and other asset classes — the underlying idea of separating spread cost from market impact is universal. The example below uses FX trade data, but the approach works for any instrument with order book snapshots. + +IS decomposes total execution cost into three components: + +- **Effective spread** — the immediate cost of crossing the spread. Measures how far the fill price deviated from the mid at the time of execution. +- **Permanent impact** — the portion of price movement that persists after the trade. This reflects the information content of the trade — if the market permanently moves against you, your trade may have been informed (or was perceived as such). +- **Temporary impact** — the portion that reverts. This is the transient market impact caused by your order consuming liquidity, which fades as the book replenishes. + +The relationship is: **effective spread = permanent impact + temporary impact**. + +## Problem + +You want to break down trading costs beyond simple slippage. For each symbol and side, you need to know how much of the execution cost was due to the spread, how much was genuine market impact, and how much was temporary dislocation that reverted. + +## Solution + +Use `HORIZON JOIN` to capture the mid-price at execution time and 30 minutes later, then `PIVOT` to reshape the offsets into columns for the decomposition: + +```questdb-sql title="Implementation shortfall decomposition by symbol" +WITH markouts AS ( + SELECT + f.symbol, + f.price, + f.quantity, + f.side, + h.offset, + (m.best_bid + m.best_ask) / 2 AS mid + FROM fx_trades f + HORIZON JOIN market_data m ON (f.symbol = m.symbol) + LIST (0, 1800s) AS h + WHERE f.timestamp IN '$yesterday' +), +pivoted AS ( + SELECT * FROM markouts + PIVOT ( + avg(mid) AS mid, + avg(price) AS px, + sum(quantity) AS vol + FOR offset IN ( + 0 AS at_fill, + 1800000000000 AS at_30m + ) + GROUP BY symbol, side + ) +) +SELECT + symbol, + side, + at_fill_vol AS total_volume, + CASE WHEN side = 'buy' THEN 1 ELSE -1 END + * (at_fill_px - at_fill_mid) / at_fill_mid * 10000 AS effective_spread_bps, + CASE WHEN side = 'buy' THEN 1 ELSE -1 END + * (at_30m_mid - at_fill_mid) / at_fill_mid * 10000 AS permanent_bps, + CASE WHEN side = 'buy' THEN 1 ELSE -1 END + * (at_fill_px - at_30m_mid) / at_fill_mid * 10000 AS temporary_bps +FROM pivoted +ORDER BY symbol, side; +``` + +## How it works + +The query has three stages: + +### 1. HORIZON JOIN — capture mid at two points in time + +```sql +HORIZON JOIN market_data m ON (f.symbol = m.symbol) + LIST (0, 1800s) AS h +``` + +For each trade, this produces two rows: +- **Offset 0** — the mid-price at the moment of execution (arrival price) +- **Offset 1800s** — the mid-price 30 minutes later (the "settled" price) + +### 2. PIVOT — reshape offsets into columns + +```sql +PIVOT ( + avg(mid) AS mid, avg(price) AS px, sum(quantity) AS vol + FOR offset IN (0 AS at_fill, 1800000000000 AS at_30m) + GROUP BY symbol, side +) +``` + +This turns the two offset rows into columns: `at_fill_mid`, `at_fill_px`, `at_fill_vol`, `at_30m_mid`, `at_30m_px`, `at_30m_vol`. The offset values in `FOR ... IN` are in nanoseconds (since `fx_trades` uses `TIMESTAMP_NS`), so 30 minutes = 1,800,000,000,000 ns. + +### 3. Decomposition — compute the three components + +The sign convention uses `CASE WHEN side = 'buy' THEN 1 ELSE -1 END` to normalize both sides so that positive values always mean cost (worse execution): + +| Component | Formula | Meaning | +|-----------|---------|---------| +| **Effective spread** | `fill_price - fill_mid` | Immediate cost of crossing the spread | +| **Permanent impact** | `30m_mid - fill_mid` | How much the market permanently moved against you | +| **Temporary impact** | `fill_price - 30m_mid` | How much of the initial cost reverted | + +## Interpreting results + +- **High effective spread, low permanent**: You're paying to cross the spread but the market isn't moving against you. This is the normal cost of aggressive execution. +- **High permanent impact**: Your trades carry information (or the market perceives them as informed). Consider reducing order size or using more passive execution. +- **High temporary impact**: You're moving the market temporarily but it reverts. This suggests your orders are large relative to available liquidity but not information-driven. +- **Negative temporary impact**: The market moved further against you after the fill. This is worse than expected — your initial impact understated the true cost. + +:::tip Choosing the horizon +The 30-minute horizon (`1800s`) is a common choice for FX, but the right value depends on your market and trading style. For highly liquid pairs, 5–10 minutes may be sufficient for the price to settle. For less liquid instruments, you may need 1 hour or more. Adjust the `LIST` offset to match your market's typical recovery time. +::: + +:::info Related documentation +- [HORIZON JOIN](/docs/query/sql/horizon-join/) +- [PIVOT](/docs/query/sql/pivot/) +- [Slippage per fill recipe](slippage.md) +- [Markout analysis recipe](markout.md) +::: diff --git a/documentation/cookbook/sql/finance/index.md b/documentation/cookbook/sql/finance/index.md index 2ed82e263..c743b6d5b 100644 --- a/documentation/cookbook/sql/finance/index.md +++ b/documentation/cookbook/sql/finance/index.md @@ -6,7 +6,7 @@ description: SQL recipes for financial analysis including technical indicators, # Capital Markets Recipes -This section contains SQL recipes for financial market analysis. Each recipe uses the +This section contains SQL recipes for financial market analysis. All recipes use the [demo dataset](/docs/cookbook/demo-data-schema/) available in the QuestDB web console. ## Price-Based Indicators @@ -53,6 +53,7 @@ Analyze trading activity and order flow dynamics. | [Volume Profile](volume-profile.md) | Volume distribution by price level | | [Volume Spike](volume-spike.md) | Detect abnormal volume | | [Aggressor Imbalance](aggressor-volume-imbalance.md) | Buy vs sell pressure | +| [VPIN](vpin.md) | Volume-synchronized informed trading probability | ## Risk Metrics @@ -71,6 +72,20 @@ Analyze market quality and trading costs. | [Bid-Ask Spread](bid-ask-spread.md) | Spread metrics and analysis | | [Liquidity Comparison](liquidity-comparison.md) | Compare liquidity across instruments | +## Post-Trade Analysis + +Measure execution quality, fill performance, and trading costs. Also available as a [top-level sidebar section](/docs/cookbook/sql/finance/) for quick access. + +| Recipe | Description | +|--------|-------------| +| [Slippage](slippage.md) | Measure execution slippage per fill | +| [Slippage (aggregated)](slippage-aggregated.md) | Compare slippage across venues and counterparties | +| [Markout analysis](markout.md) | Post-trade price reversion and adverse selection | +| [Last look detection](last-look.md) | Millisecond-granularity markout for last-look analysis | +| [Implementation shortfall](implementation-shortfall.md) | Cost decomposition into spread, permanent, and temporary impact | +| [Implementation shortfall (order)](implementation-shortfall-order.md) | Total IS per order vs arrival mid | +| [ECN scorecard](ecn-scorecard.md) | Dashboard-style venue comparison combining spread, slippage, and fill metrics | + ## Market Breadth Measure overall market participation and sentiment. diff --git a/documentation/cookbook/sql/finance/last-look.md b/documentation/cookbook/sql/finance/last-look.md new file mode 100644 index 000000000..215f55c5c --- /dev/null +++ b/documentation/cookbook/sql/finance/last-look.md @@ -0,0 +1,77 @@ +--- +title: Last look detection +sidebar_label: Last look detection +description: Detect last-look behavior using millisecond-granularity markout analysis with HORIZON JOIN +--- + +In FX markets, some liquidity providers operate under a **last look** window — a brief period (typically 1–100ms) after receiving an order during which they can reject or re-price the trade. While last look is a legitimate risk management practice (allowing LPs to verify that prices haven't moved during order transit), it can be exploited through asymmetric rejection — accepting trades only when the price has moved in the LP's favor during the hold window, and rejecting when it hasn't. + +This recipe uses millisecond-granularity [markout analysis](markout.md) to detect whether specific counterparties are exploiting last look. The signature is a sharp price movement against you in the first few milliseconds after a fill — if the mid-price consistently moves in the counterparty's favor within their last-look window, they may be selectively accepting only trades that benefit them. + +## Problem + +You want to detect whether specific counterparties show signs of last-look adverse selection. You need markout measurements at millisecond resolution — much finer than the second-level analysis in the [general markout recipe](markout.md) — to catch behavior that happens within typical last-look windows (1–100ms). + +## Solution + +Use `HORIZON JOIN` with a `LIST` of millisecond-spaced offsets to build a high-resolution markout curve for the first few seconds after each fill: + +```questdb-sql title="Millisecond-granularity markout by counterparty" +SELECT + t.symbol, + t.counterparty, + t.passive, + h.offset / 1000000 AS horizon_ms, + count() AS n, + avg( + CASE t.side + WHEN 'buy' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000 + WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / t.price * 10000 + END + ) AS avg_markout_bps +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + LIST (0, 1T, 5T, 10T, 50T, 100T, + 500T, 1000T, 5000T) AS h +WHERE t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.counterparty, t.passive, h.offset +ORDER BY t.symbol, t.counterparty, h.offset; +``` + +The `LIST` offsets are: 0ms, 1ms, 5ms, 10ms, 50ms, 100ms, 500ms, 1s, and 5s — concentrated in the sub-100ms range where last-look behavior is visible. + +:::note h.offset resolution +Since `fx_trades` uses nanosecond timestamps (`TIMESTAMP_NS`), `h.offset` is in nanoseconds. Dividing by 1,000,000 converts to milliseconds for readability. +::: + +## How it works + +The key difference from the [general markout recipe](markout.md) is the time scale. Instead of uniform 1-second steps over minutes, this uses non-uniform `LIST` offsets clustered in the millisecond range where last-look decisions happen. + +The `LIST` syntax is ideal here because the offsets are non-uniform — dense at the start (1ms, 5ms, 10ms) where you need precision, and sparse further out (1s, 5s) for context. + +## Interpreting results + +Compare the markout curve across counterparties at the same symbol: + +- **Neutral counterparty**: Markout near zero at 0ms, with gradual random drift. No systematic pattern. +- **Last-look adverse selection**: Sharp negative markout in the 1–100ms range that stabilizes or worsens. The counterparty is filling you only when the market is about to move against you. +- **Last-look with reversion**: Negative markout spike at 5–50ms that then reverts toward zero by 1–5s. This suggests the counterparty rejects trades when the price would move in your favor, but the moves are temporary. +- **Passive vs aggressive**: Last-look behavior primarily affects aggressive orders (taker flow). Passive fills from the same counterparty may show a different pattern. + +### What to look for + +A counterparty is likely using last look adversely if: + +1. **Markout drops sharply in 1–50ms** — faster than you can react +2. **The drop is counterparty-specific** — other counterparties at the same symbol don't show it +3. **The pattern is persistent** — it appears consistently across days, not just in isolated events +4. **Passive fills are unaffected** — the behavior targets your aggressive flow specifically + +:::info Related documentation +- [HORIZON JOIN](/docs/query/sql/horizon-join/) +- [Markout analysis recipe](markout.md) +- [Slippage per fill recipe](slippage.md) +::: diff --git a/documentation/cookbook/sql/finance/markout.md b/documentation/cookbook/sql/finance/markout.md new file mode 100644 index 000000000..ec29b7b25 --- /dev/null +++ b/documentation/cookbook/sql/finance/markout.md @@ -0,0 +1,255 @@ +--- +title: Post-trade markout analysis +sidebar_label: Markout analysis +description: Measure post-trade price reversion using HORIZON JOIN to evaluate execution quality and detect adverse selection +--- + +Markout analysis measures how the market mid-price moves **after** a trade executes. It is the natural complement to [slippage](slippage.md): + +- **Slippage** tells you how much you paid at the moment of execution. +- **Markout** tells you what happened next — did the market move in your favor (reversion) or against you (adverse selection)? + +A positive markout means the trade was profitable in hindsight: for buys, the mid-price rose; for sells, it fell. A negative markout means the market moved against you, which may indicate you were trading against informed flow. + +By computing markouts at multiple time horizons (e.g., every second for 5 minutes), you build a **markout curve** — the standard tool for evaluating execution quality over time. + +## Problem + +You want to evaluate whether your fills are subject to adverse selection. For each trade, you need to know how the mid-price evolved over the seconds and minutes following execution, broken down by venue, counterparty, and passive/aggressive. + +## Solution + +Use `HORIZON JOIN` to compute the mid-price at multiple time offsets after each trade, then aggregate into a markout curve: + +```questdb-sql title="Post-trade markout curve by venue and counterparty" +SELECT + t.symbol, + t.ecn, + t.counterparty, + t.passive, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + avg( + CASE t.side + WHEN 'buy' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000 + WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / t.price * 10000 + END + ) AS avg_markout_bps, + sum( + CASE t.side + WHEN 'buy' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + * t.quantity + WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2) + * t.quantity + END + ) AS total_pnl +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + RANGE FROM 0s TO 5m STEP 1s AS h +WHERE t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.ecn, t.counterparty, t.passive, h.offset +ORDER BY t.symbol, t.ecn, t.counterparty, t.passive, h.offset; +``` + +## How it works + +[`HORIZON JOIN`](/docs/query/sql/horizon-join/) is the key construct. For each trade and each time offset in the range, it performs an ASOF match against `market_data` at `trade_timestamp + offset`. The `RANGE FROM 0s TO 5m STEP 1s` generates 301 offsets (0s, 1s, 2s, ... 300s), giving you a markout reading every second for 5 minutes after each trade. + +The two metrics: + +- **`avg_markout_bps`** — average price movement in basis points, normalized by fill price. Positive means the market moved in your favor. At offset 0, this is simply the negative of slippage-vs-mid. +- **`total_pnl`** — actual P&L in currency terms (price difference × quantity). This captures the dollar impact, not just the rate — 0.1 bps on $100M of volume is very different from 0.1 bps on $1M. + +The markout formula flips the sign convention compared to slippage: + +- **For buys**: positive if mid rose after the fill (profit) +- **For sells**: positive if mid fell after the fill (profit) + +As the offset increases, you see how the market evolved after each trade. + +## Variations + +### Markout at specific horizons + +Use `LIST` instead of `RANGE` for non-uniform time points — useful when you care about specific benchmarks (e.g., 1s, 5s, 30s, 1m, 5m): + +```questdb-sql title="Markout at key horizons" +SELECT + t.ecn, + t.passive, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + round(avg( + CASE t.side + WHEN 'buy' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000 + WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / t.price * 10000 + END + ), 3) AS avg_markout_bps +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + LIST (0, 1s, 5s, 30s, 1m, 5m) AS h +WHERE t.timestamp IN '$yesterday' +GROUP BY t.ecn, t.passive, h.offset +ORDER BY t.ecn, t.passive, h.offset; +``` + +### Pre- and post-trade analysis + +Use negative offsets to detect information leakage — whether the market was already moving before your trade: + +```questdb-sql title="Price movement around trade events" +SELECT + h.offset / 1000000000 AS horizon_sec, + count() AS n, + round(avg( + CASE t.side + WHEN 'buy' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000 + WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / t.price * 10000 + END + ), 3) AS avg_markout_bps +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + RANGE FROM -30s TO 30s STEP 1s AS h +WHERE t.timestamp IN '$yesterday' +GROUP BY h.offset +ORDER BY h.offset; +``` + +If the markout is already trending before offset 0, it suggests the market was moving before your order — a sign of information leakage or that you are reacting to stale signals. + +### Markout by side + +Add `t.side` to the grouping to detect asymmetry between buy and sell execution. A counterparty might look fine on average but show adverse selection on one side only: + +```questdb-sql title="Markout curve by side" +SELECT + t.ecn, + t.side, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + round(avg( + CASE t.side + WHEN 'buy' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000 + WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / t.price * 10000 + END + ), 3) AS avg_markout_bps +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + LIST (0, 1s, 5s, 30s, 1m, 5m) AS h +WHERE t.timestamp IN '$yesterday' +GROUP BY t.ecn, t.side, h.offset +ORDER BY t.ecn, t.side, h.offset; +``` + +If buy markouts diverge significantly from sell markouts at the same venue, it may indicate directional information leakage or asymmetric adverse selection. + +### Single-side markout + +When analyzing one side at a time, you can drop the `CASE` entirely for a simpler formula: + +```questdb-sql title="Buy-side markout — positive means price moved up after you bought" +SELECT + t.symbol, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + avg(((m.best_bid + m.best_ask) / 2 - t.price) / t.price * 10000) AS avg_markout_bps, + sum(((m.best_bid + m.best_ask) / 2 - t.price) * t.quantity) AS total_pnl +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + RANGE FROM 0s TO 10m STEP 5s AS h +WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, h.offset +ORDER BY t.symbol, h.offset; +``` + +```questdb-sql title="Sell-side markout — positive means price moved down after you sold" +SELECT + t.symbol, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + avg((t.price - (m.best_bid + m.best_ask) / 2) / t.price * 10000) AS avg_markout_bps, + sum((t.price - (m.best_bid + m.best_ask) / 2) * t.quantity) AS total_pnl +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + RANGE FROM 0s TO 10m STEP 5s AS h +WHERE t.side = 'sell' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, h.offset +ORDER BY t.symbol, h.offset; +``` + +This approach is useful when you want to run separate analyses per side, or when feeding results into dashboards that track buy and sell P&L independently. + +### Counterparty toxicity + +Group by counterparty to identify which LPs are sending you toxic flow — trades that consistently move against you shortly after execution: + +```questdb-sql title="Counterparty toxicity markout (buy side)" +SELECT + t.symbol, + t.counterparty, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + avg(((m.best_bid + m.best_ask) / 2 - t.price) / t.price * 10000) AS avg_markout_bps, + sum(t.quantity) AS total_volume +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + LIST (0, 1s, 5s, 10s, + 30s, 1m, 5m) AS h +WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.counterparty, h.offset +ORDER BY t.symbol, t.counterparty, h.offset; +``` + +A counterparty whose markout is persistently negative across horizons is likely trading on information you don't have. Compare `total_volume` alongside markout — a small counterparty with terrible markout may not matter, but a large one warrants flow management. + +### Passive vs aggressive with spread context + +Compare markout between passive (limit) and aggressive (market) orders, with the half-spread as a baseline. Aggressive fills should cost roughly half the spread; if the markout is worse than that, execution quality needs attention: + +```questdb-sql title="Passive vs aggressive markout with half-spread baseline (buy side)" +SELECT + t.symbol, + t.ecn, + t.passive, + h.offset / 1000000000 AS horizon_sec, + count() AS n, + avg(((m.best_bid + m.best_ask) / 2 - t.price) + / t.price * 10000) AS avg_markout_bps, + avg((m.best_ask - m.best_bid) + / ((m.best_bid + m.best_ask) / 2) * 10000) / 2 AS avg_half_spread_bps +FROM fx_trades t +HORIZON JOIN market_data m ON (symbol) + RANGE FROM 0s TO 5m STEP 1s AS h +WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.ecn, t.passive, h.offset +ORDER BY t.symbol, t.ecn, t.passive, h.offset; +``` + +At offset 0, aggressive fills typically show `avg_markout_bps` close to negative `avg_half_spread_bps` (you crossed the spread). If markout recovers toward zero over subsequent offsets, execution is healthy — you paid the spread but the market didn't move further against you. If markout stays flat or worsens, it signals adverse selection beyond the spread cost. + +## Interpreting the markout curve + +- **Flat near zero**: No significant post-trade price impact. Fills are neutral. +- **Rising markout (positive trend)**: Price reverts in your favor after the fill. This is the ideal scenario — it suggests you are capturing spread or providing liquidity at good levels. +- **Falling markout (negative trend)**: Adverse selection — the market moves against you after the fill. This may indicate you are being picked off by informed counterparties or reacting too slowly. +- **Passive vs aggressive**: Passive fills typically show better markouts because they provide liquidity. Aggressive fills often show initial negative markout equal to the spread cost, which may or may not revert. +- **Counterparty differences**: Persistent negative markout against specific counterparties is a strong signal of adverse selection and may warrant flow management. + +:::info Related documentation +- [HORIZON JOIN](/docs/query/sql/horizon-join/) +- [ASOF JOIN](/docs/query/sql/asof-join/) +- [Slippage per fill recipe](slippage.md) +- [Slippage (aggregated) recipe](slippage-aggregated.md) +::: diff --git a/documentation/cookbook/sql/finance/post-trade-overview.md b/documentation/cookbook/sql/finance/post-trade-overview.md new file mode 100644 index 000000000..fea2d37a0 --- /dev/null +++ b/documentation/cookbook/sql/finance/post-trade-overview.md @@ -0,0 +1,127 @@ +--- +title: Post-trade analysis overview +sidebar_label: Overview +description: + Post-trade and transaction cost analysis (TCA) recipes for FX and equities + in QuestDB — slippage, markout curves, implementation shortfall, venue + scoring, and VPIN using ASOF JOIN, HORIZON JOIN, PIVOT, and window functions. +--- + +Post-trade analysis — also called transaction cost analysis (TCA) — measures +execution quality after the fact. Market makers use it to detect adverse +selection on their resting orders. Buy-side desks use it to evaluate broker and +venue performance. Compliance teams use it to demonstrate best execution. + +QuestDB is well suited to this workload because TCA is fundamentally a +time-series join problem: pair each trade with the state of the order book at +the time of execution, then again at various points in the future. The key SQL +features used across these recipes are: + +- [**ASOF JOIN**](/docs/query/sql/asof-join/) — match each trade to the most + recent order book snapshot. The foundation of all slippage calculations. +- [**HORIZON JOIN**](/docs/query/sql/horizon-join/) — match each trade to the + order book at multiple time offsets in a single pass. Powers markout curves, + implementation shortfall decomposition, and multi-horizon venue scoring. +- [**PIVOT**](/docs/query/sql/pivot/) — reshape horizon offsets from rows into + columns for dashboard-style wide tables. +- [**Window functions**](/docs/query/functions/window-functions/overview/) — + cumulative sums and rolling averages for volume-bucketed metrics like VPIN. + +## Key concepts + +Before diving into the recipes, here are the core TCA metrics in the order +you'll encounter them: + +- **Slippage** — the difference between your execution price and a reference + price (mid or top-of-book) at the time of the fill. The simplest measure of + execution cost. Positive means you paid more than the reference. +- **Markout** — how the market moves *after* your fill. The complement of + slippage: slippage tells you what you paid, markout tells you what happened + next. Negative markout means the market moved against you (adverse selection). +- **Implementation shortfall** — total cost decomposed into *why* you paid it: + spread cost (the bid-ask spread you crossed), permanent impact (the market + moved because of your order), and temporary impact (cost that reverted). +- **Adverse selection** — when counterparties or venues systematically trade + against you just before the market moves in their favor. The central problem + TCA tries to detect and quantify. +- **VPIN** — Volume-synchronized Probability of Informed Trading. A + volume-bucketed measure of order flow imbalance that detects informed trading + activity without relying on post-trade price movement. + +## Recipes + +The recipes build on each other. Slippage answers "how much did I pay?", markout +answers "what happened after?", implementation shortfall answers "why did I +pay?", venue scoring answers "where should I trade?", and VPIN answers "who is +informed?" + +### 1. Slippage — how much did I pay? + +Compare each fill to the prevailing order book at the time of execution. + +- [**Slippage per fill**](slippage.md) — cost vs mid and top-of-book for + individual trades +- [**Aggregated slippage**](slippage-aggregated.md) — roll up by ECN, + counterparty, size bucket, hour of day, or daily P&L + +### 2. Markout — what happened after? + +Track post-fill price movement at multiple time horizons. + +- [**Markout analysis**](markout.md) — markout curves by side, counterparty, + and passive vs aggressive +- [**Last look detection**](last-look.md) — millisecond-granularity markout to + identify asymmetric rejection patterns in FX + +### 3. Implementation shortfall — why did I pay? + +Decompose total cost into spread, permanent impact, and temporary impact. + +- [**IS decomposition by symbol**](implementation-shortfall.md) — Perold + framework separating effective spread, permanent, and temporary components +- [**Order-level IS**](implementation-shortfall-order.md) — per-order cost + including execution drift, spread cost, and impact breakdown + +### 4. Venue scoring — where should I trade? + +Compare execution quality across venues and counterparties to inform routing. + +- [**ECN scorecard**](ecn-scorecard.md) — fill quality, toxicity by hour, + passive vs aggressive breakdown, composite toxicity score, and pivoted + multi-horizon view + +### 5. Flow toxicity — who is informed? + +Detect informed trading using volume-synchronized metrics instead of +price-based markout. + +- [**VPIN**](vpin.md) — Volume-synchronized Probability of Informed Trading, + including per-ECN variant + +## Data schema + +All recipes use the [demo dataset](/docs/cookbook/demo-data-schema/). The two +tables are joined by `symbol` and aligned by timestamp: + +- **`fx_trades`** — trade executions with `symbol`, `ecn`, `side`, `passive`, + `price`, `quantity`, `counterparty`, `order_id` (nanosecond timestamps) +- **`market_data`** — order book snapshots with `symbol`, `bids[][]`, + `asks[][]`, `best_bid`, and `best_ask` (microsecond timestamps). + The `bids` and `asks` arrays hold price and size at each level of the book - + `[1][1]` is the best price, `[1][-1]` is the price at the deepest level. + The `best_bid` and `best_ask` columns provide the top-of-book prices + directly for convenience and efficiency, since most post-trade analytics + queries need only the best price + +The tables use different timestamp resolutions. QuestDB's time-series joins +handle +[mixed-precision timestamps](/docs/query/sql/asof-join/#mixed-precision-timestamps) +automatically — no explicit casting is needed. + +:::info Related documentation +- [ASOF JOIN](/docs/query/sql/asof-join/) +- [HORIZON JOIN](/docs/query/sql/horizon-join/) +- [PIVOT](/docs/query/sql/pivot/) +- [Window functions](/docs/query/functions/window-functions/overview/) +- [Demo data schema](/docs/cookbook/demo-data-schema/) +::: diff --git a/documentation/cookbook/sql/finance/slippage-aggregated.md b/documentation/cookbook/sql/finance/slippage-aggregated.md new file mode 100644 index 000000000..9ce43b900 --- /dev/null +++ b/documentation/cookbook/sql/finance/slippage-aggregated.md @@ -0,0 +1,286 @@ +--- +title: Aggregated slippage by venue and counterparty +sidebar_label: Slippage (aggregated) +description: Aggregate execution slippage by ECN, counterparty, and passive/aggressive to compare venue and counterparty quality +--- + +The [per-fill slippage recipe](slippage.md) measures slippage on individual trades. This recipe aggregates those measurements to answer higher-level questions: which ECN gives you the best execution? Which counterparties are cheapest to trade against? Do passive fills outperform aggressive ones? + +## Problem + +You want to compare average execution quality across different dimensions — venue (ECN), counterparty, and order type (passive vs aggressive) — to identify where you get the best and worst fills. + +## Solution + +Group slippage calculations by the dimensions of interest and compute averages: + +```questdb-sql demo title="Aggregate slippage by ECN, counterparty, and passive/aggressive" +SELECT + t.symbol, + t.ecn, + t.counterparty, + t.passive, + count() AS trade_count, + sum(t.quantity) AS total_qty, + avg( + CASE t.side + WHEN 'buy' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / ((m.best_bid + m.best_ask) / 2) * 10000 + WHEN 'sell' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / ((m.best_bid + m.best_ask) / 2) * 10000 + END + ) AS avg_slippage_vs_mid_bps, + avg( + CASE t.side + WHEN 'buy' THEN (t.price - m.best_ask) / m.best_ask * 10000 + WHEN 'sell' THEN (m.best_bid - t.price) / m.best_bid * 10000 + END + ) AS avg_slippage_vs_tob_bps, + avg( + (m.best_ask - m.best_bid) + / ((m.best_bid + m.best_ask) / 2) * 10000 + ) AS avg_spread_bps +FROM fx_trades t +ASOF JOIN market_data m ON (symbol) +WHERE t.timestamp IN '$yesterday' +GROUP BY t.symbol, t.ecn, t.counterparty, t.passive +ORDER BY avg_slippage_vs_mid_bps DESC; +``` + +## How it works + +This builds on the same `ASOF JOIN` approach from the [per-fill slippage recipe](slippage.md), but wraps the slippage calculations in `avg()` and groups by the dimensions you want to compare. + +The three metrics per group: + +- **`avg_slippage_vs_mid_bps`** — average cost relative to mid price. Includes half the spread as baseline. +- **`avg_slippage_vs_tob_bps`** — average cost beyond the top of book. Isolates execution quality from spread cost. +- **`avg_spread_bps`** — average spread at the time of each trade. Helps contextualize slippage: high slippage in a wide-spread environment is different from high slippage in a tight market. + +Results are ordered worst-first (`DESC`) so the most expensive groups appear at the top. + +## Variations + +### By ECN only + +Drop `counterparty` to get a cleaner venue-level comparison: + +```questdb-sql demo title="Slippage by ECN and passive/aggressive" +SELECT + t.ecn, + t.passive, + count() AS trade_count, + round(avg( + CASE t.side + WHEN 'buy' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / ((m.best_bid + m.best_ask) / 2) * 10000 + WHEN 'sell' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / ((m.best_bid + m.best_ask) / 2) * 10000 + END + ), 3) AS avg_slippage_bps +FROM fx_trades t +ASOF JOIN market_data m ON (symbol) +WHERE t.timestamp IN '$yesterday' +GROUP BY t.ecn, t.passive +ORDER BY t.ecn, t.passive; +``` + +### Time-bucketed analysis + +Add `SAMPLE BY` to see how execution quality changes throughout the day: + +```questdb-sql demo title="Hourly slippage by ECN" +SELECT + t.timestamp, + t.ecn, + count() AS trade_count, + round(avg( + CASE t.side + WHEN 'buy' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / ((m.best_bid + m.best_ask) / 2) * 10000 + WHEN 'sell' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / ((m.best_bid + m.best_ask) / 2) * 10000 + END + ), 3) AS avg_slippage_bps +FROM fx_trades t +ASOF JOIN market_data m ON (symbol) +WHERE t.timestamp IN '$yesterday' +SAMPLE BY 1h; +``` + +### Cost by size bucket + +How does execution cost scale with order size? Bucket fills by quantity, then use [HORIZON JOIN](/docs/query/sql/horizon-join/) with `PIVOT` to see markout and spread at multiple horizons in a single wide row per symbol and bucket: + +```questdb-sql title="Cost by size bucket — pivoted (buy side)" +WITH fills AS ( + SELECT + t.symbol, + t.price, + t.quantity, + h.offset, + (m.best_bid + m.best_ask) / 2 AS mid, + m.best_ask - m.best_bid AS spread, + CASE + WHEN t.quantity < 100000 THEN 'S' + WHEN t.quantity < 1000000 THEN 'M' + WHEN t.quantity < 10000000 THEN 'L' + ELSE 'XL' + END AS size_bucket + FROM fx_trades t + HORIZON JOIN market_data m ON (symbol) + LIST (0, 5s, 1m) AS h + WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +) +SELECT * FROM fills +PIVOT ( + count() AS n, + avg((mid - price) / price * 10000) AS markout_bps, + avg(spread / mid * 10000) AS spread_bps + FOR offset IN (0 AS at_fill, 5000000000 AS t_5s, 60000000000 AS t_1m) + GROUP BY symbol, size_bucket +) +ORDER BY symbol, size_bucket; +``` + +The result has columns like `at_fill_n`, `at_fill_markout_bps`, `t_5s_markout_bps`, `t_1m_spread_bps`, etc. Compare across size buckets: + +- **Markout degradation with size**: If `t_5s_markout_bps` becomes more negative as bucket size increases, larger fills are systematically more toxic — the market moves against you more after big trades. +- **Spread widening with size**: If `at_fill_spread_bps` increases for larger buckets, you're trading in wider markets when you trade big — possibly because you only get filled on large clips when spreads are wide. +- **Sample size caveat**: XL buckets may have very few fills. Check `at_fill_n` before drawing conclusions. + +Adjust the bucket thresholds to match your typical trade sizes. The boundaries above (100K / 1M / 10M) are reasonable for major FX pairs. + +### Counterparty cost attribution + +Which counterparties are the most expensive to trade with, all-in? Group by counterparty, ECN, and passive/aggressive, then pivot across horizons to see whether the cost is immediate (spread) or delayed (adverse selection): + +```questdb-sql title="Counterparty cost attribution — pivoted (buy side)" +WITH cp_costs AS ( + SELECT + t.symbol, + t.counterparty, + t.ecn, + t.passive, + t.price, + t.quantity, + h.offset, + m.best_bid, + m.best_ask, + (m.best_bid + m.best_ask) / 2 AS mid + FROM fx_trades t + HORIZON JOIN market_data m ON (symbol) + LIST (0, 5s, 1m) AS h + WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +) +SELECT * FROM cp_costs +PIVOT ( + count() AS fills, + sum(quantity) AS volume, + avg((mid - price) / price * 10000) AS markout_bps + FOR offset IN (0 AS at_fill, 5000000000 AS t_5s, 60000000000 AS t_1m) + GROUP BY symbol, counterparty, ecn, passive +) +ORDER BY t_1m_markout_bps; +``` + +Ordered by `t_1m_markout_bps` ascending, the most toxic counterparties appear first. Read the results across horizons: + +- **Large negative `at_fill_markout_bps` that stays flat**: You paid a wide spread upfront but the market didn't move further. The cost is the spread, not adverse selection — this counterparty is expensive but not toxic. +- **Small negative `at_fill_markout_bps` that deepens at `t_5s` and `t_1m`**: The initial fill looked reasonable, but the market moved against you afterwards. This counterparty is delivering informed or toxic flow. +- **Passive rows with deepening negative markout**: The counterparty is systematically picking off your resting orders just before the market moves. This is the most actionable signal — consider tightening or withdrawing quotes to this counterparty on the affected ECN. + +### Intraday cost profile + +When is it cheapest to trade? Group by `hour(t.timestamp)` and pivot across horizons to build a heatmap of execution cost throughout the day: + +```questdb-sql title="Intraday cost profile — hourly heatmap (buy side)" +WITH hourly AS ( + SELECT + t.symbol, + t.price, + t.quantity, + hour(t.timestamp) AS hour_utc, + h.offset, + m.best_bid, + m.best_ask, + (m.best_bid + m.best_ask) / 2 AS mid + FROM fx_trades t + HORIZON JOIN market_data m ON (symbol) + LIST (0, 5s, 1m) AS h + WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +) +SELECT * FROM hourly +PIVOT ( + count() AS n, + avg((mid - price) / price * 10000) AS markout_bps, + avg((best_ask - best_bid) / mid * 10000) AS spread_bps + FOR offset IN (0 AS at_fill, 5000000000 AS t_5s, 60000000000 AS t_1m) + GROUP BY symbol, hour_utc +) +ORDER BY symbol, hour_utc; +``` + +Each row is one symbol-hour combination with fill count, markout, and spread at each horizon. Look for: + +- **Spread spikes**: Hours with high `at_fill_spread_bps` are wide-market periods (typically Asia session for EUR/USD, or around fixes and rollovers). Execution during these windows is inherently more expensive. +- **Markout divergence**: If `t_1m_markout_bps` is significantly worse during certain hours while `at_fill_spread_bps` is similar, the problem isn't wider spreads — it's adverse selection concentrated in those hours. Route less flow or quote wider during those windows. +- **Session boundaries**: The London/NY overlap (12:00–16:00 UTC) typically shows the tightest spreads and flattest markouts for major pairs. Deviations from this pattern are worth investigating. + +### Daily P&L attribution + +Roll up execution costs into a daily P&L view per symbol and ECN. Unlike the bps-based metrics above, this uses absolute P&L (`(mid - price) * quantity`) so you can see dollar impact: + +```questdb-sql title="Daily P&L attribution (buy side)" +WITH daily AS ( + SELECT + t.symbol, + t.ecn, + t.price, + t.quantity, + t.timestamp::date AS trade_date, + h.offset, + (m.best_bid + m.best_ask) / 2 AS mid + FROM fx_trades t + HORIZON JOIN market_data m ON (symbol) + LIST (0, 1m, 5m) AS h + WHERE t.side = 'buy' + AND t.timestamp IN '$yesterday' +) +SELECT * FROM daily +PIVOT ( + count() AS fills, + sum(quantity) AS volume, + sum((mid - price) * quantity) AS pnl + FOR offset IN (0 AS at_fill, 60000000000 AS t_1m, 300000000000 AS t_5m) + GROUP BY trade_date, symbol, ecn +) +ORDER BY trade_date, symbol, ecn; +``` + +Each row is one date-symbol-ECN combination. The three P&L columns tell different stories: + +- **`at_fill_pnl`** — immediate spread cost. How much you lost to the bid-ask spread at the moment of execution. +- **`t_5m_pnl`** — realized P&L including short-term market impact. This is the more complete measure of execution cost. +- **`t_5m_pnl - at_fill_pnl`** — post-fill market movement. Positive means the market moved in your favor after the fill (mean reversion); negative means adverse selection eroded your position further. + +Track these daily to spot trends. A venue that shows deteriorating `t_5m_pnl` over several days may be attracting more informed flow, even if `at_fill_pnl` stays stable. + +## Interpreting results + +- **Passive vs aggressive**: Passive fills (limit orders) typically show lower or negative slippage since they provide liquidity. Aggressive fills (market orders) cross the spread and show higher slippage. +- **ECN differences**: Venues with deeper liquidity tend to show lower slippage for large orders. Differences in latency and matching engine behavior also play a role. +- **Counterparty patterns**: Some counterparties may consistently offer better or worse fills. Persistent adverse slippage from a counterparty may indicate information asymmetry. +- **Spread context**: Always consider `avg_spread_bps` alongside slippage. An ECN with higher slippage but tighter spreads may still offer better all-in execution cost. + +:::info Related documentation +- [ASOF JOIN](/docs/query/sql/asof-join/) +- [HORIZON JOIN](/docs/query/sql/horizon-join/) +- [PIVOT](/docs/query/sql/pivot/) +- [GROUP BY](/docs/query/sql/group-by/) +- [SAMPLE BY](/docs/query/sql/sample-by/) +- [Slippage per fill recipe](slippage.md) +::: diff --git a/documentation/cookbook/sql/finance/slippage.md b/documentation/cookbook/sql/finance/slippage.md new file mode 100644 index 000000000..9c137495c --- /dev/null +++ b/documentation/cookbook/sql/finance/slippage.md @@ -0,0 +1,82 @@ +--- +title: Slippage per fill +sidebar_label: Slippage +description: Measure execution slippage against the prevailing mid price and top-of-book for every trade fill +--- + +Slippage is the difference between the price at which a trade actually executes and a reference price at the moment of execution. It is a core metric in **Transaction Cost Analysis (TCA)** and tells you how much the market moved against you (or in your favor) on each fill. + +There are two common reference points: + +- **Mid price** — the midpoint between the best bid and best ask. Slippage against mid captures total implicit cost, including half the spread. +- **Top of book (TOB)** — the best ask for buys, best bid for sells. Slippage against TOB isolates how much worse you did beyond the spread, for example due to latency, order size, or thin liquidity at the top level. + +Positive slippage means the fill was worse than the reference (you paid more or received less). Negative slippage means price improvement. + +## Problem + +You want to evaluate fill quality for every trade execution. For each fill, you need to know the prevailing order book state at the time of execution so you can calculate how much slippage occurred, both relative to mid and relative to the side of the book you were trading against. + +## Solution + +Use `ASOF JOIN` to pair each trade with the most recent order book snapshot, then calculate slippage in basis points: + +```questdb-sql demo title="Slippage per fill" +SELECT + t.timestamp, + t.symbol, + t.ecn, + t.counterparty, + t.side, + t.passive, + t.price, + t.quantity, + m.best_bid, + m.best_ask, + (m.best_bid + m.best_ask) / 2 AS mid, + (m.best_ask - m.best_bid) AS spread, + CASE t.side + WHEN 'buy' THEN (t.price - (m.best_bid + m.best_ask) / 2) + / ((m.best_bid + m.best_ask) / 2) * 10000 + WHEN 'sell' THEN ((m.best_bid + m.best_ask) / 2 - t.price) + / ((m.best_bid + m.best_ask) / 2) * 10000 + END AS slippage_bps, + CASE t.side + WHEN 'buy' THEN (t.price - m.best_ask) / m.best_ask * 10000 + WHEN 'sell' THEN (m.best_bid - t.price) / m.best_bid * 10000 + END AS slippage_vs_tob_bps +FROM fx_trades t +ASOF JOIN market_data m ON (symbol) +WHERE t.timestamp IN '$yesterday' +ORDER BY t.timestamp; +``` + +## How it works + +**ASOF JOIN** is the key here. For each row in `fx_trades`, it finds the most recent row in `market_data` with the same `symbol` whose timestamp is at or before the trade timestamp. This gives you the order book that was prevailing when the trade executed. + +The two slippage measures: + +- **`slippage_bps`** (vs mid) — how far the fill price deviated from the midpoint. A buy at 1.1050 when mid is 1.1048 gives positive slippage (you paid above mid). This includes roughly half the spread as a baseline cost. + +- **`slippage_vs_tob_bps`** (vs top of book) — how far the fill price deviated from the relevant side: best ask for buys, best bid for sells. If you buy at the best ask exactly, this is zero. Positive values mean you walked the book or experienced latency; negative values mean you got price improvement. + +The sign convention is the same for both sides: positive = worse execution, negative = price improvement. + +## Interpreting results + +- **Near-zero `slippage_vs_tob_bps`**: Fills are executing at or near the top of book. Typical for passive or well-timed aggressive orders. +- **Positive `slippage_vs_tob_bps`**: The fill walked beyond the best level. Common for large orders that consume top-of-book liquidity and fill at deeper levels. +- **Negative `slippage_vs_tob_bps`**: Price improvement — filled better than the quoted price. Can happen with passive fills or favorable market movement. +- **`slippage_bps` around half-spread**: Expected baseline for aggressive orders. If slippage consistently exceeds half the spread, execution quality may need attention. + +:::note Order book timeliness +The accuracy of slippage measurement depends on how frequently order book snapshots are captured. With `ASOF JOIN`, each trade is matched to the most recent snapshot, so higher-frequency snapshots yield more precise results. On the demo dataset, `market_data` updates frequently enough for meaningful analysis. +::: + +:::info Related documentation +- [ASOF JOIN](/docs/query/sql/asof-join/) +- [CASE expressions](/docs/query/sql/case/) +- [Arrays in QuestDB](/docs/query/datatypes/array/) +- [Bid-ask spread recipe](bid-ask-spread.md) +::: diff --git a/documentation/cookbook/sql/finance/vpin.md b/documentation/cookbook/sql/finance/vpin.md new file mode 100644 index 000000000..84c067ba2 --- /dev/null +++ b/documentation/cookbook/sql/finance/vpin.md @@ -0,0 +1,156 @@ +--- +title: VPIN (Volume-synchronized Probability of Informed Trading) +sidebar_label: VPIN +description: Estimate the probability of informed trading using volume-bucketed order flow imbalance +--- + +VPIN measures the probability that informed traders are active in the market by looking at order flow imbalance across fixed-volume buckets. Unlike time-based metrics, VPIN synchronizes to volume — each bucket contains the same total traded quantity, so high-activity and low-activity periods are weighted equally. + +## Problem + +You want to detect when informed traders are likely active. Time-based imbalance metrics can be noisy — a 1-minute window during a quiet period captures very different market dynamics than a 1-minute window during a news event. VPIN normalizes by volume instead of time, giving a more consistent signal. + +## Solution + +Split the trade stream into fixed-volume buckets, compute the buy/sell imbalance within each bucket, then take a rolling average over the last N buckets: + +```questdb-sql demo title="VPIN — volume-synchronized informed trading probability" +WITH bucketed AS ( + SELECT + t.timestamp, + t.symbol, + t.side, + t.price, + t.quantity, + floor( + sum(t.quantity) OVER (PARTITION BY symbol ORDER BY timestamp) + / 1000000 + ) AS vol_bucket + FROM fx_trades t + WHERE t.symbol = 'EURUSD' + AND t.timestamp IN '$yesterday' +), +bucket_stats AS ( + SELECT + symbol, + vol_bucket, + min(timestamp) AS bucket_start, + max(timestamp) AS bucket_end, + count() AS trade_count, + sum(quantity) AS total_vol, + sum(CASE WHEN side = 'buy' THEN quantity ELSE 0.0 END) AS buy_vol, + sum(CASE WHEN side = 'sell' THEN quantity ELSE 0.0 END) AS sell_vol, + abs( + sum(CASE WHEN side = 'buy' THEN quantity ELSE 0.0 END) + - sum(CASE WHEN side = 'sell' THEN quantity ELSE 0.0 END) + ) / sum(quantity) AS bucket_imbalance + FROM bucketed + GROUP BY symbol, vol_bucket +) +SELECT + symbol, + vol_bucket, + bucket_start, + bucket_end, + total_vol, + buy_vol, + sell_vol, + bucket_imbalance, + avg(bucket_imbalance) OVER ( + PARTITION BY symbol + ORDER BY vol_bucket + ROWS BETWEEN 49 PRECEDING AND CURRENT ROW + ) AS vpin +FROM bucket_stats +ORDER BY vol_bucket; +``` + +## How it works + +### Step 1 — Volume bucketing + +The first CTE assigns a `vol_bucket` ID to each trade using a cumulative volume sum divided by the bucket size (1,000,000 units). All trades within the same bucket share the same ID. This is the key difference from time-based analysis — each bucket represents the same amount of market activity regardless of how long it took. + +### Step 2 — Bucket imbalance + +For each bucket, compute the absolute imbalance between buy and sell volume as a fraction of total volume. A bucket where 90% of the volume was buy-initiated has an imbalance of 0.8 (|0.9 − 0.1|). A perfectly balanced bucket has imbalance 0.0. + +### Step 3 — Rolling VPIN + +Average the bucket imbalance over a rolling window of 50 buckets. This is the VPIN estimate. The window size controls the trade-off between responsiveness and noise — fewer buckets react faster but are noisier. + +## Interpreting results + +VPIN ranges from 0 to 1: + +- **VPIN near 0**: Order flow is balanced — roughly equal buying and selling. Low probability of informed trading. +- **VPIN near 0.5**: Moderate imbalance. Normal for trending markets. +- **VPIN above 0.7**: Heavily one-sided flow. Informed traders are likely dominating. This is the danger zone for market makers — consider widening quotes or reducing exposure. + +Watch for **VPIN spikes** — sudden jumps from a stable baseline indicate a regime change, often preceding large price moves. The 2010 Flash Crash, for example, was preceded by elevated VPIN readings. + +## Tuning parameters + +- **Bucket size** (`1000000`): Adjust per symbol to get a reasonable number of buckets per day. For major FX pairs with billions in daily volume, 1M per bucket is fine. For less liquid instruments, reduce the bucket size. +- **Rolling window** (`50 buckets`): The original VPIN paper uses 50 buckets. Shorter windows (20–30) are more responsive but noisier. Longer windows (100+) give a smoother signal but lag. +- **Symbol filter**: VPIN is computed per symbol. The `WHERE t.symbol = 'EURUSD'` filter ensures volume bucketing doesn't mix symbols. To compute VPIN for multiple symbols, remove the filter — the `PARTITION BY symbol` in the window function handles separation. + +## VPIN per ECN + +Partition by ECN to see which venues carry more informed flow. An ECN with consistently higher VPIN is attracting (or routing) more informed traders: + +```questdb-sql demo title="VPIN per ECN" +WITH bucketed AS ( + SELECT + t.timestamp, + t.symbol, + t.ecn, + t.side, + t.quantity, + floor( + sum(t.quantity) OVER (PARTITION BY symbol, ecn ORDER BY timestamp) + / 1000000 + ) AS vol_bucket + FROM fx_trades t + WHERE t.symbol = 'EURUSD' + AND t.timestamp IN '$yesterday' +), +bucket_stats AS ( + SELECT + symbol, + ecn, + vol_bucket, + min(timestamp) AS bucket_start, + max(timestamp) AS bucket_end, + sum(quantity) AS total_vol, + abs( + sum(CASE WHEN side = 'buy' THEN quantity ELSE 0.0 END) + - sum(CASE WHEN side = 'sell' THEN quantity ELSE 0.0 END) + ) / sum(quantity) AS bucket_imbalance + FROM bucketed + GROUP BY symbol, ecn, vol_bucket +) +SELECT + symbol, + ecn, + vol_bucket, + bucket_start, + bucket_end, + total_vol, + bucket_imbalance, + avg(bucket_imbalance) OVER ( + PARTITION BY symbol, ecn + ORDER BY vol_bucket + ROWS BETWEEN 49 PRECEDING AND CURRENT ROW + ) AS vpin +FROM bucket_stats +ORDER BY ecn, vol_bucket; +``` + +Compare VPIN time series across ECNs. An ECN that shows elevated VPIN while others stay flat is where informed flow is concentrated. Combine with the [ECN scorecard](ecn-scorecard.md) to cross-reference against markout-based toxicity — the two signals should align. When they diverge (high VPIN but flat markouts), the imbalance may be from correlated retail flow rather than informed trading. + +:::info Related documentation +- [Window functions](/docs/query/functions/window-functions/overview/) +- [Aggressor volume imbalance recipe](aggressor-volume-imbalance.md) +- [ECN scorecard recipe](ecn-scorecard.md) +::: diff --git a/documentation/query/sql/asof-join.md b/documentation/query/sql/asof-join.md index cf0992b3b..f9836a278 100644 --- a/documentation/query/sql/asof-join.md +++ b/documentation/query/sql/asof-join.md @@ -350,6 +350,14 @@ To summarize: table with no designated timestamp, if and only if you are certain the data is already sorted. +### Mixed-precision timestamps + +ASOF JOIN handles tables with different timestamp resolutions automatically. For +example, you can join a `TIMESTAMP` (microsecond) table with a `TIMESTAMP_NS` +(nanosecond) table without explicit casting — QuestDB aligns the timestamps +internally. This also applies to [LT JOIN](/docs/query/sql/join/#lt-join) and +[SPLICE JOIN](/docs/query/sql/join/#splice-join). + ### TOLERANCE clause The `TOLERANCE` clause enhances ASOF and LT JOINs by limiting how far back in @@ -415,6 +423,14 @@ distributions. If you query is performing poorly, consult the [SQL optimizer hints](/docs/concepts/deep-dive/sql-optimizer-hints) page and try out the non-default algorithms. +## Cookbook recipes using ASOF JOIN + +For practical examples of `ASOF JOIN` in financial analysis workflows: + +- [Slippage per fill](/docs/cookbook/sql/finance/slippage/) — pair each trade with the prevailing order book to measure execution quality +- [Aggregated slippage](/docs/cookbook/sql/finance/slippage-aggregated/) — compare slippage across venues, counterparties, and order types +- [Implementation shortfall (order)](/docs/cookbook/sql/finance/implementation-shortfall-order/) — calculate total execution cost per order vs arrival price + ## SPLICE JOIN Want to join all records from both tables? diff --git a/documentation/query/sql/horizon-join.md b/documentation/query/sql/horizon-join.md new file mode 100644 index 000000000..7f88a2dac --- /dev/null +++ b/documentation/query/sql/horizon-join.md @@ -0,0 +1,203 @@ +--- +title: HORIZON JOIN keyword +sidebar_label: HORIZON JOIN +description: + Reference documentation for HORIZON JOIN, a specialized time-series join + for markout analysis and event impact studies in QuestDB. +--- + +HORIZON JOIN is a specialized time-series join designed for **markout analysis** +— a common financial analytics pattern where you need to analyze how prices or +metrics evolve at specific time offsets relative to events (e.g., trades, +orders). + +It is a variant of the [`JOIN` keyword](/docs/query/sql/join/) that combines +[ASOF JOIN](/docs/query/sql/asof-join/) matching with a set of forward (or +backward) time offsets, computing aggregations at each offset in a single pass. + +## Syntax + +### RANGE form + +Generate offsets at regular intervals from `FROM` to `TO` (inclusive) with the +given `STEP`: + +```questdb-sql title="HORIZON JOIN with RANGE" +SELECT [,] +FROM AS +HORIZON JOIN AS [ON ()] +RANGE FROM TO STEP AS +[GROUP BY ] +[ORDER BY ...] +``` + +For example, `RANGE FROM 0s TO 5m STEP 1m` generates offsets at 0s, 1m, 2m, +3m, 4m, 5m. + +### LIST form + +Specify explicit offsets as interval literals: + +```questdb-sql title="HORIZON JOIN with LIST" +SELECT [,] +FROM AS +HORIZON JOIN AS [ON ()] +LIST (, ...) AS +[GROUP BY ] +[ORDER BY ...] +``` + +For example, `LIST (0, 1s, 5s, 30s, 1m)` generates offsets at those specific +points. Offsets must be monotonically increasing. Unitless `0` is allowed as +shorthand for zero offset. + +## How it works + +For each row in the left-hand table and each offset in the horizon: + +1. Compute `left_timestamp + offset` +2. Perform an ASOF match against the right-hand table at that computed timestamp +3. When join keys are provided (via `ON`), only right-hand rows matching the + keys are considered + +Results are implicitly grouped by the non-aggregate SELECT columns (horizon +offset, left-hand table keys, etc.), and aggregate functions are applied across +all matched rows. + +## The horizon pseudo-table + +The `RANGE` or `LIST` clause defines a virtual table of time offsets, aliased by +the `AS` clause. This pseudo-table exposes two columns: + +| Column | Type | Description | +|--------|------|-------------| +| `.offset` | `LONG` | The offset value in the left-hand table's designated timestamp resolution. For example, with microsecond timestamps, `h.offset / 1000000` converts to seconds; with nanosecond timestamps, `h.offset / 1000000000` converts to seconds or `h.offset / 1000000` converts to milliseconds. | +| `.timestamp` | `TIMESTAMP` | The computed horizon timestamp (`left_timestamp + offset`). Available for grouping or expressions. | + +## Interval units + +All offset values in `RANGE` (`FROM`, `TO`, `STEP`) and `LIST` **must include a +unit suffix**. Bare numbers are not valid — write `5s`, not `5` or `5000000000`. +The only exception is `0`, which is allowed without a unit as shorthand for zero +offset. + +Both `RANGE` and `LIST` use the same interval expression syntax as +[SAMPLE BY](/docs/query/sql/sample-by/): + +| Unit | Meaning | +|------|---------| +| `n` | Nanoseconds | +| `U` | Microseconds | +| `T` | Milliseconds | +| `s` | Seconds | +| `m` | Minutes | +| `h` | Hours | +| `d` | Days | +| `w` | Weeks | + +Note that `h.offset` is always returned as a `LONG` in the left-hand table's +timestamp resolution (e.g., nanoseconds for `TIMESTAMP_NS` tables), regardless +of the unit used in the `RANGE` or `LIST` definition. When matching offset +values in a `PIVOT ... FOR offset IN (...)` clause, use the raw numeric value +(e.g., `1800000000000` for 30 minutes in nanoseconds), not the interval literal. + +## Examples + +The examples below use the [demo dataset](/docs/cookbook/demo-data-schema/) tables +`fx_trades` (trade executions) and `market_data` (order book snapshots with 2D +arrays for bids/asks). + +### Post-trade markout at uniform horizons + +Measure the average mid-price at 1-second intervals after each trade — a classic +way to evaluate execution quality and price impact: + +```questdb-sql title="Post-trade markout curve" +SELECT + h.offset / 1000000000 AS horizon_sec, + t.symbol, + avg((m.best_bid + m.best_ask) / 2) AS avg_mid +FROM fx_trades AS t +HORIZON JOIN market_data AS m ON (symbol) +RANGE FROM 1s TO 60s STEP 1s AS h +ORDER BY t.symbol, horizon_sec; +``` + +Since `fx_trades` uses nanosecond timestamps (`TIMESTAMP_NS`), `h.offset` is in +nanoseconds. Dividing by 1,000,000,000 converts to seconds. + +### Markout P&L at non-uniform horizons + +Compute the average post-trade markout at specific horizons using `LIST`: + +```questdb-sql title="Markout at specific time points" +SELECT + h.offset / 1000000000 AS horizon_sec, + t.symbol, + avg((m.best_bid + m.best_ask) / 2 - t.price) AS avg_markout +FROM fx_trades AS t +HORIZON JOIN market_data AS m ON (symbol) +LIST (1s, 5s, 30s, 1m) AS h +ORDER BY t.symbol, horizon_sec; +``` + +### Pre- and post-trade price movement + +Use negative offsets to see price levels before and after trades — useful for +detecting information leakage or adverse selection: + +```questdb-sql title="Price movement around trade events" +SELECT + h.offset / 1000000000 AS horizon_sec, + t.symbol, + avg((m.best_bid + m.best_ask) / 2) AS avg_mid, + count() AS sample_size +FROM fx_trades AS t +HORIZON JOIN market_data AS m ON (symbol) +RANGE FROM -5s TO 5s STEP 1s AS h +ORDER BY t.symbol, horizon_sec; +``` + +### Volume-weighted markout + +Compute an overall volume-weighted markout without grouping by symbol: + +```questdb-sql title="Volume-weighted markout across all symbols" +SELECT + h.offset / 1000000000 AS horizon_sec, + sum(((m.best_bid + m.best_ask) / 2 - t.price) * t.quantity) + / sum(t.quantity) AS vwap_markout +FROM fx_trades AS t +HORIZON JOIN market_data AS m ON (symbol) +RANGE FROM 1s TO 60s STEP 1s AS h +ORDER BY horizon_sec; +``` + +## Mixed-precision timestamps + +The left-hand and right-hand tables can use different timestamp resolutions +(e.g., `TIMESTAMP` with microseconds and `TIMESTAMP_NS` with nanoseconds). +QuestDB aligns the timestamps internally — no explicit casting is needed. + +When the tables differ in resolution, `h.offset` uses the resolution of the +**left-hand table** (the event table). + +## Current limitations + +- **No other joins**: HORIZON JOIN cannot be combined with other joins in the + same level of the query. Joins can be done in an outer query. +- **No right-hand side filter**: `WHERE` clause filters apply to the left-hand + table only; right-hand table filters are not yet supported. +- **Both tables must have a designated timestamp**: The left-hand and right-hand + tables must each have a designated timestamp column. +- **RANGE constraints**: `STEP` must be positive; `FROM` must be less than or + equal to `TO`. +- **LIST constraints**: Offsets must be interval literals (e.g., `1s`, `-2m`, + `0`) and monotonically increasing. + +:::info Related documentation +- [ASOF JOIN](/docs/query/sql/asof-join/) +- [JOIN](/docs/query/sql/join/) +- [SAMPLE BY](/docs/query/sql/sample-by/) +- [Markout analysis recipe](/docs/cookbook/sql/finance/markout/) +::: diff --git a/documentation/query/sql/window-join.md b/documentation/query/sql/window-join.md index 35df75f95..581924f81 100644 --- a/documentation/query/sql/window-join.md +++ b/documentation/query/sql/window-join.md @@ -85,6 +85,12 @@ JOIN. 4. Symbol-based join conditions enable "Fast Join" optimization when matching on symbol columns +## Mixed-precision timestamps + +The left and right tables can use different timestamp resolutions (e.g., +`TIMESTAMP` with microseconds and `TIMESTAMP_NS` with nanoseconds). QuestDB +aligns the timestamps internally — no explicit casting is needed. + ## Aggregate functions WINDOW JOIN supports all aggregate functions on the right table. However, the diff --git a/documentation/sidebars.js b/documentation/sidebars.js index 77e5b3a68..826441907 100644 --- a/documentation/sidebars.js +++ b/documentation/sidebars.js @@ -418,6 +418,7 @@ module.exports = { "query/sql/distinct", "query/sql/fill", "query/sql/group-by", + "query/sql/horizon-join", "query/sql/join", "query/sql/latest-on", "query/sql/limit", @@ -730,6 +731,30 @@ module.exports = { label: "Tutorials & Cookbook", type: "category", items: [ + { + type: "category", + label: "Post-Trade Analysis", + collapsed: false, + link: { + type: "doc", + id: "cookbook/sql/finance/post-trade-overview", + }, + items: [ + { + type: "doc", + id: "cookbook/sql/finance/post-trade-overview", + label: "Overview", + }, + "cookbook/sql/finance/slippage", + "cookbook/sql/finance/slippage-aggregated", + "cookbook/sql/finance/markout", + "cookbook/sql/finance/last-look", + "cookbook/sql/finance/implementation-shortfall", + "cookbook/sql/finance/implementation-shortfall-order", + "cookbook/sql/finance/ecn-scorecard", + "cookbook/sql/finance/vpin", + ], + }, { type: "category", label: "Cookbook", @@ -752,33 +777,85 @@ module.exports = { }, items: [ { - type: "doc", - id: "cookbook/sql/finance/index", - label: "Overview", + type: "category", + label: "Price-Based Indicators", + collapsed: true, + items: [ + "cookbook/sql/finance/ohlc", + "cookbook/sql/finance/vwap", + "cookbook/sql/finance/bollinger-bands", + "cookbook/sql/finance/bollinger-bandwidth", + ], + }, + { + type: "category", + label: "Momentum Indicators", + collapsed: true, + items: [ + "cookbook/sql/finance/rsi", + "cookbook/sql/finance/macd", + "cookbook/sql/finance/stochastic", + "cookbook/sql/finance/rate-of-change", + ], + }, + { + type: "category", + label: "Volatility Indicators", + collapsed: true, + items: [ + "cookbook/sql/finance/atr", + "cookbook/sql/finance/rolling-stddev", + "cookbook/sql/finance/donchian-channels", + "cookbook/sql/finance/keltner-channels", + "cookbook/sql/finance/realized-volatility", + ], + }, + { + type: "category", + label: "Volume & Order Flow", + collapsed: true, + items: [ + "cookbook/sql/finance/obv", + "cookbook/sql/finance/volume-profile", + "cookbook/sql/finance/volume-spike", + "cookbook/sql/finance/aggressor-volume-imbalance", + "cookbook/sql/finance/vpin", + ], + }, + { + type: "category", + label: "Risk Metrics", + collapsed: true, + items: [ + "cookbook/sql/finance/maximum-drawdown", + ], + }, + { + type: "category", + label: "Market Microstructure", + collapsed: true, + items: [ + "cookbook/sql/finance/bid-ask-spread", + "cookbook/sql/finance/liquidity-comparison", + ], + }, + { + type: "category", + label: "Market Breadth", + collapsed: true, + items: [ + "cookbook/sql/finance/tick-trin", + ], + }, + { + type: "category", + label: "Math Utilities", + collapsed: true, + items: [ + "cookbook/sql/finance/compound-interest", + "cookbook/sql/finance/cumulative-product", + ], }, - "cookbook/sql/finance/aggressor-volume-imbalance", - "cookbook/sql/finance/atr", - "cookbook/sql/finance/bid-ask-spread", - "cookbook/sql/finance/bollinger-bands", - "cookbook/sql/finance/bollinger-bandwidth", - "cookbook/sql/finance/compound-interest", - "cookbook/sql/finance/cumulative-product", - "cookbook/sql/finance/donchian-channels", - "cookbook/sql/finance/keltner-channels", - "cookbook/sql/finance/liquidity-comparison", - "cookbook/sql/finance/macd", - "cookbook/sql/finance/maximum-drawdown", - "cookbook/sql/finance/obv", - "cookbook/sql/finance/ohlc", - "cookbook/sql/finance/rate-of-change", - "cookbook/sql/finance/realized-volatility", - "cookbook/sql/finance/rolling-stddev", - "cookbook/sql/finance/rsi", - "cookbook/sql/finance/stochastic", - "cookbook/sql/finance/tick-trin", - "cookbook/sql/finance/volume-profile", - "cookbook/sql/finance/volume-spike", - "cookbook/sql/finance/vwap", ], }, { diff --git a/plugins/remote-repo-example/index.js b/plugins/remote-repo-example/index.js index 036341ff9..818566a99 100644 --- a/plugins/remote-repo-example/index.js +++ b/plugins/remote-repo-example/index.js @@ -59,10 +59,12 @@ module.exports = () => ({ } const codeUrl = `${rawRoot}/${path}` const codeRequest = await nodeFetch(codeUrl) - if (codeRequest.status != 200) - throw new Error( - `Could not load code from ${codeUrl}: Error ${codeRequest.status}`, + if (codeRequest.status != 200) { + console.warn( + `[remote-repo-example] Skipping ${codeUrl}: Error ${codeRequest.status}`, ) + continue + } const code = await codeRequest.text() const id = `${name}/${lang}` examples[id] = {