perf(core): Deep dive optimizations for hot path by cofin · Pull Request #355 · litestar-org/sqlspec

cofin · 2026-02-02T19:20:12Z

This PR implements deep dive optimizations identified in the core-hotpath-opt flow.

Key Changes

Query Cache (_qc_*): LRU cache for prepared statements - bypasses SQL parsing and parameter transformation on repeated queries
Micro-caching: Single-slot cache in SQLProcessor to bypass dictionary lookups for repeated queries
String Fast Paths: Internal SQL object caching for raw string statements in prepare_statement
Parameter Optimization: Optimized SQL.copy to fast-track parameter updates and streamlined parameter fingerprinting
Observability: Added is_idle check to bypass expensive instrumentation overhead when disabled
Result Construction: Optimized ExecutionResult creation and metadata handling

Benchmark Results (10k rows, sqlite)

┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Driver ┃ Library    ┃ Scenario          ┃ Time (s) ┃ % Slower vs Raw ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ sqlite │ raw        │ initialization    │   0.0018 │               — │
│ sqlite │ sqlspec    │ initialization    │   0.0035 │           98.2% │
│ sqlite │ sqlalchemy │ initialization    │   0.0852 │         4694.1% │
│ sqlite │ raw        │ write_heavy       │   0.0817 │               — │
│ sqlite │ sqlspec    │ write_heavy       │   0.0773 │           -5.4% │
│ sqlite │ sqlalchemy │ write_heavy       │   0.0493 │          -39.6% │
│ sqlite │ raw        │ read_heavy        │   0.0238 │               — │
│ sqlite │ sqlspec    │ read_heavy        │   0.0393 │           65.1% │
│ sqlite │ sqlalchemy │ read_heavy        │   0.0346 │           45.6% │
│ sqlite │ raw        │ iterative_inserts │   0.0188 │               — │
│ sqlite │ sqlspec    │ iterative_inserts │   0.2947 │         1465.1% │
│ sqlite │ sqlalchemy │ iterative_inserts │   0.4270 │         2167.7% │
│ sqlite │ raw        │ repeated_queries  │   9.1935 │               — │
│ sqlite │ sqlspec    │ repeated_queries  │   9.3915 │            2.2% │
│ sqlite │ sqlalchemy │ repeated_queries  │   9.9289 │            8.0% │
└────────┴────────────┴───────────────────┴──────────┴─────────────────┘

How to interpret these results

Scenario	What it tests	sqlspec	sqlalchemy
write_heavy	Bulk insert via `execute_many`	-5% (faster!)	-40%
read_heavy	Bulk read via `fetchall`	+65%	+46%
iterative_inserts	Individual inserts in a loop	+1465%	+2167%
repeated_queries	Same SELECT with varying params	+2.2%	+8.0%

Key insight: The repeated_queries scenario shows the query cache in action. When the same SQL statement is executed repeatedly with different parameters:

First execution: Full parsing, parameter transformation, and statement preparation
Subsequent executions: Cache hit → skip parsing → directly bind new parameters

This reduces sqlspec's overhead from ~1500% (iterative inserts) to just ~2% (repeated queries).

Why iterative_inserts is slow

Each call to session.execute() must:

Parse the SQL string
Transform parameters to native format
Build the Statement object
Execute and build result

For bulk operations, use execute_many() which amortizes this cost across all rows.

Benchmark Tooling

Added scripts/bench.py (originally from @euri10's PR #354) with enhancements:

uv run python scripts/bench.py --driver sqlite --rows 10000

Scenarios:

initialization - Connection and table setup overhead
write_heavy - Bulk insert via execute_many
read_heavy - Bulk insert + fetchall
iterative_inserts - Individual execute calls in a loop
repeated_queries - Single-row queries with varying params (tests query cache)

…result building

…ation results

- Add internal SQL object cache for string statements - Optimize SQL.copy to bypass initialization - Implement micro-cache in SQLProcessor for repeated queries - Optimize observability idle check - Streamline parameter processing and result construction

- Remove unnecessary dict() copy in _unpack_parse_cache_entry - Remove expression.copy() on parse cache store (only copy on retrieve when needed) - Defer expression.copy() to _apply_ast_transformers when transformers active - Fast type dispatch (type(x) is dict) vs ABC isinstance checks - Remove sorted() for dict keys in structural fingerprinting (use insertion order) - Cache is_idle check in ObservabilityRuntime (lifecycle/observers immutable) - Use frozenset intersection for parameter char detection in validator - Optimize ParameterProfile.styles computation for single-style case Benchmark (10,000 INSERTs): - Before: ~20x slowdown vs raw sqlite3 - After: ~15.5x slowdown (tuple params), ~18.8x (dict params) - Function calls reduced: 1.33M → 1.18M (11% fewer) - isinstance() calls reduced: 280k → 200k (28% fewer)

Add benchmark functions to isolate SQLGlot overhead: - bench_sqlite_sqlglot: Cached SQL (minimal overhead) - bench_sqlite_sqlglot_copy: expression.copy() per call - bench_sqlite_sqlglot_nocache: .sql() regeneration per call These help identify whether overhead comes from SQLGlot parsing/generation vs SQLSpec's own processing. Key findings: - SQLGlot cached parsing adds ~0% overhead - expression.copy() per call: 16x overhead (synthetic) - SQLSpec actual overhead: distributed across pipeline

- Updated type hints to use the new syntax for union types in driver.py, _async.py, and _common.py. - Improved readability by formatting long lines and breaking them into multiple lines in driver.py and _common.py. - Removed unnecessary comments and cleaned up import statements in config.py and typing.py. - Enhanced exception handling in AsyncMigrationCommands to use async input for user confirmation. - Refactored logic in CorrelationExtractor to simplify return statements. - Updated the write_fixture_async function to use AsyncPath for resolving paths asynchronously. - Improved test readability and consistency in test_sync_adapters.py and test_fast_path.py by formatting long lines.

- Create new sqlspec/driver/_query_cache.py module - Move CachedQuery namedtuple and QueryCache class - Rename _QueryCache to QueryCache (now public) - Rename _FAST_PATH_QUERY_CACHE_SIZE to QC_MAX_SIZE - Add clear() and __len__() methods to QueryCache - Update test imports - Remove unused OrderedDict import from _common.py Part of driver-arch-cleanup PRD, Chapter 1: qc-extract

Attribute renames: - _fast_path_binder → _qc_binder - _fast_path_enabled → _qc_enabled - _query_cache → _qc Method renames: - _update_fast_path_flag → _update_qc_flag - _fast_rebind → qc_rebind - _build_fast_statement → qc_build - _try_cached_compiled → qc_lookup - _execute_compiled → qc_execute - _maybe_cache_fast_path → qc_store - _configure_fast_path_binder → _configure_qc_binder Test file renamed: test_fast_path.py → test_query_cache.py Part of driver-arch-cleanup PRD, Chapter 2: qc-rename

…ation Move eligibility checks and preparation logic from qc_lookup into new qc_prepare method in _common.py. This eliminates ~15 lines of duplicated logic between sync and async implementations. Before: qc_lookup in both _common.py and _async.py contained identical eligibility checking, cache lookup, rebinding, and statement building. After: qc_prepare does all preparation work, qc_lookup becomes a thin wrapper that calls qc_prepare then qc_execute. Chapter 3 of driver-arch-cleanup_20260203 PRD.

Move eligibility validation from qc_prepare (hot lookup path) to qc_store (store path, executed once per unique query). Before: qc_prepare had 6 condition checks including needs_static_script_compilation and many-params guard. After: qc_prepare has only 2 essential checks: 1. _qc_enabled flag 2. cache lookup + param count match All detailed validation happens at store time, ensuring only valid queries enter the cache in the first place. Chapter 4 of driver-arch-cleanup_20260203 PRD.

The base class _qc_execute now handles the full fast-path execution: - Removed SqliteDriver.qc_execute (redundant with base class) - Removed AiosqliteDriver.qc_execute (redundant with base class) - Renamed qc_lookup -> _qc_lookup (internal API) - Added unreachable assertion to _qc_execute (all paths return/raise) - Fixed return type cast in execute() fast-path The `is_script`/`is_many` branches were dead code since _qc_store filters them out before caching.

Add comprehensive benchmark tooling originally contributed by euri10 in PR #354, with enhancements for testing query cache effectiveness. Scenarios: - initialization: Connection and table setup overhead - write_heavy: Bulk insert performance (execute_many) - read_heavy: Bulk read with fetchall - repeated_queries: Single-row queries with varying params (tests _qc_*) Compares: raw driver vs sqlspec vs SQLAlchemy Drivers: sqlite (asyncpg requires PostgreSQL server) Usage: uv run python scripts/bench.py --driver sqlite --rows 10000 Co-authored-by: euri10 <benoit.barthelet@gmail.com>

- Remove SQLSPEC_RS_INSTALLED flag and get_sqlspec_rs() from _typing.py - Remove _configure_qc_binder() method and calls from config.py - Remove _qc_binder attribute and fast_path_binder handling from driver - Simplify qc_rebind() to use Python-only parameter binding - Fix anyio.to_thread.run_sync pyright errors in migrations - Fix _fast_path_enabled -> _qc_enabled rename in tests - Remove test_cached_compiled_binder_override test (tested removed feature) The query cache (_qc_*) optimizations remain fully functional - only the speculative Rust binder hook was removed until sqlspec_rs is ready.

Add aiosqlite scenarios to benchmark script: - initialization, write_heavy, read_heavy - iterative_inserts, repeated_queries - raw aiosqlite, sqlspec, and sqlalchemy variants Note: Revealed a bug in sqlspec aiosqlite pool - connections are not properly isolated between different database paths. See issue tracking.

- Fix "table already exists" errors by ensuring pools are closed before temp files are deleted - Add leak detection helper `_check_pool_leak()` to detect connection leaks in benchmarks - Use `delete=False` with NamedTemporaryFile and manually unlink after pool.close_pool() to ensure proper cleanup order - Add DROP_TEST_TABLE to all aiosqlite scenarios for consistency Closes #360

- Add fast path for recently-used connections (skip full health check) - Inline mark_as_in_use/mark_as_idle to reduce method call overhead - Skip asyncio.wait_for wrapper on acquire when connection is available - Skip timeout wrapper on release rollback (SQLite rollback is fast) - Check pool capacity without lock first before acquiring lock - Check closed state directly instead of through property Also add --pool-size parameter to benchmark CLI for testing different pool configurations. Results (repeated_queries with 1000 rows): - Before: 95.7% slower than raw - After: 43.9% slower than raw (2.2x improvement)

- Add raw, sqlspec, and sqlalchemy duckdb scenarios for all 5 benchmarks - Fix temp file handling for duckdb (needs to create file itself) - Add duckdb_engine lazy import for sqlalchemy compatibility - Confirms duckdb pool is already efficient (thread-local design) Results show duckdb sqlspec overhead is 3-12% vs raw driver, compared to 20-30% for aiosqlite after optimization. Thread-local pools (sqlite, duckdb) don't need the same hot-path optimization as queue-based pools (aiosqlite).

- Move duckdb-engine from dev to benchmarks group - Add aiosqlite to benchmarks group for async benchmark scenarios - dev group includes benchmarks via include-group

cofin changed the title ~~perf(core): Deep dive optimizations for hot path (~42% faster)~~ perf(core): Deep dive optimizations for hot path Feb 3, 2026

cofin added 29 commits February 3, 2026 15:40

feat(core): Implement TypeDispatcher utility

8a28a38

refactor(core): Optimize is_statement_filter with fast attribute check

57261c2

perf(core): Optimize _should_auto_detect_many to O(1)

1fa5d92

perf(core): Cache get_default_config() singleton

addf573

test(core): Verify cache key stability

5620ac2

perf(sqlite): Optimize resolve_rowcount to avoid Protocol check

bda9888

chore: Add pipeline benchmark script

79af95a

perf: Add benchmarks for SQLite and SQLSpec performance comparisons

66032f0

flow(revise): core-hotpath-opt - Add deep dive tasks for sqlglot and …

aa0bab3

…result building

docs(flow): core-hotpath-opt - Update plan and learnings with optimiz…

3b0a06c

…ation results

fix: Add type annotations to fix Pylance errors in _processor.py

92856d8

perf(core): track compiled cache flag

a7f8f00

perf(core): rebind cached parameters

40f6cff

perf(core): validate cached parameter structure

3b74785

chore(lint): fix lint errors

7703a78

chore(types): fix type checking

d81e255

perf(bench): profile cache hit path

e9b2029

fix(core): bypass static compile cache rebind

d7d10f1

rm ignore files

fd6210f

fix(core): recompile when filters change

7f8d1b1

fix(core): recompile when is_many toggles

cf66a6f

test(core): ensure processed state gc

cda4c47

feat(core): add object pool primitive

70d2e72

feat(core): add thread-local pool registry

8760742

feat(core): add SQL reset for pooling

6736a23

feat(core): add SQL pooled flag

195c5e1

cofin added 16 commits February 3, 2026 15:40

feat(driver): add fast path eligibility flag

e28a9be

feat(driver): add fast path query cache

576929a

feat(driver): add fast path execution helper

fb76add

feat(driver): add fast-path execute routing

189679b

feat(driver): cache fast-path artifacts

64ba8b7

feat(bench): add fast-path benchmark

bada0da

feat(driver): use raw sqlite fast path

6945d3a

feat(driver): add fast-path binder hook

2a6f547

feat(config): auto-enable sqlspec_rs binder

aac62ab

fix(driver): add ProcessedState type import

1ac6fbd

fix(driver): remove stray class body literal

1374bb7

chore: correct mypyc build

d1ea772

cofin force-pushed the feat/performance branch from c058f9d to 3833499 Compare February 3, 2026 21:40

cofin and others added 13 commits February 3, 2026 21:42

refactor(driver): streamline method signatures for consistency

767c7c9

fix(types): suppress pyright errors in query cache fast-path

fd6d89c

chore(deps): move benchmark deps to benchmarks group

3d82a19

- Move duckdb-engine from dev to benchmarks group - Add aiosqlite to benchmarks group for async benchmark scenarios - dev group includes benchmarks via include-group

chore(deps): simplify benchmarks group

473f2a8

chore: linting

5c16498

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(core): Deep dive optimizations for hot path#355

perf(core): Deep dive optimizations for hot path#355
cofin wants to merge 66 commits intomainfrom
feat/performance

cofin commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cofin commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Benchmark Results (10k rows, sqlite)

How to interpret these results

Why iterative_inserts is slow

Benchmark Tooling

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cofin commented Feb 2, 2026 •

edited

Loading