Design Philosophy
Single-threaded, not multi-threaded. For a CLOB, sequential ordering is a requirement, not a limitation. Every order that modifies the book must see the effects of all prior orders. Parallelism introduces races that require synchronization, which negates the throughput gain and adds nondeterminism. A single-threaded design achieves:- Strict price-time priority with no edge cases
- Deterministic execution that the zkvm prover can reproduce exactly
- Zero synchronization overhead on the hot path
- Simple reasoning about consistency — no locks, no CAS loops
Price-Time Priority
The engine implements standard CLOB price-time priority:- Price priority: Better-priced orders execute first. For buyers, higher prices have priority. For sellers, lower prices have priority.
- Time priority: Among orders at the same price, earlier orders execute first.
BTreeMap<FixedPoint, VecDeque<Order>> per side:
- Asks: sorted ascending (best ask = lowest price = first entry)
- Bids: sorted descending (best bid = highest price = first entry, via
Reversekey wrapper)
CoW Cache Execution Flow
The Copy-on-Write cache is a performance optimization that eliminates redundant state reads on the hot path.Fixed-Point Arithmetic
All prices and quantities in the engine use a customFixedPoint type: a 64-bit integer with an implicit scale factor of 1,000,000.
Order Type Handling
The matching loop handles all four TIF variants in a unifiedmatch_order() function:
Credit System Integration
After each fill is computed, the engine checks the maker’s credit utilization:CommitBatch Dispatch
At the end of each processing batch (configurable interval, default 10ms), the engine:- Drains the cache diff into a
StateDelta - Serializes all requests and fills into a
CommitBatch - Sends
CommitBatchto the committer over an async channel - Clears the processed request queue
- Begins the next batch
Benchmarks
All benchmarks use Criterion.rs with 100 samples, outlier rejection, and a synthetic order stream that uniformly samples across all price levels.| Benchmark | p50 | p99 | p99.9 | Throughput |
|---|---|---|---|---|
| match_order (limit, no fill) | 0.81 μs | 1.4 μs | 2.1 μs | 67k/s |
| match_order (limit, full fill) | 1.38 μs | 2.3 μs | 5.1 μs | 725k/s |
| match_order (market, walk 5 levels) | 1.92 μs | 4.1 μs | 8.7 μs | 32k/s |
| cancel_order | 0.42 μs | 0.9 μs | 1.3 μs | 140k/s |
| Metric | Vela | Pulse | Improvement |
|---|---|---|---|
| p50 latency | 1.38 μs | 5.1 μs | 5.8× faster |
| p99.9 latency | 5.1 μs | 19.2 μs | −73% |
| Throughput | 725k/s | 12.2k/s | 5.8× higher |
cargo bench --bench engine.