The headline 310 ns is pure engine speed: recv_wall_ns − state_cache_updated_ns, captured inside our Rust code with a monotonic clock. It does NOT include network distance, TLS, or the time a venue spent aggregating a frame before sending it. End-to-end user-visible latency includes those. Anything you see in the 1-200 ms range elsewhere is network plus venue, not the framework. Both kinds of numbers are real; they measure different segments of the same timeline.
One MarketState::update_ticker_owned call. HashMap entry update + OHLCV live-mirror. 89,033 samples on engine 0.4.48, monotonic clock.
Every measurement here is captured inside handle_messages on engine 0.4.48 with a monotonic clock. State-cache writes finish in nanoseconds. Pipeline operations that include JSON parse + dispatch land in single-digit microseconds. The full pipeline end-to-end (socket read to state visible) is sub-15 µs at p50. Nothing crosses a millisecond on the hot path.
| metric | n | p50 | p99 | note |
|---|---|---|---|---|
state_ticker_ns | 89,033 | 310 ns | 1.95 µs | Ticker cache write + OHLCV live-mirror. The headline. |
state_mark_price_ns | 120 | 2.15 µs | 3.73 µs | Funding rate + open-position uPnL recompute. |
state_order_update_ns | 3,458 | 3.69 µs | 13.86 µs | Private order-update event apply. |
state_ob_snap_ns | 16,406 | 4.44 µs | 17.42 µs | Orderbook snapshot apply + write_book. |
state_ob_delta_ns | 102,549 | 5.51 µs | 16.34 µs | Orderbook delta + write_book. |
parse_ns | 176,555 | 1.76 µs | 77.95 µs | WS frame parse (text or binary). |
end_to_end_ns | 176,555 | 14.40 µs | 248.96 µs | ws.read return to engine state visible. Parse + dispatch + state write end-to-end. |
Engine latency is pure in-process compute. No network. CPU model, frequency governor, and thermal headroom all move the p50, but on any "engine tier" hardware the number stays sub-microsecond. A throttled laptop on battery can drift past 1 µs — that's the laptop, not the engine, and the bench harness README documents how to spot it (debug build, low-power state, slow clock source). Tier A is the maintainer-measured production probe; B-D are estimates pending community PRs.
| tier | hardware | config | p50 | p95 |
|---|---|---|---|---|
| A | Xeon Plat 8369B (Ice Lake) | Linux, pinned core, SCHED_FIFO, no turbo | 310 ns | 980 ns |
| B | Xeon Gold 6438 / EPYC 9354 | Linux, performance governor | 350–500 ns | 0.7–1.2 µs |
| C | Apple Silicon (M2 / M3 / M4) | macOS 14+, native arm64, plugged in | 250–450 ns | 0.6–1.1 µs |
| D | i7-12700K / Ryzen 7700X | Win11 high-perf or Linux perf gov | 400–650 ns | 0.8–1.4 µs |
The bench harness ships in the public OSS repo as a self-contained Rust crate. No path-dependency on the engine source. Three commands from a fresh clone.
# 1. Clone the public OSS repo git clone https://github.com/melaya-labs/melaya.git cd melaya/benchmarks/engine # 2. Run the criterion bench (~100k iterations, ~30 seconds) cargo bench --bench state_ticker # 3. Read the per-iteration CSV + summary cat results/state_ticker_ns.csv | head cat results/summary.json
For comparable numbers across machines, use the pinned Docker variant or the helper scripts shipped under scripts/. Both disable turbo, pin to a specific core, and run in performance governor.