tool_dispatch (0-arg)per call | measured | 0.60 µs p95 0.60 · p99 1.10 µs · n=10000 | 2026-06-20 | Runner-side cost of dispatching a zero-arg scoped tool: name lookup, ToolResponse wrap. Excludes the tool's own work + any network. Reproduce: pytest benches/bench_tool_dispatch.py::test_tool_dispatch_0arg -s |
tool_dispatch (5-arg)per call | measured | 0.70 µs p95 0.70 · p99 0.90 µs · n=10000 | 2026-06-20 | Same path, 5-element input dict, median operator-registered shape. Reproduce: pytest benches/bench_tool_dispatch.py::test_tool_dispatch_5arg -s |
tool_dispatch (20-arg)per call | measured | 1.00 µs p95 1.10 · p99 1.30 µs · n=10000 | 2026-06-20 | Wide-arg dispatch, long-tail tools like melaya_create_order with all optional risk params filled in. Reproduce: pytest benches/bench_tool_dispatch.py::test_tool_dispatch_20arg -s |
pipeline_step_transition (linear)per step, 10-step chain | measured | 0.22 µs p95 0.23 · p99 0.24 µs · n=2000 | 2026-06-20 | Time from one pipeline step completing to the next being invoked, in a linear chain. Pure runner overhead (graph walk + variable binding + await). Reproduce: pytest benches/bench_pipeline_orchestration.py::test_pipeline_linear -s |
pipeline_step_transition (parallel)per step, 10-step fanout | measured | 3.32 µs p95 3.64 · p99 5.88 µs · n=2000 | 2026-06-20 | Same transition cost in a parallel fanout via asyncio.gather. Higher than linear here: at N=10 the gather’s scheduling setup dominates, and it only drops below linear once steps block on real I/O. Reproduce: pytest benches/bench_pipeline_orchestration.py::test_pipeline_parallel -s |
registry_bootper cold boot · register-only | measured | 4.36 ms p95 5218.80 · p99 6368.90 µs · n=30 | 2026-06-20 | The runtime walks its tool + crew modules at boot. The bench measures the introspect+register step on 250 synthetic tools (production adds Python import-time on top, this number is register-only). Reproduce: pytest benches/bench_registry_boot.py -s |
rag_retrieve (10k chunks)per query, top-5 | measured | 281.10 µs p95 447.10 · p99 782.10 µs · n=2000 | 2026-06-20 | embed(query) + brute-force kNN + chunk hydration over a 10k-chunk in-memory index. A production ANN index is 1.5-3× faster. Reproduce: pytest benches/bench_rag_retrieval.py::test_rag_retrieval_10k -s |
rag_retrieve (100k chunks)per query, top-5 | measured | 5.52 ms p95 8450.40 · p99 9662.40 µs · n=2000 | 2026-06-20 | Same path, 10× larger corpus. Brute force is O(N·D) so expect ~10-15× growth in p50 vs the 10k bench. Reproduce: pytest benches/bench_rag_retrieval.py::test_rag_retrieval_100k -s |
model_wrapper_overheadper LLM turn (network mocked) | measured | 1.60 µs p95 2.00 · p99 2.80 µs · n=1000 | 2026-06-20 | Runner overhead around a model API call: prompt assembly, message-history pack, post-response routing. Provider HTTP boundary mocked to isolate runner cost from network. Reproduce: pytest benches/bench_model_wrapper_overhead.py -s |
context_assemblyper turn | measured | 1.40 µs p95 1.60 · p99 1.80 µs · n=5000 | 2026-06-20 | Builds the static context block a turn sends the model: system prompt + granted knowledge docs + tool schemas. Distinct from rolling history (model_wrapper) and RAG retrieval. Reproduce: pytest benches/bench_context_assembly.py -s |
session_memoryper save + load | measured | 53.00 µs p95 76.30 · p99 113.30 µs · n=5000 | 2026-06-20 | Cross-run working-memory persistence: serialize a 50-turn crew memory to the session store and restore it on the next run. In-process store, so no DB latency is included. Reproduce: pytest benches/bench_session_memory.py::test_session_memory_roundtrip -s |
cost_trackingper model call | measured | 0.40 µs p95 0.40 · p99 0.60 µs · n=10000 | 2026-06-20 | Records one model call's token usage against a price table and updates the running USD total plus per-model breakdown. This is what enables per-tenant billing and spend caps. Reproduce: pytest benches/bench_cost_tracking.py -s |
tracing_overheadper span | measured | 0.30 µs p95 1.10 · p99 1.40 µs · n=10000 | 2026-06-20 | Per-span observability tax: open an OpenTelemetry-style span, stamp the gen_ai / cost / latency attributes, close, and hand to the exporter. What enabling tracing adds per traced operation. Reproduce: pytest benches/bench_tracing_overhead.py -s |
crew_orchestrationper 4-persona run | measured | 1.20 µs p95 2.00 · p99 2.10 µs · n=2000 | 2026-06-20 | A 4-persona crew (macro, technical, risk, execution) hands context persona to persona, with the risk persona armed to veto and halt the chain mid-run. Pure orchestration overhead. Reproduce: pytest benches/bench_crew_orchestration.py -s |
prompt_injection_scanper untrusted input | measured | 17.40 µs p95 26.40 · p99 30.50 µs · n=10000 | 2026-06-20 | The prompt-injection scan run on untrusted content (RAG-retrieved docs, tool outputs) before it reaches the model: weighted pattern match against injection / jailbreak / exfiltration markers, then allow / flag / block. Wired into rag.py and the tool-output postprocess. Reproduce: pytest benches/bench_prompt_injection.py -s |
hitl_gate_overheadper write attempt | measured | 0.30 µs p95 0.40 · p99 0.40 µs · n=10000 | 2026-06-20 | The synchronous safety checks run before every write is queued for approval: sidecar-state read (a reactive watcher that can halt a run), per-cycle write cap, per-tenant daily quota, running cost cap. The trading-grade-discipline machinery, measured, distinct from the human wait below. Reproduce: pytest benches/bench_hitl_gate_overhead.py -s |
hitl_approval_round_triphuman-bound | method only | method documented | n/a | Time from 'approval requested' to 'approval received', median over real operator sessions. Dominated by human attention; cannot be benched synthetically. Methodology documented; awaiting a 30-day production telemetry cut. Reproduce: see results/hitl_round_trip/methodology_only.json |
concurrent_agent_executionsplatform limit | config | 50 | - | Configurable per-workspace cap on simultaneous agent runs (default 50); backpressure queues the rest. A deployment config knob, not a measurement. Reproduce: configured in deployment, not benched |