Melaya — Build AI agents for any job. Agentic platform for research, ops, outreach, reporting — and the only one where agents can actually trade.

// 05 · Agentic framework

How fast is the pipeline runner?

Tool dispatch overhead, RAG retrieval latency, model-call wrapper cost, HITL round-trip distribution, pipeline orchestration step-to-step. Reproducible from a fresh clone via pytest benches/, same convention as the engine bench.

Engine vs framework

This page measures the Python agentic framework, the runner that orchestrates pipeline steps, dispatches scoped tool calls, manages RAG retrieval, gates writes through HITL, and wraps model calls. For the in-house Rust trading engine (state-cache writes at 310 ns, full pipeline at 14 µs), see engine latency.

What the runner gives you

// governance, not just speed

The latency below proves the runner is lean. These are the guarantees that make it safe to run agents for real clients. Ten are measured on this page; the rest are how the platform is built.

01
Scoped, governed toolsA crew only sees the tools you grant it. Ungranted tools never enter the model's schema, so permissions are the dispatch primitive, not an afterthought.
0.6 µs
02
Human-in-the-loop writesEvery write runs the enforcement gate (a reactive watcher state, a per-cycle write cap, per-tenant quota, cost cap), then waits for operator approval. Reads flow freely; writes are gated.
0.3 µs
03
Per-workflow RAGEach workflow gets its own isolated vector store with hybrid retrieval, so one client's documents never bleed into another's context.
0.28 ms
04
Bring-your-own-model20+ providers behind one wrapper (Anthropic, OpenAI, Gemini, Mistral, DeepSeek, Qwen, plus local Ollama and LM Studio), one consistent shape across all.
1.6 µs
05
Cost & token accountingEvery model call is priced against a per-model table and aggregated into a running USD total, so you can bill clients and cap spend per tenant.
0.4 µs
06
Full observabilityAn OpenTelemetry span per tool call, model invocation, and pipeline run, carrying cost, tokens, latency, and error reasons. Operate agents you can actually see.
0.3 µs
07
Static context assemblyThe system prompt, granted knowledge docs, and tool schemas are packed into the context block each turn sends the model, kept separate from rolling history.
1.4 µs
08
Cross-run crew memoryA crew's working memory persists between runs and restores on the next one, so long-running agents keep their context across sessions.
53 µs
09
Agentic crewsMulti-persona crews (macro, technical, risk, execution) hand context persona to persona, with a risk veto and reactive sidecars that can halt the chain mid-run.
1.2 µs
10
Prompt-injection defenseAnything the agent reads from an untrusted source (retrieved documents, tool results, fetched web pages) is scanned for prompt-injection, jailbreak, and data-exfiltration patterns before the model can act on it. Each pattern carries a severity score, and the total decides the outcome: safe text passes through; mildly suspicious text is still passed but fenced off as data the model must not obey, and the event is logged; a clearly malicious signal, such as an attempt to leak a secret or hijack the conversation format, is dropped before the model ever sees it. How strict that cutoff is can be tuned per deployment.
17 µs
11
Credential isolationPer-user encrypted vaults. Agents act through short-lived tickets and never touch raw API keys, so a client's secrets stay scoped to that client.
AES-256
12
Multi-tenant by designProject-scoped roles and per-pipeline state isolation. One tenant's crew cannot read, halt, or spend against another's. Run many clients on one platform.
RBAC
Join the community