v1.0 — 140+ models, 17 providers

TheproductionharnessforAIagents.

Cost control, security, observability, memory, evaluation, and governance — Archon wraps around your LLM calls, not the other way around.

agent.pypython
1from archon import Agent, tool, Budget
2
3@tool
4def search(query: str) -> str:
5 """Search the web."""
6 return web_search(query)
7
8agent = Agent(
9 name="researcher",
10 tools=[search],
11 model="auto",
12 budget=Budget(max_per_run=0.50),
13)
14
15result = await agent.run("Why did SVB fail?")
16print(f"${result.cost:.4f} / {result.step_count} steps")
[ 01 ] — Capabilities

The 80% nobody wants to build twice.

Every agent framework solves the same 20% — call the LLM, run a tool, return an answer. Archon ships the rest.

Cost Control

Hard per-run, per-day, per-month budgets. Auto-route to the cheapest model that can handle the task. No surprise bills.

60–70% savings on typical workloads

Security

Default-deny policy engine. Subprocess sandbox for tool execution. Seven-category output sanitizer blocks prompt injection.

Zero trust, by default

Observability

Built-in trace store and dashboard. Every run returns cost, step count, and a trace URL. No 'add observability later' step.

SQLite, WAL mode, zero deps

Memory

Four tiers — working, episodic, semantic, procedural. Temporal decay with configurable half-life. Auto-consolidation.

Remember what matters, forget what doesn't

Evaluation

Inline schema checks, async quality scoring, regression detection. Shadow deployments to validate before promoting.

Continuous quality signal

Governance

Event sourcing for every action. RBAC for tool permissions. GDPR-compliant data export and right-to-erasure.

Audit trail that survives erasure
[ 02 ] — The five-gate pipeline

Every LLM call passes five gates.

Request comes in. It leaves with an answer, a cost, and a trace. What happens between is deterministic, observable, and safe.

Gate 0101/05

Policy Check

Is this tool allowed? Are the args within bounds? The default-deny engine evaluates every invocation before it runs.

< 1ms overhead
Gate 0202/05

Model Routing

Classify complexity from 27 lexical signals. Pick the cheapest model for the tier. Downgrade when budget tightens.

Sub-ms, no LLM call
Gate 0303/05

Execute

Call the LLM via LiteLLM (140+ models). Tool calls run in a sandboxed subprocess with a hard timeout.

Isolated. Timed. Traced.
Gate 0404/05

Validate Output

Schema check on structured output. Strip injection patterns across seven categories. Detect loops.

Seven injection categories
Gate 0505/05

Log Trace

Record model, tokens, cost, latency, tool calls, routing decision. Append to immutable audit log.

SQLite WAL, append-only
Request5 GatesAgentResult(output, cost, trace_url)
[ 03 ] — Model registry

One registry. Every model.

Pricing for every major model, used live by the router for cost-aware selection. Pin one, filter by budget, or let it pick.

0+
Models in registry
0
Providers
0%
Avg cost reduction
<0ms
Trace overhead
OpenAI
Anthropic
Google
xAI
DeepSeek
Meta
Mistral
Alibaba
Cohere
AI21
Microsoft
Moonshot
Amazon
Perplexity
Zhipu
Groq
Together AI
Fireworks
Cerebras
OpenRouter
OpenAI
Anthropic
Google
xAI
DeepSeek
Meta
Mistral
Alibaba
Cohere
AI21
Microsoft
Moonshot
Amazon
Perplexity
Zhipu
Groq
Together AI
Fireworks
Cerebras
OpenRouter
Simple
Short queries, formatting, yes/no
Gemini 2.5 Flash · GPT-4.1 Nano
$0.10 – $0.50
60%
Standard
Reasoning, code gen, analysis
Claude Sonnet 4.6 · GPT-4.1 Mini
$1 – $3
25%
Complex
Multi-step reasoning, architecture
Claude Opus 4.6 · o4-mini
$5 – $25
15%

Typical 60/25/15 split saves 60–70% vs sending everything to a frontier model.

[ 04 ] — Start

Stop rebuilding the 80%.

Install Archon, wrap your LLM calls, ship with budgets, observability, and governance from day one.

Apache-2.0Rust + Python + TypeScript128 tests