v1.0 — 140+ models, 17 providers

TheproductionharnessforAIagents.

Cost control, security, observability, memory, evaluation, and governance — Archon wraps around your LLM calls, not the other way around.

Start building View on GitHub

agent.pypython

1from archon import Agent, tool, Budget
2
3@tool
4def search(query: str) -> str:
5    """Search the web."""
6    return web_search(query)
7
8agent = Agent(
9    name="researcher",
10    tools=[search],
11    model="auto",
12    budget=Budget(max_per_run=0.50),
13)
14
15result = await agent.run("Why did SVB fail?")
16print(f"${result.cost:.4f} / {result.step_count} steps")

[ 01 ] — Capabilities

The 80% nobody wants to build twice.

Every agent framework solves the same 20% — call the LLM, run a tool, return an answer. Archon ships the rest.

Cost Control

Hard per-run, per-day, per-month budgets. Auto-route to the cheapest model that can handle the task. No surprise bills.

60–70% savings on typical workloads

Security

Default-deny policy engine. Subprocess sandbox for tool execution. Seven-category output sanitizer blocks prompt injection.

Zero trust, by default

Observability

Built-in trace store and dashboard. Every run returns cost, step count, and a trace URL. No 'add observability later' step.

SQLite, WAL mode, zero deps

Memory

Four tiers — working, episodic, semantic, procedural. Temporal decay with configurable half-life. Auto-consolidation.

Remember what matters, forget what doesn't

Evaluation

Inline schema checks, async quality scoring, regression detection. Shadow deployments to validate before promoting.

Continuous quality signal

Governance

Event sourcing for every action. RBAC for tool permissions. GDPR-compliant data export and right-to-erasure.

Audit trail that survives erasure

[ 02 ] — The five-gate pipeline

Every LLM call passes five gates.

Request comes in. It leaves with an answer, a cost, and a trace. What happens between is deterministic, observable, and safe.

Gate 0101/05

Policy Check

Is this tool allowed? Are the args within bounds? The default-deny engine evaluates every invocation before it runs.

< 1ms overhead

Gate 0202/05

Model Routing

Classify complexity from 27 lexical signals. Pick the cheapest model for the tier. Downgrade when budget tightens.

Sub-ms, no LLM call

Gate 0303/05

Execute

Call the LLM via LiteLLM (140+ models). Tool calls run in a sandboxed subprocess with a hard timeout.

Isolated. Timed. Traced.

Gate 0404/05

Validate Output

Schema check on structured output. Strip injection patterns across seven categories. Detect loops.

Seven injection categories

Gate 0505/05

Log Trace

Record model, tokens, cost, latency, tool calls, routing decision. Append to immutable audit log.

SQLite WAL, append-only

Request→5 Gates→AgentResult(output, cost, trace_url)

[ 03 ] — Model registry

One registry. Every model.

Pricing for every major model, used live by the router for cost-aware selection. Pin one, filter by budget, or let it pick.

Models in registry

Providers

Avg cost reduction

<0ms

Trace overhead

OpenAI

Anthropic

Google

xAI

DeepSeek

Stop rebuilding the 80%.

Install Archon, wrap your LLM calls, ship with budgets, observability, and governance from day one.

Open dashboard Read the docs

Apache-2.0•Rust + Python + TypeScript•128 tests