🪖 Trooper

NEW: Chat UI demo → | Subagent recovery demo → | Subagent recovery article → | 4-agent privacy routing demo →

🪖 Trooper

Trooper 4.0 — Chat UI with escalate. Local first. Cloud when needed. One click to switch. Context never lost.

Trooper started as a fallback proxy. It's now a local-first AI workspace.

Your agent runs. Trooper watches. You chat. You escalate. Context flows. Nothing breaks.

→ Any agent          → point at Trooper, open dashboard, see everything
→ Chat with Llama    → fast, free, private by default
→ Need more power    → one click escalates to Claude with full context
→ Back to local      → Llama knows everything Claude said
→ Claude fails       → continues on Ollama, context preserved
→ Simple prompts     → never hit the cloud

Trooper is a zero-instrumentation agent observability platform with local fallback — and now a local-first chat interface.

What's new in 4.0

Chat UI — open http://localhost:3000/chat:

Local first — all messages route to your local Ollama model by default
One-click escalate — stuck on something? Escalate to Claude with full session context injected automatically
Context never lost — back to local, Llama knows exactly what Claude said
SITREP panel — live intent, confidence, entities, open loops extracted from your conversation
Markdown rendering — code blocks, tables, headers render cleanly
Session memory — full conversation history flows between providers

What you see

In the chat UI — open http://localhost:3000/chat:

Local first. Escalate when stuck. Context never lost.

Dev: explain why my postgres query is slow despite an index on user_id

Llama: Even with an index on user_id, your query might be slow if filtering
       on other columns, or using functions like LIKE which prevent index usage.

[↑ Escalate]

— escalated to Claude —

Claude: Common reasons your Postgres query is slow despite a user_id index:
        1. Index not being used — functions, type casts
        2. Filtering on other columns — unindexed columns cause row scans
        3. Returning large result sets — add LIMIT or filter more
        Run EXPLAIN ANALYZE to see which step is slow.

[↩ Back to local]

— returned to local —

Dev: summarize what Claude just recommended

Llama: Your query might be slow due to functions preventing index usage, or filtering on unindexed columns. Run EXPLAIN ANALYZE to identify which step is slow and whether the index is being used.


![Trooper Memory Proof](assets/memory.png)

**In the dashboard** — open `http://localhost:3000/dashboard` while your agent runs:

- **Intent** — what your agent is trying to do, extracted automatically
- **Open Loops** — what it's stuck on, highlighted in real time
- **Completed Steps** — what it finished, tracked as it happens
- **Session Transcript** — every message, colour coded by role

**In every response header** — no dashboards required:

```bash
# Simple question → Ollama handled it, cloud never contacted
X-Trooper-Provider: ollama
X-Trooper-Decision: ollama (simple turn) | cloud skipped
X-Trooper-Session-Saved: 42 tokens

# Complex question → Claude handled it
X-Trooper-Provider: claude
X-Trooper-Summary: claude (direct) ✓

# Claude quota hit → fell back to Ollama, context preserved
X-Trooper-Provider: ollama
X-Trooper-Decision: ollama (fallback: credit_balance)
X-Trooper-Session-Saved: 42 tokens

What Trooper is

Trooper is a drop-in proxy that sits between your agent (or you) and any LLM provider. It observes every request, extracts intent and signals, and builds a live picture of what's happening — all without touching your code.

When cloud models fail — quota, rate limits, outages — it automatically falls back to your local Ollama instance while preserving full conversation context.

Trooper is no longer passive. It started as a fallback proxy. Now it watches every session, makes that data visible, and gives you a local-first chat interface to work with.

No retries. No crashes. No lost sessions. No SDK. No instrumentation. ⏱ Runs in under 60 seconds.

Who uses Trooper

Local LLM developers — run Ollama for privacy and cost, escalate to Claude when you need more. Context flows automatically. No copy-pasting, no re-explaining.

Agent builders — see exactly what your agent is doing, what it's stuck on, and what it completed. Zero instrumentation — just point your agent at Trooper.

App developers — your users never see quota errors. Trooper falls over to local Ollama transparently while your app keeps running.

Claude Code / Cursor users — coding sessions survive quota hits. No lost context, no starting over.

Privacy-conscious developers — sensitive requests stay local. Cloud only when you choose.

Why not LiteLLM, Bifrost, or Helicone

	LiteLLM / Bifrost	Helicone	Trooper
Chat UI	❌	❌	✅ Local-first with escalate
Context handover	❌	❌	✅ Full context on provider switch
Observability	❌	Request-level only	✅ Intent, open loops, completed steps
Instrumentation needed	SDK required	None	None
Fallback target	Another cloud	Another cloud	Your local machine
Local / private	❌	❌ Cloud only	✅ Data never leaves machine
Setup	`pip install`, YAML	API key, cloud account	One Go binary, env vars
Status	Active	Maintenance mode	Active

Quickstart

⏱ Runs in under 60 seconds.

Option 1 — Docker (no Go required)

git clone https://github.com/shouvik12/trooper
cd trooper
cp .env.example .env
# edit .env — set CLAUDE_API_KEY
docker compose up

# First run: pull the model
docker compose exec ollama ollama pull llama3.1:8b

Option 2 — Run from source (Go 1.22+)

git clone https://github.com/shouvik12/trooper
cd trooper
export CLAUDE_API_KEY=sk-ant-...   # optional — works without it
export OLLAMA_MODEL=llama3.1:8b
go run .

Open the chat UI: http://localhost:3000/chat Open the dashboard: http://localhost:3000/dashboard

Chat UI

open http://localhost:3000/chat

How it works:

Messages default to your local Ollama model — free, private, fast
Not satisfied? Click ↑ Escalate — Claude receives full session context automatically
Claude answers. Click ↩ Back to local — Llama picks up with full context of what Claude said
SITREP panel updates live — intent, confidence, entities, open loops

No copy-pasting. No re-explaining. Context never lost.

Agent usage

Point your existing client at Trooper — nothing else changes:

Python + Anthropic SDK:

import anthropic
client = anthropic.Anthropic(
    api_key="your-key",
    base_url="http://localhost:3000",  # only change
)

Python + OpenAI SDK:

from openai import OpenAI
client = OpenAI(
    api_key="your-key",
    base_url="http://localhost:3000",  # only change
)

curl:

curl http://localhost:3000/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: my-session" \
  -d '{"model": "claude-haiku-4-5", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello!"}]}'

Smart routing

Trooper decides when the cloud is overkill.

The classifier is rule-based and deterministic — no LLM call, no latency, no cost to classify.

"how many days in a week"  →  Ollama directly 🪖  (cloud never contacted)
"explain why goroutines…"  →  Claude ✅           (needs reasoning)

How Trooper handles context

The hard part of fallback isn't switching models — it's keeping context.

Trooper solves that with a 3-layer compaction system:

ANCHOR  (~10%)  — First 2 turns verbatim, never dropped
SITREP  (~20%)  — Rule-based summary of middle turns
TAIL    (~70%)  — Last N turns verbatim
                  Total <= 6144 tokens (configurable)

Provider chain

CLAUDE_API_KEY=sk-ant-...                          # Chain: Claude → Ollama
CLAUDE_API_KEY=sk-ant-...  GEMINI_API_KEY=AIza...  # Chain: Claude → Gemini → Ollama
CLAUDE_API_KEY=sk-ant-...  OPENAI_API_KEY=sk-...   # Chain: Claude → OpenAI → Ollama

Fallback behaviour

Status	Trooper action
`200 OK`	Pass through
`429 Rate Limited`	Retry with 2s backoff, then try next
`402 Payment Required`	Fall back immediately
`400 Credit Balance / Invalid Key`	Fall back immediately
`401 Unauthorized`	Surface error — bad keys are never masked
`529 Overloaded`	Fall back immediately
Network error	Fall back immediately — 30s timeout per provider

Configuration

Variable	Default	Description
`CLAUDE_API_KEY`	—	Anthropic API key
`CLAUDE_MODEL`	`claude-haiku-4-5-20251001`	Default Claude model
`GEMINI_API_KEY`	—	Google Gemini API key
`OPENAI_API_KEY`	—	OpenAI API key
`OLLAMA_MODEL`	`llama3.1:8b`	Local model
`FALLBACK_URL`	`http://localhost:11434/api/chat`	Ollama endpoint
`CONTEXT_WINDOW`	`6144`	Token budget for context compaction
`QUOTA_STATUS_CODES`	`429,402,529,400,404`	HTTP codes that trigger fallback
`TROOPER_PORT`	`3000`	Port Trooper listens on
`TROOPER_BIND`	`127.0.0.1`	Bind address
`AUTO_RECOVERY`	`false`	Enable automatic recovery to primary provider

Recommended local models

Model	Size	Notes
`llama3.1:8b`	4.9GB	Default — strong all-rounder
`qwen2.5:3b`	1.9GB	Fast, lightweight
`qwen2.5:7b`	4.7GB	Better quality, still fast
`mistral:7b`	4.1GB	Good reasoning

Running tests

go test ./... -v
./sanity.sh

Roadmap

V4.0 — Released

✅ Chat UI — localhost:3000/chat
✅ Local-first routing — Ollama by default, Claude on demand
✅ One-click escalate with full context injection
✅ Back to local — Llama knows what Claude said
✅ SITREP panel — live intent, confidence, entities, open loops
✅ Markdown rendering — code blocks, tables, headers
✅ Session memory across provider switches
✅ Provider-aware context store

V3.3 — Released

✅ Live dashboard — localhost:3000/dashboard
✅ Sessions endpoint — localhost:3000/sessions
✅ Zero instrumentation agent observability

V3.2 — Released

✅ Subagent recovery — /recovery/{session_id}
✅ Response normalization

V3.1 — Released

✅ Smart routing — simple turns skip the cloud
✅ Deterministic classifier — zero latency to route

V3.0 — Released

✅ Circuit breaker
✅ X-Trooper headers

Recognition

Featured in Agent Brief by agentcommunity.org — curated alongside Anthropic, Shopify MCP, and LangGraph updates (April 2026)
Featured on @github_unpacked — Instagram reel with 76 saves
Featured on PatentLLM — covered alongside Qwen3.6-27B RTX 3090 local inference story (May 2026)
Featured on dev.to — local AI tooling roundup (May 2026)
Cited by kylebrodeur as inspiration for "robust, transparent HTTP rate-limit fallback triggers"
Listed on UND-RDR — underrated GitHub repo discovery index

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
assets		assets
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
chat.html		chat.html
chat.py		chat.py
classifier.go		classifier.go
dashboard.go		dashboard.go
docker-compose.yml		docker-compose.yml
go.mod		go.mod
main.go		main.go
providers.go		providers.go
sanity.sh		sanity.sh
sitrep_test.sh		sitrep_test.sh
trooper.rb		trooper.rb
trooper_test.go		trooper_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪖 Trooper

What's new in 4.0

What you see

What Trooper is

Who uses Trooper

Why not LiteLLM, Bifrost, or Helicone

Quickstart

Option 1 — Docker (no Go required)

Option 2 — Run from source (Go 1.22+)

Chat UI

Agent usage

Smart routing

How Trooper handles context

Provider chain

Fallback behaviour

Configuration

Recommended local models

Running tests

Roadmap

Recognition

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🪖 Trooper

What's new in 4.0

What you see

What Trooper is

Who uses Trooper

Why not LiteLLM, Bifrost, or Helicone

Quickstart

Option 1 — Docker (no Go required)

Option 2 — Run from source (Go 1.22+)

Chat UI

Agent usage

Smart routing

How Trooper handles context

Provider chain

Fallback behaviour

Configuration

Recommended local models

Running tests

Roadmap

Recognition

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages