Inference management for agents

Your agent fails silently.
happyInference doesn’t let it.

A self-healing supervisor that watches another agent through its own traces, catches fabrications before they ship, repairs the root cause, and proves the fix on a held-out set — measured, never asserted.

GeminiGoogle ADKArize PhoenixPhoenix MCPA2ACloud Run
happyinference.ai/console
happyInference console mid-supervision — Agent X at 44% grounded, five fabricated DOIs struck in the glass box, root cause narrated in the stream happyInference console after the supervised fix — Agent X at 96% reliable, full pipeline green, run complete A real A2A supervision run against academic_coordinator — 27 citations watched, 2 fabrications caught by the Crossref oracle, and a regressing fix honestly auto-reverted
The same agent after one supervised fix — 96% reliable, every pipeline stage green, measured on a held-out set.
18
fabricated DOIs caught in a single 10-topic run — every one a silent failure, verified against Crossref
40%
measured held-out faithfulness of the weak agent before repair — the honest baseline, from run_history.jsonl
268
offline, deterministic tests across 42 files — all passing before every step landed
0
mock numbers in the console — every chart reads real run history, with honest empty states
The supervision loop

Watch → Catch → Diagnose → Repair → Prove

Observability platforms surface the problem and wait for a human to click approve. happyInference closes the loop autonomously — and proves the lift on data it never diagnosed on.

01

Watch

Every span of the supervised agent streams into Arize Phoenix via OpenInference auto-instrumentation.

sentinel/tracing.py
02

Catch

The redactor strikes unverifiable claims before output ships. Every DOI hits the Crossref oracle — fail-closed, so a plausible fake can’t pass.

sentinel/redactor.py
03

Diagnose

Root cause found by querying the agent’s own failing spans through the Phoenix MCP server — at runtime, not in a dashboard after the fact.

sentinel/diagnostician.py
04

Repair

GEPA-style reflective prompt optimization rewrites the failure away — no weight retraining — reinforced by Reflexion memory across runs.

sentinel/repairer.py
05

Prove

Faithfulness re-measured on a held-out set disjoint from the diagnosis batch. Regressing fixes are blocked — or auto-reverted over A2A.

sentinel/measure.py
Proof, not promises

The receipt from a real run

Recorded to data/run_history.jsonl on June 9, 2026 — a 10-topic supervision cycle against the deliberately weak research agent. Nothing below is invented; the judges can run the repo.

run 131b1f86e1f9 · worker · 10 topics 18 fabrications caught
  • 10.1038/s41574-020-00353-5Intermittent fasting & metabolic healthCrossref 404 · fabricated_source
  • 10.1145/3368015.3370139Remote work & developer productivityCrossref 404 · fabricated_source
  • 10.1126/sciadv.adg7224Four-day work weeks & employee outputCrossref 404 · fabricated_source
  • 10.1038/s41591-021-01645-0AI models detecting early-stage cancerCrossref 404 · fabricated_source
  • 10.1038/s41558-021-01114-3Carbon-capture cost per tonCrossref 404 · fabricated_source
18 caught · all silent20 reward rows3 preference pairs40% held-out faithfulness before repair
Diagnosis — from the agent’s own failing spans
“The agent’s prompt instructs it to reconstruct plausible DOIs if the exact one cannot be recalled… This encourages the agent to invent DOIs rather than admit uncertainty.”
The shipped repair
reconstruct a plausible DOI if the exact
one cannot be recalled
If you are not highly confident that a DOI
is real and published, you MUST leave the
source field empty. Never guess, approximate,
or fabricate a DOI.
Gate verdict
decision · approved status · shipped regression gate · clear
Glass box, not black box

Watch it think,
in plain English

  • Live narration. Every step of the loop is narrated as it happens — what it’s doing to the inference, and why, with the real number.
  • Claims struck live. Fabricated citations appear struck-through in the glass box the moment the oracle returns a 404.
  • Human in the loop, optionally. An approval gate pauses every fix for Approve / Reject before it is taught over A2A — toggle it off and the loop runs fully autonomous.
approval gate · SSE receipts
happyInference console on a real run — the approval gate showing the exact policy being approved, and the raw SSE receipt JSON for the observe step
Bring your own agent

Supervises agents it doesn’t own

Two real seams into any production agent — whether it cooperates or not. The demo Worker is just the reproducible failure source; the verifier is pluggable.

A2A · cooperates

Point it at an Agent Card

Connect any standards-compliant A2A agent by URL. happyInference discovers its card, observes, verifies, repairs, teaches — and proves the lift on a disjoint proof set.

# discovery at /.well-known/agent-card.json
GET /api/collaborate?agent_url=http://localhost:8010
JSON-RPC message/send · bounded tasks/get polling · auto-revert advisories
Gateway · drop-in

Or route its base URL through the gateway

For agents whose internals you can’t touch. No SDK, no code change beyond one environment variable — every response verified, the latest adopted advisory injected.

export OPENAI_BASE_URL=\
  https://gateway.happyinference.ai/v1
export X_HAPPYINFERENCE_AGENT=acme-support
non-blocking · per-agent policy via X-Sentinel-Agent
happyinference.ai/agents
Agents under supervision — the fleet view with live reliability per agent: Research Assistant at 100%, an A2A remote at 33% alerting, Agent X at 56%, academic_coordinator at 81% needs-watch
The fleet — every agent registered once over A2A, then supervised from the canvas: live reliability, catches, and drift alerts per agent.
Built on

The stack, used for real

Built for the Google Cloud Rapid Agent Hackathon, Arize track — observability data isn’t reviewed after the fact, it is the input to the repair.

Gemini
Brain for both agents — and the verification-aware reward model in the RL layer.
Google ADK
Worker + supervisor as LlmAgents; the Worker served over A2A with one to_a2a() call.
Arize Phoenix
Every span traced via OpenInference; faithfulness scored with Gemini LLM-as-a-judge evals.
Phoenix MCP
The channel for reading the supervised agent’s failing spans at runtime — the diagnosis input.
A2A protocol
Real Agent Card discovery, JSON-RPC messaging, and revert advisories across the network boundary.
Cloud Run
Hosts the FastAPI + SSE backend and the mission-control console.
Crossref
The ungameable external oracle — a fabricated DOI can’t sweet-talk an HTTP 404.
Prompt-space DPO
Textual-gradient preference optimization; reward rows and pairs exported per run.

Reliability as infrastructure,
not as a dashboard.

Connect an agent and watch a full supervision cycle — catch, diagnose, repair, and the honest before/after.

Open the live console