ProofAgent Harness: Adversarial Evaluation for Production AI Agents

Name: ProofAgent Platform
Brand: ProofAgent

ProofAgent Harness (arXiv:2605.24134) by Fouad Bousetouane is the open source framework for adversarial AI agent evaluation. It runs a full pipeline of planning, adversarial conducting, multi juror scoring, debate consensus, and signed reporting, and returns a readiness verdict you can ship to security and audit teams.

Single answers pass; real agents fail under pressure

Most evaluation scores a single answer with one scoring model. Production agents fail differently: three turns in, under pressure, through domain specific failure modes. The Harness selects domain relevant traps, sustains adversarial pressure across a full conversation, scores the whole trajectory with a panel of jurors, and produces evidence linked findings rather than one opaque number.

What the paper covers

The full pipeline: domain aware planner, adversarial conductor, three juror personas, Delphi or debate consensus, signed reporter
A 183 trap library across 11 attack families, including composite attack chains spanning five to seven turns
A six metric rubric: task success, instruction following, hallucination resistance, tool use, safety, manipulation resistance
Asymmetric evaluation: a small local model serving as the juror against frontier class agents
Multi juror consensus that measurably reduces the bias of a single scoring model

Why it matters for enterprises

Capability is no longer the bottleneck. The risk that keeps agents out of production is behavior under pressure: a support agent that leaks data when pushed, a finance agent that breaks policy when flattered, a tool call that fires when it should refuse. Run the Harness as a gate in CI and CD, block deployment on a failed critical metric, and ship with a signed readiness report your auditors accept.

Headline finding

Production grade agents built on GPT 5.5 and Claude Opus 4.7 fail the Harness with serious safety and manipulation resistance issues under composite chain pressure. Frontier LLMs alone are not enough. Every result is reproducible from the open source code under Apache 2.0. The companion paradigm paper, Human-on-the-Bridge (arXiv:2606.16871), formalizes the approach.

Plain language explainer on Medium: ProofAgent Harness: The Open Source Harness for Complete AI Agent Evaluation.