The most rigorous, research-backed, multi-turn adversarial AI agent testing tool. Apache 2.0 open-source. Bring your own LLM. Runs locally, in CI/CD, or in production. 183 bundled traps across 11 attack families. Trusted by engineering teams shipping production-grade AI agents. Grounded in two published papers — the core whitepaper (arXiv:2605.24134) and the companion paper Human-on-the-Bridge (arXiv:2606.16871).
The Harness runs a 5-stage pipeline against any callable AI agent. A planner picks domain-relevant traps from the 183-trap library, a conductor runs N adversarial turns with realistic attacks, three juror personas independently score the transcript across 6 canonical metrics — task success, instruction following, hallucination resistance, tool use, safety, and manipulation resistance — consensus resolves disagreements through Delphi or debate rounds, and a reporter produces a signed readiness verdict with transcript-linked findings.
Single-shot evaluation misses how production AI agents actually fail. The Harness ships composite attack chains — 5 to 7 turn adversarial sequences that blend authority pressure, urgency framing, sympathy appeals, refusal-as-betrayal pivots, and policy gaslighting. Real adversaries don't use "ignore previous instructions"; they apply sustained pressure across many turns. The Harness models that.
ProofAgent Harness is grounded in two published papers. The core whitepaper, "ProofAgent Harness — Adversarial multi-turn evaluation for production AI agents" (arXiv:2605.24134), documents the full 5-stage pipeline, the 6-metric rubric, the 183-trap library with composite attack chains, asymmetric Harness-LLM evaluation, and the headline finding that production-grade agents on top of frontier LLMs (GPT 5.5, Claude Opus 4.7) fail under sustained adversarial pressure. The companion paper, "Human-on-the-Bridge — Scalable evaluation for AI agents" (arXiv:2606.16871), formalizes the paradigm: curate human expertise once, upstream, into reusable evaluation intelligence, then let small evaluator models stress-test frontier-grade agents at scale. Both are open-methodology and reproducible from the Apache 2.0 open-source code.
Install with pip install proofagent-harness. Wrap your existing AI agent as a callable that takes a string and returns a string (or an AgentResponse with tools_called and retrievals for deeper scoring). Pass it to Harness.evaluate() with your system prompt, tools, and knowledge corpus. Run locally with any LiteLLM-supported model — Anthropic, OpenAI, Gemini, Bedrock, Ollama, vLLM, lm-studio.