The most rigorous, research-backed, multi-turn adversarial AI agent testing tool. Apache 2.0 open-source. Bring your own LLM. Runs locally, in CI/CD, or in production. 183 bundled traps across 11 attack families. Trusted by engineering teams shipping production-grade AI agents. Full methodology documented in the published whitepaper (arXiv:2605.24134).
The Harness runs a 5-stage pipeline against any callable AI agent. A planner picks domain-relevant traps from the 183-trap library, a conductor runs N adversarial turns with realistic attacks, three juror personas independently score the transcript across 5 canonical metrics, consensus resolves disagreements through Delphi or debate rounds, and a reporter produces a signed readiness verdict with transcript-linked findings.
Single-shot evaluation misses how production AI agents actually fail. The Harness ships composite attack chains — 5 to 7 turn adversarial sequences that blend authority pressure, urgency framing, sympathy appeals, refusal-as-betrayal pivots, and policy gaslighting. Real adversaries don't use "ignore previous instructions"; they apply sustained pressure across many turns. The Harness models that.
Install with pip install proofagent-harness. Wrap your existing AI agent as a callable that takes a string and returns a string (or an AgentResponse with tools_called and retrievals for deeper scoring). Pass it to Harness.evaluate() with your system prompt, tools, and knowledge corpus. Run locally with any LiteLLM-supported model — Anthropic, OpenAI, Gemini, Bedrock, Ollama, vLLM, lm-studio.