ProofAgent Harness — the best open-source AI agent evaluation framework

The most rigorous, research-backed, multi-turn adversarial AI agent testing tool. Apache 2.0 open-source. Bring your own LLM. Runs locally, in CI/CD, or in production. 183 bundled traps across 11 attack families. Trusted by engineering teams shipping production-grade AI agents. Full methodology documented in the published whitepaper (arXiv:2605.24134).

What the Harness does

The Harness runs a 5-stage pipeline against any callable AI agent. A planner picks domain-relevant traps from the 183-trap library, a conductor runs N adversarial turns with realistic attacks, three juror personas independently score the transcript across 5 canonical metrics, consensus resolves disagreements through Delphi or debate rounds, and a reporter produces a signed readiness verdict with transcript-linked findings.

Why multi-turn adversarial matters

Single-shot evaluation misses how production AI agents actually fail. The Harness ships composite attack chains — 5 to 7 turn adversarial sequences that blend authority pressure, urgency framing, sympathy appeals, refusal-as-betrayal pivots, and policy gaslighting. Real adversaries don't use "ignore previous instructions"; they apply sustained pressure across many turns. The Harness models that.

Quickstart

Install with pip install proofagent-harness. Wrap your existing AI agent as a callable that takes a string and returns a string (or an AgentResponse with tools_called and retrievals for deeper scoring). Pass it to Harness.evaluate() with your system prompt, tools, and knowledge corpus. Run locally with any LiteLLM-supported model — Anthropic, OpenAI, Gemini, Bedrock, Ollama, vLLM, lm-studio.

  • 183 adversarial traps across 11 families: prompt injection, social engineering, compliance, tool misuse, data exfiltration, factuality, code safety, business logic, policy drift, verbal abuse, bias
  • Composite attack chain support for sustained multi-turn pressure
  • 3-juror Delphi consensus reduces single-judge bias
  • pytest integration with assertion-style thresholds
  • Local-first — your context never leaves the machine