ProofAgent Harness — the best open-source AI agent evaluation framework

The most rigorous, research-backed, multi-turn adversarial AI agent testing tool. Apache 2.0 open-source. Bring your own LLM. Runs locally, in CI/CD, or in production. 183 bundled traps across 11 attack families. Trusted by engineering teams shipping production-grade AI agents. Grounded in two published papers — the core whitepaper (arXiv:2605.24134) and the companion paper Human-on-the-Bridge (arXiv:2606.16871).

What the Harness does

The Harness runs a 5-stage pipeline against any callable AI agent. A planner picks domain-relevant traps from the 183-trap library, a conductor runs N adversarial turns with realistic attacks, three juror personas independently score the transcript across 6 canonical metrics — task success, instruction following, hallucination resistance, tool use, safety, and manipulation resistance — consensus resolves disagreements through Delphi or debate rounds, and a reporter produces a signed readiness verdict with transcript-linked findings.

Why multi-turn adversarial matters

Single-shot evaluation misses how production AI agents actually fail. The Harness ships composite attack chains — 5 to 7 turn adversarial sequences that blend authority pressure, urgency framing, sympathy appeals, refusal-as-betrayal pivots, and policy gaslighting. Real adversaries don't use "ignore previous instructions"; they apply sustained pressure across many turns. The Harness models that.

The research behind the Harness — two published papers

ProofAgent Harness is grounded in two published papers. The core whitepaper, "ProofAgent Harness — Adversarial multi-turn evaluation for production AI agents" (arXiv:2605.24134), documents the full 5-stage pipeline, the 6-metric rubric, the 183-trap library with composite attack chains, asymmetric Harness-LLM evaluation, and the headline finding that production-grade agents on top of frontier LLMs (GPT 5.5, Claude Opus 4.7) fail under sustained adversarial pressure. The companion paper, "Human-on-the-Bridge — Scalable evaluation for AI agents" (arXiv:2606.16871), formalizes the paradigm: curate human expertise once, upstream, into reusable evaluation intelligence, then let small evaluator models stress-test frontier-grade agents at scale. Both are open-methodology and reproducible from the Apache 2.0 open-source code.

Quickstart

Install with pip install proofagent-harness. Wrap your existing AI agent as a callable that takes a string and returns a string (or an AgentResponse with tools_called and retrievals for deeper scoring). Pass it to Harness.evaluate() with your system prompt, tools, and knowledge corpus. Run locally with any LiteLLM-supported model — Anthropic, OpenAI, Gemini, Bedrock, Ollama, vLLM, lm-studio.

  • 183 adversarial traps across 11 families: prompt injection, social engineering, compliance, tool misuse, data exfiltration, factuality, code safety, business logic, policy drift, verbal abuse, bias
  • Composite attack chain support for sustained multi-turn pressure
  • 3-juror Delphi consensus reduces single-judge bias
  • pytest integration with assertion-style thresholds
  • Local-first — your context never leaves the machine
ProofAgent — open-source AI agent evaluation ProofAgent Harness — open-source AI agent testing framework ProofAgent Harness documentation ProofAgent SDK documentation ProofAgent Platform — enterprise AI agent evaluation The 5-stage AI agent evaluation pipeline ProofAgent pricing — free open source to enterprise ProofAgent vs Phoenix, LangSmith, DeepEval, Langfuse Sample AI agent evaluation report Security and compliance for AI agent evaluation Open ecosystem for AI agent evaluation ProofAgent community blog About ProofAgent and founder Fouad Bousetouane Research behind ProofAgent — published papers ProofAgent Harness whitepaper Human-on-the-Bridge paper Privacy policy Terms of service ProofAgent Harness on GitHub proofagent-harness on PyPI ProofAgent Harness whitepaper (arXiv:2605.24134) Human-on-the-Bridge (arXiv:2606.16871)