Open ecosystem for AI agent evaluation

ProofAgent Community is the open ecosystem for adversarial, multi-turn, domain-aware AI agent evaluation — built around the open-source ProofAgent Harness (Apache 2.0).

What's in the community

The ecosystem brings together adversarial traps, juror personas, scoring rubrics, agent skills, and domain benchmarks contributed by developers, researchers, and enterprise teams who deploy AI agents in production. Everything is open and inspectable on GitHub.

  • 183 adversarial attack traps across 11 families: social engineering, prompt injection, compliance (GDPR, HIPAA, PCI, SOX), tool misuse, factuality, data exfiltration, and more
  • Composite attack chains — 5 to 7 turn adversarial sequences blending authority pressure, urgency, sympathy, and refusal-as-betrayal
  • Juror personas — rigorous, lenient, contrarian — that score agent transcripts independently before consensus
  • Scoring rubrics across 5 canonical metrics: task success, hallucination resistance, safety, instruction following, manipulation resistance
  • Agent skills and behavioral benchmarks shared across the community for reproducible evaluation

Contribute

The community accepts pull requests for new traps, agent specs, juror personas, and benchmarks. Each contribution is validated against the canonical trap manifest schema and tested against the conductor pipeline. Browse the GitHub repository to author your own trap pack or distribute one as a pip-installable package (proofagent_traps_<name>).

Built around ProofAgent Harness

The harness ships with a planner, conductor, jury, consensus, and reporter pipeline. Bring your own LLM via LiteLLM (Anthropic, OpenAI, Gemini, Bedrock, Ollama, vLLM, lm-studio). Run fully local, in CI/CD, or as part of a production deployment workflow.

ProofAgent — open-source AI agent evaluation ProofAgent Harness — open-source AI agent testing framework ProofAgent Harness documentation ProofAgent SDK documentation ProofAgent Platform — enterprise AI agent evaluation The 5-stage AI agent evaluation pipeline ProofAgent pricing — free open source to enterprise ProofAgent vs Phoenix, LangSmith, DeepEval, Langfuse Sample AI agent evaluation report Security and compliance for AI agent evaluation Open ecosystem for AI agent evaluation ProofAgent community blog About ProofAgent and founder Fouad Bousetouane Research behind ProofAgent — published papers ProofAgent Harness whitepaper Human-on-the-Bridge paper Privacy policy Terms of service ProofAgent Harness on GitHub proofagent-harness on PyPI ProofAgent Harness whitepaper (arXiv:2605.24134) Human-on-the-Bridge (arXiv:2606.16871)