ProofAgent is grounded in two published papers by Dr. Fouad Bousetouane on adversarial AI agent evaluation. Both are open methodology and reproducible from open source code. Each paper has its own page with the full PDF embedded.
The foundational whitepaper. A full pipeline of planning, adversarial conducting, multi juror scoring, debate consensus, and signed reporting, with a 183 trap library across 11 attack families, composite attack chains, and a six metric rubric. The headline result: production grade agents on frontier LLMs fail under sustained adversarial pressure, so the agent layer needs its own stress testing infrastructure.
The paradigm behind the Harness. Instead of a human reviewer in the loop on every run, curate human expertise once, upstream, into reusable traps, juror personas, and rubrics, then let small evaluator models stress test frontier class agents at scale. Rigorous because it carries real expertise, scalable because it runs automatically, locally, and cheaply.
All evaluation methodology, trap libraries, juror personas, and scoring rubrics ship open source under Apache 2.0 in the ProofAgent Harness. Every result in both papers is reproducible from the published code with any LiteLLM supported model.