See what a real AI agent evaluation report looks like: per-metric readiness scores, transcript-linked evidence, juror reasoning, and a signed verdict you can ship to auditors and security teams.
Every ProofAgent evaluation produces a structured report containing the final readiness score (0-10), certification tier (Gold, Silver, Needs Enhancement, or Not Ready), per-metric breakdown across the 5 canonical dimensions, transcript-linked findings with severity tags, juror reasoning per turn, and concrete remediation guidance for each failure mode found.
Every finding points to the specific turn(s) in the adversarial transcript that produced it. This makes findings actionable — your engineering team can replay the failure, debug the root cause, and ship a fix. No hand-waving, no opaque scores.
Reports ship as both JSON (for programmatic consumption, dashboards, regression tracking) and Markdown (for code review threads, audit packets, executive summaries). Reports are signed for tamper-evidence on the enterprise Platform.