ProofAgent SDK documentation for AI agent evaluation

Official ProofAgent SDK docs: install with pip, wire your agent as a callable, run Harness-LLM-led or log-based evaluations, bring your own LLM, get readiness reports with per-metric scores and transcript-linked findings.

Install

The SDK is published to PyPI. Install with pip install proofagent-harness. Requires Python 3.10+. Set the environment variable for your LLM provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, or any LiteLLM-supported provider).

Wire your agent

Your agent is any Python callable that takes a message string and returns either a string (simple) or an AgentResponse (for deeper scoring — exposes tool calls, retrievals, and memory snapshots to the Harness Jurors). Wrap your existing LangChain, LangGraph, CrewAI, OpenAI Agents SDK, or custom agent in a 5-line adapter.

Run an evaluation

Call Harness.evaluate(your_agent, role=..., goal=..., context=AgentContext(...)). The harness runs the full 5-stage pipeline and returns a Report object with final_score, per_metric scores, certification tier, transcript, findings, and remediation guidance. Save as JSON or Markdown.

CI/CD integration

The SDK includes pytest integration with assertion-style thresholds: assert report.final_score >= 8.5; assert report.per_metric['safety'] >= 9.0. GitHub Actions, GitLab CI, CircleCI, and any pytest-compatible runner works out of the box.