ProofAgent is the accountability platform for production AI agents. It turns agent risk into deployment evidence through adversarial multi-juror scoring, production log audits, artifact reviews, signed readiness reports, and human review. The platform is built around the open-source ProofAgent Harness.

How do I test my AI agent with ProofAgent?

Install the open-source harness with 'pip install proofagent-harness', wrap your agent in a function returning AgentResponse, then call Harness().evaluate(my_agent, role, goal, knowledge, context). The harness runs adversarial multi-turn sessions and returns a /10 readiness score with traceable findings and fix recommendations.

What is adversarial multi-juror scoring?

Adversarial multi-juror scoring is ProofAgent's evaluation approach: a planner picks domain traps, a conductor applies sustained pressure across 25+ turns, and three independent juror agents score every behavior change. No single LLM call ever decides the verdict — the jury agents reach consensus or debate to a final score.

Is ProofAgent SOC 2 / HIPAA / GDPR compliant?

ProofAgent is SOC 2 Type II aligned, HIPAA-ready (BAAs available for enterprise customers), and follows GDPR best practices. Enterprise customers can deploy on-premises or in a private cloud with SSO/SAML, RBAC, tamper-evident audit logs, TLS 1.2+ in transit, and AES-256 at rest.

Can I use my own LLM with ProofAgent?

Yes. ProofAgent is BYO Harness LLM — the harness internals can run on any LLM provider (OpenAI, Anthropic, Google, local models). You bring your own model and API key; the harness orchestrates the multi-juror evaluation around it.

What metrics does ProofAgent measure?

11+ production metrics including Task Success, Hallucination Control, Safety, Policy Compliance, Memory Stability, Tone and Empathy, Manipulation Resistance, Tool Picking, Reasoning Quality, Relevance, and Drift Detection. Every metric is anchored to per-turn transcript evidence.

What is the difference between ProofAgent Platform and ProofAgent Harness OSS?

ProofAgent Harness OSS is the open-source multi-turn adversarial testing engine — Tier 1 of the platform, available standalone for developers and CI under Apache 2.0. ProofAgent Platform is the enterprise product that adds the other four tiers (production log audit, artifact review, multi-agent orchestration scoring, expert human review), a hosted dashboard, REST API, governance features, signed readiness reports, and dedicated support.

← All posts

Case Studies

Is Your AI Agent Ready for the EU AI Act? The Six Metrics That Prove It

Name: ProofAgent Platform
Brand: ProofAgent
Availability: InStock

Dr. Fouad Bousetouane · Jul 3, 2026 · 9 min read

Illustration of an AI agent being evaluated against six compliance metrics for the EU AI Act in a digital dashboard.

The EU AI Act is the first comprehensive AI law in the world, and its obligations for high-risk AI systems are landing in 2026. If you build or deploy AI agents that touch people in the European Union, this is the regulation that decides whether your agent is allowed to ship. This guide explains what the EU AI Act asks of AI agents, when the key 2026 deadline falls, whether your agent counts as high-risk, and how the ProofAgent governance platform evaluates an agent across six metrics and turns that evaluation into the compliance evidence the law expects, with a signed readiness report and a control plane across every governed agent. It pairs with our companion piece on the AI agent governance gap.

TL;DR. High-risk AI systems under the EU AI Act must show a risk management system, data governance, technical documentation, logging, human oversight, and safeguards for accuracy, robustness, and cybersecurity. Many teams are planning around an August 2, 2026 date, though the timeline is still being debated. The ProofAgent governance platform evaluates any agent across six metrics with the open source ProofAgent Harness, gates the release on the result, and records a compliance posture mapped to the EU AI Act, the NIST AI RMF, and ISO/IEC 42001, tied to a specific agent version.

What is the EU AI Act, and does it apply to AI agents?

The EU AI Act is a risk based law. It sorts AI systems into tiers, from minimal risk up to prohibited, and puts the heaviest duties on the high-risk tier. It applies based on what a system does and who it affects, not on what you call it, so an AI agent is not exempt simply because it is described as an assistant or a copilot. If your agent makes or materially shapes decisions about people in areas the law lists as sensitive, it can be treated as a high-risk AI system, and the obligations below apply. The official European Commission overview and the implementation timeline are the primary sources.

When is the EU AI Act deadline for high-risk AI systems?

Many enterprises have been planning around August 2, 2026 as the date the high-risk obligations become enforceable, covering provider duties under Articles 8 to 15 and deployer duties under Article 26. The timeline is genuinely moving, because an extension for certain high-risk systems has been proposed and debated that would push some deadlines toward late 2027. Until any such change is final law, the prudent posture, and the one most advisers recommend, is to be ready for 2026 rather than to bet on a delay. Readiness takes months, so the safe move is to start now and treat the earlier date as operative.

Is your AI agent high-risk under the EU AI Act?

The law defines high-risk uses in its Annex III list, which includes areas such as employment and worker management, access to essential private and public services, credit and creditworthiness, education, law enforcement, and critical infrastructure. An AI agent used in any of those contexts, for example an agent that screens job applicants, triages benefits requests, or influences a lending decision, is a strong candidate for the high-risk tier. The first practical step is an inventory. You cannot classify what you have not listed, and recent research shows over half of organizations lack a systematic inventory of the AI systems they already run.

What high-risk obligations apply to AI agents?

For a high-risk AI agent, the provider obligations map cleanly onto things you can actually test and document. This is where evaluation stops being a nice to have and becomes evidence.

EU AI Act requirement	What it means for an AI agent
Risk management system	A documented, ongoing process to identify and reduce the agent's risks across its lifecycle
Data and data governance	Control over the data and knowledge the agent is grounded in
Technical documentation	A record of how the agent was built, evaluated, and released
Logging and record keeping	Traceable events, so a decision can be reconstructed after the fact
Human oversight	Meaningful review, especially for sensitive actions
Accuracy, robustness, cybersecurity	Evidence the agent holds up under pressure, including adversarial pressure such as prompt injection

How to validate an AI agent for the EU AI Act

The strategy is simple to state. You validate a high-risk AI agent the way an adversary would probe it. Instead of checking a single answer, you pressure test the agent across a full conversation, measure how it behaves on the dimensions the law cares about, and keep the results as evidence. Three parts: adversarial evaluation, specific metrics, retained proof.

This is what the open source ProofAgent Harness is built to do. It is a test harness for AI agents that runs adversarial evaluations, works with any model, and runs locally or in CI, with no data leaving your machine unless you choose to upload it. The full method and API are in the documentation. The ProofAgent governance platform then builds on the Harness to gate releases and record the compliance evidence, which we cover in the next section.

The Harness scores every agent from 0 to 10 on six canonical metrics. Each metric answers a concrete question about the agent, and each one maps to a specific EU AI Act duty. That is what makes them evidence and not just numbers.

Metric	What it validates about the agent	EU AI Act duty it evidences
Task Success	The agent actually achieves its goal, not just sounds like it did	Accuracy
Hallucination Resistance	It stays grounded in the data it was given, and does not invent facts or policy	Accuracy, data governance
Safety	It refuses harmful requests and protects data	Fundamental rights, cybersecurity
Instruction Following	It obeys its system prompt and intended purpose across a full conversation	Adherence to intended purpose, risk controls
Manipulation Resistance	It holds policy under social engineering and sustained pressure	Robustness, cybersecurity
Tool Use	Tool calls are real and honest, with no phantom, forbidden, or fabricated actions	Logging integrity, safe action

The scoring is deliberately strict. Under zero tolerance scoring, a single genuine safety, privacy, or policy violation caps the metric at 3 out of 10 and cannot be averaged away, and a three persona jury reaches consensus through debate to reduce the bias of a single model acting as judge. An optional context engineering sub score also grades how well the agent's system prompt is written, including how hardened it is against injection, which is useful evidence that you controlled the agent's setup and not only its outputs.

How the ProofAgent governance platform validates agents for the EU AI Act

Evaluation on its own is not governance. The ProofAgent governance platform takes each evaluation, turns it into a release decision, and records the evidence against a specific agent version. You evaluate, you gate, and the platform keeps the proof.

# Evaluate the agent, gate the release, and record the evidence
proof run my_agent.py --upload \
    --agent acme-hiring-screener \
    --agent-version "$(git rev-parse --short HEAD)" \
    --profile hr_screening \
    --fail-on block
#   exit 0 = pass, 1 = review, 2 = block.
#   The upload records a compliance posture mapped to the EU AI Act,
#   NIST AI RMF, ISO/IEC 42001, and SOC 2.

Each of the six EU high-risk duties has a direct answer in the platform, so validation and compliance become the same act.

EU AI Act requirement	How the ProofAgent governance platform provides it
Risk management system	Continuous evaluation across versions, with regression tracking and drift detection, plus findings shaped as claim, source, and fix. That is a documented, ongoing risk process, not a one time review.
Data and data governance	Hallucination Resistance scores the agent against the knowledge corpus you supply, so grounding in approved data is measured, not assumed.
Technical documentation	Every run produces a versioned report with scores, method, and certification tier, tied to the exact agent build that shipped.
Logging and record keeping	The full transcript and the per metric jury debate log are retained on the platform, so any decision can be reconstructed later.
Human oversight	The review decision and the expert human review tier put a person in the loop for sensitive actions, and a block stops a risky agent from shipping.
Accuracy, robustness, cybersecurity	Adversarial multi-turn testing with 183 traps across 11 families, including injection and manipulation, capped by zero tolerance scoring.

On top of that mapping, the platform adds the things a compliance program needs at scale: a compliance posture mapped across a catalog of 25 frameworks, a signed readiness report you can hand to an auditor, a control plane across every governed agent, and the five evaluation tiers that cover launch readiness, production log audits, artifact review, multi agent risk, and expert human review. You can see a finished verdict on the sample report, and the controls behind it on the security page. The policy itself lives in code: the profile is the policy, the version pins it to a build, and the gate makes it enforceable in CI.

The readiness gap is real

Adoption has raced ahead of readiness. Roughly three quarters of enterprises say they are unprepared for their AI obligations, and most lack the inventory and evidence to even start a risk classification. That gap is the whole reason the governance gap matters, because the same operational discipline that gets more agents into production is what produces defensible compliance evidence.

A practical readiness checklist

Inventory your agents. List every AI agent, its purpose, the data it uses, and the decisions it affects.
Classify each one. Map it against the Annex III high-risk categories to see what applies.
Evaluate on the six metrics. Score each high-risk agent for safety, manipulation resistance, tool use, and the rest with the open source ProofAgent Harness.
Gate the release. Put a compliance gate in CI so a failing agent cannot ship.
Keep the evidence. Retain the versioned report, transcript, and compliance posture for every release on the governance platform.

Frequently asked questions

How does ProofAgent help validate an AI agent for the EU AI Act?
It scores the agent on six metrics, gates the release on the result, and records a compliance posture and evidence trail mapped to the EU AI Act, tied to a specific agent version. The accuracy, robustness, and cybersecurity duties are covered by the adversarial evaluation itself.

Does the EU AI Act apply to companies outside the EU?
Yes. Like the GDPR, it reaches organizations outside the EU when their AI system is used or its output is used inside the EU, so a US company with EU users can be in scope.

Is every AI agent high-risk under the EU AI Act?
No. Only agents used in the sensitive contexts the law lists, such as hiring, credit, essential services, education, or law enforcement, are high-risk. Many agents fall in lower tiers with lighter duties, which is why inventory and classification come first.

How does the EU AI Act relate to the NIST AI RMF and ISO 42001?
They reinforce each other. The NIST AI RMF and ISO/IEC 42001 give you the management process, and the ProofAgent platform maps a single evaluation to all three at once, so you are not duplicating work per framework.

Start now, because readiness takes months

Whatever the final deadline, the work does not change. You need an inventory, a classification, six metric evidence, a gate, and a record. The open source ProofAgent Harness lets an engineer start the evidence step today, its documentation has the full quickstart, and the governance platform scales it into a control plane across every governed agent. See the pricing page for how the two fit together, and the research for the method behind the evaluations.

pip install -U proofagent-harness
proof version        # -> proofagent-harness 0.7.1
proof traps stats    # -> 183 traps, 11 families, 40 domains

References

EU AI Act, official text and overview: artificialintelligenceact.eu · European Commission
EU AI Act implementation timeline: AI Act Service Desk
2026 high-risk deadline analysis: Holland & Knight · Latham & Watkins on the proposed extension
ProofAgent Harness: proofagent.ai · documentation · platform: proofagent.ai/ · the five tiers
NIST AI RMF: nist.gov · ISO/IEC 42001: iso.org
Research paper: arXiv:2605.24134 · Companion post: The governance gap

#eu-ai-act#ai-agent#compliance#ai-governance#risk-management#ai-engineer#compliance-officer#technical-documentation#logging#human-oversight#data-governance#cybersecurity#adversarial-evaluation#ai-evaluation#llm-safety

See all posts →