<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>ProofAgent Community Blog</title>
    <link>https://www.proofagent.ai/community/blog</link>
    <atom:link href="https://www.proofagent.ai/community/blog/rss.xml" rel="self" type="application/rss+xml" />
    <description>Research, case studies, findings, and tutorials on adversarial AI agent evaluation.</description>
    <language>en-us</language>
    <lastBuildDate>Sun, 24 May 2026 15:44:24 GMT</lastBuildDate>
    
    <item>
      <title>When the Refusal Itself Crashes: A GPT 5.5 Privacy and Security Agent Case Study</title>
      <link>https://www.proofagent.ai/community/blog/when-the-refusal-itself-crashes-a-gpt-5-5-privacy-and-security-agent-case-study</link>
      <guid isPermaLink="true">https://www.proofagent.ai/community/blog/when-the-refusal-itself-crashes-a-gpt-5-5-privacy-and-security-agent-case-study</guid>
      <pubDate>Sat, 23 May 2026 14:49:13 GMT</pubDate>
      <description>A privacy and security agent powered by GPT 5.5 resisted 25 turns of adversarial probing without leaking data. Yet, upstream content filters caused refusal delivery failures.</description>
      
      <category>privacy-security</category><category>ai-refusal</category><category>adversarial-evaluation</category><category>ai-engineer</category><category>security-team</category><category>privacy-ops</category><category>gpt-5-5</category><category>gemma-4b</category><category>harness-llm</category><category>content-filtering</category><category>prompt-injection</category><category>gdpr-compliance</category><category>audit-trail</category><category>llm-safety</category><category>ai-evaluation</category>
    </item>
    <item>
      <title>Claude Opus 4.7 Failed Tool Trace Test in Healthcare Triage</title>
      <link>https://www.proofagent.ai/community/blog/claude-opus-4-7-failed-tool-trace-test-healthcare-triage</link>
      <guid isPermaLink="true">https://www.proofagent.ai/community/blog/claude-opus-4-7-failed-tool-trace-test-healthcare-triage</guid>
      <pubDate>Sat, 23 May 2026 03:30:47 GMT</pubDate>
      <description>Claude Opus 4.7 failed a safety-critical tool-call test in healthcare triage. ProofAgent Harness surfaced the gap, showing why adversarial evaluation is essential for deployment.</description>
      
      <category>agent-evaluation</category><category>adversarial-testing</category><category>tool-trace</category><category>healthcare-ai</category><category>#claudeopus47</category><category>#anthropic</category><category>#aiupdate</category><category>#claudeai</category>
    </item>
    <item>
      <title>2026 Trends in Adversarial AI Agent Evaluation: The Field Guide</title>
      <link>https://www.proofagent.ai/community/blog/2026-trends-in-adversarial-ai-agent-evaluation-the-field-guide</link>
      <guid isPermaLink="true">https://www.proofagent.ai/community/blog/2026-trends-in-adversarial-ai-agent-evaluation-the-field-guide</guid>
      <pubDate>Fri, 22 May 2026 23:07:17 GMT</pubDate>
      <description>Why adversarial multi-turn evaluation replaced static benchmarks for production AI agents in 2026. Red teaming, jailbreak patterns, real tool comparison, evidence-based. (158 chars)</description>
      <author>noreply@proofagent.ai (Dr. Fouad Bousetouane)</author>
      
    </item>
  </channel>
</rss>