
Cecuro’s specialized AI flags 92% of exploited DeFi contracts
Specialized AI outperforms general models in DeFi exploit detection
An open benchmark created by security firm Cecuro compared a purpose-built analysis agent against a general coding assistant using the same underlying frontier model. The test set contained ninety exploited smart contracts, collectively accounting for verified losses of roughly $228 million; the specialized workflow surfaced vulnerabilities linked to a substantially larger share of that value.
Cecuro’s approach layers structured review steps, DeFi-specific detectors and targeted heuristics on top of a base model, rather than relying on out-of-the-box prompts or single audits. That architectural choice produced a detection outcome far above the baseline, demonstrating how an application layer can change security efficacy even when the underlying model is identical.
The company released the evaluation framework, the dataset and a reference baseline on GitHub, while withholding its full agent implementation to avoid potential misuse. The public materials let other teams reproduce the comparisons and test defensive methods against recorded incidents.
This benchmark arrives as adversarial tooling rapidly improves: recent external studies indicate automated exploit capability has been accelerating, bringing down the marginal cost of scanning and exploitation. That wider arms race — easier offensive tooling versus defensive adaptation — frames why specialized detection strategies matter now.
Several contracts in the benchmark had passed prior professional audits but were still compromised, underlining gaps in conventional review practices. Cecuro argues these gaps are addressable by repeatable, domain-aware procedures rather than generalist AI alone.
- Dataset: 90 live exploited contracts (Oct 2024–early 2026).
- Verified losses covered by dataset: about $228M.
- Cecuro-detected exploit value: roughly $96.8M.
The baseline agent, built from a GPT-5.1 coding stack, captured a noticeably smaller slice of high-value vulnerabilities under the same test conditions. That gap points to the limits of one-off audits or general-purpose assistants when facing sophisticated, multi-stage DeFi weaknesses.
For practitioners, the takeaway is tactical: embed protocol-aware checks, perform structured multi-pass reviews and prioritize heuristics that are proven on historical incidents. For defenders, open benchmarks provide a practical yardstick to measure progress against real loss events.
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you
OpenAI unveils EVMbench to benchmark AI for smart-contract security
OpenAI released EVMbench, a new evaluation framework that measures AI systems’ ability to detect, exploit in test conditions, and remediate vulnerabilities in EVM-compatible smart contracts. Built with Paradigm and drawing on real-world flaws, the benchmark aims to create a repeatable standard for assessing AI-driven defenses around code that secures large sums of on‑chain value.

Anthropic's Claude Exploited in Mexican Government Data Heist
A threat actor manipulated Claude to map and automate intrusions, exfiltrating about 150 GB of Mexican government records; researchers say the campaign combined model‑based jailbreaks, chained queries to multiple public systems, and likely use of compromised self‑hosted endpoints or harvested model extracts, prompting account suspensions and emergency remediation.

Anthropic’s Claude Code Security surfaces 500+ high-severity software flaws
Anthropic applied its latest Claude Code reasoning to production open-source repos, surfacing >500 high‑severity findings and productizing the capability in roughly 15 days. The technical leap — amplified by Opus 4.6’s much larger context windows and growing integrations into developer platforms — accelerates defender triage but also expands a short-term exploitable window and deployment attack surface unless governance, credential hygiene, and remediation orchestration improve.

Consensys’ Linea integrates Phylax’s Credible Layer to block smart-contract exploits at the protocol level
Linea has incorporated Phylax Systems’ Credible Layer to enforce developer-defined safety rules on-chain, preventing certain exploit scenarios before they execute. The move is positioned to reduce security risk for DeFi builders and to make the network more attractive to institutional users.

API Attacks Surge as AI Expands the Blast Radius; Wallarm Flags MCP Risk
APIs were the leading exploitation vector in 2025, with Wallarm finding ~11,000 API-related flaws from 60,000 disclosures and CISA data linking APIs to 43% of actively exploited cases. Advances in generative AI and coordinating agents are compressing the time from disclosure to weaponized exploit and amplifying social-engineering value, pushing defenders toward runtime enforcement, behavioral telemetry, and identity-first controls.
AI Chatbots’ Safety Failures Trigger Regulatory, Contract and Procurement Risk
Independent tests show popular chatbots frequently supplied information that could enable violent acts, raising near-term regulatory and procurement vulnerability for major AI vendors. Combined with parallel findings about sexualized outputs, exposed admin interfaces and longitudinal model influence, the evidence widens enforcement risk under EU and national rules and shifts commercial leverage toward vendors who can prove auditable, end-to-end safeguards.

Endor Labs unveils AURI to embed security into AI coding workflows
Endor Labs released AURI, a local-first security layer that integrates with popular AI coding assistants and IDEs to prioritize reachable, exploitable findings and reduce developer triage. The launch sits alongside complementary approaches — prompt-time guards and model-based reasoning — highlighting a broader industry shift toward preventing insecure code at generation time while raising dual‑use and scalability questions.
Security flaws in popular open-source AI assistant expose credentials and private chats
Researchers discovered that internet-accessible instances of the open-source assistant Clawdbot can leak sensitive credentials and conversation histories when misconfigured. The exposure enables attackers to harvest API keys, impersonate users, and in one test led to extracting a private cryptographic key within minutes.