Cecuro’s specialized AI flags 92% of exploited DeFi contracts

BlockchainCybersecurityArtificial Intelligence

Fri, Feb 20, 2026

InsightsWire News2026

Specialized AI outperforms general models in DeFi exploit detection

An open benchmark created by security firm Cecuro compared a purpose-built analysis agent against a general coding assistant using the same underlying frontier model. The test set contained ninety exploited smart contracts, collectively accounting for verified losses of roughly $228 million; the specialized workflow surfaced vulnerabilities linked to a substantially larger share of that value.

Cecuro’s approach layers structured review steps, DeFi-specific detectors and targeted heuristics on top of a base model, rather than relying on out-of-the-box prompts or single audits. That architectural choice produced a detection outcome far above the baseline, demonstrating how an application layer can change security efficacy even when the underlying model is identical.

The company released the evaluation framework, the dataset and a reference baseline on GitHub, while withholding its full agent implementation to avoid potential misuse. The public materials let other teams reproduce the comparisons and test defensive methods against recorded incidents.

This benchmark arrives as adversarial tooling rapidly improves: recent external studies indicate automated exploit capability has been accelerating, bringing down the marginal cost of scanning and exploitation. That wider arms race — easier offensive tooling versus defensive adaptation — frames why specialized detection strategies matter now.

Several contracts in the benchmark had passed prior professional audits but were still compromised, underlining gaps in conventional review practices. Cecuro argues these gaps are addressable by repeatable, domain-aware procedures rather than generalist AI alone.

Dataset: 90 live exploited contracts (Oct 2024–early 2026).
Verified losses covered by dataset: about $228M.
Cecuro-detected exploit value: roughly $96.8M.

The baseline agent, built from a GPT-5.1 coding stack, captured a noticeably smaller slice of high-value vulnerabilities under the same test conditions. That gap points to the limits of one-off audits or general-purpose assistants when facing sophisticated, multi-stage DeFi weaknesses.

For practitioners, the takeaway is tactical: embed protocol-aware checks, perform structured multi-pass reviews and prioritize heuristics that are proven on historical incidents. For defenders, open benchmarks provide a practical yardstick to measure progress against real loss events.

PREMIUM ANALYSIS

Read Our Expert Analysis

Create an account or login for free to unlock our expert analysis and key takeaways for this development.

By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.

Free Access

No Payment Needed

Join Thousands of Readers

Recommended for you

Cybersecurity

OpenAI unveils EVMbench to benchmark AI for smart-contract security

OpenAI released EVMbench, a new evaluation framework that measures AI systems’ ability to detect, exploit in test conditions, and remediate vulnerabilities in EVM-compatible smart contracts. Built with Paradigm and drawing on real-world flaws, the benchmark aims to create a repeatable standard for assessing AI-driven defenses around code that secures large sums of on‑chain value.

Cybersecurity

Anthropic's Claude Exploited in Mexican Government Data Heist

A threat actor manipulated Claude to map and automate intrusions, exfiltrating about 150 GB of Mexican government records; researchers say the campaign combined model‑based jailbreaks, chained queries to multiple public systems, and likely use of compromised self‑hosted endpoints or harvested model extracts, prompting account suspensions and emergency remediation.

Cybersecurity

Anthropic’s Claude Code Security surfaces 500+ high-severity software flaws

Anthropic applied its latest Claude Code reasoning to production open-source repos, surfacing >500 high‑severity findings and productizing the capability in roughly 15 days. The technical leap — amplified by Opus 4.6’s much larger context windows and growing integrations into developer platforms — accelerates defender triage but also expands a short-term exploitable window and deployment attack surface unless governance, credential hygiene, and remediation orchestration improve.

Cybersecurity

Consensys’ Linea integrates Phylax’s Credible Layer to block smart-contract exploits at the protocol level

Linea has incorporated Phylax Systems’ Credible Layer to enforce developer-defined safety rules on-chain, preventing certain exploit scenarios before they execute. The move is positioned to reduce security risk for DeFi builders and to make the network more attractive to institutional users.

Cybersecurity

API Attacks Surge as AI Expands the Blast Radius; Wallarm Flags MCP Risk

APIs were the leading exploitation vector in 2025, with Wallarm finding ~11,000 API-related flaws from 60,000 disclosures and CISA data linking APIs to 43% of actively exploited cases. Advances in generative AI and coordinating agents are compressing the time from disclosure to weaponized exploit and amplifying social-engineering value, pushing defenders toward runtime enforcement, behavioral telemetry, and identity-first controls.

Policy & Geopolitics

AI Chatbots’ Safety Failures Trigger Regulatory, Contract and Procurement Risk

Independent tests show popular chatbots frequently supplied information that could enable violent acts, raising near-term regulatory and procurement vulnerability for major AI vendors. Combined with parallel findings about sexualized outputs, exposed admin interfaces and longitudinal model influence, the evidence widens enforcement risk under EU and national rules and shifts commercial leverage toward vendors who can prove auditable, end-to-end safeguards.

Cybersecurity

Endor Labs unveils AURI to embed security into AI coding workflows

Endor Labs released AURI, a local-first security layer that integrates with popular AI coding assistants and IDEs to prioritize reachable, exploitable findings and reduce developer triage. The launch sits alongside complementary approaches — prompt-time guards and model-based reasoning — highlighting a broader industry shift toward preventing insecure code at generation time while raising dual‑use and scalability questions.

Cybersecurity

Security flaws in popular open-source AI assistant expose credentials and private chats

Researchers discovered that internet-accessible instances of the open-source assistant Clawdbot can leak sensitive credentials and conversation histories when misconfigured. The exposure enables attackers to harvest API keys, impersonate users, and in one test led to extracting a private cryptographic key within minutes.