Future Doctor unveils clinical safety‑effectiveness benchmark; MedGPT leads comparative evaluation
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you
OpenAI unveils EVMbench to benchmark AI for smart-contract security
OpenAI released EVMbench, a new evaluation framework that measures AI systems’ ability to detect, exploit in test conditions, and remediate vulnerabilities in EVM-compatible smart contracts. Built with Paradigm and drawing on real-world flaws, the benchmark aims to create a repeatable standard for assessing AI-driven defenses around code that secures large sums of on‑chain value.
Conversational AI Is Reshaping Diagnosis: Patient Empowerment, Clinical Workflows and New Risks
Conversational AI is moving beyond chat-style explanations into semi-autonomous assistants that help patients interpret symptoms, manage records and execute multi-step tasks, while health-specific consumer offerings often sit outside clinical privacy regimes. The models can improve diagnostic exploration and clinician productivity but have produced harmful recommendations in documented cases, raising urgent needs for provenance, validation, auditable escalation paths and new governance for agentic and multimodal health tools.
UK-backed International AI Safety Report 2026 Signals Fast Capability Gains and Growing Risks
A UK‑hosted, expert-led 2026 assessment documents rapid, uneven advances in general‑purpose AI alongside concrete misuse vectors and operational failures, and — reinforced by industry surveys — warns that procurement nationalism and buyer demand for provenance are already shaping markets. The report urges urgent, coordinated policy and technical responses (stronger pre‑release testing, mandatory security baselines, procurement safeguards and interoperable standards) to prevent capability growth from outpacing defenses.

Seattle startup applies clinical expertise to curb dangerous responses from AI chatbots
Mpathic is scaling clinician-driven safety tools that stress-test and reshape conversational models to reduce harmful outputs; the company raised $15M and reports large reductions in unsafe replies as it expands partnerships across healthcare and enterprise customers. Its clinician-in-the-loop approach is positioned to address risks amplified by agentic features, persistent context, and multimodal inputs in modern conversational systems.
U.S. strategist proposes governed control layer to scale continuous AI preventive care
A new industry blueprint argues that safe, reimbursable continuous AI-driven prevention in U.S. healthcare requires a governed execution layer that mediates AI insights, human input, and payment readiness. The proposal, advanced by Capacitate, Inc.'s founder alongside a new book, frames this infrastructure as essential to unlock a multi‑trillion dollar shift toward continuous care by the 2030s.
Scale AI's Voice Showdown reshapes voice-benchmarking for frontier models
Scale AI launched Voice Showdown , a human-preference benchmark that exposes language, voice and conversation-length failures across leading voice models. The results — measured across 60+ languages, 11 models and 52 model-voice pairs — deliver actionable performance metrics that will redirect vendor roadmaps and procurement decisions.

BioticsAI Secures FDA Clearance for AI Fetal-Ultrasound Software
BioticsAI announced FDA clearance for its AI-driven fetal ultrasound software, a regulatory milestone that paves the way for wider clinical deployment across U.S. health systems. The startup plans to scale distribution and extend functionality for fetal medicine while emphasizing equitable performance across diverse patient groups.

TELUS study finds North American publics demand inclusion, safety and regulation as AI use surges
A TELUS-commissioned cross-border survey of over 11,000 people in Canada and the U.S. shows widespread AI adoption and strong public expectations that companies solicit input, test for harms before release, and explain AI in plain terms. The results point to a near-consensus in favour of regulatory frameworks and create a strategic imperative for firms to adopt accountable, human-centred AI practices or face reputational and adoption risks.