Mirai builds a Rust inference engine to accelerate on-device AI
Mirai: compact runtime for on-device model inference
A small London team launched a runtime focused on boosting ML performance inside phones and laptops, backed by $10M in seed funding. The initial implementation targets Apple Silicon, using a Rust codebase that the founders say can lift generation throughput by roughly 37%.
Integration is designed to be lightweight: Mirai plans an SDK that lets developers embed the runtime with only a handful of code lines, turning device-resident models into usable product features quickly. The team emphasizes tuning the runtime and execution path while preserving model weights and the original output quality.
Product scope is staged. Today the stack focuses on text and voice modalities; vision support is on the roadmap. To help ecosystem validation, Mirai intends to publish on-device benchmarks so model creators can compare edge performance against cloud baselines.
- Planned features: runtime SDK, benchmark suite, and hybrid orchestration layer to fall back to cloud when needed.
- Platform expansion: talks with chipmakers and a future Android port are in progress.
- Targeted apps: low-latency assistants, transcribers, translators, and local chat agents.
Investors framed this as a response to rising cloud inference spend: running more compute at the edge lowers per-request economics for high-volume consumer services. Backing came from an investor syndicate led by Uncork Capital alongside several individual technical investors.
The company also plans an orchestration layer that can route requests which exceed device capability up to remote servers, acknowledging that some ML tasks will remain cloud-native for the foreseeable future. That mixed-mode approach is central to Mirai’s go-to-market story: speed and cost reduction where feasible, cloud fallback where necessary.
On the developer side, the promise is straightforward: fewer integration steps, lower inference latency, and reduced dependence on ongoing cloud calls. For model vendors, Mirai is opening a path to validate edge suitability through its benchmarks and tuned runtimes.
Risks remain. Hardware limits constrain large multimodal models, and success depends on partnerships with model makers and silicon vendors to tune workloads sensibly. Still, this stack could materially change cost and latency dynamics for common consumer AI features if adoption grows.
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you
Gimlet Labs Raises $80M to Orchestrate Multi‑Silicon Inference
Gimlet Labs closed an $80M Series A led by Menlo Ventures to commercialize a multi‑silicon inference cloud that shards agentic workloads across heterogeneous hardware. The raise and product launch sit inside a broader wave of infrastructure bets — from edge runtimes to stateful AI platforms — that collectively signal software orchestration is becoming the primary lever for lowering inference cost and shaping procurement.

Mistral AI acquires Koyeb to accelerate AI cloud, on‑prem inference and GPU optimization
Mistral AI has bought Paris-based Koyeb to fold serverless deployment and isolated runtime tech into its cloud stack, enabling model inference on customer hardware and tighter GPU management. The deal complements Mistral’s broader infrastructure push — including a €1.2 billion Sweden data‑center program with EcoDataCenter and new compact speech‑to‑text models optimized for local hardware — reinforcing a hybrid, Europe‑anchored AI strategy.
Positron secures $230M to accelerate AI inference memory chips and challenge Nvidia
Positron raised $230 million in a Series B led in part by Qatar’s sovereign wealth fund to scale production of memory-focused chips optimized for AI inference. The funding gives the startup strategic runway amid wider industry investment in memory and packaging innovations, but it must prove efficiency claims, ramp manufacturing, and integrate with software stacks to displace entrenched GPU suppliers.

Rapidata: on-demand human judgement to accelerate AI training
A startup named Rapidata raised $8.5M to convert mobile app attention into instant human labeling, claiming to cut model feedback cycles from weeks to minutes. Its platform routes short, opt-in microtasks through popular apps and can feed live human responses directly into training pipelines.

Multiverse Computing bets on compressed models for on-device AI
Multiverse is shipping a mobile-first app and a self-serve API that expose its compressed models, pitching privacy, resilience, and lower inference cost. The move accelerates edge deployment while forcing enterprises to re-evaluate cloud compute commitments and procurement timelines.

Microsoft debuts Maia 200 AI accelerator and begins phased in‑house rollout
Microsoft introduced the Maia 200, a second‑generation, inference‑focused AI accelerator built on TSMC’s 3nm node and optimized for energy efficiency and price‑performance. The company will put the chips to work inside its own datacenters first, open an SDK preview for researchers and developers, and is positioning the silicon amid strained global foundry capacity and accelerating demand for bespoke cloud hardware.

Qualcomm Ventures invests $8M in SpotDraft to accelerate on‑device contract AI as valuation rises toward $380M
SpotDraft secured an $8 million Series B extension led by Qualcomm Ventures, pushing its valuation to about $380 million and bringing total funding to $92 million. The capital and Qualcomm partnership aim to scale SpotDraft’s on‑device contract-review capabilities for sensitive, regulated legal workflows that require data to remain local.
Meta accelerates custom silicon push with four MTIA accelerators
Meta detailed a multi‑generation MTIA accelerator program—announcing four new chips (MTIA 300 in production; MTIA 450 with ~2x HBM) and partnerships with Broadcom and TSMC—while simultaneously locking large third‑party procurements that create a staged, hybrid deployment path. The combination compresses hardware iteration cadence, hedges foundry and packaging risks, and reshapes vendor leverage across hyperscaler AI infrastructure.