Why Mirai’s on‑device AI bet could rewrite genAI’s business model

February 19, 2026
5 min read
Developer laptop showing AI code and smartphone symbolising on-device inference

Headline and intro

Cloud GPUs have been the gasoline of the generative AI boom – and they are already becoming too expensive to burn. If that changes, it will not be because models magically got smaller, but because more of the work quietly moved onto your laptop and phone. London‑based Mirai, founded by the people behind Prisma and Reface, is a focused bet on exactly that shift. In this piece we look beyond the funding headline and ask: can a 14‑person team really bend the unit economics of genAI, and what does that mean for developers, cloud providers and European tech?

The news in brief

According to TechCrunch, Mirai is a London startup building an inference engine to run AI models efficiently on consumer hardware, starting with Apple Silicon laptops and desktops. The company was founded in 2025 by Dima Shvets, co‑founder of face‑swapping app Reface, and Alexey Moiseenkov, co‑founder and former CEO of viral filter app Prisma.

Mirai has raised a 10 million dollar seed round led by Uncork Capital, with participation from a roster of well‑known angels from Snowflake, ElevenLabs, Coinbase and others. The 14‑person technical team has built a Rust‑based engine that, by their own benchmarks, can speed up on‑device model generation by up to 37 percent without changing model weights. The focus today is text and voice; vision support is planned.

The startup is preparing an SDK that should let developers integrate Mirai’s runtime with only a few lines of code. It is also working with model makers and chip vendors to tune models for edge use, plans to bring its stack to Android, and is building an orchestration layer that offloads workloads to the cloud when a device cannot handle them locally.

Why this matters

The unglamorous truth behind many headline AI products is that their unit economics are terrible. Inference – not training – is where costs explode as user numbers grow. A chat assistant that feels cheap or free to the user can silently consume tens of cents per active session in cloud compute. That is fine while venture capital is generous; it becomes a problem once growth slows and margins matter.

Mirai is attacking exactly that pain point by trying to push as much inference as possible onto hardware that users already own. If a laptop with Apple Silicon can execute a reasonably capable language or speech model locally, the marginal cost per query drops toward zero. The value shifts from the data centre to the runtime that can squeeze the most throughput out of each watt on the device.

For developers, the promise is twofold. First, better margins and less dependence on hyperscale clouds. Second, simpler integration: Mirai clearly wants to be the Stripe of on‑device AI, abstracting away kernel optimisation, quantisation choices and device quirks behind a few SDK calls. If they deliver, small teams could ship assistants, transcription tools or translation apps that feel realtime without negotiating GPU contracts.

The losers, at least at the margin, are the cloud providers and GPU vendors who currently monetise every token generated. They will still be needed for heavy workloads and training, but a shift of even 20 to 30 percent of inference to the edge would be meaningful. It also chips away at the lock‑in of platform vendors like Apple, Google and Qualcomm, who are all building their own on‑device AI stacks.

The bigger picture

Mirai is not inventing on‑device AI; rather, it is trying to professionalise and generalise something the big platforms have already validated. Apple uses its Neural Engine for features like photo enhancement and on‑device dictation. Google has long run prediction and translation models directly on phones. Keyboard apps have shipped compact language models on Android for years.

What has changed is the scale and expectations around generative models. Users now expect chatbots, copilots and voice agents that feel almost cloud‑grade in quality. At the same time, the market has discovered how fragile the economics of pure cloud inference are. That makes a neutral edge‑inference layer suddenly attractive, especially for developers who do not want to be fully captive to any one platform vendor.

There is also a historical echo here. A decade ago, an earlier wave of edge ML startups tried to optimise models for mobile chips and IoT devices. Many were too early; the models were small, the user value unclear, and cloud was cheap. As one of Mirai’s investors told TechCrunch, those companies often ended up acquired quietly for talent or niche technology. Today the context is different: models are orders of magnitude larger, user demand is clear, and GPU capacity is the bottleneck.

Compared to today’s incumbents, Mirai’s differentiation will not be that it can run models on devices – others can too – but whether it can become the default developer‑facing interface. Think of what Stripe did for payments or Twilio for messaging. To pull that off, Mirai has to be ruthlessly pragmatic: support whatever models and chips developers care about, ship excellent tooling and benchmarks, and accept that orchestration between edge and cloud is part of the product, not an afterthought.

The European and regional angle

For Europe, on‑device AI is not just a cost optimisation trick; it is a regulatory and sovereignty lever. GDPR already encourages data minimisation and local processing. The upcoming EU AI Act, with its risk‑based approach, will put additional scrutiny on how sensitive data is handled and where it flows. Running more inference on user devices aligns neatly with both regimes: less personal data traverses the network or gets stored on servers.

That is particularly relevant for sectors where Europe is strong: industrial automation, automotive, healthcare, fintech and public services. Imagine a hospital transcription system that processes speech locally on a clinician’s tablet, or a banking app that performs risk checks on device before sending only aggregated signals to the backend. Those designs are easier to defend before regulators if the underlying runtime can prove deterministic performance and does not secretly leak data.

London as Mirai’s base is notable. Post‑Brexit, the UK is charting its own AI regulatory path, but remains deeply connected to the European market. A neutral, cross‑platform on‑device stack built in Europe could become a preferred choice for companies wary of deep integration with US or Chinese cloud vendors. At the same time, Europe’s strong chip design and embedded systems community – from automotive suppliers in Germany to robotics hubs in Central Europe – is hungry for better tooling at the edge.

For European startups, especially in smaller markets, Mirai’s pitch hits a nerve: if you can ship a competitive AI feature without burning capital on GPU bills, you have a chance to build sustainable businesses rather than pure growth stories.

Looking ahead

Several questions will determine whether Mirai becomes a foundational piece of the AI stack or just a well‑executed niche tool.

First, platform politics. Apple, Google and major chipmakers are not passive. They offer their own SDKs, kernels and preferred runtimes. Mirai will need to prove that a cross‑platform abstraction delivers enough extra performance and convenience that developers are willing to depend on it rather than only on what ships with the OS.

Second, competition from open source. The on‑device space already has strong open projects and libraries focused on quantisation and efficient inference. Mirai is betting that many teams will prefer a supported, opinionated stack with good documentation, analytics and orchestration, even if they could stitch things together themselves. Pricing and licensing will be crucial; developers burned by high cloud costs will be sensitive to new middlemen.

Third, the hybrid story. Mirai’s own roadmap acknowledges that not everything can or should run on device. The orchestration layer that decides what stays local and what goes to the cloud will likely be where much of the product intelligence and revenue sit. That system will have to juggle latency, privacy constraints, device capability, user consent and cost optimisation – a hard technical and product problem.

In the next 12 to 24 months, watch for concrete benchmarks, early lighthouse customers and partnerships with chip vendors or model providers. Also watch whether Mirai can move beyond Apple Silicon into the fragmented Android and Windows hardware landscape. If they manage that, the addressable market multiplies.

The bottom line

Mirai is a sharp bet on a future where genAI is not a thin client to the cloud, but a distributed capability running wherever it makes the most economic and regulatory sense. The team’s consumer‑app DNA and early focus on developer experience are promising, but the on‑device runtime space will be brutally competitive and tightly coupled to platform politics. For builders and policymakers in Europe, the real question is not whether edge inference will grow – it will – but who will own that critical layer between models and the devices in our pockets.

Comments

Leave a Comment

No comments yet. Be the first to comment!

Related Articles

Stay Updated

Get the latest AI and tech news delivered to your inbox.