1. Headline & intro
The GPU party is getting crowded, and Multiverse Computing is quietly heading for the exit. While most AI startups still fight for cloud credits and H100 allocations, the Spanish company is pushing in the opposite direction: shrink the models, run them locally, and skip the cloud altogether when you can.
In this piece, we’ll look at what Multiverse is actually launching, why compressed models matter more than most people think, and how this fits into a much bigger power shift in AI infrastructure. We’ll also unpack what this means for European companies under GDPR and the upcoming EU AI Act – and whether Multiverse can realistically become a European reference point for efficient, edge-first AI.
2. The news in brief
According to TechCrunch, Multiverse Computing has launched two related products built around its quantum-inspired compression technology, CompactifAI.
First, there is a consumer-style app, also called CompactifAI, that works like a chat assistant similar to ChatGPT or Mistral’s Le Chat. Under the hood, it can run a very small model named Gilda directly on users’ devices without a network connection, as long as the phone has enough RAM and storage. If not, a routing system called Ash Nazg automatically falls back to cloud-based models via API.
Second, the company is rolling out a self-serve API portal giving developers and enterprises direct access to its catalog of compressed models. These include distilled versions of models from OpenAI, Meta, DeepSeek and Mistral, as well as its latest HyperNova 60B 2602 model based on OpenAI’s publicly documented gpt-oss-120b. Multiverse already counts over 100 customers and is reportedly seeking a new €500 million round at a valuation above €1.5 billion.
3. Why this matters
Compressed models are often treated as a technical curiosity, but here they are a business and geopolitical play.
On the enterprise side, Multiverse is targeting three pain points simultaneously: cost, risk and control. GPU prices and cloud margins make large language model (LLM) usage painfully expensive at scale, especially for continuous agentic workloads such as autonomous coding or process automation. If a compressed 60B model can deliver comparable quality at lower latency and cost, the unit economics of many AI projects change overnight.
Risk is the second driver. As TechCrunch notes, Lux Capital recently warned portfolio companies to get AI compute commitments in writing because counterparty risk in the AI supply chain is rising. One way to de-risk is legal paperwork; another is to reduce dependency on shared infrastructure. Running models on-prem or on-device doesn’t remove risk entirely, but it shifts it from scarce, politicised GPU clusters to hardware companies can directly control.
Control is the third piece. Local or edge models keep data on the device or inside the corporate perimeter, which is increasingly important for regulated industries and internal IP. This matters far more than raw benchmark scores. Many enterprises don’t need GPT-4-level creativity; they need an assistant that never leaks sensitive code or customer data outside their own environment.
The short-term losers are cloud-first AI infra startups that assumed the only path was ever-larger models deployed from hyperscale data centres. The likely winners are players that can package “good-enough” intelligence into efficient, controllable form factors – Multiverse among them, but also chipset vendors and device manufacturers who can monetise AI without renting everything from US cloud providers.
4. The bigger picture
Multiverse’s move sits at the intersection of three major trends.
First, the small-model renaissance. Mistral’s recent update of its Small family and the launch of Forge, which lets enterprises tune smaller models with explicit trade-offs, show that “smaller but specialised” is no longer a poor cousin to giant LLMs. For many workloads, a finely-tuned 8–60B model is the sweet spot between capability, cost and controllability.
Second, the hybrid edge-cloud pattern. Apple Intelligence deliberately split workloads between an on-device model and a cloud “Private Cloud Compute” tier. Multiverse is following a similar pattern, but from the opposite direction: start local, then route to the cloud when hardware constraints demand it. This is likely to become the standard UX for AI – users won’t care where inference happens, only that it is fast, cheap and respects their privacy.
Third, the structural GPU crunch. Even if NVIDIA doubles capacity every year, latent demand for large models, multimodal workflows and agents is outpacing supply. Efficiency is no longer a nice-to-have; it is a survival strategy. Model compression, quantisation and distillation turn from niche optimisations into core technologies.
Historically, this mirrors the shift from mainframes to PCs and then to smartphones: compute pushes to the edge as hardware improves and software becomes more efficient. Large centralised systems don’t disappear – cloud LLMs won’t either – but value shifts to whoever can package capability closest to the user or process.
From a competitive standpoint, Multiverse is not alone. US players like OctoML and Modular focus on optimisation and deployment, while European startups such as Mistral, Aleph Alpha and LightOn explore various efficiency and sovereignty angles. But Multiverse’s bet is slightly different: it doesn’t just provide tooling; it wants to be the provider of compressed, production-ready models accessible via API or fully local deployment.
5. The European / regional angle
For Europe, Multiverse’s strategy touches on three regulatory hot buttons: GDPR, data sovereignty and the EU AI Act.
GDPR rewards data minimisation: don’t send personal data to third parties if you don’t have to. On-device models like Gilda are a straightforward way to comply. If sensitive content never leaves the device or the company’s infrastructure, consent management, cross-border data transfers and impact assessments become much simpler. The privacy advantage of true edge inference is real – and legally meaningful.
The AI Act adds another layer. General-purpose models and high-risk use cases will face transparency, documentation and monitoring obligations. Running a smaller, auditable model under your own control is easier to document and govern than relying entirely on opaque black-box APIs hosted outside the EU. A European vendor that can provide compressed versions of leading models, with clear documentation and on-prem deployment, fits nicely into this compliance landscape.
There is also an industrial-policy angle. Europe has long worried about being dependent on US hyperscalers for cloud and AI infrastructure. A Spanish company compressing models from OpenAI or Meta into efficient, customer-controlled deployments is not full sovereignty, but it is a step toward a more diversified stack. Customers like Bosch and Iberdrola show that large European corporates are willing to experiment with such alternatives.
For smaller ecosystems – from Slovenia or Croatia to the Baltics – compressed models that run on modest hardware could be the difference between participating in the AI wave or sitting it out. Not every organisation in Europe can afford massive cloud bills, but many can afford a few well-configured servers or AI-capable laptops.
6. Looking ahead
Several questions will determine whether Multiverse can turn this momentum into durable advantage.
First, performance transparency. Claims that HyperNova 60B 2602 outperforms its gpt-oss-120b ancestor on speed and cost are attractive, but enterprises will want independent benchmarks across real workloads: RAG pipelines, coding agents, and domain-specific assistants. Expect more open evals, not just marketing charts.
Second, IP and model lineage. Compressing models from OpenAI, Meta or others is powerful, but it also raises licensing and liability questions – especially once the AI Act’s rules on general-purpose models bite. Customers will ask: who is responsible if a compressed model misbehaves, or if the original licence changes? Multiverse will need watertight contracts and clear provenance tracking.
Third, distribution. An API portal is necessary, but not sufficient, in a world where every cloud provider and model lab is launching yet another endpoint. The more strategic play is likely OEM and platform partnerships: bundling compressed models into enterprise software, industrial equipment, telecom networks, even satellites and drones where connectivity is intermittent.
On the funding side, if the rumoured €500 million round materialises at a >€1.5 billion valuation, Multiverse will be under pressure to show it can scale beyond bespoke projects into a repeatable product business. That means standardised offerings, strong documentation, and developer mindshare – areas where US competitors traditionally excel.
Over the next 12–24 months, watch for three signals: design wins in high-stakes verticals (finance, energy, defence), integrations with European cloud or telecom providers, and whether big model labs start offering their own “official” compressed variants that compete directly with Multiverse.
7. The bottom line
Multiverse Computing is betting that the future of AI is not only bigger models in bigger data centres, but also smaller, sharper models running as close as possible to where data is created. That bet aligns with economic reality, regulatory pressure and hardware trends – especially in Europe. The open question is whether one company can own the “compressed model” niche before it becomes just a standard feature of every major AI platform. For developers and decision-makers, the real decision is looming: how much intelligence do you truly need, and where do you want it to live?



