Handshake’s Cleanlab deal shows where the real AI power is: data quality, not more GPUs

January 28, 2026
5 min read
Illustration of Handshake and Cleanlab logos connected over AI training data charts

Handshake’s Cleanlab deal shows where the real AI power is: data quality, not more GPUs

The AI boom has been framed as a race for bigger models and more GPUs, but Handshake’s acquisition of Cleanlab is a reminder that the real leverage sits elsewhere: in the messy, expensive, unglamorous world of data quality. By buying a niche research team obsessed with fixing bad labels, Handshake isn’t just padding its R&D headcount. It’s positioning itself as critical infrastructure for the labs building the next generation of foundation models. In this piece we’ll unpack what happened, why it matters strategically, and what it signals for European AI builders who depend on external data pipelines.


The news in brief

According to TechCrunch, AI data‑labeling startup Handshake has acquired Cleanlab, a young company focused on auditing and improving the quality of labeled data. Handshake, originally founded in 2013 as a recruiting platform for students and graduates, launched a human data‑labeling business roughly a year ago to serve leading AI labs. It already supplies data to several top-tier players, including OpenAI.

Cleanlab, founded in 2021 by three MIT‑trained computer scientists, builds software that automatically detects likely errors in datasets produced by human labelers, without requiring a second human review. The deal is primarily an acqui‑hire: nine key Cleanlab employees, including the founders, are joining Handshake’s research organisation. Financial terms weren’t disclosed, though Cleanlab previously raised around $30 million from investors such as Menlo Ventures, TQ Ventures, Bain Capital Ventures and Databricks Ventures.

TechCrunch reports that Cleanlab had attracted buyout interest from multiple AI data‑labeling companies, but ultimately chose Handshake, which already acts as a source of specialised human experts for several of its competitors.


Why this matters

This deal matters because it quietly shifts where power sits in the AI value chain. Handshake was already a major supplier of specialised human labour for data‑labeling projects, particularly in domains like medicine and law. By absorbing Cleanlab’s research team, it’s moving from being “just” a talent marketplace plus ops machine into owning the quality‑assurance brain of the pipeline.

In the short term, the winners are clear:

  • Handshake gets a respected research group that has spent years on algorithms for detecting mislabeled data. That should let it deliver higher‑quality datasets at lower marginal cost, which is exactly what top AI labs care about.
  • Handshake’s existing AI lab customers benefit if label error rates fall. That can translate directly into better model performance without touching architecture or compute.
  • Cleanlab’s investors and founders get an exit in an area where standalone business models are hard, because tooling often becomes bundled with larger platforms.

Losers are more subtle. Other data‑labeling players who reportedly circled Cleanlab now face a Handshake that is both their supplier and, increasingly, their technical superior in data‑quality tooling. Smaller upstarts that compete on price rather than quality will find the bar rising: if top labs can get audited, de‑noised data from a single vendor, there’s less incentive to try unproven shops.

Strategically, this tackles one of the core bottlenecks in modern AI: human‑generated labels are noisy, biased and expensive to verify. Manual double‑labeling doesn’t scale well. Cleanlab‑style auditing automates part of that verification step, turning label quality into an algorithmic, not purely human, process. That’s how Handshake can turn a labour‑intensive services business into something that looks and behaves more like defensible infrastructure.


The bigger picture

Zoom out, and this acquisition fits a larger pattern: the centre of gravity in AI is shifting from model architecture to data pipelines.

Over the last few years we’ve seen major infrastructure players scoop up specialist AI teams: Databricks bought MosaicML to strengthen its model‑training stack; Snowflake acqui‑hired the Neeva AI search team; a range of cloud vendors have quietly picked up RLHF and evaluation specialists. The logic is the same each time: control the harder‑to‑replicate layers of the stack.

Data‑labeling has traditionally been viewed as commoditised outsourcing – the kind of work farmed out to large workforces in lower‑cost regions. But foundation models changed the economics. When a single model can serve millions of downstream users, small improvements in label quality for niche datasets (say, radiology images or legal documents) can produce outsized value. That’s why we now see sophisticated players like Scale AI, Surge and others building heavy tooling and research capabilities on top of human labour.

Handshake’s move is distinct in two ways. First, its starting point is a network of highly specialised experts – doctors, lawyers, scientists – rather than generic crowdworkers. Second, by internalising label auditing research, it can combine who labels the data with an increasingly smart system that decides which labels are trustworthy.

The long‑term trend is toward data‑centric AI: instead of treating data as a static input and models as the main product, companies iterate aggressively on datasets, curation and governance. Cleanlab’s work sits squarely in that philosophy. If Handshake executes well, it could evolve from “where you hire experts to click boxes” into the system that continuously monitors, cleans and documents the data that trains and fine‑tunes some of the world’s most important models.

For the rest of the industry, the message is blunt: if you still see data‑labeling as a low‑margin service, you’re already behind.


The European / regional angle

For European companies building or deploying AI, Handshake’s acquisition underlines a looming dependency question: who controls the quality of the data flowing into your models, and under which jurisdiction?

The EU AI Act, together with GDPR and the Digital Services Act, places heavy emphasis on data governance: traceability of training data, documentation of sources, bias mitigation and human oversight. High‑risk systems – from healthcare to critical infrastructure – will need rigorous evidence that their training and evaluation data is appropriate, representative and documented.

Cleanlab‑style auditing algorithms can help generate that evidence, but Handshake is a US‑based player that primarily serves global AI labs. European enterprises in regulated sectors may be uneasy about shipping sensitive datasets to an external US vendor, especially when they contain health, financial or biometric information. Even with standard contractual clauses and technical safeguards, regulators and works councils in countries like Germany are already sceptical of opaque offshore data flows.

At the same time, there’s a gap – and thus an opportunity – in the European market. There are strong data‑engineering and ML research hubs in Berlin, London, Paris and Zurich, but relatively few EU‑native companies focused specifically on label auditing, provenance tooling and compliance‑ready data pipelines. Most European AI startups are still fixated on building models or vertical applications.

For EU‑based labeling firms and consultancies, Handshake’s move should be a wake‑up call. Competing purely on low‑cost annotation is a losing game when US incumbents are bundling advanced research, quality tooling and access to global expert networks. The differentiator in Europe is likely to be sovereign, compliant data‑quality infrastructure: EU‑hosted, audit‑friendly, and aligned by design with the AI Act.


Looking ahead

Over the next 12–24 months, expect more consolidation in the data‑quality layer of the AI stack. Tooling for label auditing, dataset versioning, red‑teaming and evaluation will increasingly be bundled into larger platforms – cloud providers, MLOps suites, or major labeling vendors like Handshake.

For Handshake specifically, several questions will determine how impactful this deal becomes:

  1. Platform vs. arms dealer. Today, even rival data‑labeling companies rely on Handshake’s network of experts. As Handshake internalises Cleanlab’s tech, will it keep playing neutral infrastructure, or will it tilt toward competing more aggressively with those middlemen?
  2. Depth of integration. It’s one thing to acqui‑hire a research team; it’s another to bake their algorithms deeply into every workflow, from worker training to client‑facing dashboards and regulatory reporting.
  3. Regulatory alignment. As the EU AI Act moves from text to enforcement, customers will demand not just “high‑quality data” but machine‑readable documentation and audit trails. Whoever offers that out‑of‑the‑box could become the default choice for regulated industries.

On the opportunity side, there’s room for nimble European and global startups to specialise in adjacent niches: synthetic data generation with built‑in quality guarantees, domain‑specific evaluation suites, or on‑premise data‑auditing tools for customers who cannot send data offsite.

Watch also for a cultural shift inside AI labs. As label‑auditing tools become more capable, teams may move from sporadic, one‑off dataset cleanups to continuous monitoring of data quality, similar to how DevOps turned release engineering from a project into a process.


The bottom line

Handshake buying Cleanlab is not just another small acqui‑hire; it’s a signal that the next AI moat is built from clean, well‑documented data, not just clever architectures and massive GPU clusters. Control of label quality – and the algorithms that measure it – is becoming strategic infrastructure. For European AI builders, the key question is whether they’re comfortable outsourcing that layer to US‑centric platforms, or whether the continent will finally invest in its own data‑quality stack before regulation forces the issue.

Comments

Leave a Comment

No comments yet. Be the first to comment!

Related Articles

Stay Updated

Get the latest AI and tech news delivered to your inbox.