If you run a public API for a cutting-edge large language model, you’re constantly fighting scrapers. But there's a massive difference between a script kiddie trying to build a free chat bot and a multi-billion-dollar Chinese tech conglomerate deploying 25,000 accounts to systematically siphon model capabilities. Anthropic just called out Alibaba for exactly that.
Let's look at the numbers. They’re staggering. According to a recently disclosed letter sent on June 10, 2026, to Senators Tim Scott and Elizabeth Warren, and first reported by Ars Technica, Anthropic uncovered what they call the largest campaign to illicitly extract Claude's capabilities ever measured. The operation ran from April 22nd to June 5th, 2026. During this brief six-week window, accounts affiliated with Alibaba and its AI research lab, Alibaba Qwen, generated over 28.8 million exchanges with Claude.
Let's do the math on that automation. That is roughly 7.5 requests per second, sustained every minute of every hour for forty-five days straight. You don't hit those kinds of numbers by accident. You don't get there using standard developer keys either. To pull this off, the operators set up roughly 25,000 separate, fraudulent accounts.
To bypass Anthropic’s defense systems, the operators used sophisticated obfuscation techniques and proxy networks. They rotated residential IP pools to make the requests look like they were coming from disparate, legitimate developers scattered across permitted jurisdictions. They used phone-verification farms to spin up thousands of accounts. In the automation world, we call this a distributed harvesting setup. It is a highly coordinated, capital-intensive infrastructure designed to bypass system protections. Anthropic's letter warns that we’re seeing the rise of a "circumvention economy" specifically built to help Chinese entities bypass API blocks and access restrictions. They aren’t just logging into the web interface; they’re running massive, API-driven scraping pipelines to pull intelligence out of Claude.
The High Stakes of Model Distillation
Why go to all this trouble? The answer lies in the economics of model training: model distillation. It’s the ultimate shortcut in the generative AI race.
Training a frontier model from scratch is a massive gamble. You’ll need hundreds of millions of dollars in compute, a small army of research scientists, and months of run-time on advanced GPU clusters. You’ve got to figure out hyper-parameter tuning, design the architecture, and finance the massive energy bills. Worst of all, you’ll probably end up with a model that hallucinates or fails to follow instructions properly.
Distillation bypasses almost all of this. Instead of training a model on raw web text and figuring out alignment from scratch, you query an existing frontier model—like Claude—and use its high-quality responses as the training data for your own model. The target model learns to mimic the reasoning patterns, coding logic, and formatting of the superior system.
It's incredibly effective. Alibaba's scraping campaign focused specifically on Claude’s most valuable capabilities. They targeted agentic reasoning, software engineering, and long-horizon tasks. They’re the crown jewels of model intelligence. If you can train your model to write code and perform multi-step agent actions by copying Claude's outputs, you save tens of millions of dollars in R&D costs.
Anthropic points out that these distillation campaigns are widespread. They turned hundreds of billions of dollars in American R&D investment into a direct subsidy for geopolitical competitors. Alibaba isn't alone here. Earlier reports flagged similar tactics from other Chinese AI developers like DeepSeek, Moonshot, and MiniMax, which generated over 16 million exchanges with Claude using 24,000 accounts. OpenAI and Google have documented similar extraction campaigns targeting their systems. For an open-source advocate who builds automation systems, I can see the perverse logic: if you can get the best training data for a fraction of the cost, you do it. But when a Chinese national champion systematically targets a US firm's crown jewels, the line between clever optimization and structural IP theft disappears completely.
Geopolitical Strains: Alibaba’s Dangerous Defensive Stance
This API scraping campaign didn't happen in a vacuum. It went down right as the Trump administration issued clear warnings against cloning American frontier models.
Back in April, Trump accused China of "industrial-scale" AI theft and warned that cloning attempts were unacceptable. Apparently, Alibaba’s leadership didn't care. They kept the scraping pipelines running right through April and May, even though the firm is publicly traded on the New York Stock Exchange and depends heavily on US investors and capital markets.
This defiance puts Alibaba in a very tight spot. The company's already fighting a multi-front legal war with the US government. In late June, Alibaba filed a lawsuit seeking to overturn their blacklisting by the Trump administration, which designated the firm as a company linked to the Chinese military. Alibaba’s lawyers claim the designation has "no basis in fact or law." They argue that the company is governed by an independent board with no military ties. They claim their products—like the Qwen language models—are built for enterprise IT, logistics, and retail, not weapons or espionage.
But Anthropic’s evidence makes that independent posture hard to buy. If Alibaba is actively running a massive, covert distillation campaign to replicate US frontier capabilities, their claims of being a pure enterprise-commerce company look hollow. Western regulators are going to look at the 28.8 million exchanges and ask: who actually ordered this intelligence transfer? As details of the Anthropic letter leaked, Alibaba's stock slid three percent on the NYSE. That's a direct financial cost for playing geopolitical games with API access. As long as Chinese tech firms attempt to walk the tightrope of indexing on Western capital while siphoning Western tech, they’re going to face ruinous regulatory blowback.
Collaborative Firewalls: Anthropic's Legislative Blueprint
So, how do you stop a nation-state scale scraping operation? Standard API rate limits and IP blocks clearly aren't enough.
Anthropic's letter to Scott and Warren contains three main legislative proposals. First, they want Congress to update antitrust laws so that competing AI companies can freely share threat intelligence. Right now, if Anthropic, OpenAI, and Google sit down to share IP addresses, account payment details, and prompt fingerprints of domestic or foreign scraping networks, they risk running afoul of antitrust regulations. That's absurd. In the cybersecurity world, sharing Indicators of Compromise (IoCs) is standard procedure. We need a secure, legal way to share these indicators for AI model abuse without fearing antitrust lawsuits.
Second, Anthropic wants tighter export controls on advanced chips. The logic is simple: if you make it harder for Chinese labs to acquire clusters of advanced compute, they won't be able to train frontier-class models, even if they manage to scrape all of Claude's training data. Distillation still requires a significant amount of compute to train on the siphoned outputs. If they don't have the silicon, the scraped logs are just text files collecting dust.
Third, they want direct federal penalties for distillation. Anthropic suggests that foreign companies caught stealing model capabilities should be cut off from accessing US cloud infrastructure, advanced chips, or data centers outside of China.
As someone who works with automation, I think the antitrust sharing proposal is a no-brainer. But let’s be realistic about chip controls. They’re a leaky bucket. The global supply chain for GPUs is incredibly complex, and shell companies in third-party countries buy and lease compute constantly. But the threat of being cut off from US cloud environments and NYSE capital might actually force companies like Alibaba to think twice before setting up their script engines.
Mutually Assured Vulnerability: Zhou Hongyi's Warning
If you think Anthropic is exaggerating the strategic value of these models, look at what Chinese tech executives are saying. During a cybersecurity conference in Beijing, Zhou Hongyi, the founder of 360 Security Technology, didn't mince words. He called Anthropic's recent 'Mythos' release a 'cyber nuclear weapon.'
What makes this terrifying for Chinese authorities? Only US organizations have access to Mythos Preview. Under 'Project Glasswing,' more than forty US organizations were granted early access to Mythos Preview to strengthen their cyber defenses. Meanwhile, Chinese companies are locked out.
Zhou complained that US organizations can scan Chinese networks for vulnerabilities using Mythos, while China isn't even allowed to catch a glimpse of the model. He warned that this creates a dangerous asymmetrical capability gap. For China, the only way forward is to develop their own Mythos-like model. Zhou argued that they must race to copy these capabilities to achieve a state of mutually assured destruction in cyber warfare. If two superpowers can both scan each other with equal speed, a stalemate is maintained.
Right now, Zhou admitted that Chinese labs are well short of Mythos-level capabilities. Because they cannot match the raw compute or the frontier models of the US yet, they are focusing on Agent systems that combine Qwen (Alibaba's model) with custom security datasets. This is where the distillation campaign comes full circle. Alibaba needs Claude's agentic reasoning and software engineering capabilities because those are the exact building blocks required to assemble the security agents Zhou is calling for.
Without aggressive, coordinated action, Anthropic warns that China will match Mythos-level capabilities much sooner than expected. If they do, they will gain advanced cyber capabilities to deploy against the US government and American companies. The margin of safety is shrinking. If the US wants to protect its lead, it has to treat its APIs as critical national security infrastructure, not just commercial endpoints.