ProBackend
robotics
4 hours ago9 min read

How AI Coding Agents Are Teaching Robots to Install GPUs and Cut Zip Ties

Teams of AI coding agents use the ENPIRE framework to autonomously train robots for manipulation tasks like GPU insertion and zip tie cutting, achieving near-100% success with up to eight agents working in parallel.

Percy Bell

Imagine walking into your robotics lab on a Monday morning only to find your robotic arms have spent the weekend successfully teaching themselves how to install GPUs into motherboards and tie zip ties. It sounds like science fiction—the kind of trope where, by Monday morning, you find your robots have also organized your supply closet by color—but it’s not. It’s the new reality promised by the research coming out of NVIDIA’s GEAR lab, in collaboration with Carnegie Mellon University and UC Berkeley.

They’ve built a framework called ENPIRE, and it’s effectively giving autonomous AI coding agents the keys to the kingdom when it comes to training physical hardware. Forget the old days of manually coding every nudge and adjustment for a robot gripper; now, we're talking about handing agents compute resources and a "generous token budget" and simply waiting to see what they produce.

The premise is straightforward: traditional reinforcement learning for robotics is slow, brittle, and frankly, a massive bottleneck. You spend months programming a robot to do one thing, then a slightly different-sized resistor comes along, and you’re back to square one. ENPIRE flips that entirely on its head, turning robotic training into an iterative, agentic process that never really sleeps.

The human-in-the-loop method, for all its nuance, is reaching its limit. When humans try to teach physical manipulation, we hit a ceiling on speed, repeatability, and sheer persistence. Agents, however, can run thousands of micro-experiments in the time it takes a researcher to drink a coffee. The shift here isn't just a technical upgrade; it’s a fundamental change in how we conceive of training. We aren't training the robot anymore; we are training an agent to train the robot, and the agent has literally infinite patience and, when the budget holds, immense compute.

This is not just about making robots faster at picking things up. It’s about creating a general-purpose, self-improving lab environment that could eventually scale to handle the complex, low-volume, high-variety tasks that currently require manual re-tooling in factories around the world. Similar efforts to automate heavy physical engineering are underway with initiatives like Jeff Bezos's Prometheus startup, detailed in Bridging Simulation and Reality: Bezos’s Multi-Billion-Dollar Plan to Speed Up Heavy Design and Engineering. The era of the "always-on" robotic lab is here.

AI Agents Are Now Training Physical Robots While You Sleep, And They’re Doing It Surprisingly Well

The Mechanics of ENPIRE: A New Kind of Harness

At its heart, ENPIRE is an agentic harness. If you know how modern AI development works, you know that raw LLMs are essentially just brain-in-a-jar scenarios. They need tools, they need memory, and they absolutely need feedback loops to interact with the physical world. Without them, you just have a very eloquent chatbot playing in a sandbox.

ENPIRE wraps around these models to do just that. It provides four distinct modules that cover the lifecycle of robotic training: automatic reset and verification, policy refinement, evaluation across multiple physical robots, and failure analysis. When a robot fails to stick the landing with a component, the agent doesn't just panic and stop—it digs into the logs, ingests relevant research papers, tweaks the training algorithm, and tries again. It’s a self-improving loop that runs faster than any human engineer could, theoretically, work.

The harness was tested with three different AI coding agents, including OpenAI’s Codex with GPT-5.5, Anthropic’s Claude Code with Opus 4.7, and Moonshot AI’s Kimi Code with Kimi K2.6. These teams of agents independently developed different algorithmic approaches, tested them in real-world experiments, and then retained whatever changes helped raise the overall success rate over repeated cycles of selfdirected testing.

Crucially, the harness manages the context—something robots traditionally struggle with. A robot arm in a cell doesn't know it just failed a zip tie task unless it's explicitly told, and even then, knowing it failed isn't the same as knowing why it failed. ENPIRE bridges that gap by connecting the robotic telemetry back into the agentic reasoning loop. It's the difference between a robot simply crashing into a table and an agent recognizing it crashed because it calculated the surface friction incorrectly based on a faulty sensor reading. The agent doesn't just see the failure; it understands the why (or at least proposes a logical algorithmic reason) and acts.

This is a step closer to the dream described by Jim Fan, NVIDIA's director of AI, in a LinkedIn post: a lab that "self-improves tirelessly overnight." The dream is for the humans to just take a holiday while the robots and their AI agents handle the grunt work. It sounds almost arrogant, but looking at the results, the confidence feels more than warranted.

The Mechanics of ENPIRE: A New Kind of Harness

Testing the Limits: From GPU Insertion to Zip Ties

The proof, as they say, is in the physical performance. With ENPIRE running the show, these agents developed strategies that pushed success rates for manipulation tasks to a staggering 99 percent.

The tasks weren't trivial, either. They tackled the benchmark "Push-T" task (getting a T-shaped block into a target position), but they also moved into more complex, real-world utility tasks: organizing pins in a box and handling delicate chores like tying zip ties or inserting graphics cards into motherboard sockets—and then unplugging them again to reset for another trial.

What’s truly telling is that in some benchmarks, like pin organization, this autonomous approach actually outperformed the "frontier human-in-the-loop method." That's a massive shift. Humans usually excel at the intuitive nuances of physical manipulation, but when the agents can test thousands of micro-adjustments in parallel, they find efficiencies that we simply wouldn't think to try.

Think about the sheer cognitive space required to "think" like a robot gripper. You aren't just thinking about coordinates; you're thinking about pressure, torque, coefficient of friction, and material deformation. Humans struggle to generalize this knowledge across different tasks, but agents operating under the ENPIRE harness are building a library of failure modes and corrective actions that—once learned—don't disappear. The agent doesn't get frustrated, tired, or bored; it just runs the next iteration, and another, and another, until the success probability reaches the threshold.

This is the promise of physical AI: not just replacing human hands, but replacing the human intuition that has historically been the primary ingredient in robotic programming. Once you can offload the training to an agentic process, the bottleneck to automation shifts from 'how do we program this robot' to 'how many robots can we feed valid tasks.'

The Scaling Paradox: More Agents vs. Coordination Complexity

One of the more interesting findings from this research is how team size affects the speed of innovation. You might assume that just having one highly capable model, like GPT-5.5 or Claude Code, would be enough, but the data suggests it's a bit more nuanced.

In experiments, larger teams of up to eight AI coding agents did indeed reach high success rates faster than smaller four-agent teams or single agents. For instance, the eight-agent team hit 99 percent success on the Push-T task in just two hours, while the single-agent setup dragged on for nearly five hours.

There’s a clear scaling law at play here, but it’s not purely linear. Yes, more agents mean more brains on the problem, but it introduces a level of coordination complexity that—frankly—slows down the actual physical execution of tasks. The human researchers discovered that larger teams often spent a disproportionate amount of time in "coordination mode"—summarizing each other’s ideas and mediating agent-to-agent dynamics instead of effectively utilizing the robotic arms.

This is a surprisingly human-like trait for an AI system. Just as any manager knows, doubling the team size doesn't necessarily double output; it often increases the time spent in meetings about the work rather than doing it. The agents, in their enthusiasm to collaborate, were creating their own bottleneck.

Furthermore, these parallel training sessions didn't always make the most efficient use of the compute resources readily available. When all eight agents were trying to decide on the best strategy for the next robot move, they weren't necessarily optimizing for hardware utilization, but for consensus. The challenge for future iterations of ENPIRE isn't just making the agents smarter; it's making them better team players, optimizing for communication efficiency so that the robots aren't just sitting idle waiting for them to reach a conclusion. Interestingly, this challenge of balancing parallel agent coordination is mirrored in structural model inference, as seen in The Paradox of Parallel AI: How Google’s DiffusionGemma Rewrites the Speed Rules, where parallel generation steps must be carefully managed to achieve optimal efficiency.

Economic Bottlenecks: Token Costs and Idle Robots

It’s all fun and games until you look at the economics and the actual utilization rates of the hardware. The human researchers discovered a significant, stubborn bottleneck: the robots often sat idle while the agents did the heavy lifting of reading logs, debating approaches, and writing the next iteration of code.

There's also the burning issue of cost. Faster, parallel training equals a massive, rapid token burn. With developers like Anthropic currently looking at pricing models that could hike these costs, this isn't just a research problem; it’s a potential business blocker. Is it cheaper to run an AI agent team for two hours or pay a human researcher for five? Today, maybe not. Tomorrow, it's inevitable.

The cost isn't just the compute time—it's the opportunity cost of the robot being idle while the agents argue. If you have a $100,000 robotic arm sitting unused because the AI deciding its next training regimen is still processing a log files, you're losing money every second.

This highlights the critical need for better, more efficient agentic frameworks that can work asynchronously from the physical hardware. The agents need to be able to "think ahead" and pre-calculate the next training steps so the robot isn't waiting. This is likely the next major engineering challenge for the GEAR lab and others working in this space: bridging the gap between agentic reasoning speed and physical hardware motion.

As we move toward a future where "self-running robot labs" are commonplace, we need the economics to make sense. Right now, it's a high-spend experiment. Eventually, it has to be a profitable industrial process.

The Future: Your Own Self-Running Robot Lab

The most exciting takeaway isn't just that NVIDIA achieved this; it’s their intent to open-source ENPIRE. They want anyone—from startups to hobbyists—to be able to host a "self-running robot lab at home."

This is, of course, part of a much wider, aggressive strategy by NVIDIA to dominate the physical AI space. They’ve already announced partnerships with Unitree (to provide reference humanoid robots for labs) and have been busily courting massive manufacturers, notably Hyundai, which owns Boston Dynamics, a titan in four-legged and humanoid robotics.

We are watching the infrastructure layer of physical AI be laid down in real-time. Whether it's the software in ENPIRE, the humanoid platforms from Unitree, or the manufacturing scale provided by potential partners like Hyundai, the message is clear: the divide between autonomous digital intelligence and physical robotic action is closing, and it’s closing fast.

The infrastructure isn't just the silicon in the GPU or the motor in the arm; it's the autonomous training pipeline that connects them. Whoever controls the ability to train robots autonomously controls the deployment speed of physical AI. By open-sourcing the training harness, NVIDIA is effectively setting the standard for how this training should be done, ensuring that when the industry scales, it does so on their architecture.

It's a bold gamble. But if the progress we've seen from ENPIRE is any indication, the future of physical robotics isn't one where thousands of engineers painstakingly program robots one task at a time. It’s one where a few engineers set up the environment, define the goals, and let the agents turn the lights out while they do the rest. The robotic lab of the future is coming, and it’s being built by agents, for agents.

More blogs