By Ian Scheffler
Penn Engineers have developed a new algorithm that allows robots to react to complex physical contact in real time, making it possible for autonomous robots to succeed at previously impossible tasks, like controlling the motion of a sliding object.
The algorithm, known as consensus complementarity control (C3), may prove to be an essential building block of future robots, translating directions from the output of artificial intelligence tools like large language models, or LLMs, into appropriate action.
“Your large language model might say, ‘Go chop an onion,’” says Michael Posa, Assistant Professor in Mechanical Engineering and Applied Mechanics (MEAM) and a core faculty member of the General Robotics, Automation, Sensing and Perception (GRASP) Lab. “How do you move your arm to hold the onion in place, to hold the knife, to slice through it in the right way, to reorient it when necessary?”
One of the greatest challenges in robotics is control, a catch-all term for the intelligent use of the robot’s actuators, the parts of a robot that move or control its limbs, like motors or hydraulic systems. Control of the physical contact that a robot makes with its surroundings is both difficult and essential. “That kind of lower- and mid-level reasoning is really fundamental in getting anything to work in the physical world,” says Posa.
Since the 1980s, experts in artificial intelligence have recognized that, paradoxically, the first skills humans learn — how to manipulate objects and move from one place to another, even in the face of obstacles — are the hardest to teach robots, and vice versa. “Robots work really well until they have to start touching things,” says Posa. “Artificial intelligence machines right now can solve International Mathematical Olympiad-level math problems and beat experts at chess. But they have the physical capabilities of a two- or three-year-old at best.”
In essence, this means that every interaction robots have that involves touching something — picking up an object, moving it somewhere else — must be carefully choreographed. “The key challenge is the contact sequence,” says William Yang, a recent doctoral graduate of Posa’s Dynamic Autonomy and Intelligent Robotics (DAIR) Lab. “Where do you put your hand on the environment? Where do you put your foot on the environment?”
Humans, of course, rarely have to think twice about how they interact with objects. In part, the challenge for robots is that something as simple as picking up a cup actually involves many different choices — from the correct angle of approach to the appropriate amount of force. “Not every one of these choices is so terribly different from the ones around it,” Posa points out. But, until now, no algorithm has allowed robots to assess all those choices and make an appropriate decision in real time.
To solve the problem, the researchers essentially devised a way to help robots “hallucinate” the different possibilities that might arise when making contact with an object. “By imagining the benefits of touching things, you get gradients in your algorithm that correspond to that interaction,” says Posa. “And then you can apply some style of gradient-based algorithm and in the process of solving that problem, the physics gradually becomes more and more accurate over time to where you’re not just imagining, ‘What if I touch it?’ but you’re actually planning to go out and touch it.”
In the past year, Posa and the DAIR Lab have published a suite of award-winning papers on the topic, most recently one for which Yang served as the lead author, which won the Outstanding Student Paper Award at the 2024 Robotics: Science and Systems conference in the Netherlands. That paper demonstrates how C3 can empower robots to control sliding objects in real time. “Sliding is notoriously hard to control in robotics,” says Yang. “Mathematically, it’s hard, but you also have to rely on object feedback.”
But, using C3, Yang demonstrated how a robotic arm can safely manipulate a tray, similar to one waiters might use at a restaurant. In videotaped experiments, Yang had the robotic arm pick the tray up and put it down, with and without a coffee cup, and rotate the tray against a wall. “Previous work thought, ‘We just want to avoid sliding,’” Yang says, “but the algorithm includes sliding as a possibility for the robots to consider.”
In the future, Posa and his group hope to make the algorithm even more robust to different situations, such as when the objects a robot handles weigh slightly more or less than anticipated, and to extend the project to more open-ended scenarios that C3 currently cannot handle.
“This is a building block that can go from a pretty simple specification — make this part go over there — and distill that down to the motor torque that the robot is going to need to achieve that,” says Posa. “Going from a very, very complicated, messy world down to the key sets of objects or features or dynamical properties that matter for any given task, that’s the open question we’re interested in.”
These studies were conducted at the University of Pennsylvania School of Engineering and Applied Science and supported by the U.S. National Science Foundation (NSF CAREER FRR-2238480, NSF EFRI-1935294, NSF CMMI-1830218, NSF GRFP DGE-1845298), the Toyota Research Institute and the AI Institute.
Additional co-authors of the papers referred to include Alp Aydinoglu and Wei-Cheng Huang of the GRASP Lab; Adam Wei of the University of Toronto; and Wanxin Jin of Arizona State University.