Daniele Pucci on Physical AI and the Limits of Vision-First Humanoids

In this interview, Daniele Pucci, CEO of Generative Bionics, explains why humanoid intelligence should not be centralized in a vision-language “brain.” He outlines a Physical AI approach in which intelligence is distributed across body, materials, and sensorimotor loops.

You describe Generative Bionics’ approach as Physical AI, rather than the more common Embodied AI. Is this mainly a technical distinction, or does it reflect a deeper disagreement with the idea that intelligence can be centralized in a vision-language “brain” and deployed top-down into a body?

Our Human-Centric Physical AI is not a semantic variation of Embodied AI; it reflects a fundamentally different architectural stance inspired by how human intelligence actually works.

From a neuroscience perspective, human intelligence is not centralized. It is commonly structured across three tightly coupled domains: motor intelligence, which governs balance, posture, and movement; cognitive intelligence, responsible for reasoning, planning, and fine manipulation; and interaction intelligence, which enables understanding and acting within physical and social contexts. Crucially, all three have co-evolved with the biomechanics of the human body. This allows fast reflexes and adaptive behaviors to be decentralized and embedded within the peripheral and spinal nervous systems, rather than routed through conscious reasoning.

A simple example is human locomotion. Humans can walk downhill with near-minimal energy expenditure and brain activity by exploiting gravity and body dynamics. In this case, the body itself—through its materials, mass distribution, and morphology—performs much of the computation required for walking. Intelligence is not fully “in the brain”; it is distributed across body and physics.

Many humanoid approaches today attempt to centralize motor, cognitive, and interaction intelligence into a vision-language “brain.” That is not how humans work, and it is also why we believe the term Vision-Language-Action (VLA) is fundamentally misleading for humanoid robots. These systems receive far more than vision and language inputs: motor currents, joint torques, inertial data, tactile signals, temperatures, and internal states are all critical. If humans wear wearables that provide heart rate, biomechanical fatigue, or cognitive load, these become additional robot sensors that should directly shape robot behavior.

Reducing this complexity to “VLA” obscures what humanoid intelligence really requires.

At Generative Bionics, we are developing Generative Physical AI methods that explicitly link the materials and physical properties of the robot body—aluminum, composites, structural compliance—to the amount and type of computation required to perform a task, such as walking or manipulation. This enables us to co-design the robot body and the AI model simultaneously, rather than treating the body as a passive shell for a pre-trained brain. When conditioned by human biomechanics, this process naturally produces humanoid morphologies and control strategies reminiscent of human sensorimotor organization—hence the name Generative Bionics.

Our approach integrates motor, cognitive, and interaction intelligence by making the body itself an active computational resource. We combine material science, morphology, dynamics, and distributed sensing so that locomotion, cognition, and interaction emerge from body–environment physics, rather than from stacking more GPUs. Motor intelligence lives in fast sensorimotor loops at the limbs and spine; interaction intelligence—the hardest and least developed today—requires whole-body tactile and force sensing, which most competitors still lack.

Vision-language models absolutely have a role in high-level reasoning. But Physical AI grounds them in a body that learns physics-first: “fragile” means sensing breakage risk, “heavy” means feeling inertia. You cannot deploy top-down intelligence into a passive body and expect safe, adaptive behavior in unstructured, contact-rich environments. The body has to be intelligent from the start.

Many humanoid programs today follow a vision-first strategy. Your robots prioritize full-body tactile sensing. In non-structured industrial environments, what are the fundamental limits of vision-only systems that tactile-first architectures are better suited to overcome?

Vision-first architectures work well in structured settings, but most legacy industrial environments are the opposite of that: poor lighting, occlusions, dust, unexpected objects, humans walking in and out, and tools that are not neatly arranged for the robot. In these scenarios, a vision-only system is fragile; when the scene changes or the camera is blinded, your “intelligence” collapses at the very moment you need robustness.

A tactile-first architecture gives the robot another, often more reliable, channel of truth. When you have full-body skin and distributed force sensing, the robot can feel contact on its arms, torso, and feet, adjust posture, and keep working even if a camera is occluded or a part is slightly misaligned. It is closer to how humans work in tight, cluttered spaces: you do not only look at the world, you lean against it, you feel it pushing back, and you adapt minute by minute.

GEN.01 uses a distributed compute model, pushing processing closer to the limbs and skin. In practical terms, how does this change the latency and stability of perception-to-action loops when balance or contact forces change unexpectedly?

Pushing compute closer to the limbs and the skin is not about compensating for limitations in control algorithms — modern reinforcement learning policies can already stabilize balance and locomotion even with minimal sensing and centralized execution. The real advantage of distributed compute lies in how information is processed, filtered, and acted upon under real-world physical uncertainty.

In a humanoid operating in contact-rich environments, the challenge is not only reacting quickly, but managing extremely high-bandwidth, heterogeneous signals: forces, torques, micro-slips, pressure distributions, vibrations, and transient contacts that are impractical to stream raw to a central model at full fidelity. Near-sensor and limb-level compute allows these signals to be processed, abstracted, and stabilized locally, producing physically meaningful representations for higher-level reasoning.

This architecture improves stability not because the central policy is slow, but because the system remains observable and well-conditioned under unexpected contacts. Local compute can handle rapid, high-frequency physical interactions—such as impact absorption, contact compliance, or force redistribution—while the central model focuses on task-level intent, adaptation, and planning. This separation avoids saturating a single controller with incompatible timescales and reduces sensitivity to noise, delays, or partial failures.

In practice, this leads to perception-to-action loops that are more robust, more scalable, and less brittle, especially when interaction intelligence is required. Balance and locomotion can remain stable even under degraded sensing or compute conditions, while rich tactile and force information still informs higher-level behavior. Rather than a single monolithic brain, the robot behaves more like a biological system: intelligence distributed across body, sensors, and central models, each operating at the timescale where it is most effective.

Before founding Generative Bionics, you led projects like iRonCub, the world’s first jet-powered humanoid. From a control-theory perspective, what lessons from stabilizing a humanoid in extreme, high-dynamic environments translated most directly into making a ground-based robot more robust?

iRonCub was an extreme stress test for humanoid control, but not only because of multibody dynamics. The real discontinuity came from having to control, in real time, phenomena governed by partial differential equations (PDEs): aerodynamics and thermodynamics, on top of classical rigid-body physics.

In standard humanoids, most uncertainties come from contacts and multibody dynamics, which are already highly nonlinear but still modeled by Ordinary Differential Equations (ODEs). With iRonCub, we had to reason about airflow, thrust–body interaction, turbulent effects, and heat propagation—phenomena described by PDEs that cannot be modeled or estimated online using classical control approaches.

This forced us to develop physics-informed AI models to infer aerodynamic forces and thermodynamic effects in real time, combining CFD, wind-tunnel experiments, and neural networks trained to respect physical constraints. These models, published in Nature Communications Engineering with Stanford and Politecnico di Milano last year, fundamentally changed how we think about perception and control: instead of assuming clean models, we explicitly embed learning and uncertainty into the control loop.

Another key lesson was learning how to design Model Predictive Control architectures that coordinate actuators with radically different bandwidths: fast electric motors and much slower jet turbines, each with delays, saturation, and different failure modes.

Solving this problem taught us how to build controllers that remain stable despite latency, mismatched dynamics, and incomplete models.

In parallel, we also pushed heavily on reinforcement learning, using algorithms based on Adversarial Motion Priors (AMP) to control the flying humanoid. AMP allowed us to inject structured motion priors into the learning process, producing physically plausible and coordinated whole-body behaviors while still satisfying hard dynamic and control constraints. This was critical for managing transitions between different regimes—contact, near-contact, and flight—without relying on brittle, hand-tuned controllers.

All of this translates directly into more robust ground robots. In Generative Bionics, when a contact changes at the hand or foot, or when friction suddenly drops, we treat it as a global inference-and-control problem, not a local instability. The mindset developed with iRonCub—whole-body control under deep physical uncertainty, including PDE-driven phenomena—is now a core principle for making humanoids reliable in real factories and unstructured environments.

Industrial robotics has long struggled to balance rigid position control with safe, compliant force control. How does your control stack allow GEN.01 to remain precise enough for industrial tasks while still being physically safe for close human collaboration?

Industrial “precision vs. compliance” is only a real dilemma if you treat precision as a single, fixed requirement. In practice, precision depends on the use case: welding along a seam, inserting a connector, carrying a part, or working near humans all demand different accuracy, stiffness, and bandwidth profiles, and often involves only a sub-part of the robot.

A second key point is that what people call “industrial precision” is often achieved with high gearbox ratios (e.g., Harmonic Drives), because they suppress reflected load disturbances and improve positioning. The downside is well known: high ratio typically hurts backdrivability and intrinsic compliance, and can amplify friction/hysteresis: so naïvely, it looks incompatible with safe force control. We have worked extensively on making high-ratio actuation compatible with compliant control by combining physics-informed friction learning and sensorless torque estimation, enabling robust torque and force control without relying on joint torque sensors.

Importantly, compliant force and torque control on high-ratio gearboxes was demonstrated as early as 2016, showing that with the right estimation, identification, and control stack, accurate force tracking is achievable even on stiff legacy-style transmissions.

For GEN.01, our approach is to avoid one actuator philosophy everywhere. Depending on the task, and thanks to our Generative Physical AI, we investigate an optimal distribution of actuation technologies and gear ratios across the body: for example, leveraging emerging low/medium-ratio solutions (e.g., planetary transmissions) where you want higher backdrivability and where RL-based policies benefit from bandwidth and transparency, while keeping precision and safety-critical compliance in other areas (often the upper body / interaction interfaces) using the estimation + friction-compensation + whole-body torque control techniques we’ve developed over the last years.

So the short answer is: we don’t “pick” between precision and safety. We (1) define task-specific precision targets, (2) make even high-ratio joints force-controllable via physics-informed identification and torque estimation, and (3) co-design the actuation layout so that the robot is precise where it must be and compliant where it should be, enabling close human collaboration without giving up industrial performance.

Generative Bionics emerged from decades of research at the Italian Institute of Technology, including platforms like iCub. How important is that long-term academic continuity in avoiding black-box learning approaches that dominate much of today’s AI-driven robotics?

In other words, we may say that Generative Bionics is not a garage project; it is the continuation of more than 15 years of research at the Italian Institute of Technology on platforms such as iCub, ergoCub and iRonCub. That long-term continuity matters because it gives us a deep understanding of when learning is necessary—and when it is not.

We do not reject black-box, AI-driven robotics. On the contrary, we consider it one of the valid ways to endow robots with intelligence, when the use case justifies it. For certain tasks, simple and well-understood approaches—such as Model Predictive Control or classical whole-body control—are sufficient, safer, and far more efficient in terms of compute, power consumption, and system complexity.

Because we come from a strong academic background, we know both the theory and the practical limits of these methods. That allows us to deliberately calibrate the role of AI: sometimes learning-based policies are essential, sometimes they add unnecessary cost, opacity, and fragility. The key is matching the intelligence stack to the task and to the architectural price of the robot in terms of sensors, compute, energy, and certification burden.

Our heritage pushes us toward hybrid architectures, where learning is used where models break down or uncertainty dominates, while model-based control remains central for safety-critical and certifiable functions. This balance is particularly important in the European regulatory context, where interpretability, robustness, and compliance are not optional.

In short, long-term academic continuity gives us the freedom to choose—not ideology. We use black-box AI when it adds value, and we avoid it when simpler, more transparent solutions are better engineering.

Your early industrial partners operate in harsh, legacy environments rather than highly standardized factories. Why did you choose to start in these “brownfield” settings, and what do they reveal about where humanoid robots can create real value first?

We chose harsh, brownfield environments because that is where Europe’s real bottlenecks and safety problems are. Many European factories, shipyards, and steel plants are not designed as clean, fully automated labs; they are complex, messy, human-centric spaces with legacy equipment.

If humanoids only work in highly standardized greenfield plants, they are a nice demo but they do not solve the labour shortages and safety issues affecting millions of workers. By starting with specific industrial partners, we are forced to prove that Physical AI can adapt to welding in shipyards or steel plants, to heat, dust, and irregular layouts, not just to perfectly structured test cells. That is where humanoids can create real value first: doing jobs that are necessary, harmful, and hard to staff.

Compared with U.S. and Chinese competitors that can collect massive real-world datasets, European robotics companies often face data scarcity. How do physics-based models, simulation, and physical priors help you reduce dependence on large-scale data and close the sim-to-real gap?

Europe was the cradle of robotics and automation, and as a result it still hosts a dense and unique ecosystem of industrial value chains: from manufacturing and logistics to energy, shipbuilding, and process industries. If these industries are digitally instrumented, they can generate high-quality physical data, not web-scale data, but data that is deeply grounded in real machines, real contacts, and real physics. This gives Europe a very concrete way to play the robotics game.

Data scarcity is real in Europe, but in safety-critical robotics you will never have “internet-scale” data anyway: even in the U.S. or China. Our response is to lean heavily on physics-based models, simulation, and strong physical priors. We start from what we know: dynamics, contacts, thermodynamics, and human-like kinematics, and then we use learning to adapt and refine, rather than to rediscover physics from scratch.

This approach dramatically reduces the amount of real-world data required and makes sim-to-real transfer far more reliable. The robot does not need to learn that gravity exists or that friction matters; those constraints are already embedded in the model. Learning is then used where it is most effective: capturing unmodeled effects, tool-specific behaviors, and interaction patterns with humans and industrial processes.

In this sense, physics-informed simulation and priors are not a workaround for Europe’s data constraints. They are a strategic advantage. They allow us to turn Europe’s industrial heritage into a scalable source of meaningful data and to build robots that are robust, predictable, and deployable in real industrial environments, rather than optimized for benchmark datasets alone.

Your investor base includes sovereign capital, heavy industry, and semiconductor companies. Beyond funding, how do these partners influence your technology roadmap, particularly around compute architecture, safety, and industrial validation?

Our investors are not just capital; they are a cross-section of the ecosystem we want these robots to serve. CDP Venture Capital and sovereign actors push us to think in terms of European technological sovereignty and long-term industrial impact.

AMD is both an investor through AMD Ventures and our technology partner, providing a full-stack, end-to-end approach to compute architecture. This partnership turns the body into compute: low-latency FPGA for real-time force/tactile fusion, edge CPUs/GPUs for simulation and adaptation, all supported by open standards.

Eni Next and Duferco bring us directly into complex, heavy-industry environments — shipbuilding, steel, energy —where validation is unforgiving but incredibly valuable. Tether and QVAC, our fintech partner, signal the future of autonomous machine economics: tokenized services and blockchain coordination across industrial ecosystems. Together, they pull our roadmap toward real industrial validation, robust safety, and scalable compute, rather than pure lab performance.

Europe is introducing some of the world’s strictest AI and safety regulations. Do you see frameworks like the EU AI Act as a constraint, or as a strategic advantage for robotics companies that design safety and explainability directly into hardware and control systems?

I see the EU AI Act and related safety frameworks as a forcing function rather than a brake. If you want to deploy humanoids around people, you will need rigorous safety, transparency, and governance anyway; Europe is just making that explicit earlier.

For companies that try to retrofit safety and explainability after the fact, regulation will feel like a constraint. For us, designing human-first, intrinsically safe systems with clear oversight and documentation are part of the brand and of the engineering philosophy — and there is a profound alignment of values between Generative Bionics and the European Union’s legislative approach.

The EU AI Act’s principles — human oversight, risk-based regulation, transparency, and accountability for high-risk systems — are exactly what we’ve baked into our architecture from day one: interpretable controllers, physics-based safety bounds, tactile skin for continuous interaction monitoring, and human-in-the-loop learning. Our messaging explicitly positions Generative Bionics as “Europe’s answer to global humanoid competition,” aligned with EU goals of technological sovereignty, worker protection, and responsible innovation. We are not just compliant; we want to help shape the standards through policy engagement and technical contributions.

Long term, being “born compliant” with Europe’s strict standards will be a competitive advantage, not only in Europe but globally, in markets that prioritize trust, liability, and public acceptance over pure speed.

You often frame your mission as augmenting human capability rather than replacing workers. In real deployments, how does that philosophy shape which tasks you automate first and how you define success?

Our mission is to amplify human potential and it’s our fundamental premise. We don’t believe humans will disappear, nor should they. We chose the verb “amplify” based on the simple observation that whatever number multiplied by 0 is still 0. Hence, technology with no humans at the center solves nothing meaningful to us. Our mission is to keep the human as the core of the system and amplify their potential through Physical AI.

When selecting tasks or environments, our mantra is simple: “avoid solving useless problems.” We ask our industrial partners to identify three specific tasks currently performed by either a human, a machine, or human-machine interaction. We then deliver a technological solution that addresses at least those three tasks, creating clear, measurable value added for their operations.

This approach ensures we’re not chasing lab demos or hypothetical use cases. We’re solving real problems in real factories where humanoids can make work safer, more sustainable, and more productive without removing the human from the equation. Success is measured by safety improvements, reduced injury risk, and the ability of companies to preserve critical expertise while tackling labor shortages.

Looking ahead five years, what would success for Generative Bionics actually look like? Is it scale and volume, or proving that Physical AI can operate safely and reliably in environments where humans should no longer have to work?

In five years, I would consider Generative Bionics successful if we have fleets of humanoids operating thousands of hours in real industrial environments, with a safety record that makes workers and regulators comfortable. Scale and volume matter, but only if they are built on reliability and trust.

I want Physical AI to be a proven concept: robots that have learned from human teachers, that adapt to harsh, unstructured European workplaces, and that demonstrably reduce injuries and labour shortages. If, by then, we are recognized as the European reference point for ethical, design-led humanoid robotics — a company that gave AI a body in a way society can actually live with — that will mean we did our job.

Editor’s Note

This interview highlights a growing split in humanoid robotics between centralized, model-heavy architectures and approaches that co-design intelligence with body, materials, and physics.

Daniele Pucci on Physical AI and the Limits of Vision-First Humanoids

Plurio Lands 3.5M to Automate Performance Marketing

Quantcore Lands 2.5M for Sovereign Quantum Hardware

Tewke Lands 1.5M to Scale Home Energy Optimization

Georgii Vysotskii on Making 4D Gaussian Splatting Streamable

Muhammad Ali Khan on Orchestrating Hybrid Quantum Computing

Todd Roberts on Designing Assistive Robotics Without Losing User Control

Muhammad Ali Khan on Orchestrating Hybrid Quantum Computing

Copla Raises 6M for Real-Time Compliance

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections