Robotics has seen rapid advances in learning algorithms, yet real-world deployment continues to stall. In this interview, Neuracore founder Stephen James explains why the true bottleneck is no longer models or control, but the absence of scalable, cloud-native data infrastructure capable of capturing, preserving, and learning from physical robot data.
1. Your career spans Imperial College, UC Berkeley, and leading applied research at Dyson. Was there a specific moment in industry when you realized robotics was not blocked by algorithms, but by data infrastructure itself, and that this was no longer fixable with incremental tooling?
Stephen James: It wasn’t a single moment, it was a pattern I kept seeing repeated across every environment. During my PhD at Imperial, at Berkeley with Pieter Abbeel, and especially leading the Dyson Robot Learning Lab, I watched talented teams spend 6+ months rebuilding the same data collection, logging, and training pipelines from scratch. The frustrating part was that the infrastructure they were building was nearly identical, it’s the tasks that differ, not the underlying data architecture.
At Dyson, this became acute. We had world-class talent and significant resources, yet we were still hitting the same bottlenecks as early-stage startups. The issue wasn’t our algorithms, the sample-efficient RL and imitation learning methods were advancing rapidly. The blocker was that 80% of engineering time was consumed by infrastructure: getting sensor streams synchronized, visualizing what the robot was actually seeing, managing training runs, deploying models to hardware.
The realization was that incremental improvements – better ROS wrappers, slightly nicer visualization tools – wouldn’t solve this. We needed a fundamental rethinking of how robotics data is captured, stored, and used. That’s when I knew the idea that had been germinating for a decade was finally ready: the technology had matured, the market understood the problem, and cloud-native infrastructure could finally deliver what local solutions never could.
2. You’ve described Neuracore as infrastructure that prevents teams from rebuilding the same pipelines repeatedly. Looking back at your work on RLBench and PyRep, do you see Neuracore as the commercial continuation of questions those open-source projects could not fully solve in the physical world?
Stephen James: Absolutely. RLBench and PyRep were about democratizing benchmarking and simulation – giving researchers standardized environments to develop and evaluate algorithms. They solved crucial problems around reproducibility and scalability in simulation. But they also revealed deeper questions we couldn’t answer in that context.
RLBench made it trivial to generate infinite demonstrations through motion planning in simulation. But in the real world? Teams were drowning in data management complexity, how do you log asynchronous sensor streams at native rates, visualize multi-modal data in real-time, version your datasets, train across GPUs, and deploy to physical hardware without losing your mind?
Neuracore is the answer to what happens when you take those lessons about scalability, reproducibility, and standardization and apply them to the entire robot learning stack in the physical world. It’s the infrastructure layer that RLBench and PyRep always implicitly assumed would exist but didn’t. Where those projects asked “how do we benchmark algorithms?”, Neuracore asks “how do we make those algorithms deployable on real robots at scale?”
3. Much of modern robotics treats synchronization as a prerequisite for clean data. Neuracore is asynchronous by design. Why is forced synchronization not just inefficient, but fundamentally the wrong way to model how robots perceive and interact with the physical world?
Stephen James: The real problem is that most teams don’t even realize they’re destroying their data. They synchronize at collection time, and by the time they realize what they’ve lost, it’s gone forever.
Here’s what happens: a team sets up ROS, and because the default is to sync everything to a common timestamp, they do exactly that. Camera at 30Hz, joint encoders at 1kHz, force-torque at 2kHz—all downsampled and aligned to 30Hz. They collect weeks of data. Six months later, when their policy struggles with contact-rich manipulation, they realize they need that high-frequency force data. But it’s gone. They threw it away before they knew they needed it.
This is the silent killer. Teams make irreversible decisions about information loss at the beginning of their pipeline, when they have the least context about what will matter for learning.
Neuracore’s philosophy: capture everything at native rates, preserve the raw temporal structure, and only synchronize later when you actually need to. Log asynchronously, synchronize optionally. This prevents teams from accidentally crippling their datasets before they even understand what they need.
4. Your background combines SLAM, control, and robot learning. How did that shape Neuracore’s decision to preserve high-frequency, heterogeneous sensor streams rather than collapsing them early, and what kinds of learning failures does early synchronization silently introduce?
Stephen James: The most common failure mode I see is that teams synchronize during collection, then wonder why their learned policies don’t work. Here’s a concrete example: a robot learning insertion tasks. The critical moment, when the peg makes contact with the hole, creates a force spike at 2kHz. But if you’ve synchronized everything to your 30Hz camera, you’ve averaged that spike into mush. The policy never learns the sharp, precise force feedback that signals successful alignment. Instead, it learns a vague, smeared version of contact dynamics and develops wobbly, imprecise behaviors.
Or consider dynamic manipulation, catching, throwing, fast pick-and-place. These tasks have critical events that happen at millisecond timescales. Downsample your proprioception from 1kHz to 30Hz and you’ve lost the fine-grained dynamics entirely. The learned policy can’t reproduce fast, precise movements because it never saw them in the training data.
The insidious part is that these failures are silent. Your training loss looks fine. The policy works okay in simple cases. But it hits a ceiling on task complexity, and you don’t know why, because you threw away the data that would have explained it months ago.
Neuracore’s decision to preserve high-frequency, heterogeneous streams comes from watching this pattern repeat across every robotics team I’ve worked with. Different modalities operate at different timescales because they’re measuring different aspects of physics. Collapsing them early doesn’t “clean” your data, it destroys the multi-scale temporal structure that learning-based control actually needs.
Keep the raw streams. Synchronize only when necessary, and only for specific purposes. Let the model discover the temporal relationships from high-fidelity data, rather than imposing your assumptions through premature downsampling.
5. The industry is moving toward vision-action-language and foundation models for robotics. From an infrastructure perspective, what new requirements do these models impose on data storage, indexing, and retrieval that most existing robotics stacks were never designed to handle?
Stephen James: Foundation models for robotics completely explode the traditional data assumptions.
First, multi-modal alignment: You’re now correlating language instructions, RGB-D streams, proprioception, sometimes audio, each at different rates and scales. You need indexing that can efficiently query “show me all trajectories where the robot grasped a red mug after being told to clean the table.”.
Second, scale: Foundation models are data-hungry in ways classical robotics never was. You’re not training on hundreds of demos anymore, you need millions of interactions. Storage and retrieval need to be cloud-native from day one.
Third, semantic search and retrieval: Language grounding means you need to be able to retrieve demonstrations not just by robot state, but by semantic content. “Find examples where the robot recovered from a grasp failure” requires understanding the meaning of trajectories, not just pattern matching.
Fourth, heterogeneous action spaces: Different robots, different embodiments, different control frequencies. Foundation models promise generalization across morphologies, but that means your data infrastructure can’t assume a fixed action dimensionality or control scheme.
Traditional robotics stacks were built for homogeneous, single-task scenarios. They literally cannot scale to what foundation models demand. Neuracore was designed from the ground up for this new paradigm, cloud-native storage, semantic indexing, multi-modal temporal alignment, and the ability to handle TBs of heterogeneous data from diverse robot platforms.
6. Simulation has been central to your academic work, yet Neuracore places strong emphasis on real-world data. How do you think about the long-term relationship between sim-to-real and real-to-sim, and can high-quality physical data eventually redefine how simulation itself is built and calibrated?
Stephen James: Simulation and real-world data aren’t competitors, they’re complementary, but the relationship is shifting fundamentally.
Historically, sim-to-real was about forward transfer: train in simulation where data is cheap, then pray your domain randomization and dynamics modeling were good enough to transfer to reality. But this is brittle. The reality gap remains the fundamental problem, and no amount of randomization fully captures real-world complexity.
The future is real-to-sim: high-quality physical data becomes the ground truth for building better simulators. Instead of hand-tuning physics parameters and hoping, you use real interaction data to calibrate dynamics models, identify sim-to-real gaps, and continuously improve your simulation fidelity. Think of it as system identification at scale, powered by learning.
Neuracore’s infrastructure enables this bidirectional flow. By capturing high-fidelity, asynchronous physical data, we can create a virtuous cycle: better real-world data infrastructure → more accurate sim calibration → more effective sim-to-real transfer → better physical policies that generate more useful data.
The endgame isn’t choosing between sim and real, it’s using physical data to make simulation actually useful, and using simulation to augment and contextualize physical data.
7. Many teams rely on fragmented best-of-breed tools for logging, visualization, training, and deployment. Why do you believe this approach fundamentally breaks down in robotics, and what only becomes possible once these stages are treated as a single, continuous system?
Stephen James: The “Frankenstein stack”, which involves stitching together ROS bags, tools for visualisation, custom training scripts, and ad-hoc deployment, creates artificial boundaries that destroy iteration velocity.
What becomes possible with a unified system:
Continuous learning cycles: Deploy a policy, it encounters a novel situation, that data automatically flows back into your training pipeline with proper versioning and indexing. No manual wrangling.
Temporal debugging: A robot fails in production? You can instantly pull up the exact sensor streams it saw, compare to similar training scenarios, and identify the distribution gap – all in one system.
Progressive deployment: Train a new policy, shadow-deploy it alongside the current one, compare performance in real-time, gradually roll out with confidence. This is trivial in a unified stack, nearly impossible with fragmented tools.
Fleet learning: Multiple robots in production generate data that automatically improves models for the entire fleet. This requires seamless integration from edge deployment back to centralized training infrastructure.
Neuracore isn’t just about convenience – it’s about enabling workflows that are architecturally impossible with fragmented tooling.
8. Neuracore positions itself as infrastructure rather than an AI brain, unlike companies that sell end-to-end robotic intelligence. Why is it strategically important for you not to compete with your customers’ models, and how does that choice shape trust and long-term adoption?
Stephen James: This is existential to our strategy and deeply informed by my time in academia and industry.
Every robotics company believes their secret sauce is their models – their learned policies, their perception algorithms, their task-specific intelligence. They’re right to think this. It’s where their differentiation lives. If we positioned Neuracore as “we’ll build your brain for you,” we’d immediately become a competitor or vendor lock-in risk.
By being pure infrastructure, “we don’t care what models you run, just that you can iterate fast”, we become platform neutral. A warehouse automation company and a surgical robotics startup can both use Neuracore without worrying we’ll build a competing product or steal their IP. We’re AWS for robots, not a robot company.
This shapes trust in several ways:
Openness: Our training algorithms are open-source. You can inspect, modify, and own your training code. We charge for compute, storage, and deployment infrastructure, not for proprietary model IP. This transparency is critical in an industry rightly cautious about vendor lock-in.
Complementarity: We actively partner with companies building robot-specific intelligence. We’re not trying to own the entire stack, we’re enabling others to build on top of us.
Data ownership: Your data stays yours. We’re infrastructure, not a data aggregator building foundation models on your proprietary datasets. This matters enormously for commercial adoption.
Long-term, this creates network effects without competition: as more teams build on Neuracore, they contribute algorithms, tools, and community knowledge, but they never worry we’ll use that to compete with them. We become the Switzerland of robot learning infrastructure.
9. You offer usage-based pricing across storage, training, and deployment. In a domain where experimentation is expensive and physical iteration is slow, how do you design incentives so teams can iterate aggressively without losing cost predictability or cutting corners on data quality?
Stephen James: We’re actually moving to tier-based pricing, rather than usage-based, precisely to solve this tension. Robotics teams need to iterate aggressively, but they also need predictable budgets. Usage-based pricing created anxiety, teams would hesitate before running another training experiment or logging at higher frequencies because they couldn’t predict the bill.
With tiers, the incentives align correctly:
Iterate without fear: Your tier includes generous storage and training compute. Run as many experiments as you need within your allocation. No mental math about whether the next training run is “worth it.”
Log everything: Storage limits are high enough that teams shouldn’t downsample or prune data to save costs. We want you capturing high-fidelity, asynchronous streams—not making data quality trade-offs for budgetary reasons.
Scale predictably: As you grow from 5 robots to 50, you move up tiers in a predictable way. You can forecast infrastructure costs as part of your scaling roadmap.
Robotics iteration is already expensive because of hardware and physical setup time. Infrastructure costs should be predictable overhead, not a variable that makes teams second-guess technical decisions. When teams know exactly what they’re paying, they experiment more freely, which is exactly what accelerates learning.
10. You’ve launched a free academic program giving universities full access to the platform. Beyond adoption, what role do you want the academic community to play in shaping datasets, benchmarks, and norms for the next generation of robot learning?
Stephen James: Academia is where the future of robotics is being defined right now. If Neuracore becomes the backbone for academic research, we have an opportunity, and responsibility, to shape the field in fundamentally positive ways.
Reproducibility and standards: We want Neuracore to become the default infrastructure for publishing robot learning research. When a paper says “trained on Neuracore,” readers know exactly what the data pipeline, training setup, and evaluation protocol looked like. This addresses the reproducibility crisis in robotics, where “we open-sourced our code” still leaves a dozen ambiguous preprocessing steps.
Shared datasets and benchmarks: RLBench provided 100 sim tasks. Imagine an academic community collectively building the physical equivalent: standardized tasks, shared datasets, leaderboards, all on common infrastructure. Researchers could compare algorithms apples-to-apples without rebuilding pipelines.
Norms around data quality and ethics: By giving academics tools that emphasize high-fidelity logging, we’re subtly shaping norms: don’t downsample prematurely, keep metadata, version your datasets. Similarly, as robotics enters high-stakes domains (medical, eldercare), academics using Neuracore can help establish best practices.
Algorithm innovation: We want Neuracore to be the natural home for releasing new algorithms. Instead of standalone GitHub repos with hard-to-reproduce setups, researchers publish directly into the Neuracore ecosystem. The community benefits, and bleeding-edge methods become immediately accessible to industry.
The academic program isn’t charity, it’s an investment in the intellectual and social infrastructure that will define robotics for the next decade.
11. Physical AI introduces real safety risks once models enter control loops. At the infrastructure layer, what safeguards are essential to ensure deployment velocity does not come at the cost of reliability or physical safety?
Stephen James: This is non-negotiable. My lab at Imperial is specifically called the Safe Whole-body Intelligent Robotics Lab (SWIRL) because I’ve spent a lot of time thinking about this.
At Imperial, our research focuses on the fundamental safety challenges of learned control. We work on safety filters that can wrap around learned policies—mathematical guarantees that constrain what actions a model can take regardless of what it outputs. We’re building benchmarks specifically designed to stress-test safety, not just task completion. And we’re exploring how to integrate more explainable, model-based methods as guardrails around the increasingly opaque end-to-end vision-language-action models.
The reality is that as VLAs become more powerful and more inscrutable, we need complementary safety mechanisms. A transformer policy might be excellent at manipulation, but it’s a black box. We need ways to verify its outputs, constrain its actions to safe manifolds, and have fallback controllers when it ventures into uncertain territory.
The infrastructure layer will play a critical role. The challenge for the field is that we can’t let safety become an excuse for slow iteration. Physical AI needs to move fast, but it needs to move fast safely. That’s the balance my research group is working toward, and it’s the balance Neuracore will need to enable as the platform matures.
12. Many people talk about a future GPT moment for robotics. Do you believe such a moment is possible without a Neuracore-like data layer already in place, or is infrastructure the true gating factor that determines whether general robotic intelligence can scale at all?
Stephen James: I’ll be direct: infrastructure and data is the gating factor.
Look at what enabled the GPT moment in language. It wasn’t just transformer architectures or scaling laws, it was the fact that text data was already digitized, indexed, and accessible at massive scale. The internet had been creating a de facto data infrastructure for decades. OpenAI didn’t need to solve “how do we store and version terabytes of text”, that problem was already solved.
Robotics has no such luxury. Physical data is:
● Heterogeneous: Vision, proprioception, force, multiple modalities at different frequencies
● Embodiment-specific: A Franka arm’s data looks nothing like a quadruped’s
● High-dimensional and noisy: Way more complex than tokenized text
● Expensive to collect: You need actual hardware, environments, interactions
The “GPT moment” in robotics requires aggregating and learning from millions of robot hours across diverse tasks and embodiments. You can’t do that with each team logging data to local disks in incompatible formats.
Transformer-based policies work. Diffusion policies work. VLAs show promise. The bottleneck isn’t algorithms, it’s that no one can train these models at the scale required because the data infrastructure simply doesn’t exist.
A Neuracore-like layer, cloud-native, multi-modal, standardized but flexible, enabling massive-scale aggregation, isn’t optional. It’s the prerequisite. Without it, we’ll get incremental improvements on narrow tasks, but not generalist robotic intelligence.
The companies that win robotics won’t be the ones with the best model architecture. They’ll be the ones that solved data infrastructure first. That’s why we built Neuracore, not because it’s interesting infrastructure (though it is), but because it’s the foundational infrastructure upon which the GPT moment for robotics can actually happen.
Robotics’ scaling laws exist. We just haven’t been able to test them yet because we don’t have the data infrastructure to scale. That’s what we’re changing.
Editor’s Note
This interview identifies data infrastructure as the limiting factor in scaling robotic learning.

