Julius Kümmerle on Why Autonomous Driving Still Needs Boundaries

In this interview, Julius Kümmerle, CEO and Co-Founder of SafeAD, explains why autonomous driving cannot rely on end-to-end perception alone, and why architectural separation, deterministic safeguards, and regional validation remain essential for production-grade deployment.

You built your academic career on deterministic multi-sensor calibration. Why are you comfortable replacing physical certainty with probabilistic end-toend perception, and where is the line you refuse to cross?

Deterministic methods work extremely well when a problem can be formulated precisely in mathematical terms. High-precision end-of-line calibration with defined targets in a controlled environment is a perfect example. You can model the geometry, control the conditions, and achieve outstanding precision.

But perception is fundamentally different. It is not just about detecting objects; it is about understanding a scene, interpreting relationships, predicting behavior, and fusing multiple sources of information under uncertainty. The real world is too complex to be fully described with closed-form mathematical formulations. That is where AI becomes essential.

For me, AI does not replace deterministic algorithms. It starts where simple formulations end. In an autonomy stack, both worlds can coexist. AI handles extremely complex, high-dimensional tasks. Deterministic components provide transparency, traceability, and explainability which are critical for safety.

Many teams are collapsing perception and planning into pixels-to-control systems. Do you believe there is a structural boundary between perception and driving logic that should not be compressed into a single neural model?

There are strong arguments both for and against collapsing everything into a single neural model. It depends very much on the application and the maturity target.

Pixel-to-control systems are attractive because they allow you to build an impressive first demo quickly. You can collect data simply by recording a human driver’s controls together with camera input. The results can look very convincing early on.

The challenge comes later: proving reliability and safety. Without a structural separation, you need enormous amounts of representative driving data and you may have to repeat that effort after every significant model update. That becomes extremely expensive and difficult to scale.

Maintaining a structural boundary between perception and driving logic simplifies safety monitoring and certification significantly. It reduces the required test mileage and allows structured argumentation when updating individual AI models without restarting validation from scratch.

That said, this is not a binary choice. Smart systems can combine both principles. End-to-end approaches can improve driving performance, but architectural separation remains highly valuable for certifiability and controlled deployment. The combination is actually our target system architecture.

At highway speeds, milliseconds matter. What latency envelope do you consider safe for production deployment, and what did you sacrifice architecturally to stay within it?

Latency is a critical topic. In our system, we generate a full environment model within ~30 milliseconds, and trajectory planning takes less than 5 milliseconds. That is well within the range considered safe for production deployment.

What you need to “sacrifice” architecturally depends heavily on the target compute platform. Achieving this performance on automotive-grade hardware used in driver assistance systems can be challenging. It often requires a dedicated deployment team focused solely on optimization.

The key is harmony between software and hardware architecture. You cannot design them independently. Efficient memory handling, parallelization, and careful model design are just as important as algorithmic performance. Real-world automotive deployment is not just about intelligence. It is about efficiency.

What major technical direction did SafeAD deliberately abandon because it could not scale to automotive-grade deployment?

Deployment constraints define technological boundaries and business boundaries as well.

As a startup without billion-euro funding, we cannot replace a traditional Tier 1 supplier that delivers complete turnkey systems across the entire stack. That is simply not realistic at our stage.

Instead, we made a deliberate decision to focus on being a high-impact co-development partner and enabler for series production projects. In fact, we are already part of such projects and generate tangible value because we can move much faster than large organizations.

Speed, flexibility, and deep technical focus are our strengths at this early stage. Rather than trying to replicate an entire automotive supply chain, we integrate strategically where we create the most leverage.

Mapless perception promises scalability, but road topology varies dramatically across regions. Have you observed geographic bias in your models, and how do you prevent it from becoming systemic?

Yes, geographic bias is real. You cannot expect a model to correctly recognize signs or markings it has never seen before.

This is not unique to mapless driving. It is a general limitation of current AI systems. Models interpolate very well within the boundaries of their training distribution. Extrapolating beyond those boundaries remains an unsolved challenge whether in autonomous driving, vision-language models, or large language models.

For real products, you must validate the system in the target region. That always requires collecting representative local data. No serious production deployment happens without it. Demos can often skip this step but production systems cannot.

The key is disciplined regional validation and structured data acquisition before claiming operational readiness.

Transformer-based 3D perception is compute intensive. Is your training infrastructure a defensible strategic asset, or are you structurally dependent on external GPU supply?

We have invested heavily in our own GPU infrastructure. Our servers run training workloads 24/7. For very large production-scale training runs with customers, however, we often use additional infrastructure beyond our internal cluster. That flexibility allows us to scale when required, while maintaining strong internal capabilities for research and development at comparably low costs.

Level 3 automation assumes human takeover within seconds. Does your system evaluate driver readiness in real time, and what happens if the human fails to respond?

Driver monitoring in Level 3 systems is typically implemented as a parallel function and is not our primary focus.

From a regulatory standpoint, if the human fails to respond to a takeover request, the system must execute a minimal risk maneuver, typically bringing the vehicle to a safe stop in a safe area. This behavior is legally required.

Any production-grade L3 system must guarantee this fallback capability. That is a fundamental safety principle.

Synthetic data expands edge-case coverage. How do you prevent it from creating overconfidence in scenarios that remain statistically rare in the real world?

It is important to mix edge cases into the development and validation process and to evaluate them strictly against the given requirements. If an edge case is part of the specification, then the system must cope with it according to that requirement. no discussion.

At the same time, you also have very clear requirements for normal, everyday driving scenarios, and there no compromise is accepted. Performance in common situations must remain stable and robust.

So it becomes a matter of weighing and architectural design choices in combination with structured evaluation. The evaluation framework provides the feedback loop for the system design.

Vision-first systems are vulnerable to adversarial perturbations. Do you maintain independent physical-rule guardrails outside the neural network?

Having a few physical-rule checks, at least for monitoring, can make sense. In general, redundancy to your main AI performance path can improve robustness and simplifies safety argumentation which makes the overall system easier to certify.

German OEM development cycles move slowly while AI systems evolve quickly. What has been the hardest integration friction between SafeAD’s perception stack and traditional validation pipelines?

The biggest friction is the fundamental difference in development philosophy.

AI development is inherently iterative. You continuously improve models with new data, architectural adjustments, and optimizations. It is a circular process: you train, evaluate, learn, refine, and repeat.

Traditional automotive development, however, is still largely structured around the classical “V-model.” It is built on long, sequential development phases with clearly frozen milestones and tightly controlled change processes. Once a stage is completed, going back is costly and highly regulated.

So the tension is obvious: modern AI development is a loop, while the traditional automotive industry still operates in a very linear mindset.

Every meaningful AI update can potentially trigger re-validation efforts. That creates friction between the need for rapid iteration and the demand for stability, traceability, and formal approval processes.

Perception systems continuously track dynamic actors. How do you preserve behavioral prediction without retaining identity-level information under European privacy constraints?

For driving automation, we are not interested in personal identity. We are interested in behavior.

Behavioral prediction relies on dynamic state estimation (position, velocity, acceleration, heading, interaction context) and the class of the object.

In our systems, actors are represented as abstract dynamic objects. They have state vectors and uncertainty estimates, not personal identities. There is no need to know who a person is, only how an object is moving and how it might interact with the ego vehicle.

If L3 deployment scales, human perception becomes supervisory rather than primary. In that world, what final authority should remain with the human inside the vehicle?

If L3 deployment scales, the human role clearly shifts from primary driver to supervisory backup when the system is active.

However, as long as there is a steering wheel in the car and a trained driver sitting behind it, my view is that the human always would like to retain the authority to intervene, even when the system is operating in L3 mode and even if it’s proven that the L3 system is safer than the average human driver.

Editor’s Note

This interview examines the tension between probabilistic AI perception and the structured safety requirements of automotive deployment, where speed of iteration must still coexist with certifiability and control.