AI-generated voice is moving into real-time agents and creator workflows, where latency and identity are no longer optional. In this interview, MorVoice founder Mor Monshizadeh explains why speed is a product requirement, how persistent voice identity changes ownership and trust, and why efficiency matters more than model scale in the next phase of voice AI.
1. Your early career was shaped by industrial inventions focused on converting waste into usable energy. That work required extreme efficiency under physical constraints. In today’s AI landscape, where inference cost and energy waste are becoming systemic problems, how directly did that industrial efficiency mindset influence MorVoice’s model architecture and its decision to prioritize speed and low latency over sheer parameter scale?
Mor Monshizadeh: My early career was shaped by industrial invention work where constraints were real and unforgiving—systems focused on converting waste into usable energy and automating waste handling. That kind of work forces a very simple discipline: if you’re not efficient, you don’t scale, and the solution fails in the real world.
That work was also recognized internationally — I was awarded “World’s Best Inventor” in 2013 — and it reinforced a principle I still follow today: under real constraints, efficiency is not optional. It’s the product.
I brought this mindset directly into MorVoice. In the AI voice space, inference cost and wasted compute are becoming systemic issues, so rather than chasing “the biggest model possible,” MorVoice prioritizes efficiency in the full product loop: low-latency generation, fast iteration, and a workflow that lets creators get usable results quickly and repeatedly. Speed is a product requirement—not just a performance metric—because creators work in tight feedback loops (generate → listen → tweak → regenerate).
2. Mondial AI processes hundreds of thousands of real enterprise phone calls each month in regulated sectors like healthcare and finance. To what extent does MorVoice’s robustness and emotional stability come from this real-world conversational data rather than from curated or studio-grade speech datasets typically used in TTS training?
Mor Monshizadeh: During my time at Mondial AI, I worked in environments where reliability and stability matter—especially when AI is used in real business contexts, including regulated settings. That experience shaped how I evaluate voice systems: they must behave consistently outside a controlled studio demo.
MorVoice’s approach to robustness follows that principle. We aim for outputs that stay emotionally stable and consistent across many iterations and use cases, so creators and developers aren’t fighting random shifts in tone or delivery. I won’t go into proprietary training details, but the guiding standard is simple: voice quality is measured by behavior in real usage, not just curated benchmarks.
3. Unlike most voice AI founders who come from NLP backgrounds, your experience includes C++, game engines, and 3D rendering pipelines. Has that spatial computing background shaped how you think about voice not as a waveform, but as an object with identity, ownership, and persistence inside virtual environments?
Mor Monshizadeh: Most voice AI founders come purely from NLP; my background includes C++, game engines, and 3D rendering pipelines. In real-time systems, objects have identity, ownership rules, and persistence across scenes and sessions.
This strongly shapes MorVoice. I don’t view voice as just a waveform. I view it as a persistent identity object—something that can be created, managed, reused, and (where appropriate) owned and licensed across contexts (content, agents, and virtual environments). That perspective pushes us to design for continuity: users should be able to build a recognizable voice identity and reliably carry it across projects.
4. The current voice AI market seems split between two strategies: maximal realism for long-form content and high-throughput generation for short, high-frequency use cases. By optimizing MorVoice for seconds-level generation and low latency, are you making a deliberate bet that speed will matter more than perfect fidelity for the next wave of creators and real-time agents?
Mor Monshizadeh: Yes—this is a deliberate bet. The market is splitting between “maximum realism for long-form” and “high-throughput generation for short, frequent use.” I believe the next wave of creators and real-time agents will reward speed and responsiveness more than absolute perfection.
MorVoice optimizes for seconds-level generation and low latency because that’s what unlocks iteration. If you make creators wait, you kill momentum. In this category, latency is not just technical debt—it’s creative friction.
5. MorVoice avoids competing head-on with audiobook-grade platforms and instead targets short-form creators and real-time interaction. Do you see this as a tactical market entry, or as a long-term belief that most AI-generated speech will be consumed in fast, disposable, and iterative contexts rather than in polished, cinematic formats?
Mor Monshizadeh: Both—but primarily long-term belief. Short-form is a practical entry because it’s a daily workflow and the feedback loop is fast. More importantly, I believe most AI-generated speech will be consumed in fast, iterative contexts: short videos, ads, product demos, interactive agents, support flows, and micro-learning.
Cinematic, polished, audiobook-grade production will exist, but it’s not where most daily volume and iteration will live. MorVoice is built for the high-frequency reality.
6. Most AI startups rely on venture capital to fund increasingly expensive model training. Mondial AI has remained bootstrapped, while MorVoice explores tokenization and on-chain economics. Is your Web3 strategy partly an attempt to replace traditional VC financing with community-based capital formation tied directly to usage and ownership?
Mor Monshizadeh: Most AI startups rely on VC to fund increasingly expensive training and infrastructure. I’ve built in bootstrapped conditions before, which forces discipline: cost, utility, and retention must be real from day one.
MorVoice explores tokenization and on-chain economics because I’m interested in aligning incentives between the network, creators, and users—tying value to participation and usage rather than purely to fundraising cycles. I don’t see Web3 as “replacing VC” in a simplistic way; I see it as a tool to design a more community-aligned growth model where ownership and usage can be connected.
7. MorVoice’s voice NFT marketplace revives an idea that failed publicly in earlier projects due to copyright abuse and identity fraud. What has fundamentally changed since those failures, and why do you believe voice ownership can now be enforced technically rather than merely promised philosophically?
Mor Monshizadeh: Earlier “voice ownership” and NFT-style experiments failed publicly for understandable reasons: abuse, copyright problems, and identity fraud. The core failure was lack of enforcement—ownership was claimed, but not technically and operationally upheld.
What’s different now is that enforcement can be treated as a product requirement rather than marketing language:
– Clear consent and permission rules
– Identity and provenance mechanisms
– Abuse detection and review processes
– Platform-level controls that can block monetization when rights aren’t proven
The key is to treat a voice asset not as a collectible, but as a functional license/identity object where usage and monetization are bound to permission and enforceable rules.
8. Decentralization often conflicts with platform responsibility. If a user attempts to mint or monetize an unauthorized or celebrity-like voice, where does MorVoice draw the line between being a neutral infrastructure provider and an accountable gatekeeper?
Mor Monshizadeh: Decentralization is not a shield against responsibility. If a user attempts to mint or monetize an unauthorized or celebrity-like voice, MorVoice draws a clear line: permission and consent are required, and abuse cannot be monetized.
Infrastructure providers still make choices. MorVoice’s stance is: enable legitimate creation, but act as an accountable gatekeeper when identity theft or unauthorized imitation is involved—especially when monetization is on the table. Without that, you don’t get a sustainable ecosystem; you get a fraud marketplace.
9. Recent legal cases suggest that voice identity is increasingly protected under right-of-publicity laws, even if raw audio data itself is not copyrighted. How does MorVoice reconcile this legal reality with its vision of permissionless voice markets?
Mor Monshizadeh: Voice identity is increasingly treated as protected. That legal reality is not optional.
So the reconciliation is straightforward in principle:
– “Permissionless” access to tools and creation workflows
– But “permissioned” use of a person’s identity, especially for monetization
A permissionless market only works long-term if participants trust that the underlying assets are authorized. Otherwise it collapses under legal, reputational, and platform pressure.
10. By placing voice assets on-chain, MorVoice implicitly proposes a future where AI models license voices instead of scraping them. Do you see this as a viable alternative to today’s train-first, litigate-later norm in generative AI, or is this model only realistic for high-value, niche voices?
Mor Monshizadeh: Yes, I see licensing as a viable alternative to the current “train-first, litigate-later” norm—because it’s the only approach that can scale into a regulated future without constant legal conflict.
That said, I’m pragmatic: licensing will likely be adopted first for high-value voices where the economics justify it. Over time, standards and automation can expand it beyond niche cases. The goal is to create infrastructure where models can license voices cleanly rather than scraping and hoping.
11. As AI agents become more autonomous in business and social interactions, do you believe users will prefer agents that speak in their own cloned voice to maximize trust, or will there be a push toward clearly synthetic, branded voices to preserve human-machine boundaries?
Mor Monshizadeh: I expect both trends.
– In private or relationship-based contexts, people may prefer agents that speak in their own voice because it increases trust and continuity.
– In public, commercial, or regulated contexts, there will be pressure toward clearly synthetic or branded voices to preserve boundaries and reduce deception.
The real requirement is clarity: users should know whether they’re hearing a human, a clone, or a brand voice, and consent should be explicit where identity is involved.
12. Looking beyond MorVoice as a product, do you see low-latency voice and enforceable identity as foundational infrastructure for a future agent-driven or spatial web, where avatars without persistent voice identities become functionally invisible?
Mor Monshizadeh: Yes. As agents become more autonomous, voice becomes a primary interface. If the voice layer is slow, interactions feel broken. If identity is not persistent, agents and avatars become interchangeable and untrustworthy.
I see low-latency voice and enforceable identity as foundational infrastructure: the layer that makes agents socially functional and economically viable. In a spatial/agent-driven web, avatars without persistent voice identity won’t disappear visually—but they’ll be “invisible” in terms of recognition and trust.
Editor’s Note
This interview explores a structural shift in voice AI. As agents become real-time and persistent, speed and identity move from optimization goals to foundational requirements.

