in

Bessemer’s Take on Voice AI Expanding Beyond Commands


Voice AI is now ripe for applications that do more than simply process spoken commands; it enables comprehensive, end-to-end services. Bessemer Venture Partners recently put forward a compelling thesis highlighting Voice AI’s potential to revolutionize customer service, sales, and operations across industries. From transforming customer satisfaction to creating new market opportunities, Voice AI stands ready to overhaul how companies handle complex customer interactions.

Most customer service systems are still built on outdated Interactive Voice Response (IVR) technology. IVR, designed to handle calls with preset commands, is rigid and fails to meet today’s demands for fluid, natural conversations. The next wave of innovation in voice technology — driven by developments in Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and the latest Speech-to-Speech (STS) models — removes these limitations.

By recognizing and responding to human-like conversations, Voice AI will meet and anticipate customer needs faster and more naturally than any prior iteration of automated customer service. Breakthroughs like OpenAI’s Whisper and Google’s Gemini 1.5 are propelling Voice AI into an era where complex customer demands, regional dialects, and high-demand spikes are handled effortlessly.

The most anticipated paradigm shift in the space is radiating from Google’s NotebookLM. It allows companies to convert research and customer insights into natural-sounding audio formats, creating “podcast-like” experiences. Bessemer, for example, used NotebookLM to generate a conversational audio version of their entire thesis. The result is nearly indistinguishable from a real recording.

The Value of Voice AI in High-Touch Industries

Industries like healthcare, insurance, and logistics rely heavily on phone communication to handle personalized, often complex inquiries. Traditional call centers struggle under high volumes, with small businesses missing 62% of their calls on average (Bessemer’s estimations), leading to lost business.

Even larger enterprises face constraints on scalability. Voice AI bridges this gap. It could handle customer calls, schedule appointments, provide customer quotes, and even process payments without human intervention, turning what might have been missed opportunities into seamless customer interactions.

Voice AI’s Technology Stack: Moving Beyond Cascading Architectures

Traditional voice AI applications use a cascading approach — converting speech to text, processing it, and then converting it back to speech. However, this method introduces latency and limits conversational flow. Voice-native STS models now process raw audio directly, allowing response times as low as 300 milliseconds — close to human reaction speed. These models also interpret context, tone, and emotion, creating smoother, more natural interactions.

Advanced voice-native models like Kyutai’s Moshi and Hume’s Empathetic Voice Interface add layers of expressiveness and emotional awareness. It was previously unattainable, and now supporting Voice AI to feel less automated and more like a responsive assistant.

New Opportunities in the Voice AI Landscape

Voice AI is now ripe for applications that enable full-service capabilities across industries:

  • Transcription and Note-Taking: Automatically capturing conversation insights, especially valuable in healthcare or sales.
  • Inbound and Outbound Calling: Automating tasks like appointment bookings or initial screenings, allowing human agents to focus on high-value interactions.
  • Negotiation and Claims Resolution: Empowering voice agents to manage complex negotiations in real time, with LLMs analyzing extensive data in seconds.
  • Training and Sales Enablement: Providing virtual training for sales or customer service teams, simulating real-world scenarios.

StartupHub.ai been covering Voice AI for years, tracking the evolution of key players who have shaped this field long before the recent surge in Generative AI. Aiola released their open-sourced speech recognition model, Medusa-Whisper, earlier this year, with rapid fast language understanding language and over 95% accuracy. Accessibility-driven Voiceitt creates solutions for individuals with speech impairments, making voice technology more inclusive. Tenyx harnessed the power of LLMs by fine-tuning its own 7B model, ultimately leading to its acquisition by Salesforce. Eleven Labs and Play.ht, are also pioneering voice synthesis and emotional understanding, pushing boundaries to create more human-like AI voices.

Decagon also plans to introduce voice AI modality in the coming months, having just raised $65 million in series B funding.

Overcoming the Challenges in Enterprise Voice AI Adoption

Despite its promise, Voice AI faces challenges in gaining widespread adoption. For Voice AI providers, ensuring reliability, low latency, and integration with existing systems is paramount. Metrics like customer satisfaction scores, churn rates, self-serve resolution rates, and call termination data are essential indicators of performance. High churn rates often reveal that a voice agent’s reliability is below expectations.

Andreessen Horowitz (a16z) also highlighted Voice AI’s potential earlier this year in May, with a thesis with broader focus on consumer applications. They emphasized the potential for full-stack Voice AI providers that seamlessly integrate ASR, LLM processing, and TTS capabilities into a single solution. And expansion into consumer-focused roles, such as virtual therapy and coaching.

As Voice AI evolves, both VCs agree that the real challenge lies in achieving reliable, scalable, and deeply integrated solutions. The winners in this space will deliver high-quality, low-latency experiences that enhance customer satisfaction and create new business opportunities. With robust engineering, industry-specific integrations, and attention to customer pain points, Voice AI is on the cusp of making the “please hold” experience a relic of the past.

Generative AI Challenge – Implement RAG with Azure OpenAI Service

OpenAI's MASSIVE Announcements at Dev Day 2024