ZOO Digital, a global media localization leader, has embraced AI to solve one of the most time-consuming tasks in dubbing and content adaptation: speaker diarization. With the help of AWS and the WhisperX model, they’ve significantly cut down processing time and manual effort.
The Challenge: Manual Diarization Bottlenecks
In traditional workflows, assigning speaker labels in video or audio content is manual and slow. For a 30-minute episode, diarization alone can take up to 3 hours. This hampers ZOO Digital’s mission to deliver fast, high-quality localized content for the world’s top entertainment brands.
The AI-Powered Solution
ZOO Digital partnered with AWS’s Prototyping team to develop an automated workflow using Amazon SageMaker and the WhisperX model. By integrating transcription, alignment, and speaker diarization models into a single pipeline, they achieved near real-time processing of complex media files.
- WhisperX handles transcription and time-aligned output
- pyannote is used for accurate speaker diarization
- FFmpeg and Wav2Vec2 enhance audio handling and timestamp precision
Deployment at Scale
The models were deployed using asynchronous endpoints on SageMaker, allowing ZOO to process large video files efficiently and cost-effectively. With auto-scaling enabled, the solution adapts to demand and remains scalable across thousands of localization projects.
Results and Impact
In internal tests, ZOO reduced diarization time from hours to under 30 minutes per episode. This unlocked major productivity gains across teams and enabled faster turnaround for content globalization.
While accuracy varied between media types (e.g. dramas vs. documentaries), the improvements have already shown a compelling return on investment—especially for high-volume streaming content.
Conclusion
The collaboration between AWS and ZOO Digital demonstrates how AI, when strategically deployed, can streamline a traditionally manual and costly workflow. This case offers a blueprint for other media companies looking to leverage machine learning to accelerate localization and reduce operational overhead.
For the full AWS case study, visit: AWS Blog
For more AI-driven media innovations, explore our full list of exclusive interviews.