Businesses generate massive amounts of speech data every day, from customer calls to product demos to sales pitches. This data can transform your business by improving customer satisfaction, helping you prioritize product improvements and streamline business processes. While AI models have improved in the past few months, connecting speech data to these models in a scalable and governed way can be a challenge, and can limit the ability of customers to gain insights at scale.
Today, we are excited to announce the preview of Vertex AI transcription models in BigQuery. This new capability can make it easy to transcribe speech files and combine them with other structured data to build analytics and AI use cases — all through the simplicity and power of SQL, while providing built-in security and governance. Using Vertex AI capabilities, you can also tune transcription models to your data and use them from BigQuery.
Previously, customers built separate AI pipelines for transcription of speech data for developing analytics. These pipelines were siloed from BigQuery, and customers wrote custom infrastructure to bring the transcribed data to BigQuery for analysis. This helped to increase time to value, made governance challenging, and required teams to manage multiple systems for a given use case.
An integrated, governed data-to-AI experience
Google Cloud’s Speech to Text V2 API offers customers a variety of features to make transcription easy and efficient. One of these features is the ability to choose a specific domain model for transcription. This means that you can choose a model that is optimized for the type of audio you are transcribing, such as customer service calls, medical recordings, or universal speech. In addition to choosing a specialized model, you also have the flexibility to tune the model for your own data using model adaptation. This can allow you to improve the accuracy of transcriptions for your specific use case.
Once you’ve chosen a model, you can create object tables in BigQuery that map to the speech files stored in Cloud Storage. Object tables provide fine-grained access control, so users can only generate transcriptions for the speech files for which they are given access. Administrators can define row-level access policies on object tables and secure access to the underlying objects.
To generate transcriptions, simply register your off-the-shelf or adapted transcription model in BigQuery and invoke it over the object table using SQL. The transcriptions are returned as a text column in the BigQuery table. This process makes it easy to transcribe large volumes of audio data without having to worry about the underlying infrastructure. Additionally, the fine-grained access control provided by object tables ensures that customer data is secure.
Here is an example of how to use the Speech to Text V2 API with BigQuery: