Spoken language recognition on Mozilla Frequent Voice — Audio Transformations. | by Sergey Vilov

That is the third article on spoken language recognition based mostly on the Mozilla Common Voice dataset. In Part I, we mentioned knowledge choice and knowledge preprocessing and in Part II we analysed efficiency of a number of neural community classifiers.

The ultimate mannequin achieved 92% accuracy and 97% pairwise accuracy. Since this mannequin suffers from considerably excessive variance, the accuracy may probably be improved by including extra knowledge. One quite common approach to get additional knowledge is to synthesize it by performing numerous transformations on the obtainable dataset.

On this article, we are going to contemplate 5 well-liked transformations for audio knowledge augmentation: including noise, altering pace, altering pitch, time masking, and minimize & splice.

The tutorial pocket book could be discovered here.

For illustration functions, will use the pattern common_voice_en_100040 from the Mozilla Common Voice (MCV) dataset. That is the sentence The burning hearth had been extinguished.

import librosa as lr
import IPythonsign, sr = lr.load('./remodeled/common_voice_en_100040.wav', res_type='kaiser_fast') #load sign
IPython.show.Audio(sign, fee=sr)

Unique pattern common_voice_en_100040 from MCV.

Unique sign waveform (picture by the writer)

Including noise is the only audio augmentation. The quantity of noise is characterised by the signal-to-noise ratio (SNR) — the ratio between maximal sign amplitude and commonplace deviation of noise. We are going to generate a number of noise ranges, outlined with SNR, and see how they modify the sign.

SNRs = (5,10,100,1000) #Sign-to-noise ratio: max amplitude over noise stdnoisy_signal = {}
for snr in SNRs:
noise_std = max(abs(sign))/snr #get noise std
noise =  noise_std*np.random.randn(len(sign),) #generate noise with given std
noisy_signal[snr] = sign+noise
IPython.show.show(IPython.show.Audio(noisy_signal[5], fee=sr))
IPython.show.show(IPython.show.Audio(noisy_signal[1000], fee=sr))

Spoken language recognition on Mozilla Frequent Voice — Audio Transformations. | by Sergey Vilov | Aug, 2023

Time masking

Reduce & splice

New Technology Revolutionizes Insect Research

Open Source AI Has Founders—and the FTC—Buzzing

You Don't Understand AI Until You Watch THIS

Think Deepfakes Aren’t a Risk? Check Out This AI Video of Biden Flinging Slurs at His Enemies

Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films

Study Finds That AI Is Adding to Employees’ Workload and Burning Them Out

New Technology Revolutionizes Insect Research

Open Source AI Has Founders—and the FTC—Buzzing

Think Deepfakes Aren’t a Risk? Check Out This AI Video of Biden Flinging Slurs at His Enemies

Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films

Study Finds That AI Is Adding to Employees’ Workload and Burning Them Out

When AI Is Trained With AI-Generated Data, It Starts Spouting Gibberish

Bind AI Copilot (www.getbind.co)

Forensic Analysis Finds Overwhelming Similarities Between OpenAI’s Voice and Scarlett Johansson

WriteText.ai for WooCommerce (writetext.ai)

World’s Largest Radiology AI Marketplace CARPL Raises $6 Million to Accelerate the Adoption of AI in Clinical Workflows

Google for Startups Accelerator: AI First MENA-T

Prime Posts July 31 – August 6: Neglect ChatGPT, This New AI Assistant Is Leagues Forward

KDnuggets Information, August 2: ChatGPT Code Interpreter: Quick Information Science • Can’t Hold Up? Compensate for This Week in AI

Time masking

Reduce & splice

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections