OpenAI Releases ‘Whisper’ - Automatic Speech Recognition Tool

September 23, 2022 By Jozeph P

(Image Credit Google)

Whisper is released by OpenAI, a nonprofit organization dedicated to developing and directing artificial intelligence (AI) to benefit humanity as a whole. It is an automatic speech recognition system that will enable 'robust' transcription in multiple languages, according to OpenAI. Whisper will also automatically translate those languages into English. AI and machine learning have always been challenged by automatic speech recognition (ASR). Whisper is a step in the right direction for OpenAI. Whisper was trained by OpenAI using 680,000 hours of audio data and matching transcripts from the web in 98 languages. The models and inference code are open source and can be used to create useful apps and further research into making speech processing more reliable. OpenAI Releases ‘Whisper’ - Automatic Speech Recognition Tool

OpenAI Releases ‘Whisper’ - Automatic Speech Recognition Tool

CLIP, an open source computer vision model released by OpenAI in January 2021, arguably ignited the recent era of rapidly progressing image synthesis technology such as DALL-E 2 and Stable Diffusion. Whisper is described by OpenAI as an encoder-decoder transformer, a type of neural network that can learn associations from input data and then translate them into the model's output. This overview of Whisper's operation is provided by OpenAI. 'Input audio is divided into 30-second chunks, converted to a log-Mel spectrogram, and then passed through an encoder.' A decoder is trained to predict the corresponding text caption, and special tokens are mixed in to direct the single model to perform tasks like language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.' OpenAI Releases ‘Whisper’ - Automatic Speech Recognition Tool

Approximately one-third of Whisper's audio dataset is non-English, with the task of transcribing in the original language or translating to English alternately assigned. The researchers claim that this method is effective for learning speech-to-text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot. The OpenAI researchers also hope that Whisper's high accuracy and ease of use will enable developers to incorporate voice interfaces into a broader range of applications.

By Jozeph P

Journalism explorer, tech Enthusiast. Love to read and write.

OpenAI Releases ‘Whisper’ - Automatic Speech Recognition Tool

Walmart's GenAI Search Engine: Challenging Google's Dominance in Retail Search

Introducing the MSI Claw Handheld: Your Ultimate Gaming Companion

Uber Increases In-App Ads, Prompting Mixed Reactions from Customers

Apple's iOS 18: A Leap into the AI Era

Google's Regular Pixel 8 Won't Get Gemini Nano AI

MacBook Air M3 Makes Amends for M2's Storage Blunder

Samsung Unveils the Galaxy M15 5G

Elon Musk's xAI to Open-Source Chatbot Grok

Contra: Operation Galuga - A Modern Run-and-Gun Classic

Musk Confirms X's TV App Arrives This Week

Unitree Go2 is a 4-legged robot integrated GPT-powered system

Scientists fabricate fake raspberries so that robots can practice picking this delicate fruit

NASA Engineers Build Virtual World to Explore Data

Machine Learning Can Spot Hit Songs With 97% Accuracy- A New Study Reveals

You Can Send Messages to Europa, Jupiter's Moon, Using NASA's "Message in a Bottle"

China wants to land on the moon by 2030