During my time at Weights & Biases I produced Gradient Dissent, an interview podcast about machine learning in the real world.

We found that transcribing episodes mattered: adding closed captions to episode videos increased view time, and being able to create show notes with full transcripts improved discoverability and indexing. Unfortunately, transcribing audio is generally pretty tedious.

As such, I was very excited when OpenAI released Whisper in September 2022! Whisper is an open source, automatic speech recognition deep learning model that transcribes and captions audio/audiovisual files remarkably quickly and well.

Although I ultimately found that Whisper wasn’t quite accurate enough, it was still a lot of fun to play around with it! I wrote this guide to document and share my experience with other content creators.