Local audio transcription with OpenAI Whisper. No API keys, no cloud, no data leaving your machine.
You have an audio file you want transcribed. The usual options:
- OpenAI / AssemblyAI / Deepgram APIs - fast, accurate, but your audio goes to a server.
- macOS dictation / Otter.ai - also cloud.
- This - Whisper running entirely on your machine. Sensitive audio (interviews, medical notes, legal recordings) never leaves.
- Python 3.7+
- FFmpeg (for audio decoding)
- ~1-10 GB disk space depending on which Whisper model you load
pip install openai-whisper ffmpeg-python torchaudioInstall FFmpeg if you don't have it:
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg
# Windows (with Chocolatey)
choco install ffmpegVerify:
ffmpeg -versionEdit text-to-speech.py to point at your audio file:
audio_path = "/path/to/your/audio.wav"Then run:
python text-to-speech.pyOutput: transcript printed to terminal and saved as a .txt file alongside the audio.
| Model | Speed | Accuracy | RAM |
|---|---|---|---|
tiny |
Fastest | Lowest | ~1 GB |
base |
Fast | Good | ~1 GB |
small |
Medium | Better | ~2 GB |
medium |
Slow | Better | ~5 GB |
large |
Slowest | Best | ~10 GB |
Change the model in the script:
model = whisper.load_model("small") # or "base", "medium", "large"The first time you load a model, Whisper downloads it from the OpenAI CDN.
Anything FFmpeg can decode: WAV, MP3, M4A, FLAC, OGG, AAC, WMA, etc.
FFmpeg errors - verify the install with ffmpeg -version. On Windows, make sure C:\ffmpeg\bin is in your PATH.
FileNotFoundError - use an absolute path to the audio file. Check for typos in the filename.
Model download fails - first run requires internet to fetch the model. Check disk space too (models range from ~150 MB to ~3 GB).
Poor accuracy - try a larger model. Accuracy on noisy or accented audio scales heavily with model size. The large model handles non-English audio much better than base.
MIT