Weird Wednesday: The Sorceries of Python and AI combined

So, I first became aware of Whisper, an LLM designed for transcription, translation, and subtitles, a couple of years back when I was writing Wolf’s Trail. Whisper was then the “backend” of a free website where I could upload my audio files and get a text transcription back. Then the free website went sideways around the time I started work on the sequel, Undead Flight, so although I did a little dictation on that book (speech to text in Word, cleanup by Claude AI, additional reworking by me), I wasn’t able to dictate on the road very much. So I found out that I could run whisper on my own computer through python, downloaded pytorch, downloaded whisper, and then realized I had no idea how to work with python. I abandoned the idea for 7 or 8 months, then took an online course in python on a whim, fiddled around trying to install some other stuff whisper depended on that I didn’t have, and then, after visiting about half a dozen “whisper in python” tutorials and asking Claude AI for help on the “write to text file” part, I came up with the following. Lines following a # sign are comments rather than part of the code.

# This is the initial test to make sure python can see the audio file.
# Be sure to change example file name and location to match the the thing you want transcribed.

#Please note that forward slashes (/) are correct for file locations in python but not necessarily any other form.

from pathlib import Path
file_path = Path(‘Drive:/Users/Username/Location/Filename.mp3’)
if file_path.exists():
file_name = open(file_path)
print(“File Uploaded Successfully!”)
else:
print(“File does not exist, please try again later.”)

# If file test is successful, continue with the operation.

import whisper
model = whisper.load_model(‘medium’)

#I haven’t seen that whisper’s ‘large’ model is appreciably better in English than its ‘medium’ model, but YMMV.

#You may get a lecture on using torch.load with weights_only=False. It should be safe to ignore it.
#You may fire the rest of the code when ready.

output_dir = Path(‘Drive:/Users/Username/Location/’)
result = model.transcribe(str(file_path), language=’en’, verbose=True, fp16=False)

#Verbose=true means that whisper will show you what it’s hearing as it interprets it, with date stamps. #fp16 (half-precision floating point) is considered a more efficient way for LLMs to draw inferences, but my computer kept failing at it and moving to fp32, so I added fp16=False to make the computer stop trying fp16.

output_path = output_dir / f”{file_path.stem}.txt”

#this line tells python where to save the transcription and under what name.

output_path.write_text(result[“text”], encoding=’utf-8′)
print(f”Transcription saved to: {output_path}”)

One thought on “Weird Wednesday: The Sorceries of Python and AI combined

Leave a comment