Use your voice to interact with AI

TL;DR

Typing is a bottleneck for describing what you want your AI to do. You can speak faster than you can type. Just say what you’re thinking out loud, with all the ums and buts. Don’t worry about spelling mistakes and bad transcriptions, the language model will work it out. You can transcribe your voice in real time using free tools that work offline by running local models on your devices.

ok so I need you to refactor the auth middleware, right now it hits the database on every single request to check the JWT and thats just way too many calls, can you make it verify the token signature first and only do a DB lookup if its actually valid

What’s the problem with typing?

You probably think before you write and type. It’s also likely you will correct spelling and grammar mistakes. You might try and remove superfluous words or sentences, and iterate on the structure in order to distil the key information for the reader. Not only does this help the reader but it probably helps you think too - and that’s a great thing!

But when it comes to LLMs, not only are they an inanimate object Harry smashing his phone in In Bruges Harry smashes an inanimate object - In Bruges (2008) Open link ↗ that seldom complain and are tolerant to nonsense, they are also literally language models that are incredibly good at understanding poorly constructed text with bad spelling and grammar.

But how do I adjust my speech?

Don’t do anything special per se, just verbalise all your thoughts as you think them. There’s something quite liberating about just blurting out all your thoughts as they enter your brain. If you lose your thread, just keep going - the LLM will piece it together.

And although it won’t be quite as clear and concise as when you type your thoughts out with careful consideration, it doesn’t matter to the LLM, and you’ll convey far more just by letting your unfiltered thoughts flow.

This is not to say that you shouldn’t think about and consider the position of your reader, the LLM (TODO: more about that in another post). But rather don’t worry about editing your speech, just allow your thoughts to be transcribed - warts n’ all.

But it won’t understand my bollocks

Don’t worry, it will. Here’s an excerpt from one of my prompts that I dictated.

I'vegotallthisTmuckshittoopenumopenvariouspop-upsandstuff.IsthereawayIcanjustopenupumlikeFinderormylocalfilebrowser?I'vegottheobviouslyI'vegoteditor,that'sallblack.POSICshit,butit'dbecoolifIcouldjustopenupthedirectoryonmysystem.themsomacOSFinder,ChromeOS,it'llbethere,whateverfilething,whetheritis,that'dbethat'dbecool.'CausesometimesIjustwanttoopenitupandbeabletolikedraganddropfilesfromanativefilemanagerintoabrowserorsomething.

Reading it back now, even I’m struggling to understand some of the wonky transcription from my local speech-to-text model 😅

But . These LLMs have ingested something like tens of thousands of lifetimes’ worth of human text during pre-training. They’ve seen every misspelling, every dialect, every bit of dodgy grammar the interwebs has to offer. So yours shouldn’t be any trouble for it 😉

What do I need to start dictating?

I recommend Handy, a free and open-source dictation app for macOS, Linux and Windows that works entirely offline. It runs Whisper or Parakeet models locally on your device - no data leaves your machine. Choose your hotkey, and then press it to dictate into any text input on your computer noice

What I use: on my Mac I use MacWhisper, only because I got it before I knew about Handy. On my I use the built-in dictation. And on my Android phone I use the voice typing in Gboard, which also runs a local model.

There are plenty of Wispr Flow fans out there and I’m sure it’s great, but I can’t justify the recurring cost when free local tools do the job. If you want something polished that just works and you’re in the Apple ecosystem, it might be worth a look. If you’re really into Wispr Flow and you want to hard sell it to me, .

But what about the office?

This is a fair concern. I can get a bit self-conscious sometimes in a quiet office when I’m whispering sweet nothings into my AI’s ear via a Speaking into a gooseneck microphone .

Maybe you don’t care and you just natter away in your open plan office - fair play, more power to you. If not, you can start by trying it at home. Then when you start missing it in the office, sneak off to quieter parts of the building and eventually buy yourself a microphone you can stick next to your mouth and whisper into. This isn’t a joke, you can literally whisper to it and it’ll understand.

Final words

AI is moving fast but human habits are slower to change. Switching from typing to dictating takes conscious effort. You don’t have to commit to becoming a “dictator” - just give it a go in earnest and you’ll probably never look back.