๐Ÿ”ง Builder Guideโฑ ~15 min read๐Ÿ’ฐ Cost: $0

Voice to Prompt in 3 Hours

Built it for $0. Saves me 50 minutes a day.

I was sick of typing long AI prompts every day. So I stopped typing. Here's exactly how I built a hands-free voice system using free APIs โ€” and how you can too.

1The Problem2The Stack3The Build4Results5Replicate It
Start Reading โ†“

๐Ÿ’ก If you're still typing every prompt by hand, you're wasting at least 40 minutes a day. This guide fixes that.

WisprFlow went paid. I'd been using it to dictate AI prompts hands-free โ€” and suddenly it was $12/month. I was about to close the tab and pay it, when I thought: how hard can this actually be?

I was spending 30โ€“40 minutes a day just typing prompts. Not thinking. Not building. Just typing the same kinds of instructions into Claude, over and over. That's 4+ hours a week on pure friction.

๐Ÿ’ก

The Decision Moment

I Googled "free Whisper API" and found Groq had Whisper V3 Turbo free-tier. That was the moment I knew I was building this instead of paying.

What I Actually Needed to Build

  • Press a keyboard shortcut โ†’ start recording
  • Release โ†’ send audio to Whisper for transcription
  • Clean up the raw transcript (remove filler words)
  • Convert spoken intent into a structured AI prompt
  • Paste it into whatever app I'm in
๐Ÿ”–
That's it. Four steps. Three APIs. Zero dollars.

The Full Stack

ToolWhat It DoesCost
Groq (Whisper V3 Turbo)Speech-to-text transcriptionFree tier
GPT OSS 120B (via OpenRouter)Cleans raw transcript, removes filler wordsFree tier
Llama 4 ScoutConverts cleaned text โ†’ structured promptFree tier
Python + xdotoolKeyboard shortcut + clipboard paste on Linux$0

The total infrastructure cost: $0. All three AI models have free tiers generous enough to handle 50โ€“100 prompts a day without hitting limits.

โš ๏ธ

The One Honest Caveat

Free tiers have rate limits. If you're doing 200+ prompts/day, you'll hit them. For most builders, the free tier is more than enough.
01
Hour 0โ€“0.5

Set Up Groq API

Created account, grabbed API key, tested Whisper V3 Turbo with a 10-second clip. Worked first try. Shocked.

02
Hour 0.5โ€“1

Built the Recording Script

Python script using sounddevice. Keyboard shortcut with xdotool. Hold to record, release to transcribe. Raw transcript back in ~1.2 seconds.

03
Hour 1โ€“2

Added the Cleaning Layer

Raw Whisper output was full of 'um', half sentences, repeated words. Piped it through GPT OSS 120B: 'Clean this transcript, preserve meaning, remove filler.' Night and day difference.

04
Hour 2โ€“2.5

Added the Prompt Layer

The real unlock. Llama 4 Scout converts spoken intent into a structured AI prompt. 'Make this shorter' becomes a proper Claude instruction. This is where it got powerful.

05
Hour 2.5โ€“3

Clipboard + Auto-paste

xdotool pastes the result wherever my cursor is. Works in Claude, Cursor, browser, anywhere. Done.

The Part Where It Almost Broke

The keyboard shortcut detection was flaky on Wayland. Spent 45 minutes debugging. Switched from pynput to xdotool + a background listener. Fixed. That's the messiest 45 minutes of the build โ€” everything else was clean.

๐Ÿ’ฌ

Raw โ†’ Clean โ†’ Prompt

Raw: "um make this uh make this like shorter and like more punchy"
Clean: "Make this shorter and more punchy"
Prompt: "Rewrite the following text to be more concise and punchy. Remove filler words and unnecessary phrases. Preserve the core meaning."
Before (Typing)
โ˜… RecommendedAfter (Voice System)
Time per prompt
45โ€“90 seconds
8โ€“12 seconds
Daily prompt time
40โ€“50 minutes
6โ€“8 minutes
Prompt quality
Varies โ€” I rush when tired
Consistent โ€” the cleaning layer fixes it
Mental friction
High โ€” switching context to type
Near zero โ€” just talk
Cost
$12/month (WisprFlow)
$0
โœ…

The Real Number

50โ€“60 minutes saved per day. That's 6+ hours a week. In a month, that's an extra full workday back. For $0.

Prerequisites

  • Python 3.10+ installed
  • Free Groq account (groq.com) โ€” takes 2 minutes
  • Free OpenRouter account for GPT OSS 120B and Llama 4 Scout
  • Linux (xdotool) or macOS (use pbpaste + Automator instead)
1

Install deps: pip install groq sounddevice numpy openai

2

Set env vars: GROQ_API_KEY and OPENROUTER_API_KEY in your .bashrc

3

Create the Python script (link below) โ€” 80 lines total

4

Add a keyboard shortcut: Settings โ†’ Keyboard โ†’ Custom Shortcut โ†’ run the script

5

Test: hold shortcut, say something, release โ€” see the structured prompt paste

๐Ÿ’ก

The Script

I'll share the full script in my next Threads post. Follow @utkarsh.gen to get it when it drops.

Build It This Weekend

3 hours. $0. 50 minutes saved daily.

Hour 1

Set up Groq account and test Whisper V3 Turbo with a sample clip.

โฑ 30 min๐Ÿ“„ Working transcription
Hour 2

Build the recording script + keyboard shortcut. Test basic record โ†’ transcribe flow.

โฑ 60 min๐Ÿ“„ Hands-free transcription
Hour 3

Add cleaning layer (GPT OSS 120B) + prompt conversion layer (Llama 4 Scout). Test end-to-end.

โฑ 60 min๐Ÿ“„ Full voice-to-prompt system live

I build things like this every week.

Follow on Threads for real builds, free tools, and the messy parts nobody shows.

Follow @utkarsh.gen โ†’