🔧 Builder Guide⏱ ~15 min read💰 Cost: $0

Voice to Prompt in 3 Hours

Built it for $0. Saves me 50 minutes a day.

I was sick of typing long AI prompts every day. So I stopped typing. Here's exactly how I built a hands-free voice system using free APIs — and how you can too.

1The Problem 2The Stack 3The Build 4Results 5Replicate It

Start Reading ↓

💡 If you're still typing every prompt by hand, you're wasting at least 40 minutes a day. This guide fixes that.

WisprFlow went paid. I'd been using it to dictate AI prompts hands-free — and suddenly it was $12/month. I was about to close the tab and pay it, when I thought: how hard can this actually be?

I was spending 30–40 minutes a day just typing prompts. Not thinking. Not building. Just typing the same kinds of instructions into Claude, over and over. That's 4+ hours a week on pure friction.

💡

The Decision Moment

I Googled "free Whisper API" and found Groq had Whisper V3 Turbo free-tier. That was the moment I knew I was building this instead of paying.

What I Actually Needed to Build

Press a keyboard shortcut → start recording
Release → send audio to Whisper for transcription
Clean up the raw transcript (remove filler words)
Convert spoken intent into a structured AI prompt
Paste it into whatever app I'm in

🔖

That's it. Four steps. Three APIs. Zero dollars.

The Full Stack

Tool	What It Does	Cost
Groq (Whisper V3 Turbo)	Speech-to-text transcription	Free tier
GPT OSS 120B (via OpenRouter)	Cleans raw transcript, removes filler words	Free tier
Llama 4 Scout	Converts cleaned text → structured prompt	Free tier
Python + xdotool	Keyboard shortcut + clipboard paste on Linux	$0

The total infrastructure cost: $0. All three AI models have free tiers generous enough to handle 50–100 prompts a day without hitting limits.

⚠️

The One Honest Caveat

Free tiers have rate limits. If you're doing 200+ prompts/day, you'll hit them. For most builders, the free tier is more than enough.

Hour 0–0.5

Set Up Groq API

Created account, grabbed API key, tested Whisper V3 Turbo with a 10-second clip. Worked first try. Shocked.

Hour 0.5–1

Built the Recording Script

Python script using sounddevice. Keyboard shortcut with xdotool. Hold to record, release to transcribe. Raw transcript back in ~1.2 seconds.

Hour 1–2

Added the Cleaning Layer

Raw Whisper output was full of 'um', half sentences, repeated words. Piped it through GPT OSS 120B: 'Clean this transcript, preserve meaning, remove filler.' Night and day difference.

Hour 2–2.5

Added the Prompt Layer

The real unlock. Llama 4 Scout converts spoken intent into a structured AI prompt. 'Make this shorter' becomes a proper Claude instruction. This is where it got powerful.

Hour 2.5–3

Clipboard + Auto-paste

xdotool pastes the result wherever my cursor is. Works in Claude, Cursor, browser, anywhere. Done.

The Part Where It Almost Broke

The keyboard shortcut detection was flaky on Wayland. Spent 45 minutes debugging. Switched from pynput to xdotool + a background listener. Fixed. That's the messiest 45 minutes of the build — everything else was clean.

💬

Raw → Clean → Prompt

Raw: "um make this uh make this like shorter and like more punchy"
Clean: "Make this shorter and more punchy"
Prompt: "Rewrite the following text to be more concise and punchy. Remove filler words and unnecessary phrases. Preserve the core meaning."

Before (Typing)

★ RecommendedAfter (Voice System)

Time per prompt

45–90 seconds

8–12 seconds

Daily prompt time

40–50 minutes

6–8 minutes

Prompt quality

Varies — I rush when tired

Consistent — the cleaning layer fixes it

Mental friction

High — switching context to type

Near zero — just talk

Cost

$12/month (WisprFlow)

✅

The Real Number

50–60 minutes saved per day. That's 6+ hours a week. In a month, that's an extra full workday back. For $0.

Prerequisites

Python 3.10+ installed
Free Groq account (groq.com) — takes 2 minutes
Free OpenRouter account for GPT OSS 120B and Llama 4 Scout
Linux (xdotool) or macOS (use pbpaste + Automator instead)

Install deps: pip install groq sounddevice numpy openai

Set env vars: GROQ_API_KEY and OPENROUTER_API_KEY in your .bashrc

Create the Python script (link below) — 80 lines total

Add a keyboard shortcut: Settings → Keyboard → Custom Shortcut → run the script

Test: hold shortcut, say something, release — see the structured prompt paste

💡

The Script

I'll share the full script in my next Threads post. Follow @utkarsh.gen to get it when it drops.

Build It This Weekend

3 hours. $0. 50 minutes saved daily.

Hour 1

Set up Groq account and test Whisper V3 Turbo with a sample clip.

⏱ 30 min📄 Working transcription

Hour 2

Build the recording script + keyboard shortcut. Test basic record → transcribe flow.

⏱ 60 min📄 Hands-free transcription

Hour 3

Add cleaning layer (GPT OSS 120B) + prompt conversion layer (Llama 4 Scout). Test end-to-end.

⏱ 60 min📄 Full voice-to-prompt system live

I build things like this every week.

Follow on Threads for real builds, free tools, and the messy parts nobody shows.

Follow @utkarsh.gen →