Now available for macOS

Yes, another voice-to-text app.
Here's why this one exists.

~500ms from shortcut to formatted text. Same loop, two backends:

  • Local mode (WhisperKit). Same formatting quality as cloud.
  • Cloud mode (ElevenLabs streaming, Groq as fallback). Audio not stored, not used for training.
  • No screenshots. Active app detected via macOS APIs.
  • 100+ languages in cloud, 14+ local. Formatting works in all of them, with
    • Native transcription (cloud or local), or
    • Translation to English (cloud only, Pro)
Space One shortcut. Talk. Done. Feel the difference instantly.
↓ Download for Mac ↓ Download for Windows

Free forever. Plus 7 days of Pro on us, no card.

“A five-minute chore now takes about ten seconds.” Eric

~500ms end-to-end · macOS 13+ · Apple Silicon & Intel · 2,000 words/week free

🛡
No screenshots. Ever.
💻
Local matches cloud quality
🌍
100+ languages, all formatted
~500ms end-to-end

Every app makes you choose

"Great formatting OR privacy. Pick one."

Cloud apps (Wispr Flow, Aqua Voice) format well. The cost: some screenshot your screen every few seconds for "context," and the local mode is either missing or noticeably worse than cloud.

Local apps (SuperWhisper, TalkFlowy) keep your audio on-device. The cost: raw transcripts you punctuate and reformat by hand. Local quality lags cloud.

Shoute resolves the tradeoff. Cloud mode streams audio to ElevenLabs for sub-second transcription. Local mode runs WhisperKit on-device, with the same formatting model. Both produce identical context-aware output. Neither one ever reads your screen.

Three things no competitor does together

Lots of apps do one of these. None do all three.

🛡

Privacy without compromise

No screen-capture permission requested - we don't take screenshots. Cloud mode streams audio to ElevenLabs (Groq as fallback) and stores nothing. Local mode keeps audio on your Mac, period. You pick which one runs.

Local model. Cloud-grade output.

Most "local modes" are an obvious downgrade. Ours runs WhisperKit on Apple Silicon and tunes the formatting model for dictation, not generic chat. Same clean punctuation, same context-aware structure, same ~500ms loop as cloud. Free tier lets you A/B both.

🎯

Formats to the app you're in

Same words, different output. Casual in Slack. Greeting and sign-off in Mail. Checkboxes in Reminders. Paragraph in Notes. Shoute reads the active app's name through the macOS Accessibility API - no screen capture, no settings to toggle.

Same voice. Different format.

Same dictation style, four destinations, four formats. No setting toggled. No screen captured.

💬 Slack Casual
You said

"hey can you push the standup to 3 today um something came up with the client"

Shoute output

Hey, can you push the standup to 3 today? Something came up with the client.

Mail Formal
You said

"hey sarah thanks for the proposal let's schedule a call this week to go over next steps does thursday afternoon work"

Shoute output

Hi Sarah,

Thanks for sending over the proposal. I'd like to schedule a call this week to discuss next steps. Does Thursday afternoon work for you?

Best regards

Reminders Checklist
You said

"pick up dry cleaning get almond milk call the dentist about tuesday and order avi's birthday present"

Shoute output
Pick up dry cleaning
Get almond milk
Call the dentist about Tuesday
Order Avi's birthday present
📝 Notes / Docs Paragraph
You said

"the main issue with the current approach is that we're triggering the photo evaluation too early um users haven't uploaded enough photos yet so the results aren't meaningful"

Shoute output

The main issue with the current approach is that we're triggering the photo evaluation too early. Users haven't uploaded enough photos yet, so the results aren't meaningful.

Speak your language. Get formatted text.

Most apps add multilingual transcription, then only format well in English. Shoute's formatting works in every language it transcribes. Dictate in Tamil, get a proper Mail email. Spanish in Slack? Punctuated and casual. Or set the output to English and Shoute translates as it formats — speak any language, paste polished English (Pro, cloud-only).

🇺🇸 English
🇨🇳 Chinese
🇮🇳 Hindi
🇪🇸 Spanish
🇸🇦 Arabic
🇫🇷 French
🇵🇹 Portuguese
🇷🇺 Russian
🇯🇵 Japanese
🇩🇪 German
🇰🇷 Korean
🇮🇳 Tamil
+ 88 more, auto-detected
🇪🇸 Spanish Slack
You said

"oye puedes mover la reunión a las tres de la tarde es que me surgió algo con el cliente"

Shoute output

Oye, ¿puedes mover la reunión a las 3 de la tarde? Me surgió algo con el cliente.

🇮🇳 Tamil Mail
You said

"vanakkam sir report ready aayiduchi naalaikku meeting la discuss pannalaam"

Shoute output

வணக்கம் Sir,

Report தயாராகிவிட்டது. நாளைக்கு meeting-ல் discuss பண்ணலாம்.

நன்றி

🇩🇪 German Notes
You said

"das hauptproblem ist dass wir die auswertung zu früh starten ähm die nutzer haben noch nicht genug daten hochgeladen"

Shoute output

Das Hauptproblem ist, dass wir die Auswertung zu früh starten. Die Nutzer haben noch nicht genug Daten hochgeladen.

🇯🇵 Japanese Slack
You said

"sumimasen kyou no meeting san ji ni henkou dekimasuka chotto kyaku no ken de"

Shoute output

すみません、今日のミーティング3時に変更できますか?ちょっと客の件で。

🇮🇳 Tamil → English Slack · Pro
You said

"deployment 3 maniku finish aagum, after that we can start the demo"

Shoute output (translated)

Deployment will finish at 3. After that we can start the demo.

Multilingual support in most apps stops at raw transcription - the formatting intelligence is English-only. Shoute formats every language it transcribes. Checklist in Reminders, formal in Mail, casual in Slack, no matter which language you spoke it in. Need English out? Flip one toggle and Shoute translates while it formats (Pro, cloud-only).

What "privacy-first" actually looks like

Every voice app calls itself "privacy-first." Here's what theirs do vs. what ours does.

How most voice apps work

Screenshots, no real local option

Audio retention policies vague - some train models on your voice data
Screen captured every few seconds for "context awareness" (Wispr Flow does this)
Local mode exists but ships an obviously worse formatter
Transcription content fed into product analytics
"Your data may improve our models" - opted in by default
How Shoute works

Private by architecture

Two modes, your pick. Cloud streams audio to ElevenLabs (Groq as fallback). Local runs WhisperKit on-device, zero network calls.
No screenshots. Ever. Active app name comes from the macOS Accessibility API, not pixels on your screen.
Local output matches cloud - same formatting model, same ~500ms loop. The free tier lets you compare both.
Cloud audio is never stored, never logged, never used for training. The transcription provider sees the stream once and discards it.
Forward Alpha is a two-person indie studio. No VC, no investor pressure to harvest data.

Three steps. One shortcut.

No app to switch to. No copy, no paste. Text just appears.

1

Press one shortcut

From any app, any text field. No window to bring forward, no field to focus.

⌥ Option + Space
2

Speak naturally

Ramble. Use filler words. Change your mind mid-sentence. The formatter strips the "ums" and the false starts before you see anything.

3

Text appears at your cursor

Formatted for the app you were in: casual in Slack, structured in Mail, checkbox list in Reminders. Typically ~500ms from release to text on screen.

How Shoute stacks up - no spin

We respect every product on this list. Here's the honest read - including where they're still ahead of us.

App No Screenshots Local = Cloud Smart Format Multi-Language Price
Shoute ✓ Yes ✓ Yes Per-app context 100+ $5.83/mo
Wispr Flow ✗ Takes screenshots Cloud only Context-aware 100+ $15/mo
Aqua Voice Unknown Cloud only Prose polish Multi $8-10/mo
SuperWhisper ✓ Yes Local is worse Basic Multi $249 lifetime
TalkFlowy ✓ Yes Local only Raw transcript 50+ One-time
Sayline ✓ Yes Local only Grammar only Multi One-time

Start free. Upgrade when you're hooked.

No credit card. No signup wall. Free tier is 2,000 words a week - enough to know within a day whether voice-to-text changes how you work.

Free
Get started, no card required
$0
Free forever
  • Cloud-powered transcription
  • AI smart formatting
  • Works in every app
  • Audio never stored or used for training
  • 2,000 words / week
  • 1 device

You'll start with Shoute Pro free for 7 days

Download Free
Local
100% offline, pay once
$49.99
One-time purchase, yours forever
  • On-device transcription only
  • Nothing leaves your computer
  • Works fully offline
  • Apple Silicon optimized
  • All future updates
  • 2 devices

Questions you're probably asking

Is the local model really as good as cloud?
For dictation formatting on Apple Silicon, yes. We run WhisperKit for transcription and a formatting model tuned for dictation - not a general-purpose LLM crammed into a small footprint. Output and ~500ms latency match cloud. Don't take our word for it - the free tier lets you A/B both modes.
What do you mean "no screenshots"? Why would a voice app take screenshots?
Some voice-to-text apps capture your screen periodically to understand what you're working on - this is how they provide "context-aware" formatting. Shoute takes a different approach: we detect the frontmost app name (e.g., "Mail" or "Slack") through the macOS Accessibility API. Same formatting intelligence, zero screen capture.
How does context-aware formatting work?
When you trigger Shoute, it checks which app is active. Slack? Output is casual - lowercase greeting, no sign-off. Mail? Proper email structure with greeting and closing. Reminders? Checklist format. Notes? Clean paragraphs. The AI formatting model adjusts its output based on where your text will land.
What languages are supported?
100+ languages, and the formatting intelligence works across all of them. You can dictate in Tamil, Spanish, German, Japanese, Hindi, or Arabic and get properly formatted output - not just raw transcription. You can even switch languages mid-conversation.
What happens if you shut down? Does the app stop working?
The Local plan runs entirely on-device, so it keeps working regardless. Cloud features need our servers - the local option exists precisely so you're never locked in. We're Forward Alpha, a two-person studio that uses Shoute every day; this isn't a launch-and-pivot play.
I can just use Apple's built-in Dictation. Why pay?
Apple Dictation times out after 60 seconds, doesn't format anything, can't tell the difference between a Slack message and an email, and outputs one continuous sentence with no punctuation or structure. Try dictating a grocery list - you'll get a single run-on sentence. Shoute gives you a checklist. That's the gap.
Who's behind this?
Forward Alpha - a two-person indie studio. We build tools we want to use ourselves. No VC funding, no investor pressure to harvest your data, no growth-at-all-costs playbook. Just a product we're proud of and use every single day.

Quiet wins, in their words

No incentives, no scripts. Just what people told us after switching.

Try it free. The difference is obvious
on the first dictation.

2,000 words a week, free. No credit card. No signup wall on the first session.

Mac

Free · ~41 MB

macOS 13 Ventura or later
Universal (Apple Silicon & Intel)

Download for Mac

Windows

Free · ~110 MB

Windows 10 & 11
64-bit (x64)

Download for Windows
Share Shoute