~500ms from shortcut to formatted text. Same loop, two backends:
Free forever. Plus 7 days of Pro on us, no card.
~500ms end-to-end · macOS 13+ · Apple Silicon & Intel · 2,000 words/week free
Cloud apps (Wispr Flow, Aqua Voice) format well. The cost: some screenshot your screen every few seconds for "context," and the local mode is either missing or noticeably worse than cloud.
Local apps (SuperWhisper, TalkFlowy) keep your audio on-device. The cost: raw transcripts you punctuate and reformat by hand. Local quality lags cloud.
Shoute resolves the tradeoff. Cloud mode streams audio to ElevenLabs for sub-second transcription. Local mode runs WhisperKit on-device, with the same formatting model. Both produce identical context-aware output. Neither one ever reads your screen.
Lots of apps do one of these. None do all three.
No screen-capture permission requested - we don't take screenshots. Cloud mode streams audio to ElevenLabs (Groq as fallback) and stores nothing. Local mode keeps audio on your Mac, period. You pick which one runs.
Most "local modes" are an obvious downgrade. Ours runs WhisperKit on Apple Silicon and tunes the formatting model for dictation, not generic chat. Same clean punctuation, same context-aware structure, same ~500ms loop as cloud. Free tier lets you A/B both.
Same words, different output. Casual in Slack. Greeting and sign-off in Mail. Checkboxes in Reminders. Paragraph in Notes. Shoute reads the active app's name through the macOS Accessibility API - no screen capture, no settings to toggle.
Same dictation style, four destinations, four formats. No setting toggled. No screen captured.
"hey can you push the standup to 3 today um something came up with the client"
Hey, can you push the standup to 3 today? Something came up with the client.
"hey sarah thanks for the proposal let's schedule a call this week to go over next steps does thursday afternoon work"
Hi Sarah,
Thanks for sending over the proposal. I'd like to schedule a call this week to discuss next steps. Does Thursday afternoon work for you?
Best regards
"pick up dry cleaning get almond milk call the dentist about tuesday and order avi's birthday present"
"the main issue with the current approach is that we're triggering the photo evaluation too early um users haven't uploaded enough photos yet so the results aren't meaningful"
The main issue with the current approach is that we're triggering the photo evaluation too early. Users haven't uploaded enough photos yet, so the results aren't meaningful.
Most apps add multilingual transcription, then only format well in English. Shoute's formatting works in every language it transcribes. Dictate in Tamil, get a proper Mail email. Spanish in Slack? Punctuated and casual. Or set the output to English and Shoute translates as it formats — speak any language, paste polished English (Pro, cloud-only).
"oye puedes mover la reunión a las tres de la tarde es que me surgió algo con el cliente"
Oye, ¿puedes mover la reunión a las 3 de la tarde? Me surgió algo con el cliente.
"vanakkam sir report ready aayiduchi naalaikku meeting la discuss pannalaam"
வணக்கம் Sir,
Report தயாராகிவிட்டது. நாளைக்கு meeting-ல் discuss பண்ணலாம்.
நன்றி
"das hauptproblem ist dass wir die auswertung zu früh starten ähm die nutzer haben noch nicht genug daten hochgeladen"
Das Hauptproblem ist, dass wir die Auswertung zu früh starten. Die Nutzer haben noch nicht genug Daten hochgeladen.
"sumimasen kyou no meeting san ji ni henkou dekimasuka chotto kyaku no ken de"
すみません、今日のミーティング3時に変更できますか?ちょっと客の件で。
"deployment 3 maniku finish aagum, after that we can start the demo"
Deployment will finish at 3. After that we can start the demo.
Multilingual support in most apps stops at raw transcription - the formatting intelligence is English-only. Shoute formats every language it transcribes. Checklist in Reminders, formal in Mail, casual in Slack, no matter which language you spoke it in. Need English out? Flip one toggle and Shoute translates while it formats (Pro, cloud-only).
Every voice app calls itself "privacy-first." Here's what theirs do vs. what ours does.
No app to switch to. No copy, no paste. Text just appears.
From any app, any text field. No window to bring forward, no field to focus.
Ramble. Use filler words. Change your mind mid-sentence. The formatter strips the "ums" and the false starts before you see anything.
Formatted for the app you were in: casual in Slack, structured in Mail, checkbox list in Reminders. Typically ~500ms from release to text on screen.
We respect every product on this list. Here's the honest read - including where they're still ahead of us.
| App | No Screenshots | Local = Cloud | Smart Format | Multi-Language | Price |
|---|---|---|---|---|---|
| Shoute | ✓ Yes | ✓ Yes | Per-app context | 100+ | $5.83/mo |
| Wispr Flow | ✗ Takes screenshots | Cloud only | Context-aware | 100+ | $15/mo |
| Aqua Voice | Unknown | Cloud only | Prose polish | Multi | $8-10/mo |
| SuperWhisper | ✓ Yes | Local is worse | Basic | Multi | $249 lifetime |
| TalkFlowy | ✓ Yes | Local only | Raw transcript | 50+ | One-time |
| Sayline | ✓ Yes | Local only | Grammar only | Multi | One-time |
No credit card. No signup wall. Free tier is 2,000 words a week - enough to know within a day whether voice-to-text changes how you work.
You'll start with Shoute Pro free for 7 days
Download FreeNo incentives, no scripts. Just what people told us after switching.
2,000 words a week, free. No credit card. No signup wall on the first session.