v0.1.0 · Open source · MIT Linux Windows macOS soon

Local Whisper. Cloud models. One interface.

A local-first desktop app for dictation. Switch between free local models and paid cloud providers in one click. Transcription types straight at your cursor — in any app.

Download free · MIT View on GitHub ★ star it

Smart Voice Flow main window showing live transcription in progress — Live streaming transcription with partial results. Red indicator = recording.

Free and private by default

Run faster-whisper on your own machine. No audio leaves your computer. Eight model sizes from tiny (40MB) to large-v3 (3GB). Works offline.

No account. No cloud. No leak.

Cloud accuracy on demand

OpenAI gpt-4o-transcribe, Deepgram Nova-3, AssemblyAI Universal, OpenRouter, Speaches. Bring your own API key. Switch in one click.

Local for drafts. Cloud for final takes.

A profile for every context

German podcast, English code comments, mixed-language meeting notes — each profile has its own model, post-processing, and hotkeys. Switch on the fly.

One app. Every use case.

Know what you spend

Every cloud call is logged with tokens, duration, and cost. Export to CSV. Set per-profile budgets. Never get surprised by a bill.

Receipts for every second.

Pick a profile

Choose your language, model, and post-processing in one click. Presets for podcasting, code comments, email, and notes ship by default.
Hold your hotkey and speak

A floating indicator shows it's listening. Pause, think, keep going. Live partial results appear so you can course-correct mid-sentence.
Release. Text lands at your cursor.

Transcription is typed directly into whatever has focus — VS Code, Notion, your CMS, a terminal, an email draft. No copy-paste.

Live diktat with VAD
Profile library
Per-profile detail
Profile basics
ASR engine settings
Post-processing
Hotkey binding
Advanced controls
Local Whisper models
Cloud providers
API keys
Cost tracking
Usage stats
General settings
Text injection
Built-in HTTP API
Advanced
System doctor
Error reporter (with secret-scrubbing)
Getting-started help
API access docs

/ for developers

Use Smart Voice Flow from anywhere.

A local HTTP API so other apps can tap into your configured models. One endpoint, any ASR engine, unified response shape.

Localhost-only by default — no public surface
Bring-your-own-key respected per profile
Streaming + non-streaming response modes
CLI companion for scripts and pipelines

Full API reference in the docs

$ curl -X POST http://localhost:8123/transcribe \
    -F "audio=@meeting.wav" \
    -F "profile=meeting-notes"

> {
    "text": "Kickoff on Monday, agenda attached...",
    "profile": "meeting-notes",
    "model": "faster-whisper/large-v3",
    "duration_ms": 2480,
    "cost_eur": 0.00
  }

"I dictate podcast scripts in German with local Whisper, then switch to the cloud profile for final polish. Twenty minutes saved per episode, every episode."

M. K. podcaster · Berlin

"Replaced three separate SDKs with one local API. Cost tracking alone earned it a permanent spot in my dotfiles."

T. S. backend engineer

"Teaching online means switching languages mid-session. Profiles make that trivial. Also: actually private. No upload."

A. H. language tutor

"The floating indicator is what sold me. No guessing whether it's listening. And the open-source license means I can ship it to my team without procurement drama."

J. L. product designer

Used Smart Voice Flow? Send a short note — honest feedback is how this project grows.

Send a review

Linux

.deb Debian · Ubuntu .AppImage any distro · portable

X11 and Wayland. Python 3.11+ recommended.

Windows

.exe installer Windows 10 · 11

Signed binary. UAC-aware installer.

macOS

coming soon Apple Silicon + Intel

On macOS? Help us test so we can ship it sooner.

Or build from source Open source · MIT · free forever

Is Smart Voice Flow free?

Yes. Local Whisper is completely free. Cloud models use your own API keys and you pay the provider directly.

What operating systems are supported?

Linux (X11 and Wayland) and Windows at launch. macOS support is in progress.

Does my audio leave my computer?

With local Whisper: no. Audio stays on your machine. With cloud models: yes — audio is sent to the provider you chose (OpenAI, Deepgram, etc.) under their terms.

Which cloud providers are supported?

OpenAI (gpt-4o-transcribe), Deepgram (Nova-3), AssemblyAI (Universal), OpenRouter, and self-hosted Speaches.

How accurate is local Whisper compared to cloud?

large-v3 local is very close to cloud accuracy, especially for clean audio. For noisy or niche-vocabulary audio, cloud models still win on average. You can A/B in the app.

Does it work offline?

Local models work fully offline. Cloud calls need a connection.

Can I use it for live captioning?

Yes — streaming transcription with partial results is supported. Pair it with OBS or any window capture.

Is there an API?

Yes. Smart Voice Flow runs a local HTTP server you can call from other apps. See the API reference.

Is it really open source?

Yes. MIT license. All code on GitHub. Contributions welcome.

Where do I report a bug?

GitHub Issues. Template provided.

Local Whisper. Cloud models. One interface.

Four pillars. No compromise.

Free and private by default

Cloud accuracy on demand

A profile for every context

Know what you spend

Three steps. Zero friction.

Pick a profile

Hold your hotkey and speak

Release. Text lands at your cursor.

Every screen. Every setting.

Use Smart Voice Flow from anywhere.

Built for real workflows.

Install in under a minute.

Things people ask.