Voibe Logo voibe
Home Why Voibe? How to Use Pricing Windows? Join the waitlist

How On-Device Dictation Works

"On-device" is a phrase that gets used loosely. This page explains exactly what it means for Voibe β€” what runs where, what touches the network, and why the architecture is the way it is.

For the high-level overview of what this means for your data, read Privacy Overview first. This page is the technical companion.

πŸ”„

The pipeline, end to end

When you press the hotkey and speak, here's what happens β€” and where each step runs.

1

You press the hotkey

Voibe opens the macOS audio input stream. Nothing was listening before this moment. Runs on your Mac.

2

Audio is buffered in memory

Samples are held in RAM as you speak. They are never written to a file on disk and never sent to the network. Runs on your Mac.

3

The transcription model runs

Voibe runs a speech recognition model on Apple Silicon's Neural Engine. The model weights ship with the app β€” they are not fetched from a server. The audio buffer is the only input. Runs on your Mac, on the Neural Engine.

4

Text is post-processed

Smart Formatting cleans up filler words, applies punctuation, expands Memory shortcuts, and resolves Developer Mode references β€” all locally. Runs on your Mac.

5

Text is inserted into your focused app

Voibe uses the macOS accessibility API to paste the transcript into the field you're typing in. The transcript also goes into your local history so you can retrieve it later. Runs on your Mac.

6

The audio buffer is dropped

Voibe releases the in-memory audio. There's no file to delete because no file was ever written. Runs on your Mac.

Every step is local. There is no point in this pipeline where audio, transcripts, or workspace context are sent to our servers β€” or anyone's servers.


⚑

Why Apple Silicon's Neural Engine

Running a speech recognition model on a CPU is possible but slow. Running it on a GPU is faster but power-hungry. Apple Silicon (M1 and later) ships with a third option β€” the Neural Engine β€” purpose-built for the kind of dense matrix math that neural networks do.

The Neural Engine is what lets Voibe transcribe in real time without warming up your fans and without phoning home to a GPU farm. It's fast enough to feel instant, and efficient enough that you can use it on battery. That combination β€” speed, efficiency, and silicon purpose-built for this workload β€” is what makes on-device dictation practical.

For more on hardware support and why Intel Macs aren't supported, see System Requirements.


πŸ“

What "on-device" actually means

"On-device" is marketed loosely. It can mean any of these:

Flavor What actually happens
Hybrid (some "on-device" tools) A small model runs locally for wake-word detection. The actual transcription is sent to a server.
"On-device but logged" Transcription happens locally, but the text is uploaded for "improving the model" or analytics.
Voibe The audio capture, transcription, post-processing, and text insertion all happen on your Mac. Nothing about the dictation transits the network.

When we say "on-device," we mean the third row. That's why Voibe works with Wi-Fi off and why the privacy guarantees in the Privacy Policy can be technical, not just promises.


🌐

What touches the network

Voibe is a paid Mac app, so a few things do go over the network. None of them involve your dictated content.

  • Licence checks. When you sign in or activate a Mac, Voibe contacts our servers to validate your licence.
  • Usage analytics. Counts of dictations and feature usage. This data is not used to identify you. Never includes transcribed content.
  • Crash reports. When Voibe crashes, technical diagnostic data is sent to help us fix it. Audio, transcripts, and Memory entries are not included.
  • App updates. Voibe checks for new versions and downloads update binaries from our distribution endpoint.

The full list of subprocessors and what each one handles is on the Security page.


✈️

Working offline

Once Voibe is installed and your licence is activated, dictation works without an internet connection. On a flight, in a basement, on a train through a tunnel β€” the Neural Engine doesn't need a network.

A few things still need the network when you eventually reconnect:

  • β€’ Periodic licence revalidation (Voibe gives you a generous offline window).
  • β€’ App updates and any usage analytics queued while offline.

Dictation itself? Always local.


❓

Technical FAQ

Is the model based on Whisper?

Voibe uses a Whisper-family speech recognition model adapted to run efficiently on Apple Silicon's Neural Engine. The model weights ship with the app.

Where do the model weights live?

Inside the Voibe app bundle on your Mac. They are downloaded once when you install the app and updated as part of normal app updates. They are not streamed.

Can I see what Voibe is doing on my Mac?

Yes. Tools like Little Snitch, Lulu, or macOS's Activity Monitor will show you exactly what network traffic Voibe makes. You'll see licence checks, telemetry, and updates β€” never an audio stream.

Does enabling Developer Mode change this?

No. Developer Mode adds a local scan of file and folder names in your active editor workspace so Voibe can resolve identifiers when you say them. The scan results stay on your Mac.

Does Voibe use any cloud AI for post-processing?

No. Smart Formatting, Memory expansion, and Developer Mode resolution all run locally. No part of the dictation pipeline calls a cloud LLM.

Still stuck? Email hi@getvoibe.com and we'll help you out.