Best Local Whisper Model for Superwhisper (2026): Tiny vs Base vs Small vs Medium vs Large-v3
How to pick the best local Whisper model in Superwhisper 2026. Tiny, base, small, medium, large-v3, and large-v3-turbo compared on speed, accuracy, RAM, and best-fit use cases for Mac.
For most Superwhisper users on an Apple Silicon Mac in 2026, the best local Whisper model is large-v3-turbo. It delivers accuracy within a few percentage points of the full large-v3 model at roughly 8x the speed on comparable hardware, and it fits comfortably in 8 GB of unified memory. Use small on Intel Macs or when battery life matters. Use the full large-v3 only when you need the absolute highest accuracy on multilingual or noisy audio and can tolerate the slower transcription time.
This guide walks through how to pick the right Superwhisper local Whisper model for your hardware and use case. Every recommendation is grounded in published Whisper benchmarks and the model specs Superwhisper exposes in its settings. If you want a dictation app that picks the right model automatically without this setup work, Voibe selects the optimal on-device Whisper model for your Mac and costs $99 lifetime — 60% less than Superwhisper's $249.99 lifetime.
Key Takeaways: Which Superwhisper Local Model Fits You
| If your Mac is… | If your use case is… | Pick this model | Expected speed |
|---|---|---|---|
| Apple Silicon (M1–M4), 8 GB+ | Everyday dictation in English | large-v3-turbo | ~4x real-time |
| Apple Silicon (M1–M4), 8 GB+ | Multilingual or noisy audio | large-v3 (full) | ~1x real-time |
| Apple Silicon, 8 GB | Battery-sensitive / low-latency | small | ~6x real-time |
| Intel Mac | General dictation | small | ~2-4x real-time |
| Apple Silicon | Quick notes & short messages | base | ~16x real-time |
| Any Mac | Trial / free tier | tiny | ~32x real-time |
Disclosure: Voibe is our product and ships with automatic Whisper model selection. This guide covers Superwhisper's model choices fairly and grounds every speed claim in published Whisper benchmarks.
Key Takeaway
For most Apple Silicon users on Superwhisper in 2026, large-v3-turbo is the sweet spot: near-large accuracy at ~4x real-time speed. Step down to small on Intel Macs or for battery-sensitive work.
The Whisper Models Available in Superwhisper
Superwhisper exposes every public OpenAI Whisper model in its settings, letting you pick per-mode which model runs for each workflow. The table below lists all options with their parameter counts, disk sizes, and expected speed on Apple Silicon.
| Model | Parameters | Disk Size | Speed on M1 | Free tier? | Best for |
|---|---|---|---|---|---|
| tiny | 39 million | ~75 MB | ~32x real-time | ✅ Yes | Low-power devices, quick drafts |
| base | 74 million | ~142 MB | ~16x real-time | ✅ Yes | Casual dictation, short messages |
| small | 244 million | ~461 MB | ~6x real-time | ❌ Pro only | Daily use — accuracy/speed balance |
| medium | 769 million | ~1.5 GB | ~2x real-time | ❌ Pro only | Professional dictation |
| large-v3 | 1.55 billion | ~2.9 GB | ~1x real-time | ❌ Pro only | Maximum accuracy, multilingual |
| large-v3-turbo | 809 million | ~1.6 GB | ~4x real-time | ❌ Pro only | Near-large accuracy, optimized speed |
Two important notes:
- Superwhisper's free tier limits you to small local models (tiny, base). To unlock small, medium, large-v3, and large-v3-turbo, you need Superwhisper Pro ($8.49/month, $84.99/year, or $249.99 lifetime). See our Superwhisper pricing guide for the full tier breakdown.
- Real-time multiplier refers to transcription speed relative to audio duration. For example, "4x real-time" means 60 seconds of audio transcribes in roughly 15 seconds. Speeds are approximate and scale with chip generation — M3 and M4 are faster than M1 across every model.
For the technical background on how Whisper works under the hood, see our How Whisper Works guide.
Step 1: Check Your Mac's RAM and Chip Generation
The first constraint on Whisper model choice is your Mac's unified memory and chip generation. Apple Silicon shares memory between CPU, GPU, and Neural Engine, which means the Whisper model plus the rest of your running apps compete for the same pool.
- Open the Apple menu at the top-left of the screen.
- Click 'About This Mac'. The panel shows your chip (for example, 'Apple M2 Pro'), memory (for example, '16 GB'), and macOS version.
- Identify your chip family: M1, M1 Pro/Max/Ultra, M2, M2 Pro/Max/Ultra, M3, M3 Pro/Max/Ultra, or M4. All Apple Silicon chips run every Whisper model; newer chips are faster.
- Note your unified memory: 8 GB, 16 GB, 24 GB, 32 GB, 64 GB, or higher.
Use the guidance below to map memory to model size:
- 8 GB unified memory: large-v3-turbo (1.6 GB model) is the largest you should comfortably run alongside a browser, Slack, and your IDE. large-v3 (2.9 GB) works but can cause memory pressure.
- 16 GB unified memory: Any model, including full large-v3. Most users choose large-v3-turbo for the speed benefit with minimal accuracy trade-off.
- 24 GB or more: No memory constraints. Pick the model that matches your accuracy needs, not your hardware limit.
- Intel Mac (any RAM): Whisper models run but without Neural Engine acceleration. Stick to small or base unless you are willing to wait longer for transcription.
Tip
If you are running large-v3 and notice the beachball during transcription, switch to large-v3-turbo. It uses roughly half the memory and is several times faster with minimal accuracy loss on English audio.
Step 2: Identify Your Primary Use Case
Hardware sets the ceiling; use case sets the floor. Match your typical dictation pattern to the smallest model that handles it well — larger is not always better if you are dictating a 5-word Slack message and waiting 2 seconds for it to appear.
- Short messages and notes (under 30 seconds, general English): base or small. Speed matters more than the last few percentage points of accuracy for short content.
- Long-form dictation (emails, prose, articles in English): large-v3-turbo. The accuracy gain over small on multi-sentence output is noticeable, and the speed is still good enough for interactive use.
- Code comments and technical identifiers: large-v3 or large-v3-turbo. Technical vocabulary benefits disproportionately from the larger model's broader training distribution.
- Multilingual dictation (non-English or code-switching): large-v3. Large-v3 was trained on more multilingual data than any other Whisper release and handles mixed-language input better than any smaller model.
- Medical, legal, or technical dictation with domain terminology: large-v3. Pair with Superwhisper's custom vocabulary for your domain terms.
- Noisy audio (cafes, open offices, in-transit): large-v3. Smaller models degrade faster on noisy input than the large model, which was explicitly trained on noisier samples.
Step 3: Balance Accuracy Against Latency for Your Workflow
Whisper's reported word error rate (WER) numbers give a useful accuracy baseline, but latency matters as much as accuracy for interactive dictation. Published Whisper benchmarks show large-v3 at approximately 2.7% WER on clean English audio and 7.88% WER on mixed real-world recordings. Smaller models trade accuracy for speed — base typically runs several percentage points higher in WER than small, and small runs several points higher than medium.
- Measure how long you wait after releasing the dictation hotkey. If it takes longer than 2 seconds for text to appear on a short phrase, the model you chose is probably too large for your hardware.
- Count the corrections you make per paragraph. If you are fixing more than one transcription error every 3-4 sentences on general English, you are probably running a model that is too small.
- Compare output from two model sizes on the same recording. Superwhisper lets you set different modes with different models — create a 'test-large' mode and a 'test-small' mode, dictate the same 60-second passage into both, and compare.
- Check accuracy on your specific vocabulary. Generic WER numbers don't capture how well a model handles your industry's terminology. Test with a realistic sample from your actual work.
- Decide based on your tolerance for corrections. For throwaway messages, base or small is usually fine. For published writing or professional deliverables, large-v3-turbo or large-v3 is worth the latency cost.
The practical rule: pick the smallest model that hits your accuracy floor, not the largest your Mac can run. Bigger models use more RAM, drain battery faster, and warm up your Mac more — costs that compound across an entire work day.
Step 4: Set Up Per-Mode Model Overrides in Superwhisper
Superwhisper's mode system is its strongest feature for model selection — you can assign different Whisper models to different apps or workflows. Take advantage of this instead of picking one global model.
- Open Superwhisper Settings from the menu bar icon.
- Navigate to Modes in the sidebar.
- Create or select a mode for a specific app or workflow (e.g., 'Slack', 'Email', 'Code Comments', 'Medical Notes').
- Set the Whisper model for that mode independently. For example: tiny for Slack (speed matters, short messages), large-v3-turbo for email (accuracy matters, longer content), large-v3 for medical notes (highest accuracy on specialized vocabulary).
- Assign the mode to the target app. Superwhisper will activate that mode automatically when you dictate into that app.
- Save and test each mode with a representative sample from your real work.
A reasonable per-mode setup for an Apple Silicon user with 16 GB of RAM:
| Mode | Target apps | Model | Why this model |
|---|---|---|---|
| Quick messages | Slack, iMessage, Discord | base | Short content; speed beats accuracy |
| Email & writing | Mail, Gmail, Docs, Notion | large-v3-turbo | Longer content; accuracy worth the wait |
| Code comments | VS Code, Cursor, Xcode | large-v3 | Technical identifiers benefit from full model |
| Meeting notes | Bear, Obsidian, Notes | large-v3-turbo | Multi-speaker content needs higher accuracy |
| Multilingual | Any app, non-English dictation | large-v3 | Best multilingual accuracy |
This multi-mode setup is Superwhisper's main differentiator versus simpler tools. If you would rather not manage this complexity, Voibe picks the right model automatically based on your Mac and discards audio after transcription, all for $99 lifetime.
Tips for Better Results With Superwhisper Local Models
- Start with large-v3-turbo as your default. If you do not want to think about model selection, large-v3-turbo is the best single choice for most Apple Silicon users. Adjust from there if you hit specific constraints.
- Keep your macOS up to date. Core ML and Accelerate framework improvements in new macOS releases improve Whisper performance across all model sizes without any change to Superwhisper itself.
- Close memory-heavy apps before dictating long passages. Chrome with many tabs, virtualization software, and video editors all compete for unified memory. Closing them improves transcription consistency on 8 GB Macs.
- Use a wired microphone or AirPods Pro over the built-in mic. Input quality has more effect on accuracy than the last model size jump for most users.
- Add domain terms to Superwhisper's custom vocabulary. Medical terms, legal citations, and programming identifiers that Whisper misses can be taught. This is often more impactful than switching to a larger model.
- Turn off cloud LLM post-processing for speed-sensitive modes. If you are using Superwhisper's BYOK cloud rewriting, the round-trip to OpenAI or Anthropic adds latency on top of the local Whisper time. Keep cloud post-processing for long-form writing modes only.
- Do not assume 'larger is always better'. For English dictation on a quiet home office mic, small or medium often produces output indistinguishable from large-v3. Test before you settle.
Troubleshooting Common Model Problems
Why is transcription slow when I use large-v3?
The full large-v3 model is 2.9 GB and runs at approximately 1x real-time on M1. If transcription feels sluggish, switch to large-v3-turbo (1.6 GB, ~4x real-time) for most English dictation. Latency also suffers when memory is tight — on 8 GB Macs, closing browser tabs and other apps before a long dictation session helps. Intel Macs will always be slower on the larger models because they lack the Neural Engine and unified memory architecture.
Why does the smaller model miss technical terms?
Smaller Whisper models were trained on the same data as the larger ones, but with less capacity to memorize the long tail of specialized vocabulary. Technical identifiers, medical terms, and proper nouns are disproportionately affected. Two fixes: step up to large-v3-turbo or large-v3 for those specific modes, or add the terms to Superwhisper's custom vocabulary so the model receives them as hints.
Why does non-English dictation quality drop in Superwhisper?
First, verify you are using large-v3 for multilingual content — smaller models have weaker non-English performance. Second, check whether your mode has cloud LLM post-processing enabled. Multiple Superwhisper users have reported that cloud LLM post-processing corrupts or auto-translates non-English output (German being auto-translated to English is a commonly cited example). Disable cloud post-processing for non-English modes and compare the raw Whisper output first.
Why does the same model produce different accuracy on different machines?
Whisper model weights are identical across machines, but inference quality can vary slightly based on the runtime (whisper.cpp vs MLX vs Core ML), the chip generation, and the audio preprocessing pipeline. Superwhisper uses a consistent runtime internally, so differences you see between your M1 Mac and your M3 Mac are usually latency differences, not accuracy differences — the transcript itself should be the same for the same model and same audio.
I picked large-v3 and my laptop gets hot. What should I do?
The full large-v3 model pushes the Neural Engine and unified memory hard. On laptops in thermal-constrained environments (on your lap, direct sunlight, during summer), sustained large-v3 use can trigger thermal throttling. Switching to large-v3-turbo typically cuts power draw substantially with minimal accuracy impact. If battery life is your primary concern, drop to small while on battery and return to larger models when plugged in.
Tools That Make This Easier
Selecting the right Whisper model is real work. If you would rather not do it yourself, there are Mac dictation apps that make different trade-offs.
Voibe: Automatic Model Selection
Voibe picks the optimal on-device Whisper model for your Mac automatically. No mode system, no tier-gated models, no manual tuning. The app detects your Apple Silicon chip and available memory, then runs the largest Whisper model your hardware can support without latency impact. Audio is discarded after transcription by default — there are no recordings saved to disk. At $99 lifetime, Voibe is 60% cheaper than Superwhisper's $249.99 lifetime ($150 saved). See our Wispr Flow vs Superwhisper comparison for how Voibe positions against the main alternatives.
MacWhisper: File Transcription With Model Choice
MacWhisper is a dedicated file transcription tool (not real-time dictation). It also exposes Whisper model selection per-transcription and is good for batch processing recorded audio. See our MacWhisper vs Superwhisper comparison for the full breakdown.
VoiceInk: Open-Source Model Flexibility
VoiceInk is open-source and gives you the same Whisper model menu as Superwhisper at a much lower lifetime price. See our VoiceInk review for feature details and trade-offs.
Apple Dictation: No Model Selection, No Cost
For casual short-form dictation, Apple's built-in dictation is free and uses its own on-device model on Apple Silicon. There is no way to configure the model, but there is also nothing to configure. See our Apple Dictation privacy breakdown for the details.
Superwhisper Local Model FAQ
Model Choice Basics
Which Whisper model does Superwhisper recommend by default?
Superwhisper does not impose a default model — you choose one when you create a mode. For most Apple Silicon users in 2026, large-v3-turbo is the best starting default because it delivers accuracy close to the full large-v3 model at roughly 4x the speed and fits comfortably in 8 GB of memory.
What is the difference between large-v3 and large-v3-turbo?
large-v3 has 1.55 billion parameters, a 2.9 GB disk footprint, and runs at approximately 1x real-time on M1. large-v3-turbo has 809 million parameters, a 1.6 GB disk footprint, and runs at approximately 4x real-time on M1 — roughly 8x faster than full large-v3 with minimal accuracy trade-off on English audio per published Whisper benchmarks.
Can I run Superwhisper offline with local models?
Yes. Every local Whisper model in Superwhisper runs fully on-device with no internet connection required for transcription. Internet is only required if you enable cloud LLM post-processing modes that route transcripts through external services like OpenAI or Anthropic.
Hardware and Performance
How much RAM do I need for large-v3?
The full large-v3 model is 2.9 GB on disk and requires roughly 3-4 GB of active memory during inference. On an 8 GB Apple Silicon Mac, it works but competes for memory with other apps. On 16 GB or more, it runs comfortably. For 8 GB Macs, large-v3-turbo (1.6 GB) is a better fit.
Can I run large-v3 on an Intel Mac?
Yes, but performance will be noticeably slower because Intel Macs lack the Neural Engine and unified memory architecture that Apple Silicon provides. For Intel Macs, small or base are more practical defaults. Consider this a planning reason to upgrade if dictation is central to your workflow.
Why is my M1 slower than the benchmarks suggest?
Published Whisper benchmarks assume single-model inference on an otherwise idle system. In real use, your browser, Slack, Zoom, and other apps compete for Neural Engine cycles and unified memory. Close memory-heavy apps before sustained dictation or pick a smaller model to preserve responsiveness.
Accuracy and Use Cases
Does the model choice affect multilingual accuracy?
Yes, substantially. large-v3 has the strongest multilingual performance of any Whisper model and handles 99 languages with meaningfully better accuracy than smaller variants. For non-English or code-switching dictation, use large-v3 rather than large-v3-turbo or smaller models.
Is medium enough for professional dictation?
Medium (769M parameters) is suitable for professional English dictation in quiet environments with general vocabulary. For specialized fields (medical, legal, technical identifiers) or noisy audio, step up to large-v3-turbo or large-v3. Medium remains a solid choice for older Apple Silicon (original M1 with 8 GB) where large-v3-turbo causes memory pressure.
Does Whisper large-v3 hallucinate text?
Whisper models can occasionally hallucinate short phrases during silent or near-silent audio segments — this is a known limitation of the architecture across all sizes. Larger models hallucinate less often than smaller ones. Superwhisper applies voice-activity detection to reduce hallucination triggers, but it does not eliminate the risk. Review critical transcripts before relying on them.
Tiers and Pricing
Can I use large-v3 on Superwhisper's free tier?
No. Superwhisper's free tier is limited to small local models (tiny and base). To access small, medium, large-v3, and large-v3-turbo you need Superwhisper Pro ($8.49/month, $84.99/year, or $249.99 lifetime). For the full tier breakdown, see our Superwhisper pricing guide.
Is there a cheaper way to access large Whisper models on Mac?
Yes. Voibe runs on-device Whisper and auto-selects the appropriate model for your hardware at $99 lifetime — 60% cheaper than Superwhisper's $249.99 lifetime. VoiceInk is another budget option at $39.99 one-time but lacks the automatic selection. See our best offline dictation apps guide for the ranked comparison.
Conclusion: Start With large-v3-turbo, Adjust From There
If you take one recommendation from this guide: on an Apple Silicon Mac with 8 GB or more of unified memory, start with large-v3-turbo in Superwhisper and only deviate when you have a specific reason. Go up to full large-v3 for multilingual work or maximum-accuracy deliverables. Drop to small for battery-sensitive use, Intel Macs, or latency-critical short content. Reserve tiny and base for quick messages where the last few percentage points of accuracy truly do not matter.
Superwhisper's per-mode model override system is powerful if you are willing to set it up. If model selection is friction rather than fun — and for most users it is — consider a dictation app that makes the decision for you. Voibe picks the right on-device Whisper model automatically, discards audio after transcription, and costs $99 lifetime. For the broader Mac dictation landscape, see the Mac dictation guide and the Mac dictation pricing hub.
Ready to type 3x faster?
Voibe is the fastest, most private dictation app for Mac. Try it today.
Related Articles
9 Best Handy Alternatives in 2026 (Free and Paid)
Compare the best Handy alternatives for Mac, Windows, and Linux dictation in 2026. Voibe, Wispr Flow, Superwhisper, VoiceInk, Apple Dictation and more — reviewed with pricing and features.
Superwhisper Platforms 2026: Mac, Windows, iOS & Android Status
Superwhisper platform support in 2026: full Mac support, Windows and iOS with caveats, no Android yet (198 votes pending). Plus Linux, iPad, watchOS, and Chrome status.
7 Best Dictation Software for Writers (2026)
Compare the 7 best dictation tools for writers in 2026. Covers offline and cloud options, pricing from free to $699, and which tool fits your writing workflow.
