Voibe Logovoibe Resources

AI Tool Privacy Tracker

What every major AI tool actually does with your data. Training behavior, retention, and on-device support — verified against primary sources, with a separate row for consumer and business tiers because the answer is different.

Last updated April 27, 202615 tools trackedNext review May 27, 2026

Recent Changes

Dated policy shifts that changed what a tool does with your data. Each entry is linked to a primary source.

  1. GitHub Copilot

    GitHub began using Free / Pro / Pro+ user interaction data, including code snippets, to train AI models by default. Existing opt-outs are honored. Business and Enterprise are unaffected.

    Source: github.blog/news-insights
  2. OpenAI / ChatGPT

    OpenAI's obligation to indefinitely retain consumer ChatGPT and API content, imposed under the NYT litigation order, ended. Standard 30-day retention practices resumed. Limited April–September 2025 data is still preserved under the order.

    Source: openai.com
  3. Anthropic / Claude

    Anthropic shifted consumer Claude (Free, Pro, Max) from "not used for training" to a user-choice model. Users who opt in have data retained up to 5 years (vs. previous 30 days). Existing-user choice deadline: October 8, 2025.

    Source: anthropic.com/news

How this tracker is maintained

  • Each cell is verified against the vendor's own privacy policy, terms of service, or technical documentation. We don't paraphrase what the policy "probably" says — only what it actually states.
  • We separate consumer and business / API tiers because they operate under different contracts. The tier filter on each table reflects this; conflating them is the most common error in third-party comparisons.
  • "Trains on your data?" answers what happens by default at sign-up. The note under each chip describes the toggle, if one exists.
  • The matrix is reviewed on a roughly monthly cadence and updated immediately whenever a vendor announces a policy change. Each row carries its own Last verified date.
  • Errors get fixed fast: email hi@getvoibe.com with a primary-source link and we'll update the row.

Spotted an error? hi@getvoibe.com. Include the cell and a primary-source link and we'll update on the next pass.

Legend:Yes (default)Trains by default with no opt-out pathYes (opt-out)Trains by default; user can disable in settingsUser choiceUser must actively choose during signup or in settingsNoDoes not train on user data, period

AI Assistants

Chatbots and search assistants. Consumer tiers vary the most — check whether your account is logged-in vs. logged-out, and whether you've reviewed your data settings since the last policy change.

ToolPlan tierData collectedTrains on your data?RetentionOn-deviceLast verifiedSource
Free / Plus / ProPrompts, outputs, uploaded files, usage, IP, device info, account infoYes (opt-out)

Off via Settings → Data Controls → "Improve the model for everyone." Temporary Chat is never used for training.

30 days after deletion. April–September 2025 data preserved due to NYT order; standard practice resumed Sept 26, 2025.NoApr 27, 2026
Free / Pro / MaxChats, coding sessions (when using Claude Code with consumer accounts), feedback (thumbs)User choice

Active choice required during signup or in Privacy Settings ("You can help improve Claude"). Off by default for users who decline. Policy changed August 2025.

30 days if declined. 5 years if enabled. Flagged conversations: 2–7 years for trust & safety.NoApr 27, 2026
Free / Gemini AdvancedChats, files, photos, videos, screen content, account info, IP, device infoYes (opt-out)

Off via "Gemini Apps Activity" → Off. Even when off, future chats are kept for 72 hours so Gemini can respond and process feedback.

18 months default (adjustable to 3 months / 36 months / never). Human-reviewed chats retained up to 3 years (disconnected from account).NoApr 27, 2026
Free / Pro / MaxQueries, prompts, AI responses, usage, device infoYes (opt-out)

Off via Account Settings → Preferences → "AI Data Retention." Logged-out users are trained on by default with no opt-out path.

Threads kept until manually deleted. Account deletion processed within 30 days.No (Comet browser stores some data locally — separate policy)Apr 27, 2026

AI Coding Tools

IDE assistants and agents. Consumer defaults shifted in April 2026 (GitHub Copilot now trains on consumer interaction data by default). Most tools offer a Privacy / Zero-Data-Retention mode that flips the answer; check whether yours is on.

ToolPlan tierData collectedTrains on your data?RetentionOn-deviceLast verifiedSource
Individual default (Privacy Mode OFF)Code, prompts, editor actions, code snippetsYes (default)

Default for individual accounts ("Share Data" on). Used to improve Cursor's models. Toggle Privacy Mode ON to opt out — code never trained on, plaintext discarded after request.

Stored indefinitely (Share Data ON). Privacy Mode ON: plaintext discarded after request; cached files encrypted with client-generated keys.No (configurable to use local Ollama / LM Studio models, which bypass Privacy Mode entirely)Apr 27, 2026
Free / Pro / Pro+Inputs, outputs, code snippets, associated contextYes (opt-out)

Policy changed April 24, 2026: GitHub now trains on consumer interaction data by default. Existing opt-outs honored. Toggle in Settings → Privacy.

User Engagement Data: 2 years. Coding Agent session logs: lifetime of account. Private repo code at rest is NOT used for training; in-flight interaction data IS.NoApr 27, 2026
Individual (no ZDR, default)Logs may contain code snippets and user trajectoriesYes (opt-out)

ZDR is opt-in for individuals — toggle in profile to enable. With ZDR ON, code submitted is never trained on.

With ZDR ON: in-memory for request lifetime, plus minutes-to-hours for prompt caching. Without ZDR: logs may persist.NoApr 27, 2026
Open-source extension (BYOK)Cline operates no model server. Code goes only to your configured API provider (Anthropic, OpenAI, Bedrock, Gemini, etc.) and is governed by that provider's terms.No (by Cline)

Cline's stated principle: "Code never leaves your machine" toward Cline servers. Anonymous telemetry (features used, task completion) is opt-out via the "Cline Telemetry" setting. Code, file contents, command arguments, and conversation content are not collected by telemetry.

Cline retains nothing about your code. Provider retention applies (e.g., Anthropic API ZDR, OpenAI API 30 days).Partial — extension runs locally; inference happens at your chosen provider, or fully on-device if you configure Ollama / LM Studio.Apr 27, 2026

Voice & Dictation

Speech-to-text and dictation tools. Voice input is uniquely sensitive — audio carries identity, biometric data, and ambient context — so the on-device column matters more than for text-only tools.

ToolPlan tierData collectedTrains on your data?RetentionOn-deviceLast verifiedSource
All plansNo audio or transcription leaves the device. Account holders: email (account auth) plus non-identifying usage analytics; crash reports exclude dictated content.No

"The Voibe application processes your voice entirely on your device. No audio is transmitted to our servers at any point."

Audio: not transmitted, not retained. Account email: kept while account is active.Yes (only mode) — Whisper models running on Apple Silicon Neural EngineApr 27, 2026
Free (Privacy Mode OFF, default)Audio, transcripts, edits, optional Context Awareness (screen content from active app)Yes (opt-in)

After 2024 community backlash, training is now off by default and requires opt-in. Audio retained indefinitely; 30 days for data passed to third-party LLMs (OpenAI, Meta).

Indefinite for retained dictation data; 30 days for third-party LLM passthrough.No — transcription always happens in the cloud, even in Privacy Mode (zero-retention cloud, not local).Apr 27, 2026
On-device modes (Fast / Nano / Standard / Parakeet — Free + Pro)None — audio processed locally and never transmittedNo

"Your data is not retained on Superwhisper servers" and "not used for training AI models or any other machine learning purposes." Audio recordings are saved to local disk by default — opt out in settings.

N/A on servers. Local recordings persist until the user deletes them.YesApr 27, 2026
Cloud modes (Ultra transcription / Super Mode LLMs — Pro)Audio sent to Superwhisper's proxy infrastructureNo (per vendor)

Superwhisper says cloud audio is proxied through their infrastructure, third-party providers don't see user account or content, and there is no training or retention. Cloud-mode handling is not currently distinguished in the public privacy policy from on-device modes — verify the latest with the vendor before sensitive use.

Stated as not retained on servers; not separately documented for cloud modes.NoApr 27, 2026
Pro (Gumroad) / Whisper Transcription (App Store)On-device modes: none transmitted. App Store version discloses "Usage Data" and "Product Interaction" as Data Not Linked to You. Cloud Assistant or BYOK (OpenAI / ElevenLabs) features send audio to those providers under their terms.No (by MacWhisper)

MacWhisper does not train its own models on user audio. Cloud Assistant and BYOK integrations inherit the chosen provider's terms (e.g., OpenAI Whisper API, Anthropic / ElevenLabs).

On-device transcription: not retained. Cloud Assistant / BYOK: per third-party provider's terms.Yes (primary mode) — local Whisper models plus Apple Foundation Models for AI features. Cloud Assistant is opt-in for higher-quality transcription.Apr 27, 2026
Free / Pro / iOS ProAudio inputs, technical data (IP, browser, OS, performance metrics), session metadata. With Privacy Mode disabled, "we may securely store transcript data on our servers."Yes (opt-out)

Privacy Mode toggle stops transcript storage on Aqua Voice servers; with it enabled, "transcript data is not collected" though session metadata may still be. The privacy policy does not explicitly state whether stored transcript data is used for AI training. SOC 2 Type II certified by Advantage Partners. No HIPAA BAA publicly advertised.

With Privacy Mode disabled: not specified in policy. With Privacy Mode enabled: transcripts not stored; session metadata (timestamps, device type, performance metrics) may be retained.No — cloud transcriptionApr 27, 2026
Free / ProAudio plus limited contextual information, processed on Typeless's cloud servers. Subprocessors include third-party LLM providers, analytics, and cloud infrastructure.No (per vendor)

Privacy policy: "Your data is never used to train these services and is configured for zero retention by the providers." Note: the November 2025 reverse-engineering analysis documented in our Typeless privacy issues investigation reported collection beyond what the public policy describes — verify against the current policy and subprocessor list before sensitive use.

Per privacy policy, audio + contextual information are "processed in real time on our cloud servers and immediately discarded once the result is returned to your device."No — cloud-processed in real timeApr 27, 2026
macOS / iOS (Apple Silicon, supported languages)Audio inputs, plus contextual data (contacts, app names, etc.) when sent to serversOpt-in only

"Improve Siri & Dictation" must be enabled. Default at setup is to be asked.

If opted in: audio + transcripts kept under a rotating random ID for up to 6 months, dissociated and kept up to 2 years for improvement; reviewed subset retained beyond 2 years. If opted out: not retained for improvement.Yes (partially) — most languages on Apple Silicon process locally for general text fields (Notes, Mail, Messages). Server fallback applies to unsupported languages, search-box dictation, and some third-party Speech Recognition API uses.Apr 27, 2026

Privacy Policy Quick Read: Does Each AI Tool Train on Your Data?

For each of the 15 tools in the matrix above, here is what the vendor's own privacy policy says about training, retention, and on-device support — quoted verbatim where the policy text supports a clean citation. Each entry links to the primary source we verified against on .

AI Assistants

Does ChatGPT train on my data?

Yes, by default — opt-out available.

ChatGPT's consumer plans (Free, Plus, Pro) train on user prompts, outputs, and uploaded files by default. To opt out, navigate to Settings → Data Controls and disable "Improve the model for everyone." Conversations are retained for 30 days after deletion. Temporary Chat is never used for training. ChatGPT Team, Enterprise, and API plans are explicitly excluded from training under OpenAI's enterprise terms — API users can optionally opt in via Playground feedback. Limited April–September 2025 data is preserved due to the NYT litigation order; OpenAI's standard 30-day retention practices resumed September 26, 2025.

Primary source: OpenAI privacy policy

Does Claude train on my data?

User choice required (since Aug 2025).

As of August 28, 2025, Anthropic shifted Claude's consumer plans (Free, Pro, Max) from "not used for training" to a user-choice model. New users must actively choose during signup whether to share data for training; existing users had until October 8, 2025. Users who opt in have their data retained for up to 5 years; users who decline keep the previous 30-day retention window. Flagged conversations are retained 2–7 years for trust & safety review. Claude for Work, the Claude API, Amazon Bedrock, and Google Vertex AI are all contractually excluded from training under Anthropic's Commercial Terms.

Primary source: Anthropic Aug 2025 update

Does Gemini train on my data?

Yes, by default — opt-out via "Gemini Apps Activity."

Free Gemini and Gemini Advanced (consumer) train on user conversations by default. Per Google's documentation, when Gemini Apps Activity is on, "Google uses your activity to provide, develop, and improve its services (including training generative AI models)." To opt out, set Apps Activity to OFF — but even when off, future chats are saved for 72 hours so Gemini can respond and process feedback. Default retention is 18 months, adjustable to 3 months, 36 months, or never. Human-reviewed conversations are kept up to 3 years (disconnected from your Google Account). Vertex AI customer data is contractually excluded from training: "Google won't use your data to train or fine-tune any AI/ML models without your prior permission or instruction."

Primary source: Gemini Apps Activity controls

Does Perplexity train on my data?

Yes, by default — opt-out for logged-in users only.

Perplexity trains on user queries, prompts, and AI responses by default for Free, Pro, and Max plans. The "AI Data Retention" toggle in Account Settings → Preferences disables this. Logged-out users are trained on by default with no opt-out path — sign in to gain control. Threads are retained until manually deleted; account deletion is processed within 30 days. The Sonar API offers Zero Data Retention with prompts and responses never stored. Third-party providers (OpenAI, Anthropic) are contractually prohibited from training on Perplexity's API data. Enterprise file uploads are deleted after 7 days.

Primary source: Perplexity data collection policy

AI Coding Tools

Does Cursor train on my code?

Yes, by default for individuals — Privacy Mode opt-out.

For individual accounts, Cursor's "Share Data" mode is enabled by default, sending code, prompts, editor actions, and code snippets to Cursor for model improvement. Toggling Privacy Mode ON prevents training and discards plaintext after each request — cached files are encrypted with client-generated keys, with the encryption keys existing on Cursor's servers only for the duration of each request. Team and Enterprise accounts default to Privacy Mode ON, with zero-data-retention agreements with OpenAI, Anthropic, Google, xAI, Fireworks, Baseten, and Together. The strictest tier, Privacy Mode (Legacy), guarantees no code is stored at all, by Cursor or any third party. Cursor can also be configured to use local Ollama or LM Studio models, which bypass Privacy Mode entirely.

Primary source: Cursor data use

Does GitHub Copilot train on my code?

Yes, by default for consumer plans — opt-out as of April 24, 2026.

On April 24, 2026, GitHub began using Free, Pro, and Pro+ user interaction data — including code snippets — to train AI models by default. Existing opt-outs are honored. To disable training going forward, go to Settings → Privacy. User Engagement Data is retained for 2 years; Coding Agent session logs persist for the lifetime of the account. Private repository code at rest is NOT used for training, but in-flight interaction data IS. Business and Enterprise plans are explicitly prohibited from being used for training under GitHub's agreements: subscription Prompts and Suggestions are retained 28 days, and User Engagement Data 2 years.

Primary source: April 2026 policy change

Does Windsurf (Codeium) train on my code?

Yes for individuals by default — Zero Data Retention opt-in available.

Windsurf (formerly Codeium) trains on individual user code by default — without zero-data-retention enabled, logs may contain code snippets and user trajectories. Individuals can toggle ZDR on in their profile to prevent training; with ZDR on, "the code data submitted by zero-data retention mode users will never be trained on," code is never serialized in plaintext on Windsurf's servers, and is held only in-memory for the request lifetime (plus minutes-to-hours for prompt caching). Teams and Enterprise plans default to ZDR ON. The Enterprise Self-hosted tier deploys via Docker Compose or Helm Charts inside the customer's firewall — no traffic leaves customer infrastructure.

Primary source: Windsurf security

Does Cline train on my code?

No — Cline operates no model server. Privacy depends on your chosen API provider.

Cline is an open-source VS Code extension that operates no model server of its own. User code is sent only to whichever API provider you configure (Anthropic, OpenAI, AWS Bedrock, Google Gemini, Cerebras, Groq, etc.) and is governed by that provider's terms. Cline's stated principle: "Code never leaves your machine" toward Cline servers. Anonymous telemetry (features used, task completion rates) is collected but can be disabled via the Cline Telemetry setting. Code, file contents, command arguments, and conversation content are explicitly NOT collected by telemetry. For fully on-device use, configure Cline with a local Ollama or LM Studio model.

Primary source: Cline telemetry docs

Voice & Dictation

Does Voibe train on my voice data?

No — audio never leaves the device.

Voibe processes audio entirely on your Mac using OpenAI Whisper models running on Apple Silicon's Neural Engine. Per Voibe's privacy policy: "The Voibe application processes your voice entirely on your device. No audio is transmitted to our servers at any point" and "Your dictated content never leaves your Mac and we have no access to it." Because audio never crosses the network, there is no training to opt out of. Account holders provide an email (for authentication) and non-identifying usage analytics; crash reports exclude dictated content. The Free plan does not require an account at all.

Primary source: Voibe privacy policy

Does Wispr Flow train on my voice data?

Off by default since 2024 backlash — opt-in for training.

After 2024 community backlash, Wispr Flow shifted training to opt-in. Privacy Mode is OFF by default for Free users, meaning audio, transcripts, edits, and optional Context Awareness (screenshots of the active app's screen) are retained indefinitely. Data passed to third-party LLM providers (OpenAI, Meta) is retained for 30 days. Enterprise plans default to Privacy Mode ON with zero data retention by Wispr or any third party — audio is processed and immediately discarded after transcription. A Business Associate Agreement is available for Enterprise; once signed, Privacy Mode locks irreversibly. Transcription always happens in the cloud; even Privacy Mode is "zero-retention cloud," not local processing.

Primary source: Wispr Flow privacy policy

Does Superwhisper train on my voice data?

No — verbatim from policy.

Superwhisper's privacy policy states explicitly: "Your data is not retained on Superwhisper servers" and "not used for training AI models or any other machine learning purposes." On-device modes (Fast, Nano, Standard Whisper, Parakeet — available on the Free plan and within Pro) process audio entirely locally; nothing is transmitted. Cloud modes (Ultra transcription, Super Mode LLMs — Pro tier) proxy audio through Superwhisper's infrastructure with no retention. One caveat: audio recordings are saved to local disk by default. Opt out in settings if local audio retention is a concern. Note: the privacy policy does not currently distinguish between on-device and cloud modes — verify cloud-mode specifics with the vendor before sensitive use.

Primary source: Superwhisper privacy

Does MacWhisper train on my voice data?

No — primarily on-device, with optional cloud + BYOK paths.

MacWhisper does not train its own models on user audio. The on-device transcription path uses local Whisper models that you can download for offline use; Apple Foundation Models also run on-device for AI features. MacWhisper's optional "Assistant" cloud transcription service and BYOK integrations (OpenAI Whisper API, ElevenLabs) inherit those providers' terms when used. The App Store version's privacy disclosure shows only "Usage Data" and "Product Interaction" as Data Not Linked to You. There is no separate enterprise tier; the data-handling architecture is identical for individuals and bulk-licensing customers.

Primary source: App Store listing

Does Aqua Voice train on my voice data?

Privacy policy does not explicitly address training — opt-out via Privacy Mode.

Aqua Voice's privacy policy does not explicitly state whether stored data is used for AI training. With Privacy Mode disabled, "we may securely store transcript data on our servers"; with Privacy Mode enabled, "transcript data is not collected" though session metadata (timestamps, device type, performance metrics) may still be. Aqua Voice is SOC 2 Type II certified by Advantage Partners. Teams and Enterprise plans support an org-wide Privacy Mode that applies the same protections across an entire organization. No HIPAA Business Associate Agreement is publicly advertised. Audio is cloud-processed; there is no on-device option.

Primary source: Aqua Voice privacy policy

Does Typeless train on my voice data?

No, per the published privacy policy — but verify the architecture.

Typeless's privacy policy states: "Your data is never used to train these services and is configured for zero retention by the providers." Audio plus contextual information is "processed in real time on our cloud servers and immediately discarded once the result is returned to your device." Free and Pro tiers receive the same data-handling treatment. However, a November 2025 reverse-engineering analysis (covered in our Typeless privacy issues investigation) reported collection beyond what the published policy describes — including URL capture, window-title metadata via the macOS accessibility API, and broad permission requests. Verify the current subprocessor list at trust.typeless.com/subprocessors before relying on Typeless for sensitive content.

Primary source: Typeless privacy policy

Does Apple Dictation train on my voice data?

Only if you opt in via "Improve Siri & Dictation."

Apple Dictation only uses your audio to improve its models if you have explicitly enabled "Improve Siri & Dictation" — the default at setup is to be asked. If opted in, audio and transcripts are retained under a rotating random ID for up to 6 months, then dissociated and kept for up to 2 years for improvement; a reviewed subset is retained beyond 2 years. If opted out, recordings are not retained for improvement. On Apple Silicon Macs running modern macOS or iOS, most languages process locally for general text fields (Notes, Mail, Messages). Server-side fallback applies to unsupported languages, search-box dictation, and some third-party Speech Recognition API uses. Apple does not sign a Business Associate Agreement for consumer Dictation, so it is not HIPAA-compliant.

Primary source: Ask Siri & Dictation policy

Frequently Asked Questions

What does "on-device" actually mean?
On-device means a tool can complete its core workflow without sending your input to a vendor's servers. For dictation, that means audio is captured, transcribed, and discarded entirely on your computer — nothing leaves the machine. Most AI assistants and coding tools are not on-device by default: they transmit your prompts and code to a cloud model, even if the vendor doesn't retain or train on it. "Partially on-device" means parts of the workflow are local but specific cases (unsupported languages, agentic operations, large models) fall back to the cloud. Apple Dictation and Cline (when paired with a local Ollama or LM Studio model) are examples of partial on-device.
Why does the same tool show different answers for Free vs Business?
Consumer and business tiers operate under separate contracts. Most major AI vendors train on consumer data (or did until very recently) and explicitly exclude business / API / enterprise data from training under their commercial agreements. The two tiers can use the same underlying model but with different data-handling guarantees. Conflating the two is the most common error in third-party comparison articles. The tier filter at the top of each table separates them so you can answer either question independently.
Can a tool "unlearn" my data after training?
Practically, no. Once a model has been trained on a piece of data, the parameters reflect that training and cannot be cleanly reverted on a per-record basis. Vendors offering deletion typically delete the conversation record but cannot remove its influence on the model that has already absorbed it. This is why the relevant question is "will it be used for training in the first place," not "can I delete it later." Pages like this one focus on the training question because the deletion question rarely changes the outcome for already-trained models.
How often is this updated?
Each row carries a Last verified date. We re-check every cell against its primary source on a roughly monthly cadence, plus immediately whenever a vendor announces a policy change. The Recent Changes timeline at the top of the page lists every dated change we have logged. If you find an outdated cell or a missing change, email hi@getvoibe.com — we'll update and credit the report.
Why aren't there ratings, scores, or rankings?
Privacy posture is multi-dimensional. A score conflates three independent decisions a buyer needs to make: "do you train on my data," "how long do you keep it," and "can it run on my device." A tool with great training policy and weak retention isn't strictly better or worse than the inverse. The matrix lets you weight those dimensions yourself. The page is a reference, not a verdict.
How do I report an error or a missing tool?
Email hi@getvoibe.com with the tool name, the cell you think is wrong, and a primary-source link. We'll verify and update on the next pass. Tool requests are welcome, but to be added we need a vendor-published privacy or data-handling page that we can cite — marketing claims aren't enough.

This tracker is maintained by the team at Voibe. We built it because privacy is the central design constraint of our product, and we kept being asked these questions. Voibe is one of the tools listed — the methodology is the same for every row.

Related reading on this site: Is Wispr Flow safe? · Typeless privacy issues · Apple Dictation privacy · Voice data privacy · Cloud vs local dictation · HIPAA dictation.