What is an AI hallucination in a legal filing?

An AI hallucination in a legal filing is a case citation, quotation, or legal authority that a generative AI tool generates but that does not exist or does not say what the AI claims. The term covers four distinct failure modes catalogued by the Damien Charlotin AI Hallucination Cases Database: fabricated content (a case that never existed), false quotes (a real case cited for a line the opinion never contained), misrepresented material (a real case cited for a proposition the opinion does not support), and outdated advice (a case or rule that has been overruled or superseded). Courts have sanctioned attorneys under all four categories, and the term "AI hallucination" now appears in judicial opinions as a term of art.

What happened in the Sullivan & Cromwell AI hallucination incident?

Sullivan & Cromwell partner Andrew Dietderich filed an apology letter to Chief Judge Martin Glenn of the U.S. Bankruptcy Court for the Southern District of New York on April 18, 2026, acknowledging that an emergency motion the firm had filed in the Prince Global Holdings Chapter 15 bankruptcy contained AI hallucinations. Bloomberg Law reported approximately 28 erroneous citations, with some reports placing the count higher. Errors included fabricated case citations, misquoted authorities, and misdescribed legal sources. Boies Schiller Flexner, opposing counsel, flagged the issues to Sullivan & Cromwell, which then self-reported to the court. The specific AI tool involved was not disclosed, and as of April 2026 coverage no sanctions had been imposed.

What was the first major AI hallucination case in US courts?

The first major AI hallucination case was Mata v. Avianca, Inc., decided by Judge P. Kevin Castel of the Southern District of New York on June 22, 2023. Attorneys Steven A. Schwartz and Peter LoDuca of Levidow, Levidow & Oberman cited six fabricated cases — including Varghese v. China Southern Airlines Co., 925 F.3d 1339 (11th Cir. 2019) — in opposition to a motion to dismiss. The cases had been generated by ChatGPT, which falsely confirmed their existence when Schwartz asked. Judge Castel imposed $5,000 in Rule 11 sanctions jointly and severally against both attorneys and the firm, finding subjective bad faith driven primarily by the cover-up rather than the initial AI use.

How many AI hallucination cases have been documented in US courts?

The Damien Charlotin AI Hallucination Cases Database catalogued 1,348 documented worldwide cases as of April 24, 2026, including 915 cases from US courts. Approximately 60 percent involve pro se litigants and the remaining 40 percent involve licensed attorneys or other professionals. Eugene Volokh at the Volokh Conspiracy documented 17 US court decisions noting suspected AI hallucinations on a single day — March 31, 2026. The tracker's authors caution that the actual count is much higher because many hallucinations are never spotted, many that are spotted are never noted in published decisions, and most state trial court decisions are not indexed in the databases that feed the tracker.

What does ABA Formal Opinion 512 require of lawyers using AI?

ABA Formal Opinion 512, issued July 29, 2024, is the first formal ABA ethics guidance on generative AI tools. It requires lawyers to maintain a reasonable and current understanding of any GAI tool's specific capabilities and limitations under Model Rule 1.1 (competence), including reliability, accuracy, completeness, and bias. Lawyers must not rely on GAI output without independent verification. Under Model Rule 3.3, lawyers must review AI outputs and correct misstatements of law or fact before filing. Under Model Rule 1.6, lawyers must avoid inputting confidential client information into self-learning tools without informed client consent. Under Model Rules 5.1 and 5.3, managerial lawyers must establish firm policies and supervise compliance. Under Model Rule 1.5, lawyers generally cannot bill hourly clients for time saved by AI efficiencies that were not actually expended.

How often do legal research AI tools hallucinate?

A Stanford RegLab study published in the Journal of Empirical Legal Studies in 2025 tested purpose-built legal AI research tools and found that Lexis+ AI hallucinated on approximately 17 percent of queries and Westlaw AI-Assisted Research hallucinated on approximately 33 percent of queries — roughly double the Lexis+ AI rate. The study defined hallucinations as answers that were either factually incorrect about the law or that cited real sources that did not support the claim. General-purpose LLMs perform much worse. An earlier Stanford paper by Dahl et al. (2024) found GPT-4 hallucinated on 58 percent of random federal case questions and Llama 2 on 88 percent. The lesson is that even the legal-specific tools with retrieval-augmented generation are not safe to cite without verification.

Which judges require disclosure of AI use in court filings?

Judge Brantley Starr of the Northern District of Texas was the first US federal judge to issue a mandatory AI certification order, entered May 30, 2023. Attorneys must certify either that no portion of the filing was drafted by generative AI, or that any AI-drafted language was checked for accuracy by a human being using print reporters or traditional legal databases. Judge Michael M. Baylson of the Eastern District of Pennsylvania issued a broader standing order on June 6, 2023 covering all AI tools and requiring both disclosure of AI use and certification that every citation has been verified as accurate. Standing orders addressing AI have since been entered in courts across multiple circuits, but there is no uniform federal rule.

Can a law firm be sanctioned even if it disclosed the AI use?

Yes. Disclosure alone does not immunize a firm from sanctions when the filing contains hallucinated citations. Courts have sanctioned attorneys under Rule 11 and inherent authority for filing unverified AI output regardless of whether the AI use was disclosed. Judge Kelly H. Rankin, sanctioning Morgan & Morgan attorneys in Wadsworth v. Walmart (D. Wyo. 2025), stated that a finding of subjective bad faith is not required to impose sanctions. The duty under Rule 11 is to read every case cited to ensure the excerpt is existing law. Disclosure reduces aggravating factors but does not replace verification.

Does on-device dictation avoid the AI hallucination problem for law firms?

On-device dictation avoids a different AI problem than hallucination but addresses the same underlying architectural concern that runs through the hallucination cases. Dictation tools convert speech to text and do not generate novel case citations, so they do not hallucinate citations in the sense the Sullivan & Cromwell and Mata v. Avianca cases describe. However, cloud-based dictation tools transmit audio of privileged legal content to vendor servers under terms that typically permit broad data use — creating the same third-party disclosure problem the Heppner ruling identified for consumer chatbots. On-device dictation tools like Voibe run speech recognition locally on Apple Silicon, so audio of privileged content never leaves the lawyer's Mac. For law firms rebuilding their AI stack after Sullivan & Cromwell, on-device dictation is a low-friction architectural swap for the voice portion of the workflow.

Has any court held that enterprise AI tools are safe to use without verification?

No court has held that enterprise AI tools, including Thomson Reuters CoCounsel, Lexis+ AI, Westlaw Precision AI-Assisted Research, Harvey, or their equivalents, produce output that does not require human verification. Judge Michael Wilner's May 2025 sanctions order in Lacey v. State Farm involved CoCounsel and Westlaw Precision alongside Google Gemini and imposed approximately $31,000 in fees against Ellis George LLP and K&L Gates LLP for filing a brief in which approximately 9 of 27 citations were wrong and at least 2 cases did not exist. Wilner wrote that the AI output "affirmatively misled me" and that he was persuaded until he looked up the cited decisions himself. The Stanford RegLab study is consistent: even purpose-built legal AI with retrieval-augmented generation hallucinates at rates high enough to make verification mandatory.

What should a law firm do right now if it has been using AI without a verification protocol?

A law firm that has been using AI without a formal verification protocol should take five immediate steps. First, audit active filings for AI-drafted content and verify every citation under the Three-Layer Verification Protocol before the next hearing. Second, issue an interim written policy requiring human citation verification for every AI-assisted filing until a formal policy is adopted. Third, identify every AI tool in use across lawyers, paralegals, and support staff — including voice tools like cloud dictation and meeting note-takers — and classify each by confidentiality tier. Fourth, migrate privileged-content workflows off consumer AI tiers and off cloud dictation to enterprise or on-device alternatives consistent with ABA Opinion 512 and the reasoning in US v. Heppner. Fifth, schedule partner-level training on Opinion 512 obligations, Rule 11 duties, and the documented sanctions landscape so that compliance is understood rather than delegated.

AI Hallucinations in Law Firms: What Lawyers Must Know (2026)

AI Hallucinations in Law Firms: The 2026 State of Play

TL;DR: AI hallucinations in law firms are fabricated case citations, false quotations, and misrepresented authorities generated by AI tools that attorneys file without verification. The April 2026 apology from Sullivan & Cromwell to Chief Judge Martin Glenn — for an emergency motion with roughly 28 erroneous citations in the Prince Global Holdings Chapter 15 bankruptcy — is the highest-profile of a documented 1,348 worldwide cases tracked by the Damien Charlotin AI Hallucination Cases Database, 915 of them in US courts. Since Mata v. Avianca first sanctioned ChatGPT-fabricated citations in June 2023, at least eight appellate and trial rulings have imposed fines, referrals, and suspensions; a single day in March 2026 produced 17 separate court decisions noting suspected hallucinations. Rule 11, ABA Formal Opinion 512, and a growing set of judicial standing orders all point to the same duty: verify every citation before filing, whether the draft came from a junior associate, a research tool, or a generative AI.

The operational fix is not to ban AI. It is to tier the tools by confidentiality risk, adopt a formal citation verification protocol, and treat AI-assisted work under the same Rule 11 and Opinion 512 obligations that apply to any other drafting channel. This article catalogues the landmark cases, the hallucination mechanics, the ethics framework, and a practical Three-Layer Verification Protocol law firms can adopt immediately.

Key Takeaway

AI hallucinations in law firms are now a documented pattern, not an isolated failure. 1,348 worldwide cases, 17 in a single day, and sanctions from a $5,000 Rule 11 fine in Mata v. Avianca to a roughly $31,000 order against two AmLaw firms in Lacey v. State Farm.

Key Takeaways: AI Hallucinations at a Glance

Dimension	2026 State of Play	Why It Matters
Documented cases	1,348 worldwide; 915 US (Charlotin tracker, Apr 24, 2026)	Pattern, not anomaly — and the tracker undercounts
Highest-profile incident	Sullivan & Cromwell apology to Chief Judge Glenn (Apr 18, 2026)	Top-tier firms are not immune
Landmark sanction	Mata v. Avianca, $5,000 Rule 11 fine (S.D.N.Y., Jun 22, 2023)	Established subjective bad faith analysis for AI misuse
Largest fee award to date	Lacey v. State Farm, ~$31,000 against Ellis George + K&L Gates (May 6, 2025)	Enterprise legal AI tools (CoCounsel, Westlaw Precision) implicated
Governing ABA guidance	Formal Opinion 512 (Jul 29, 2024)	Binds competence, confidentiality, candor, supervision, and fees
Legal AI hallucination rates	Lexis+ AI ~17%, Westlaw AI ~33% (Stanford RegLab, 2024)	Even purpose-built legal tools require human verification
General LLM hallucination rates	GPT-4 ~58%, Llama 2 ~88% on random federal case questions	Consumer chatbots are never citation-ready

Disclosure: Voibe is our product. This article is educational first; the analysis applies regardless of which vendor a law firm chooses for its AI stack.

The Sullivan & Cromwell Incident: What Happened in April 2026

The Sullivan & Cromwell incident began with an emergency motion filed in early April 2026 in the Chapter 15 proceeding In re Prince Global Holdings Limited and Paul Pretlove, docket 1:26-bk-10769, before Chief Judge Martin Glenn of the U.S. Bankruptcy Court for the Southern District of New York. The case concerns the wind-down of a Cambodian conglomerate tied to the Chen Zhi / Prince Group crypto-scam investigation. Sullivan & Cromwell represents the petitioners; Boies Schiller Flexner represents Chen Zhi.

Boies Schiller flagged errors in Sullivan & Cromwell's motion. On April 18, 2026, Andrew Dietderich, co-head of Sullivan & Cromwell's global restructuring group, filed an apology letter to Chief Judge Glenn. Dietderich wrote: "The inaccuracies and errors in the Motion include artificial intelligence ('AI') 'hallucinations.' 'Hallucinations' are instances in which artificial intelligence tools fabricate case citations, misquote authorities, or generate non-existent legal sources." He acknowledged that firm policies "were not followed in connection with the preparation of the Motion" and that the firm's "review process did not identify the inaccurate citations generated by AI." A three-page Schedule A of corrections was attached.

The reported error count varies across coverage. Bloomberg Law identified 28 erroneous citations. Above the Law described approximately 40 corrections. The errors included fabricated case citations, misquotations of the Bankruptcy Code, misdescribed authorities, and at least one case that did not exist. The specific AI tool used was not disclosed. As of coverage through April 23, 2026, Judge Glenn had not imposed sanctions; a status hearing was scheduled.

Two details make the incident especially notable. First, Sullivan & Cromwell reportedly advises OpenAI on the safe and ethical deployment of AI — a detail David Lat flagged in his Substack. Second, senior-partner restructuring billing rates at the firm are reported at approximately $2,000 per hour. The reputational story is not that AI caused the error; it is that an AM Law 100 firm with AI-advisory credentials, senior restructuring partners, and premium billing rates still failed the Rule 11 citation check.

Warning

Sullivan & Cromwell's apology letter matters less for what it admitted than for what it conceded by implication: firm policies existed, the policies were not followed, and the review process did not catch AI-generated fake citations before filing. That is the exact failure pattern every firm's AI governance should be designed to prevent.

The Broader Pattern: 1,348 Documented Cases and 17 in a Single Day

The broader pattern shows AI hallucinations in legal filings are a scaling problem, not a novelty. The Damien Charlotin AI Hallucination Cases Database catalogued 1,348 worldwide cases as of April 24, 2026, with 915 from US courts. Growth acceleration is steep: the tracker recorded 87 cases on May 18, 2025; 486 cases on October 28, 2025; and 1,348 cases by April 2026. Reported incidents have grown from roughly two per week in early 2025 to two to three per day by late 2025.

Eugene Volokh documented 17 US court decisions in a single day — March 31, 2026 — noting suspected AI hallucinations in filings. His accompanying commentary identifies three reasons the real count is materially higher than the tracker: many hallucinations are never spotted by opposing counsel or the court; many that are spotted do not generate a published decision; and the majority of state trial court decisions are not indexed on Westlaw or Lexis, so they do not surface in searches at all.

The participant mix matters for firm-level risk analysis. Across the 1,348 documented cases, the tracker identifies 804 pro se litigants, 511 licensed lawyers, and 33 other professionals — judges, prosecutors, paralegals. In other words, approximately 40 percent of documented hallucination cases involve trained lawyers. The hallucination types are distributed roughly as follows: 1,123 involve fabricated content; 356 involve false quotes; 542 involve misrepresented material (including cases where multiple categories overlap); 30 involve outdated advice.

A Chronology of Landmark AI Hallucination Cases (2023–2026)

A chronology of landmark AI hallucination cases shows consistent sanctions patterns across jurisdictions, tools, and attorney seniority. The list below is not exhaustive; it captures the rulings most frequently cited in subsequent opinions and in law-firm AI policy documents.

Mata v. Avianca, Inc. (S.D.N.Y. Jun 22, 2023) — Judge P. Kevin Castel imposed $5,000 in Rule 11 sanctions jointly and severally against Steven A. Schwartz, Peter LoDuca, and Levidow, Levidow & Oberman. Schwartz cited six fabricated cases produced by ChatGPT, including Varghese v. China Southern Airlines Co., 925 F.3d 1339 (11th Cir. 2019). When questioned, ChatGPT told Schwartz the cases could be found on Westlaw and LexisNexis. Castel found "subjective bad faith" driven primarily by the attorneys' response after the fakes were identified. Docket; Seyfarth summary.
People v. Crabill (Colorado, Nov 22, 2023) — Colorado's Presiding Disciplinary Judge approved a one-year-and-one-day suspension (90 days served) of Zachariah C. Crabill for filing ChatGPT-generated fake citations and then blaming a legal intern. Characterized as the first US attorney-discipline ruling implicating AI misuse. Volokh Conspiracy summary.
United States v. Cohen (S.D.N.Y. Mar 2024) — Michael Cohen used Google Bard to research supervised-release precedents and forwarded three fabricated Second Circuit citations to his then-lawyer David M. Schwartz, who filed them without verification. Judge Jesse M. Furman declined to impose sanctions but called the episode "embarrassing and certainly negligent" in a 13-page ruling. Legal Dive coverage.
Park v. Kim (2d Cir. Jan 30, 2024) — The Second Circuit referred attorney Jae S. Lee of JSL Law Offices to its Grievance Panel after she cited a non-existent case in a reply brief, admitted ChatGPT use, and made no independent inquiry. The panel held her conduct fell "well below the basic obligations of counsel." Opinion.
Kruse v. Karlen (Mo. Ct. App. E.D., Feb 13, 2024) — Pro se appellant sanctioned $10,000 after 22 of 24 case citations were fabricated. First Missouri appellate opinion sanctioning AI-generated fake citations. Missouri Independent.
Wadsworth v. Walmart Inc. (D. Wyo. Feb 2025) — Judge Kelly H. Rankin sanctioned Morgan & Morgan attorneys Rudwin Ayala ($3,000), T. Michael Morgan ($1,000), and Taly Goody ($1,000) after Ayala used the firm's internal AI tool "MX2.law" to add case law to motions in limine, producing eight fabricated cases. Rankin: "A finding of subjective bad faith is not required to impose sanctions." Ayala's pro hac vice admission was withdrawn. LawNext.
Lacey v. State Farm (C.D. Cal., May 6, 2025) — Special Master Michael R. Wilner imposed approximately $31,000 in fees against Ellis George LLP and K&L Gates LLP, jointly and severally, after approximately 9 of 27 citations in a supplemental brief were wrong and at least 2 cases did not exist. Tools used included CoCounsel, Westlaw Precision, and Google Gemini. Wilner wrote that AI use "affirmatively misled me" and noted that K&L Gates is among the largest US firms by headcount. ABA Journal.
Noland v. Land of the Free, L.P. (Cal. Ct. App. 2025) — First California Court of Appeal opinion sanctioning AI hallucinations; $10,000 fine after 21 of 23 case quotations were fabricated. Daily Journal.
Coomer v. Lindell (D. Colo. Jul 2025 + Apr 2026) — U.S. Magistrate Judge Nina Y. Wang sanctioned MyPillow attorneys Christopher I. Kachouroff and Jennifer T. DeMaster $3,000 each for approximately 30 defective citations in post-trial filings, generated by Microsoft Copilot, Google Gemini, and X's Grok. In April 2026 Wang issued an order to show cause proposing additional sanctions after the attorneys continued citing non-existent cases. NPR; Colorado Politics.
Sullivan & Cromwell / Prince Global Holdings (Bankr. S.D.N.Y. Apr 2026) — Firm self-reported roughly 28–40 AI hallucinations after Boies Schiller flagged the errors. No sanctions as of April 2026 coverage.

The pattern across rulings: courts do not draw sharp lines between pro se litigants, solo practitioners, mid-market firms, AmLaw 100 partners, or specialized legal AI tools. Rule 11 and the inherent sanctions power apply uniformly. Size of firm and sophistication of tool have no doctrinal relevance.

Tools and firm sizes vary widely. Rule 11 and inherent-authority sanctions apply uniformly.

Why LLMs Fabricate Legal Citations: The Mechanics

Why LLMs fabricate legal citations is a product of how large language models work, not a bug that will be patched. LLMs are autoregressive next-token predictors trained on web-scale text. They generate plausible-sounding output by extending statistical patterns from their training data. They are not databases, and they do not retrieve verified case law from a primary source unless they are explicitly wired to do so.

Two structural features make legal citations especially vulnerable. First, the citation format itself — Party v. Party, volume reporter page, circuit, year — is highly pattern-regular, so the model can generate a citation that looks real without ever having seen the underlying opinion. Second, general-purpose LLMs were not trained on the full Westlaw or Lexis corpora (which are paywalled and license-restricted), so they have seen enough case-law snippets to mimic citation form but not enough to reproduce the full body of accurate citations.

The Stanford RegLab 2024 study by Varun Magesh and coauthors tested this empirically. After Stanford's redo methodology, the reported hallucination rates for purpose-built legal AI research tools were approximately:

Lexis+ AI: ~17 percent hallucination rate
Westlaw AI-Assisted Research: ~33 percent hallucination rate (roughly double Lexis+ AI)
Ask Practical Law AI (Thomson Reuters): refused more than 60 percent of queries and had lower accuracy on the remainder

These numbers are for tools built with retrieval-augmented generation — the architecture specifically designed to ground LLM output in verified sources. General-purpose LLMs perform much worse. A companion Dahl et al. (2024) study found GPT-4 hallucinated on approximately 58 percent of random federal case questions and Llama 2 on approximately 88 percent.

The operational implication is simple: no AI tool on the market — including the legal-specific ones — produces citation-ready output without human verification. The Sullivan & Cromwell incident, the Lacey v. State Farm order, and the Stanford study all point in the same direction. Treat every AI-generated citation as unverified until a human has opened the opinion in Westlaw or Lexis and confirmed the case, the quote, and the subsequent treatment.

The Five Law-Firm AI Risk Categories

The Five Law-Firm AI Risk Categories organize the failure modes that appear across documented sanctions and ethics opinions. Each category corresponds to a distinct duty under Rule 11 and ABA Opinion 512; each has appeared in at least one sanctions ruling.

Fabricated citations. The AI generates a case, statute, or regulation that does not exist. Dominant failure mode in Mata v. Avianca, Park v. Kim, Kruse v. Karlen, and Wadsworth v. Walmart. Governed by Rule 11 (reasonable inquiry) and ABA Model Rule 3.3 (candor to tribunal).
Misquoted or mischaracterized authorities. The case exists, but the quoted passage is not in the opinion, or the opinion does not stand for the proposition cited. Central to Lacey v. State Farm and the Sullivan & Cromwell incident. Governed by Rule 11 and Model Rule 3.3. The Stanford RegLab study calls this "misgrounded" output.
Outdated or overruled precedent. The case existed and said what is quoted, but has been overruled, reversed, or substantially narrowed. This category is under-spotted because the citation appears correct on a surface reading. ABA Opinion 512 flags currency explicitly as part of competent AI use.
Confidentiality leakage. Privileged or confidential client information is entered into a consumer AI tool whose terms permit training or broad disclosure. Governed by ABA Model Rule 1.6 and Opinion 512's informed-consent requirement. Related but distinct from hallucination; often paired with it because the same workflow produces both failures.
Privilege waiver by third-party disclosure. Client chats with a public AI tool about case strategy, producing documents that are not privileged when later discovered. This is the failure mode US v. Heppner crystallized. Governed by federal common-law privilege and Model Rule 1.6.

Firms building post-Sullivan & Cromwell AI policies should map each approved workflow against all five categories. A tool that avoids hallucinations by using retrieval-augmented generation can still fail the confidentiality test if its terms permit training. A tool that handles confidentiality well can still return mischaracterized authorities. Safety is the product of tool selection, workflow design, and human verification — not any single vendor claim.

The first three risks are citation-quality failures. The last two are workflow-architecture failures. A complete AI policy addresses all five.

ABA Formal Opinion 512 and What It Requires

ABA Formal Opinion 512, issued July 29, 2024, is the first formal ABA ethics guidance on generative AI tools. It does not create new rules; it applies existing Model Rules to AI. Five provisions bear directly on hallucination risk. The full text is available as a PDF from the ABA.

Model Rule 1.1 (Competence). Lawyers must understand the benefits and risks of any GAI tool they use and maintain a reasonable and current understanding of its specific capabilities and limitations, including reliability, accuracy, completeness, and bias. Competence requires periodic re-evaluation and forbids reliance on AI output without independent verification or review.
Model Rule 3.3 (Candor to Tribunal). Lawyers must review AI outputs, including analysis and citations to authority, and correct errors before filing. Hallucinated citations that are filed unverified are false statements of law under Rule 3.3(a)(1) regardless of whether the lawyer knew they were false at the time of filing.
Model Rule 1.6 (Confidentiality). Lawyers must avoid inputting confidential client information into self-learning GAI tools without informed client consent. Opinion 512 specifically rejects boilerplate engagement-letter consent as insufficient; informed consent requires specific disclosure of the tool, the data flow, and the risks.
Model Rules 5.1 and 5.3 (Supervision). Managerial and supervisory lawyers must establish clear firm policies governing GAI use and supervise both lawyers and non-lawyers to ensure compliance. A firm-level policy is a precondition; individual-lawyer discretion is not enough.
Model Rule 1.5 (Fees). Lawyers generally may not bill hourly clients for time saved by GAI efficiencies that were not actually expended, and must disclose GAI-related costs when charged as expenses.

Opinion 512's operational center of gravity is verification. The Opinion explicitly states that GAI "cannot solely substitute for a lawyer's competent legal work" and that required verification is "factually specific" and depends on the tool and the task. The Sullivan & Cromwell incident — where firm policies existed but were not followed — is a Rule 5.1 supervision failure as much as a Rule 3.3 candor failure.

Judicial Standing Orders: Who Requires AI Disclosure

Judicial standing orders on AI predate ABA Opinion 512 and are spreading across federal courts. Two orders are cited most frequently in subsequent policies and law-review articles.

Judge Brantley Starr (Northern District of Texas) issued the first US federal AI certification order on May 30, 2023. His "Mandatory Certification Regarding Generative Artificial Intelligence" requires attorneys to file a certificate attesting either that no portion of the filing was drafted by GAI (he names ChatGPT, Harvey.AI, and Google Bard), or that any AI-drafted language was checked for accuracy using print reporters or traditional legal databases by a human being. Failure to file the certificate results in the filing being struck.

Judge Michael M. Baylson (Eastern District of Pennsylvania) issued a broader Standing Order re AI on June 6, 2023. Baylson's order covers all AI — not only generative AI — and applies to every complaint, answer, motion, brief, and other paper. Attorneys and pro se litigants must both disclose AI use and certify that every citation has been verified as accurate.

Standing orders addressing AI have since been entered in courts across multiple circuits. There is no uniform federal rule. Firms filing in federal court should maintain a standing-orders registry — Starr and Baylson are the foundational examples, but local rules, individual-judge standing orders, and even court-wide administrative orders may impose additional disclosure or certification duties. The operational implication is that compliance checking is now per-judge as well as per-circuit.

The Three-Layer Verification Protocol for AI-Assisted Filings

The Three-Layer Verification Protocol is a citation-checking framework derived from the Rule 11 duties and ABA Opinion 512 requirements that appear in every documented AI hallucination sanction. It is designed to be executed by a human before any AI-assisted filing is signed, and to catch all three citation-quality failure modes from the Five Risk Categories.

Layer 1 — Existence. Open Westlaw, Lexis, or the court's official docket system. Search for the case by name and citation. Confirm that every cited case, statute, regulation, and secondary source actually exists. This layer catches fabricated citations — the Mata v. Avianca, Park v. Kim, Kruse v. Karlen, and Wadsworth v. Walmart failure mode. Do not rely on the AI tool's own claim that the citation is correct. Mata v. Avianca attorney Steven Schwartz asked ChatGPT whether the cases were real; it said yes. They were not.

Layer 2 — Accuracy. Pull the full opinion. Read the pinpoint cite. Verify that every quoted passage appears in the opinion exactly as quoted, and that every proposition attributed to the case is actually supported by the opinion's reasoning or holding. This layer catches misquoted and mischaracterized authorities — the Lacey v. State Farm and Sullivan & Cromwell failure mode, which the Stanford RegLab study calls "misgrounded" output. This is the most time-consuming layer and the most frequently skipped.

Layer 3 — Currency. Run Shepard's or KeyCite on every cited authority. Confirm that the authority has not been overruled, reversed, substantially narrowed, or distinguished in a way that destroys its value. This layer catches outdated precedent — a failure mode that is easy to miss because the citation appears correct on a surface reading. LLMs with training-data cutoffs are particularly weak here; purpose-built legal AI with RAG is better but still not reliable per the Stanford findings.

Each layer must be completed by a human who has access to a verified legal research database. The protocol does not trust the AI to verify its own output. It does not trust another AI to verify the first AI. It requires human access to primary sources because that is what Rule 11 requires and what every sanctions order to date has enforced.

Each layer maps to a distinct failure mode and a distinct rule. All three must pass before filing.

Where On-Device Dictation Fits: The Architectural Through-Line

Where on-device dictation fits in the post-Sullivan & Cromwell AI risk picture is a question most hallucination commentary skips. The direct answer: on-device dictation does not address hallucination risk — speech-to-text tools transcribe audio, they do not fabricate case citations — but it addresses the architectural concern running through the entire cases list, which is the same concern the US v. Heppner ruling crystallized on the privilege side: privileged legal content should not leave the lawyer's machine if it does not have to.

The risk categories map cleanly. Hallucination risk (categories 1–3 from the framework above) sits in the LLM layer and is resolved by the Three-Layer Verification Protocol. Confidentiality and privilege risk (categories 4–5) sit in the data-transmission layer and are resolved by tool-tier selection. Cloud dictation tools transmit audio of whatever a lawyer says into the microphone — including privileged memos, case strategy, deposition prep, and client calls — to vendor servers under terms that typically permit training, subprocessor sharing, and disclosure to law enforcement. The Heppner third-party disclosure analysis applies directly.

On-device dictation is the architectural answer to the voice portion of this workflow. Tools that run speech recognition locally on the lawyer's own Mac process audio in memory and discard it immediately. No audio leaves the device. No transcript leaves the device. No vendor terms of service are implicated.

Voibe is Voibe Inc.'s on-device dictation app for Mac. It runs OpenAI's Whisper models locally on Apple Silicon (M1 through M4) at $9.90/mo, $89.10/yr, or $198 lifetime. Audio is processed on the lawyer's own chip; the transcript appears wherever the cursor is; the audio is discarded. For firms rebuilding their AI stack in response to Sullivan & Cromwell and Heppner, dictation is the easiest architectural swap — it eliminates a third-party disclosure vector without changing how the lawyer works. For deeper analysis, see our guide on dictation software for lawyers, our analysis of Rev.com alternatives for lawyers (which covers the human-transcriber privilege exposure specifically), our explainer on cloud vs local dictation, and our coverage of voice data privacy. For medical-legal matters, see the HIPAA dictation guide.

Tip

The architectural principle: every AI category that touches privileged legal content should be evaluated for data-transmission risk as well as output-quality risk. Hallucinations are output-quality. Dictation is data-transmission. Both matter; they are solved by different controls.

Frequently Asked Questions About AI Hallucinations in Law Firms

The Sullivan & Cromwell Incident

Q: Which AI tool did Sullivan & Cromwell use? The specific AI tool was not disclosed in Dietderich's apology letter or in subsequent coverage. Inferences in the legal press have not been confirmed by the firm.

Q: Has Sullivan & Cromwell been sanctioned? As of April 2026 coverage, Chief Judge Glenn had not imposed sanctions; a status hearing was scheduled. The firm self-reported after Boies Schiller flagged the errors.

Q: Was this really an AI problem or a supervision problem? Both. Dietderich's letter acknowledged that firm policies were not followed and that the firm's review process did not catch the AI-generated errors before filing. Opinion 512 supervision duties under Model Rules 5.1 and 5.3 bear on the latter as squarely as candor duties bear on the former.

Sanctions and Professional Responsibility

Q: What is the average sanction in an AI hallucination case? There is no formal average across the documented 915 US cases, and the Charlotin tracker does not publish one. Among the landmark rulings, monetary sanctions cluster in the $1,000–$10,000 range per attorney, with outlier orders in the $30,000+ range (Lacey v. State Farm, Sixth Circuit 2025 ruling reported by the ABA Journal). Non-monetary consequences include grievance-panel referrals (Park v. Kim), pro hac vice withdrawal (Wadsworth v. Walmart), and suspension (People v. Crabill).

Q: Can the client be sanctioned for the lawyer's AI error? Generally no. Rule 11 sanctions attach to signers of filings, who are lawyers of record or pro se litigants. But client-initiated AI use raises the privilege issue separately; see US v. Heppner.

Q: Does Rule 11 apply to AI tools the same way it applies to human-drafted filings? Yes. The duty of reasonable inquiry under Rule 11(b) applies to every representation in a filed paper, regardless of how it was drafted. Judge Rankin in Wadsworth v. Walmart stated that a finding of subjective bad faith is not required. Judge Castel in Mata v. Avianca found subjective bad faith on the cover-up but would have imposed sanctions in any event under Rule 11's objective prong.

Firm Policy and Verification

Q: Does using legal-specific AI like CoCounsel or Westlaw Precision AI provide a safe harbor? No. Lacey v. State Farm imposed approximately $31,000 in fees on two firms whose filings used CoCounsel and Westlaw Precision alongside Google Gemini. The Stanford RegLab study found hallucination rates of approximately 33 percent for Westlaw AI-Assisted Research and approximately 17 percent for Lexis+ AI. Legal-specific tools reduce the error rate but do not eliminate it. Verification is still required.

Q: What is the minimum AI policy a firm should have in place? At minimum: (1) a tiering of approved tools by confidentiality risk; (2) a human citation verification protocol mapped to Rule 11 duties; (3) informed-consent language for client matters using AI; (4) training on ABA Opinion 512; (5) supervisory responsibility assignment under Model Rules 5.1 and 5.3. Opinion 512 does not require a specific format, but the substantive coverage is what supervisory lawyers will be judged on.

Q: Do judicial standing orders on AI apply to all filings or only to specific case types? Judge Starr's and Judge Baylson's standing orders apply to all filings before those judges. Firms filing in federal court should maintain a judge-level standing orders registry because local practice varies. Some orders apply court-wide; others are per-judge.

Adjacent AI Risks for Lawyers

Q: How does the hallucination story relate to US v. Heppner on privilege? Hallucination and privilege are different failure modes in overlapping workflows. Hallucination is about the accuracy of AI output and is governed by Rule 11 and Model Rule 3.3. Privilege is about third-party disclosure and is governed by federal common law and Model Rule 1.6. A single consumer-chatbot workflow can produce both failures simultaneously — fabricated citations plus waived privilege. For a deep dive on the privilege side, see AI and attorney-client privilege after US v. Heppner.

Q: Are meeting note-taker tools like Otter or Fireflies a hallucination or a privilege problem? Primarily a privilege problem. They transcribe human speech rather than generate case citations, so they are generally not a hallucination source. They transmit audio of potentially privileged calls to vendor servers, which is the Heppner concern. For privileged calls, either use on-device transcription or an enterprise tier with appropriate contractual confidentiality.

Q: Is cloud dictation software safe for law firms to use? Cloud-based dictation can be safe for non-privileged work if vendor terms include appropriate contractual confidentiality, no-training commitments, and data-handling terms consistent with Opinion 512. For privileged content, on-device dictation is the lower-risk architectural choice because no audio or transcript leaves the lawyer's machine. See our dictation software for lawyers guide and cloud vs local dictation analysis for tool-by-tool breakdowns.

Conclusion: The Post-Sullivan & Cromwell Playbook for Law Firms

The post-Sullivan & Cromwell playbook for law firms is not an AI moratorium. It is a disciplined application of rules that already exist. Rule 11 has always required reasonable inquiry into every citation. ABA Opinion 512 applies the existing Model Rules to AI tools. Judicial standing orders from Judge Starr and Judge Baylson specify what the duty looks like in practice. The 1,348 documented cases catalogued by the Charlotin tracker, the 17-in-a-day spike on March 31, 2026, and the Sullivan & Cromwell apology letter all show the same thing: firms that have not operationalized these duties will be caught, whether they are solo practitioners or AmLaw 100 partners.

Three operational moves define the playbook. First, adopt the Three-Layer Verification Protocol — existence, accuracy, currency — as a non-negotiable step in every AI-assisted filing. Second, tier AI tools by confidentiality risk and map the tier to the work. Consumer chatbots for non-privileged research. Enterprise tools with contractual confidentiality for sensitive but non-privileged work. On-device or locally hosted tools for anything that touches privileged content — including, importantly, voice workflows like dictation, transcription, and meeting capture. Third, document the workflow. Save prompts. Record which tool was used on which task. Update engagement letters and intake protocols. When Rule 5.1 supervision duties are tested, this documentation is what a firm will be judged on.

For the voice portion of that stack specifically, Voibe is a Mac dictation app that runs OpenAI's Whisper models on-device on Apple Silicon. Audio is processed locally and discarded; no transcript leaves the lawyer's Mac. Try Voibe for free, or read the deeper guides on dictation software for lawyers, AI and attorney-client privilege after Heppner, cloud vs local dictation, and why offline dictation matters.

The Sullivan & Cromwell story will not be the last of its kind. The next hallucination incident at an AmLaw 100 firm will probably not come with an apology letter before sanctions attach. The firms best positioned for that next ruling are the ones that treat every AI-generated citation as unverified until a human has opened the opinion and confirmed the case, the quote, and the subsequent treatment — and that keep privileged legal content off third-party servers wherever the architecture allows.