Contribution · Application — Legal

AI in E-Discovery and Document Review

E-discovery in large litigation can cost millions reviewing gigabytes of emails, Slack, and documents. The 2010s answer was TAR 1.0 (SVMs) and TAR 2.0 (continuous active learning). In 2026, the state of the art is TAR 3.0: LLMs that classify responsiveness, flag privilege, and produce investigation summaries with citations. Courts accept AI review when sampling validates recall at stipulated thresholds — the legal framework hasn't changed, just the tooling.

Application facts

Domain: Legal
Subdomain: E-Discovery
Example stack: Claude Opus 4.7 (1M context) or on-prem Llama 4 for reasoning · Relativity or Everlaw review platforms with LLM plug-ins · Nuix or Reveal for processing EDRM XML ingest · pgvector / OpenSearch for concept search · Custom privilege classifier fine-tuned on seed set

Data & infrastructure needs

Collected ESI — email (PST, MBOX), chat (Slack, Teams), docs
Privileged attorney / keyword list
Issue and responsiveness coding schema
Historical coded seed set for calibration
Protective order and clawback agreement terms

Risks & considerations

Privilege waiver — inadvertent production of privileged material
Recall below stipulated threshold breaching discovery order
Data residency and cross-border transfer violations
Prompt injection via adversarial email content
Cost-shifting disputes if AI review methodology is challenged

Frequently asked questions

Is AI document review admissible in court?

Yes — courts have accepted TAR since Da Silva Moore (2012) and the principle extends to LLM-based review, provided the protocol is documented, defensible, and statistically validated. The Sedona Conference and EDRM publish current best-practice protocols adopted by federal courts.

Which model is best for e-discovery?

In 2026, long-context models (Claude Opus 4.7 at 1M tokens, GPT-5) handle multi-document threads natively. For the highest-sensitivity matters, many firms deploy on-prem open-weight models (Llama 4) to avoid data egress. Accuracy on narrow issue-coding often requires fine-tuning on case-specific seeds.

What are the biggest risks?

Privilege leakage (waives attorney-client privilege irreversibly), over-broad production breaching protective orders, data residency violations, and bias in deduplication or threading. Mitigation: privileged-term dictionaries, clawback protocols under FRE 502(d), and rigorous privilege sampling.

Sources

The Sedona Conference — Commentary on TAR — accessed 2026-04-20
EDRM — E-Discovery Reference Model — accessed 2026-04-20
Federal Rules of Civil Procedure — Rule 26 — accessed 2026-04-20