Capability · Framework — orchestration

Microsoft Presidio

Presidio ships two main engines, Analyzer and Anonymizer. Analyzer detects entities like EMAIL, CREDIT_CARD, PHONE, AADHAAR, and custom patterns; Anonymizer replaces them with redaction, hashing, or format-preserving encryption. It's the standard way to scrub PII out of prompts, logs, and training data.

Framework facts

Category
orchestration
Language
Python
License
MIT
Repository
https://github.com/microsoft/presidio

Install

pip install presidio_analyzer presidio_anonymizer
python -m spacy download en_core_web_lg

Quickstart

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

text = 'Call Priya at +91-98-1234-5678 or [email protected].'
analyzer = AnalyzerEngine()
results = analyzer.analyze(text=text, language='en')
print(AnonymizerEngine().anonymize(text=text, analyzer_results=results).text)

Alternatives

  • LLM Guard — LLM-specific security scanners
  • AWS Comprehend PII
  • Scrubadub — pure Python scrubber
  • PIIcatcher — structured data focus

Frequently asked questions

Does Presidio detect region-specific PII like Aadhaar or PAN?

Yes, via built-in recognisers and easy custom patterns. The community ships recognisers for IN_AADHAAR, IN_PAN, IN_VEHICLE_REGISTRATION and more.

Can Presidio run inside an air-gapped network?

Yes. It's pure Python with optional spaCy / transformers NER models; everything can be packaged and run offline.

Sources

  1. Presidio — GitHub — accessed 2026-04-20
  2. Presidio — docs — accessed 2026-04-20