Fix missing Presidio recognizers for URL, US_SSN, CRYPTO, etc. (#69)
authorStefan Gasser <redacted>
Mon, 9 Feb 2026 08:05:13 +0000 (09:05 +0100)
committerGitHub <redacted>
Mon, 9 Feb 2026 08:05:13 +0000 (09:05 +0100)
commit5871fa526a7a63c4c91af2d598736ef0cb6e9236
treeb5ecb0375b61094e2b39ee69dbb19e6103019210
parent2d21638ce575f53a6fb5b64951f4f389ddbd78c7
Fix missing Presidio recognizers for URL, US_SSN, CRYPTO, etc. (#69)

The config generator only included 6 recognizers, missing standard ones
like UrlRecognizer, UsSsnRecognizer, CryptoRecognizer. This caused
detection failures when users enabled these entity types.

Changes:
- Add GLOBAL_RECOGNIZERS for pattern-based detection (7 recognizers)
- Add LANGUAGE_RECOGNIZERS for language-specific detection
- Only load language-specific recognizers when that language is configured
- EN: US + UK recognizers (8)
- ES: Spanish NIF/NIE (2)
- IT: Italian documents (5)
- PL: Polish PESEL (1)
- KO: Korean RRN (1)

Fixes #67
docker/presidio/generate-configs.py
git clone https://git.99rst.org/PROJECT