Overview
- OpenAI released the open-weight Privacy Filter on Wednesday under the Apache 2.0 license, with downloads on GitHub and Hugging Face.
- The tool detects and masks eight types of sensitive information, covering private names, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets like passwords or API keys.
- By running locally, the model keeps unfiltered text on device so teams can clean emails, logs, or code before any chatbot or cloud service sees it.
- The 1.5 billion-parameter token-classification design labels text in one pass, supports up to 128,000 tokens of context, and emphasizes context-aware decisions over simple pattern matching.
- OpenAI reports 96% F1 on the PII-Masking-300k benchmark, rising to 97.43% after annotation fixes, yet it warns this is not a compliance guarantee and calls for human review in high-risk domains.