PIISA

PIISA (Personally Identifiable Information Standard Architecture) is a set of tools to detect and remediate PII within large scale language data. It uses best of breed tools like 🤗 transformers libraries, spaCy, regular expressions, Faker and Presidio to leverage best practices for effectively managing data privacy in accordance with your privacy policies. Important links:

  1. PIISA API docs
  2. Blog
  3. LinkedIn

This demo uses the multi-lingual wikineural model from Babelscape.

â–µ We're looking for any feedback and/or suggestions, so please open a new thread in the Discussions tab â–µ

Select Language

Select Policy

Examples

Policies are defined as follows:

  1. Annotate - replace the PII instance by a <TYPE:VALUE> string, i.e. include both the PII type and its value
  2. Redact - all PII instances are replaced by a <PII> generic string
  3. Placeholder - replace with a prototypical value
  4. Synthetic - substitute with synthetic data

For more information on the transformation policies, please refer to the guide here