COMPRISE NLP TOOLS


Text Transformer logo

Text Transformer

An open-source tool for text de-identification

Many documents contain private information. When these documents are shared with others, there is a risk that this information may fall into the wrong hands. It is worth preserving and protecting the privacy of the involved people, but doing it all by hand is cumbersome and costly.

The COMPRISE Text Transformer is an open-source tool that helps protect private information in text. It automatically identifies different types of potentially private information (names, organizations, locations, etc.) and offers different strategies to cope with them.

The COMPRISE Text Transformer allows you to transform sentences containing privacy-critical information such as e.g., personal names, locations, organization names, and time information into similar looking but neutral variants:

Mrs. Johnson from Seattle, WA, has worked for Boeing for 29 years.
Redact:Mrs.   from  , has worked for   for   years.
Word-by-word replacement:Mrs. Smith from Boston, Miami, has worked for SAP for 4 years.
Full entity replacement:Mrs. Vernon from Frankfurt, has worked for Tesla for 15 years.

Benefits & Features

The COMPRISE Text Transformer is a great tool to help with large-scale de-identification of text documents.

  • Runs on any platform that supports Python3: Windows, MacOS, Linux
  • Three transformation strategies to choose from
  • Freely available

Additional information can be found in one of our research papers.

Get it now!

Download the COMPRISE Text Transformer.


Weakly Supervised Learning for NLU

An open-source tool for machine learning that requires little to no labeling

Supervised machine learning methods based on neural networks dominate many Natural Language Understanding (NLU) tasks. However, they often require large quantities of labeled examples to train, and high-quality labeling is a cumbersome and costly effort.

The COMPRISE Weakly Supervised Learning for NLU library is an open-source tool that allows to perform tasks such as Named Entity Recognition with significantly lower labeling requirements, but without major performance impacts.

Benefits & Features

The COMPRISE Weakly Supervised Learning for NLU library is a great tool to help with large-scale classification tasks.

  • Runs on any platform that supports Python3: Windows, MacOS, Linux
  • Contains a general neural network-based approach applicable to many supervised NLU tasks
  • Contains a specialized approach for Named Entity Recognition that requires no training at all (“zero-shot”)
  • Freely available

Additional information can be found in two of our public reports (here and here).

Get it now!

Download the COMPRISE Weakly Supervised Learning for NLU library.


Contact: thomas.kleinbauer@lsv.uni-saarland.de