Consulting and Huggingface Transformers: A Match Made in Heaven

Matthew Cintron and Thomas Dunlap, Specialist Leaders

Whether you work in the Federal sector, Private sector, or any sector, you will have to deal with plenty of text. Emails, contracts, documentation, research papers, etc. all comprise of a tsunami of words that need your attention, your time, and of course, your money (based on the age-old mathematical equation: Time = Money).


What if there were multiple state-of-the-art Natural Language Processing (NLP) pipelines at your disposal, that can take care of the majority of pre-processing, training, and prediction steps? “Certainly, you would need a legion of Ph.D.’s”, you say? Not at all if you know about Huggingface and their Transformers Pipelines.

At Attain, we have utilized Huggingface Sentiment Analysis Pipelines, which are fine-tuned for scoring fraudulent words and phrases, in Fraud and Discrimination investigations. These investigations would normally take about four weeks for a team to manually collect and analyze scattered internet sources and human contacts. This AI investigation app can now reach the same outcome in four minutes of processing, which is a 10,000% improvement in investigation time!

We have also created automated meeting notes using the Summarization Pipeline and Transcriptions from Microsoft Stream. Additionally, made a legal clause classifier for acceptability of contract clauses via a fine-tuned Sentence Classification Pipeline, and we extracted keyword metadata from documents using the Named Entity Recognition Pipeline. The list of possible uses goes on-and-on!

What NLP Tasks can Huggingface Pipelines Perform?

You might as well ask what tasks Huggingface Pipelines cannot perform. Here is an abridged list of what they can do:

  • Summarization:  Get the most import parts of large amounts of text.
  • Conversational:  Provide a context and have a conversation!
  • Sentence Classification:  e.g. Sentiment Analysis!  Input text and output a class value.
  • Feature Extraction: Pipeline to extract embeddings for other ML tasks.
  • Token Classification: Named Entity Recognition.
  • Question Answering:  Answers based on a provided context.
  • Zero Shot Classification: Provide topics and text and classify how likely that text is about those topics.
  • Translation: Language translation. 
  • Text Generation: Generate most likely sentences from short phrases.

Pre-Trained by Ph.D.’s

Normally, you would need large amounts of text data to train a neural network NLP model, but BERT comes pre-trained. Imagine traditional NLP models as an average person trying to learn to cook a perfect steak, and pre-trained Transformer models as a professional chef learning the same new recipe. The pre-trained chef will need far less information and repetition to get the desired result than will someone with no training. In-fact, the chef probably already makes it close enough to a perfect steak! The difference between a pre-trained model vs training a neural network from scratch is you can take advantage of the pre-trained model’s “skills” taught by the “masters” at Google, Facebook, et al. without having to provide the same training yourself.

For example, the BERT BASE model was trained on 4 Cloud TPUs over 4 days on 16GB (3.3 billion words) of Wikipedia and BookCorpus data, and the BERT LARGE was trained on 16 Cloud TPUSs. These are time and financial costs your company no longer has to incur. This pre-training creates a model with a semantic understanding of the English language (and many other languages). Because the model has this understanding, it can be “adjusted” to a specific NLP task with much less data than traditionally necessary to create a useable model. Basically, someone else bakes the cake, frosts it, decorates it, and writes “Happy Birthday, Grandma!” on it, and you just light the candles and get all the praise and credit that Gram has to offer!

I’m Intrigued by These Huggingface Pipelines, But Not Convinced…

Fair enough, if you need more convincing click here to read an article written by a Hugginface Summarization Pipeline—one that I did not train, barely pre-and-post-processed, and lightly edited after the fact.  I scraped text from top Google Search results and fed the text through the summarizer. How much time did I spend training the models to do this task reasonably well? Zero. How many rows of data did I need to train this model? Zero. How much do other engineers need to comprehend multi-head self-attention query, key, value, transformer architecture? Guess. Did you guess “zero”? Because it’s zero.