Your Privacy

This site uses cookies to enhance your browsing experience and deliver personalized content. By continuing to use this site, you consent to our use of cookies.
COOKIE POLICY

Text Analytics Spells “Big Savings”

Text Analytics Spells “Big Savings”
Back to insights

Text analytics and natural language processing are extremely powerful concepts that are increasingly within organizations’ grasp. Many of the concepts for mining text to extract new information have existed since the mid-1980s, but with the rise of the data scientist the barrier of entry has been dramatically lowered. Before we talk about how text analytics might be useful to your organization, let’s establish a quick baseline of understanding. 

What is Text Analytics?

Text analytics is roughly synonymous with text mining, and text data miningTechnically it is not related to biblio-wizardry or vocabu-sorcery but I’d still like to think there’s some magic left in the world. The whole idea behind text analytics is taking a body of text and extracting valuable, discrete, or new information. Think about your business, then think about how much of a paper trail there is: E-mails, contracts, invoices, industry publications, etc. Most organizations have an absolute mountain of text information that is likely providing little value right now, other than its original intended purpose.  

  (See More about turning data into insights /data-activation-when-your-data-hands-you-lemons/ ) 

What about Natural Language Processing?

Natural Language Processing is a subset of text analytics that deals with aspects of language such as identifying the parts of speech, disambiguation, sentiment analysis, and the other vagaries of human language that computers will soon be better at understanding than we are. Although I’m afraid that no amount of context clues can help me understand modern slang (https://thoughtcatalog.com/january-nelson/2018/09/millennial-slang/ ). I used to be cool, but now I’m just a data geek.  

Text Analytics and Machine Learning

As you’d expect in the new frontier of data jiggery, there are quite a few different approaches to text analytics. Some of the more interesting approaches utilize machine learning to train a model on an existing corpus of text and apply that model to related text. Perhaps we’re looking to extract entities by identifying law firm names in a body of legal documents. Maybe we’re trying to measure a customer’s sentiment to a customer service call by identifying speech patterns and word choice. Maybe we’re trying to determine if two historical works are actually written by the same author, or if they’ve just been attributed to the same person. These are exciting use-cases, and I doubt you have to think hard before you come up with something applicable to your own organization. 

A Real World Example

UDig is working with an association who publishes scholarly articles. Their ask is to improve their ability to use the abstracts of the works to automatically match new content with specific peer reviewers. A high-level explanation of our approach to tackling the challenge roughly follows.  

First, we take the massive corpus of abstracts and do some simple pre-processing. We do things like remove stop words (“the”, “and”, etc) and stem words (i.e., change “monitoring” to “monitor”). Next, we calculate a metric called TF-IDF. TF-IDF (which stands for “Term frequency–inverse document frequency”) essentially counts the appearance of a particular word in a document and then penalizes the “score” for the word if it appears in many different documents. For example, the word “the” (if it weren’t already removed by our stop word elimination) would appear quite frequently in a single document; but because it appears numerous times in every document, it gets penalized to count for nothing. Conversely, if one article happens to be about “biblio-wizardry”, and only two other documents contain the terms “biblio-wizardry” we can start to assume those texts might be related; particularly as we assess other common terms across the documents. 

In this case, ranking scholarly articles utilizing TF-IDF lets us get a pretty good idea of when two documents are related; and when two documents have little to do with each other. From there, we can take these terms and marry them up with peer-reviewers. If we discover that one person has a penchant for reviewing articles about “biblio-wizardry” but never touches the (frankly more profane) “vocabu-sorcery”, we know how to route new abstracts as they come in by applying the same technique. 

How achievable is this?

The possibilities for text analytics are endless. While it can be challenging to extract the information and no text analytics project looks the same, I believe there is an absolute treasure trove of value to be discovered. From automating discrete data identification, to gaining a more holistic view of your customers, text analytics is worth investigating.  

 

 

Digging In

  • Artificial Intelligence

    Capitalizing the AI Wave to Advance Data Governance

    AI is everywhere and getting a lot of attention, but you can’t successfully leverage AI without good data. You can use the buzz around AI to advance your data governance capabilities. Join us as we explore the intersection of AI and Data Governance.

  • Artificial Intelligence

    How Prompt Engineering Impacts the Deployment of AI

    The integration of artificial intelligence (AI) within enterprise operations marks a significant shift towards more efficient, informed decision-making processes. At the heart of this transformation is prompt engineering — a nuanced approach that plays a pivotal role in optimizing AI model interactions. This post explores the intricate framework of prompt engineering, outlines the structures of […]

  • Artificial Intelligence

    Emerging Technology: Artificial Intelligence (AI)

    From enhancing customer experiences to streamlining operations and enabling data-driven decision-making, AI is a transformative force that no agency can afford to ignore. Is Your Business Ready for AI?

  • Artificial Intelligence

    Is Your Business AI Ready?

    In the not-so-distant past, the concept of artificial intelligence (AI) often belonged to the realms of science fiction, promising a future of autonomous robots and sentient machines. Fast forward to today, and AI has not only emerged as a reality but has also skyrocketed in popularity, infiltrating virtually every sector of the business world. From […]

  • Artificial Intelligence

    Teaching a Robot to Read

    Many businesses are struggling to become more efficient and drive higher levels of employee engagement and customer satisfaction.  Intelligent Automation solutions could address all of those.  UDig can help you determine if it’s right for your organization, and if it is, you may get the opportunity to teach a robot how to read. You might think that […]

  • Artificial Intelligence

    Machine Learning in the Cloud

    In most machine learning projects, there is a common workflow that, at a minimum, consists of data preparation, model training, and model deployment. Still in its infancy, the Data Science community is testing various methodologies to streamline this process with varying degrees of success. This is the market that companies like Microsoft and Amazon are […]