Text Analytics Spells “Big Savings”

Text analytics and natural language processing are extremely powerful concepts that are increasingly within organizations’ grasp. Many of the concepts for mining text to extract new information have existed since the mid-1980s, but with the rise of the data scientist the barrier of entry has been dramatically lowered. Before we talk about how text analytics might be useful to your organization, let’s establish a quick baseline of understanding.

What is Text Analytics?

Text analytics is roughly synonymous with text mining, and text data mining. Technically it is not related to biblio-wizardry or vocabu-sorcery but I’d still like to think there’s some magic left in the world. The whole idea behind text analytics is taking a body of text and extracting valuable, discrete, or new information. Think about your business, then think about how much of a paper trail there is: E-mails, contracts, invoices, industry publications, etc. Most organizations have an absolute mountain of text information that is likely providing little value right now, other than its original intended purpose.

(See More about turning data into insights /data-activation-when-your-data-hands-you-lemons/ )

What about Natural Language Processing?

Natural Language Processing is a subset of text analytics that deals with aspects of language such as identifying the parts of speech, disambiguation, sentiment analysis, and the other vagaries of human language that computers will soon be better at understanding than we are. Although I’m afraid that no amount of context clues can help me understand modern slang (https://thoughtcatalog.com/january-nelson/2018/09/millennial-slang/ ). I used to be cool, but now I’m just a data geek.

Text Analytics and Machine Learning

As you’d expect in the new frontier of data jiggery, there are quite a few different approaches to text analytics. Some of the more interesting approaches utilize machine learning to train a model on an existing corpus of text and apply that model to related text. Perhaps we’re looking to extract entities by identifying law firm names in a body of legal documents. Maybe we’re trying to measure a customer’s sentiment to a customer service call by identifying speech patterns and word choice. Maybe we’re trying to determine if two historical works are actually written by the same author, or if they’ve just been attributed to the same person. These are exciting use-cases, and I doubt you have to think hard before you come up with something applicable to your own organization.

A Real World Example

UDig is working with an association who publishes scholarly articles. Their ask is to improve their ability to use the abstracts of the works to automatically match new content with specific peer reviewers. A high-level explanation of our approach to tackling the challenge roughly follows.

First, we take the massive corpus of abstracts and do some simple pre-processing. We do things like remove stop words (“the”, “and”, etc) and stem words (i.e., change “monitoring” to “monitor”). Next, we calculate a metric called TF-IDF. TF-IDF (which stands for “Term frequency–inverse document frequency”) essentially counts the appearance of a particular word in a document and then penalizes the “score” for the word if it appears in many different documents. For example, the word “the” (if it weren’t already removed by our stop word elimination) would appear quite frequently in a single document; but because it appears numerous times in every document, it gets penalized to count for nothing. Conversely, if one article happens to be about “biblio-wizardry”, and only two other documents contain the terms “biblio-wizardry” we can start to assume those texts might be related; particularly as we assess other common terms across the documents.

In this case, ranking scholarly articles utilizing TF-IDF lets us get a pretty good idea of when two documents are related; and when two documents have little to do with each other. From there, we can take these terms and marry them up with peer-reviewers. If we discover that one person has a penchant for reviewing articles about “biblio-wizardry” but never touches the (frankly more profane) “vocabu-sorcery”, we know how to route new abstracts as they come in by applying the same technique.

How achievable is this?

The possibilities for text analytics are endless. While it can be challenging to extract the information and no text analytics project looks the same, I believe there is an absolute treasure trove of value to be discovered. From automating discrete data identification, to gaining a more holistic view of your customers, text analytics is worth investigating.

Digging In

Artificial Intelligence
The State of AI: Building Trust and Aligning Strategy to Drive Adoption and Impact
If you’ve been in a room with technology leaders lately, you’ve probably heard a lot of excitement – and a lot of frustration – about AI. Artificial intelligence has moved rapidly from a conceptual tool to a C-suite priority that offers boundless potential, but implementation remains a messy, human process. The truth is, we’re all […]
Read More
Artificial Intelligence
Can You Shortcut Testing to Expedite Your Digital Roadmap?
Slow testing cycles are the silent blockers to your product roadmap – it’s time for a change. AI-enabled automated testing can be a force multiplier as businesses look to increase the speed of digital transformation. In this article, we will cover: The Challenge: Complexities in Testing The AI-Driven Solution Innovations of AI-Driven Test Automation Real-World […]
Read More
Artificial Intelligence
Transforming the Tractor Supply Store Experience: AI’s Role in Modern Retail
Join us for a fireside conversation on how AI is reshaping the in-store experience at Tractor Supply. Business and technology leaders will explore the real-world impact of AI across retail—unpacking practical use cases, leadership insights, and future possibilities.
Read More
Artificial Intelligence
Unlocking Your Hidden Goldmine of Information: The Power of Document Intelligence
Did you know you are already sitting on a hidden goldmine of information that can deliver powerful, actionable insights? Here’s a truth bomb: a mountain of knowledge – and vast untapped potential – resides in a wellspring, far below the surface of your organization. Every text document, contract, report, policy, email, or manual contains critical […]
Read More
Artificial Intelligence
Building a Multi-Model LLM Chatbot with Azure OpenAI and Amazon Bedrock
This video will explore the journey of the creation of a Multi-Model LLM Chatbot that utilizes both Azure OpenAI and Amazon Bedrock.
Read More
Artificial Intelligence
Is Reporting Dead? The Shift to Actionable Insights with Agentic AI
Traditional reporting has been a cornerstone of business operations for decades—but is it really driving meaningful change?
Read More

Your Privacy

Text Analytics Spells “Big Savings”

What is Text Analytics?

What about Natural Language Processing?

Text Analytics and Machine Learning

A Real World Example

How achievable is this?

Digging In

The State of AI: Building Trust and Aligning Strategy to Drive Adoption and Impact

Can You Shortcut Testing to Expedite Your Digital Roadmap?

Transforming the Tractor Supply Store Experience: AI’s Role in Modern Retail

Unlocking Your Hidden Goldmine of Information: The Power of Document Intelligence

Building a Multi-Model LLM Chatbot with Azure OpenAI and Amazon Bedrock

Is Reporting Dead? The Shift to Actionable Insights with Agentic AI