AI for Organizing – Categorizing and Managing Data at Scale

AI for Organizing – Categorizing and Managing Data at Scale

So far in this series on Artificial Intelligence Artificial Intelligence: Monster or Mentor? and AI for Summarization – Enabling Human-Consumable Information, we looked at several key ways in which AI advances can improve human productivity in organizations. Last time’s article dove into Distillation – automating the path to value. In this article, we’ll look at the next common approach: Categorization.

Categorization is applying AI approaches to automate the labeling and organization of large data volumes, so that data can be routed, processed, and interpreted in the right way. Imagine an enormous coin sorter that takes dump truck loads of coins (mixture of currencies across the globe) and produces nicely sorted buckets of quarters, nickels, etc. for each currency. This is a poster-child example of categorization – the categories are well understood (we know up front all the possible “buckets” that coins can land in), and the sorter sorts accurately into categories. In many real-life applications, and especially when we are categorizing large volumes of data, we aren’t this lucky. We (a) might not know what the “buckets” should be, and we (b) often make mistakes in categorization. 

Topic Modelling is a great example of a machine learning approach to this challenge. 

Imagine you had every article ever written in the New York Times … but you didn’t know which section the article came from (Business? Sports? Opinions?). By applying Topic Modelling methods, such as Latent Dirichlet Allocation (LDA), the algorithms can learn and infer naturally occurring “buckets” of articles—or, as we call them in this case, topics (see 

This is a standard “hello world” example, but the approach is immensely useful in business. For example, these techniques can enable large-scale, minimally supervised categorization of inbound customer emails. Many organizations receive more email than their sorting/routing teams can handle, and in today’s instant social media world, it’s crucially important to not drop the ball on important customer communication. By applying these AI techniques, the algorithms can learn and identify the inherent structure in the correspondence, and with minimal human intervention help route volumes of email to the correct teams. 

Optimizing outbound customer communication is another good example of how categorization can have a large impact on business. Customer outreach and marketing campaigns are frequently plagued by low conversion rates—too few customers click or respond to the emails/ads/etc. they are sent. There’s many factors at play, but a significant one is how tailored the email is to their interests and needs. A generic listing of items on sale doesn’t create the same interest as a customized list based on their interests. 

This approach, often called market segmentation, enables organizations to identify groupings of their customers with shared interests and then customize email communications for each segment. Market segmentation leads to increased conversion rates and a better experience for the customer. 

Categorization has numerous other applications, but its impact is frequently greatest when a human is in the loop, as they gain most from having some inherent structure or organization (categories, if you will) applied to the enormous data volumes they’re trying to understand. Typically, this doesn’t solve the whole problem (e.g., in the newspaper article case, someone still needs to say these articles are “Sports”) but it makes the human drastically more efficient and productive.

The next article will wrap up this series, and we’ll take a look at how AI techniques for Prediction can enable humans to more efficiently find the figurative “needles in the haystack”.


Roy Wilds is the Chief Data Scientist at PHEMI Systems, a big data warehouse solutions company.

AI for Organizing – Categorizing and Managing Data at Scale

Title: AI for Summarization – Enabling Human-Consumable Information

Title: AI for Summarization – Enabling Human-Consumable Information

In the first post in this series on Artificial Intelligence: Monster or Mentor? we saw that there are several key ways in which AI advances can improve human productivity in organizations. In this article, we’ll look at the first: Distillation.

Distillation is applying AI approaches to automate making large data volumes interpretable. Just like miners distill tons of raw ore into ounces of gold using machines, the goal is to automate the identification of value in big data. Here, we’ll focus specifically on how Distillation can be applied to the business problem of customer experience.

Companies interact with their customers in more and more ways, across ever-increasing numbers of service channels: call centers, web-chat, email, automated chat-bots, social media—the list goes on. A growing challenge is to understand your customer’s experience, even as they traverse this massive web of communications and interactions. Being able to distill answers to simple questions like the following can deliver enormous business value.

  • Why are they contacting us?
  • How can we most effectively interact in order to reduce service channel costs? 
  • What can we do to make this a positive interaction?
  • Where/when should we intercede in the future to pre-empt the need for contacting us?

A brute-force, manual analysis of the raw interactions is just not possible. It’s true that a lot of insight can be gathered from analyzing certain specific interaction data. The challenge is knowing which needle in the haystack to focus on … or in this case which needle in the stack of needles to focus on.

 Through combinations of network analysis, temporal pattern mining, and interactive analysis, it’s now possible to leverage AI-assisted technologies that enable humans to answer business-oriented questions like those above to identify service optimizations and cost reductions, and deliver a better customer experience at the same time.

For instance, as customers traverse a business’s service channels, network analysis metrics like betweenness centrality[1] can identify “choke points” that customers are commonly funneled through. Such metrics let analysts focus their search and expose important interaction steps that can be optimized. As an example, these kinds of metrics can identify important patterns, such as cases where automated emails are key points of customer engagement. You might discover that having meaningful, customized emails instead of generic one-size-fits-all communications results in far-reaching impacts in customer interactions, because of the network effects such choke points create.

 That’s Distillation for communication patterns. But beyond communication pattern analysis, AI approaches based on NLU (Natural Language Understanding) offer insight into the communications themselves. AI based on NLU provides opportunities to distill, and quantify, the meaningful aspects of natural language interactions (emails, call transcriptions, etc.) associated with the customer journey. As Narrative Science observes:

Until the last few years, NLP has been the more dynamic research area; the focus was on getting more data into the computer (e.g. teaching the machine how to “read” an email and determine if it’s likely to be spam). The problem has now flipped. Our computers have access to vast repositories of data, and the problem is trying to get actual value and insights back out from all that data.[2]

 That’s a brief look at Distillation. The next article will look at Categorization applications—the way data tends to percolate through organizations by moving from one bucket into the next, being enriched, processed, and actioned upon along the way. See you next time!


Roy Wilds is the Chief Data Scientist at PHEMI Systems, a big data warehouse solutions company.

Title: AI for Summarization – Enabling Human-Consumable Information

Artificial Intelligence: Monster or Mentor?

Artificial Intelligence: Monster or Mentor?

Artificial Intelligence (AI) is everywhere these days. It’s simultaneously heralded as both the greatest thing since sliced bread — freeing us from driving cars, diagnosing diseases better, and so on — and the worst thing imaginable— displacing millions of jobs, and a step towards the inevitable AI domination of humans.

Lost in this hyperbole are the many simple, yet effective, enabling innovations that AI makes possible. Just like we rely on machines in the physical world to excavate holes for buildings or transport people or cargo long distances, we increasingly rely on machine algorithms such as machine learning (ML) models in the online, networked world. These innovations enable us to keep our email from overflowing with spam and to index and catalog enormous volumes of text for simple and fast retrieval, along with a wide range of other efficiencies. A recent article in Harvard Business Review[1] touched on this, highlighting the risks of large – yet ambiguous – AI projects compared to the measured possibilities businesses can undertake.

Artificial intelligence is a hot topic right now. Driven by a fear of losing out, companies in many industries have announced AI-focused initiatives. Unfortunately, most of these efforts will fail. They will fail not because AI is all hype, but because companies are approaching AI-driven innovation incorrectly. And this isn’t the first time companies have made this kind of mistake.

In this blog series we’re going to dive deeper into several exciting examples where AI enables human workers to function at a far greater level of productivity than they would otherwise. The productivity gain is realized through three main mechanisms, which often overlap:

  • Distillation — The ultimate summarizer. Crawling and analyzing enormous volumes of text, numbers, and data to generate a human-consumable concise summary.
  • Categorization — The ultimate sorter and router. Finding global patterns in enormous datasets to allow you to organize data at large scales.
  • Prediction — The ultimate assistant. Learning from human behavior and feedback to replicate and automate common tasks. 

These mechanisms are the core conceptual elements of many AI applications, and they aren’t new. However, here we’re going to emphasize the machine-human interaction they involve.

All too often, we data scientists and engineers get lost in the technical details of our algorithms and code, forgetting about the human that is intended to benefit from the work we immerse ourselves in. And I’m not just talking about ignoring the end-user interface. When we scientists and engineers focus on how end-users can benefit from AI capabilities throughout the process — as viewed through the lens of to distilling, categorizing, and predicting — we can genuinely help make people more productive.

The three case studies we’re going to focus on will touch on each of these mechanisms in turn.

In the next article, we’ll take a look at the first mechanism, distillation. We’ll take as our example the customer journey challenge. We’ll explore how, through combinations of network analysis, temporal pattern mining, and interactive analysis we can build AI-assisted technologies that enable humans to answer these questions and identify service optimizations and cost reductions, and deliver a better customer experience.

Subsequent articles will touch on:

  • Categorization – exploring how email triage can enable fixed resource teams to grow with volumes of email traffic
  • Prediction – reviewing an example of how AI-assisted medical diagnosis can enhance medical care and accuracy 

Artificial Intelligence: Monster or Mentor?