The trouble with sentiment analysis

Two things spurred me to write this post. First, I’d given the same advice three times which, according to David Robinson‘s rule, meant it was time. And, second, this news story on a startup that claims that they can detect student emotions over Zoom. With those things in mind, here is my very simple guidance on sentiment analysis:

You should almost never do sentiment analysis.


A picture of a stop sign against a blue sky. There are two wind turbines in the background. Photo by lamoix, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons

Thanks for reading, hope that cleared things up. 🙂 In all seriousness, though, the places where it makes sense for a data scientist or NLP practitioner working in industry to use sentiment analysis are vanishingly rare. First, because it doesn’t work very well and second, because even when it does work it’s usually measuring the wrong thing.

What do I mean that sentiment analysis doesn’t work very well?

Let’s consider the most common approach. You have a list of words that are “positive” and a list of words that “negative” (or, more rarely, a list of words associated with different emotions) and you count how many words from each list appear in your target text. If there are more positive words you assign a positive sentiment, more negative words a negative one and if they are equal or (much more likely) none of the words on either list show up you assign a neutral sentiment. Of course, there are a variety of other, more sophisticated approaches, but this is the most common one.

As you can see, there’s a lot you’ll miss using this approach. The construction of the lists may not have been done with the target text producers in mind, for example. A list from five years ago may possibly have “lit” as a word with positive sentiment, but what about “based”? If you’re attempting to characterize young internet users you’ll need to be very careful with your sentiment lists.

And a word-based approach cannot account for context. “Our flight was late but Cindy at O’Hare managed to change our connection”, for example, may have gotten a negative sentiment assigned to it due to “late” (assuming you’re working within the transportation domain) but in context this is actually a pretty positive review.

Plus, of course, sarcasm will be completely missed by such an approach. If someone says “lovely weather outside” in the middle of a tornado warning then you, as a human, know that they probably aren’t very happy about the weather. Which ties in to my next point: the question of what you’re measuring.

The specific thing you’re attempting to measure using sentiment analysis is the sentiment expressed in the text, but often you’ll see folks (generally tacitly) make a leap to assuming that what you’re measuring is who someone actually felt when they were writing the text. That’s pretty clearly not the case: I can write “I’m absolutely livid” and “I feel ecstatic joy” without experiencing those emotions and I’d expect most sentiment analysis tools would give those statements very strong negative and positive sentiments, respectively.

This is important because, generally, people tend to care more about what people are feeling than what they’re expressing in text. (A good counterexample would be a digital humanities project showing the sentiment of different passages in a novel.) And figuring out someone’s emotions from text is much, much more difficult and in most cases completely utterly impossible. And speaking about what people are feeling….

Sentiment isn’t generally actually that useful

So given that sentiment analysis of text isn’t likely to tell you what people are feeling with any fidelity, where would you want to use it? A great question, and one that I think folks should ask more often. Usually when I see it being used, it’s in a case where there’s another, actually more useful, thing you want to know. Let’s look at some examples.

  • In a chatbot to know if the conversation is going well
    • Since most words are neutral and most turns are pretty short, you’re pretty unlikely to get helpful information unless things have already gone very, very wrong (as Zeerak Waseem points out,you can look for swearing). A far better thing to measure would be patterns that you shouldn’t see in efficient conversations, like lots of repetition or slightly rephrasing.
  • To determine if reviews are good or bad
    • This one in particular I find baffling: most reviews are associated with a star rating, which is a clear measure directly from the person. A better use of time would probably be to do topic modelling or some other sort of unsupervised text analysis to see if there are persistent issues that should be addressed.
  • To predict customer churn based on call center logs
    • If you have the raw input text and the labelled outcomes (churned or not) then I’d just build a raw classifier. You’re also likely to get more mileage out of other metadata features, like how often they’ve contacted support, number of support tickets filed or something similar. Someone can be very polite to a customer service rep and still churn because the product just doesn’t meet their needs.

In all of these cases “what is the sentiment being expressed in the text” just isn’t the useful question. Sure, it’s quick enough to do a sentiment analysis that you might add it as a new feature to just see if it adds anything… but I’d say your time would be better spent elsewhere.

So why do people still use sentiment analysis?

Great question. Probably one reason is that its often used as an example in teaching. It has a pretty simple intuition, there are lots of existing tools for it and can it help students develop intuitions about corpus methods. That means that a *lot * of people have done sentiment analysis already at some point, and it’s much simpler to use a method you are already familiar with. (I get it, I do it too.)

And another, more pernicious reason, is that it’s harder to define a new problem (for which a tool or measure might not exist) than it is to redefine it as an existing one where an existing, simple-to-use tool is available. Even if that redefinition means that the work is no longer actually all that useful.

So, with all that said, the next time you’re thinking that you need to do sentiment analysis I’d encourage you to spend some time really considering if you’re sure before you decide to dive in.

Leave a comment