Language is tricky. Because the same words can mean different things to different people. And because it includes things like irony and sarcasm. In many respects language is a bit like going out for a drive. You see lots of road signs. Some are informative, like road names, house numbers, that kind of thing. These could be the factual, dispassionate elements of language. Other points are more informative like signs informing you of which road to take to the next town. These would the more compelling elements of language, telling you something you might not know. Finally there are the warning signs, slow down, sharp corner ahead. This will be shock and awe elements of speech. The sort of language which grabs your attention.
For example, some might lend gravity by the use of swear words. Some might be swayed by their use while others might be turned off. As a media research consultant one of my most tricky past-times is consistently assuming the role of the average reader or viewer and scoring coverage accordingly. Many enterprise monitoring and evaluation programs offer an automatic sentiment checker and it has been a theme of mine to highlight there inaccuracy. Many are no better than tossing a coin! My prefered course of action is to take a sample of the coverage, normally 25% of the coverage or 400 clips, whichever is bigger and sentiment score these by hand. This will take you to within a 4% margin or error, a pretty good qualitative measure.
You need to understand that not all sentiment checkers are the same and if you must use one consider SentiStrength. I have written in the past about SentiStrength, from the computer geeks at University of Wolverhampton. I have no commercial interest with them. While it lacks the sexy interface and graphed results of others it has the strongest behind the scenes processes and is about the most accurate one I have tried. It is also fairly transparent in how it goes about it, scoring from a list of over 2000 emotive words, recording a total for positive and negative based on their strength of meaning. For example ‘collapse’ has a score of -1, ‘failure’ has a score of -2 and ‘horrible’ has a score of -3. It has the obvious limitations in not understanding the context, the degree of relevance of the usage, and semantic factors like irony and sarcasm but when you choose this option that is the trade off.
SentiStrength has been around for a few years. You might wonder how long when the accompanying documentation makes references to the testing done using MySpace! But it does seem an effective scoring mechanism. I parsed the social coverage on BHS and Sir Philip Green for the past couple of week. Using an enterprise sentiment tool I got the negative coverage being equal to 21% of the coverage – surely too optimistic. Running the same coverage through SentiStrength registered a negative result for 55% of the coverage. A far more plausible result though possibly still a bit generous.
If you only take one thing away from this remember automated sentiment scoring comes with a massive health warning. Too many decision are made on the basis of inaccurate computer scoring. I really can’t overstate this point. I absolutely believe that if you want to track sentiment with confidence you need to be very careful with how you go about it. Check a selection of your automatically checked items and be prepared to either re-verify a sample or bring in specialist help.
Thank you for your interest and please don’t hesitate to leave a comment or sign up for future updates.
Leave a Reply