Improving Sentiment Classification in Natural Language Processing

Improving Sentiment Classification in Natural Language Processing


Sentiment Analysis evaluates a written statement in order to  determine whether it conveys a positive, negative, or neutral opinion regarding a topic. Such comments provide tremendous insight into the preferences and dislikes that customers might have towards a particular product or service. Consequently, modern market research projects will often include the analysis of written customer sentiment, especially that which is expressed via social media postings.

One social media channel that is usually monitored for sentiment trends is Twitter. This can be an appealing medium for market research, because statements are concise, to the point, and often issued in real-time when the author is dealing with a circumstance that evokes an impassioned opinion that lays bare their true feelings.

However, Twitter contains overwhelming volumes of noise that can obscure the sentiment signal of interest. This occurs in the form expressed sentiment that is irrelevant because it is aimed towards topics other than the one that the market researcher is focused on. Therefore,  Commercial-off-the-Shelf (COTS) sentiment classification software, while good at detecting positive or negative sentiment in general, will be much less accurate when required to detect sentiment pertaining to a specific topic, such as customer satisfaction.


To remove Twitter sentiment noise, AlgoTactica has developed a classifier that focuses exclusively on the detection of sentiment related to customer satisfaction. It has been designed to maximally reject any other type of sentiment, even though it might be positive or negative, by classifying it as neutral if it does not reference a customer satisfaction topic. During testing, the highly-focused nature of this algorithm demonstrated far superior results in comparison to a commercially-pretrained COTS classifier designed to detect general sentiment.


For this study, a training corpus of 12400 Tweets was designed in which customer service sentiment was labelled as positive or negative, and all other sentiment was labelled as neutral. Our classifier was subjected to 80 randomized training runs, with each run using data sampled from a larger percentage of the corpus. For each randomized corpus sample, our classifier was then tested with a sample of new data from the remainder of the corpus on which it had not been trained. A COTS classifier was also used on each remainder test set, and the results were compared. The accompanying graphs show that when trained on only a 20% fraction of the corpus (2480 Tweets) our classifier yielded superior out-of-sample test results relative to the COTS version, with scores further improving as training fraction size was increased.

Two metrics used to assess classifier performance are F-Measure and AUC; for each of these, the higher the classifier scores on the metric, the better it performs. In terms of accurately identifying customer service sentiment, the Classifier F-Measure diagram indicates that our classifier consistently outperformed the COTS version by a very high margin of increased accuracy. The COTS classifier faltered because it would very often classify sentiment as positive or negative, when it should actually have classified as neutral because the sentiment was not related to customer service. The Positive and Negative Sentiment graphs for AUC score also reveal similar results. For these types of sentiment, our classifier consistently scored much higher than the COTS classifier, with respect to accurately identifying only sentiment related to customer service.

Back to top