Sentiment Analysis Using Python

Suvrat Arora 22 May, 2024 • 11 min read

Introduction

In today’s digital age, platforms like Twitter, Goodreads, and Amazon overflow with people’s opinions, making it crucial for organizations to extract insights from this massive volume of data. Sentiment Analysis in Python offers a powerful solution to this challenge. This technique, a subset of Natural Language Processing (NLP), involves classifying texts into sentiments such as positive, negative, or neutral. By employing various Python libraries and models, analysts can automate this process efficiently. Let’s delve into how to perform sentiment analysis in Python and explore some examples of its application.

Learning Outcomes

  • Gain insights into different approaches for sentiment analysis in Python, such as Text Blob, VADER, and machine learning-based models.
  • Discover how to preprocess text data for sentiment analysis, including cleaning, tokenization, and feature extraction.
  • Implement sentiment analysis on real-world datasets to classify text into positive, negative, or neutral sentiments.
  • Evaluate the performance of sentiment analysis models using appropriate metrics.
  • Explore advanced sentiment analysis techniques using deep learning models like LSTM and transformer-based models.
  • Apply sentiment analysis to practical use cases such as social media monitoring, product/service analysis, and stock price prediction.
  • Understand the limitations and challenges associated with sentiment analysis in Python.
  • Develop proficiency in Python for sentiment analysis applications across various domains.

This article was published as a part of the Data Science Blogathon.

What is Sentiment Analysis?

Sentiment Analysis is a use case of Natural Language Processing (NLP) and comes under the category of text classification. To put it simply, Sentiment Analysis involves classifying a text into various sentiments, such as positive or negative, Happy, Sad or Neutral, etc. Thus, the ultimate goal of sentiment analysis is to decipher the underlying mood, emotion, or sentiment of a text. This is also referred to as Opinion Mining.

Let us look at how a quick google search defines Sentiment Analysis:

sentiment analysis definition

How Does Sentiment Analysis Work? 

Sentiment analysis in Python typically works by employing natural language processing (NLP) techniques to analyze and understand the sentiment expressed in text. The process involves several steps:

  • Text Preprocessing: The text cleaning process involves removing irrelevant information, such as special characters, punctuation, and stopwords, from the text data.
  • Tokenization: The text is divided into individual words or tokens to facilitate analysis.
  • Feature Extraction: The text extraction process involves extracting relevant features from the text, such as words, n-grams, or even parts of speech.
  • Sentiment Classification: Machine learning algorithms or pre-trained models are used to classify the sentiment of each text instance. Researchers achieve this through supervised learning, where they train models on labeled data, or through pre-trained models that have learned sentiment patterns from large datasets.
  • Post-processing: The sentiment analysis results may undergo additional processing, such as aggregating sentiment scores or applying threshold rules to classify sentiments as positive, negative, or neutral.
  • Evaluation: Researchers assess the performance of the sentiment analysis model using evaluation metrics, such as accuracy, precision, recall, or F1 score.

Types of Sentiment Analysis

Various types of sentiment analysis can be performed, depending on the specific focus and objective of the analysis. Some common types include:

  • Document-Level Sentiment Analysis: This type of analysis determines the overall sentiment expressed in a document, such as a review or an article. It aims to classify the entire text as positive, negative, or neutral.
  • Sentence-Level Sentiment Analysis: Here, the sentiment of each sentence within a document is analyzed. This type provides a more granular understanding of the sentiment expressed in different text parts.
  • Aspect-Based Sentiment Analysis: This approach focuses on identifying and extracting the sentiment associated with specific aspects or entities mentioned in the text. For example, in a product review, the sentiment towards different features of the product (e.g., performance, design, usability) can be analyzed separately.
  • Entity-Level Sentiment Analysis: This type of analysis identifies the sentiment expressed towards specific entities or targets mentioned in the text, such as people, companies, or products. It helps understand the sentiment associated with different entities within the same document.
  • Comparative Sentiment Analysis: This approach involves comparing the sentiment between different entities or aspects mentioned in the text. It aims to identify the relative sentiment or preferences expressed towards various entities or features.

Gaining Insights and Making Decisions with Sentiment Analysis

Sentiment analysis is a valuable tool for organizations to understand customer sentiment and make informed decisions. For example, a perfume company selling online can use sentiment analysis to determine popular fragrances and offer discounts on unpopular ones. By analyzing customer reviews, the company can identify popular fragrances and make informed decisions. However, due to the vast number of fragrances available, it can be challenging to analyze all of them in one lifetime.

You simply gather all the reviews in one place and apply sentiment analysis to it. The following is a schematic representation of sentiment analysis on the reviews of three fragrances of perfumes — Lavender, Rose, and Lemon. (Please note that these reviews might have incorrect spellings, grammar, and punctuations as it is in the real-world scenarios)

sentiment analysis,

From these results, we can clearly see that:

  • Fragrance-1 (Lavender) has highly positive reviews by the customers which indicates your company can escalate its prices given its popularity.
  • Fragrance-2 (Rose) happens to have a neutral outlook amongst the customer which means your company should not change its pricing.
  • Fragrance-3 (Lemon) has an overall negative sentiment associated with it — thus, your company should consider offering a discount on it to balance the scales.

This was just a simple example of how sentiment analysis can help you gain insights into your products/services and help your organization make decisions.

Sentiment Analysis Use Cases

We just saw how sentiment analysis can empower organizations with insights that can help them make data-driven decisions. Now, let’s peep into some more use cases of sentiment analysis:

  • Social Media Monitoring for Brand Management: Brands can use sentiment analysis to gauge their Brand’s public outlook. For example, a company can gather all Tweets with the company’s mention or tag and perform sentiment analysis to learn the company’s public outlook.
  • Product/Service Analysis: Brands/Organizations can perform sentiment analysis on customer reviews to see how well a product or service is doing in the market and make future decisions accordingly.
  • Stock Price Prediction: Predicting whether the stocks of a company will go up or down is crucial for investors. One can determine the same by performing sentiment analysis on News Headlines of articles containing the company’s name. If the news headlines pertaining to a particular organization happen to have a positive sentiment — its stock prices should go up and vice-versa.

Ways to Perform Sentiment Analysis in Python

Python is one of the most powerful tools when it comes to performing data science tasks — it offers a multitude of ways to perform sentiment analysis in Python. The most popular ones are enlisted here:

  • Using Text Blob
  • Using Vader
  • Using Bag of Words Vectorization-based Models
  • Using LSTM-based Models
  • Using Transformer-based Models

Let’s dive deep into them one by one.

Note: For the purpose of demonstrations of methods 3 & 4 (Using Bag of Words Vectorization-based Models and Using LSTM-based Models) sentiment analysis has been used. It comprises more than 5000 text labelled as positive, negative or neutral. The dataset lies under the Creative Commons license.

Using Text Blob

Text Blob is a Python library for Natural Language Processing. Using Text Blob for sentiment analysis is quite simple. It takes text as an input and can return polarity and subjectivity as outputs.

  • Polarity determines the sentiment of the text. Its values lie in [-1,1] where -1 denotes a highly negative sentiment and 1 denotes a highly positive sentiment.
  • Subjectivity determines whether a text input is factual information or a personal opinion. Its value lies between [0,1] where a value closer to 0 denotes a piece of factual information and a value closer to 1 denotes a personal opinion.

Step1: Installation

pip install textblob

Step2: Importing Text Blob

from textblob import TextBlob

Step3: Code Implementation for Sentiment Analysis Using Text Blob

Writing code for sentiment analysis using TextBlob is fairly simple. Just import the TextBlob object and pass the text to be analyzed with appropriate attributes as follows:

from textblob import TextBlob

text_1 = "The movie was so awesome."
text_2 = "The food here tastes terrible."

#Determining the Polarity 
p_1 = TextBlob(text_1).sentiment.polarity
p_2 = TextBlob(text_2).sentiment.polarity

#Determining the Subjectivity
s_1 = TextBlob(text_1).sentiment.subjectivity
s_2 = TextBlob(text_2).sentiment.subjectivity

print("Polarity of Text 1 is", p_1)
print("Polarity of Text 2 is", p_2)
print("Subjectivity of Text 1 is", s_1)
print("Subjectivity of Text 2 is", s_2)

Output

Polarity of Text 1 is 1.0
Polarity of Text 2 is -1.0
Subjectivity of Text 1 is 1.0
Subjectivity of Text 2 is 1.0

Using VADER

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a rule-based sentiment analyzer that has been trained on social media text. Just like Text Blob, its usage in Python is pretty simple. We’ll see its usage in code implementation with an example in a while.

Step1: Installation

pip install vaderSentiment

Step2: Importing SentimentIntensityAnalyzer class from Vader

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Step3: Code for Sentiment Analysis Using Vader

Firstly, we need to create an object of the SentimentIntensityAnalyzer class; then we need to pass the text to the polarity_scores() function of the object as follows:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sentiment = SentimentIntensityAnalyzer()
text_1 = "The book was a perfect balance between wrtiting style and plot."
text_2 =  "The pizza tastes terrible."
sent_1 = sentiment.polarity_scores(text_1)
sent_2 = sentiment.polarity_scores(text_2)
print("Sentiment of text 1:", sent_1)
print("Sentiment of text 2:", sent_2)

Output:

Sentiment of text 1: {'neg': 0.0, 'neu': 0.73, 'pos': 0.27, 'compound': 0.5719} 
Sentiment of text 2: {'neg': 0.508, 'neu': 0.492, 'pos': 0.0, 'compound': -0.4767}

As we can see, a VaderSentiment object returns a dictionary of sentiment scores for the text to be analyzed.

Using Bag of Words Vectorization-Based Models

In the two approaches discussed as yet i.e. Text Blob and Vader, we have simply used Python libraries to perform sentiment analysis. Now we’ll discuss an approach wherein we’ll train our own model for the task. The steps involved in performing sentiment analysis using the Bag of Words Vectorization method are as follows:

  • Pre-Process the text of training data (Text pre-processing involves Normalization, Tokenization, Stopwords Removal, and Stemming/Lemmatization.)
  • Create a Bag of Words for the pre-processed text data using the Count Vectorization or TF-IDF Vectorization approach.
  • Train a suitable classification model on the processed data for sentiment classification.

Code for Sentiment Analysis using Bag of Words Vectorization Approach:

To build a sentiment analysis in python model using the BOW Vectorization Approach we need a labeled dataset. As stated earlier, the dataset used for this demonstration has been obtained from Kaggle. We have simply used sklearn’s count vectorizer to create the BOW. After, we trained a Multinomial Naive Bayes classifier, for which an accuracy score of 0.84 was obtained.

Dataset can be obtained from here.

#Loading the Dataset
import pandas as pd
data = pd.read_csv('Finance_data.csv')
#Pre-Prcoessing and Bag of Word Vectorization using Count Vectorizer
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import RegexpTokenizer
token = RegexpTokenizer(r'[a-zA-Z0-9]+')
cv = CountVectorizer(stop_words='english',ngram_range = (1,1),tokenizer = token.tokenize)
text_counts = cv.fit_transform(data['sentences'])
#Splitting the data into trainig and testing
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(text_counts, data['feedback'], test_size=0.25, random_state=5)
#Training the model
from sklearn.naive_bayes import MultinomialNB
MNB = MultinomialNB()
MNB.fit(X_train, Y_train)
#Caluclating the accuracy score of the model
from sklearn import metrics
predicted = MNB.predict(X_test)
accuracy_score = metrics.accuracy_score(predicted, Y_test)
print("Accuracuy Score: ",accuracy_score)

Output:

Accuracuy Score:  0.9111675126903553

The trained classifier can be used to predict the sentiment of any given text input.

Using LSTM-Based Models

Though we were able to obtain a decent accuracy score with the Bag of Words Vectorization method, it might fail to yield the same results when dealing with larger datasets. This gives rise to the need to employ deep learning-based models for the training of the sentiment analysis in python model.

For NLP tasks, we generally use RNN-based models since they are designed to deal with sequential data. Here, we’ll train an LSTM (Long Short Term Memory) model using TensorFlow with Keras. The steps to perform sentiment analysis using LSTM-based models are as follows:

  • Pre-Process the text of training data (Text pre-processing involves Normalization, Tokenization, Stopwords Removal, and Stemming/Lemmatization.)
  • Tokenizer is imported from Keras.preprocessing.text and created, fitting it to the entire training text. Text embeddings are generated using texts_to_sequence() and stored after padding to equal length. Embeddings are numerical/vectorized representations of text, not directly fed to the model.
  • The model is built using TensorFlow, including input, LSTM, and dense layers. Dropouts and hyperparameters are adjusted for accuracy. In inner layers, we use ReLU or LeakyReLU activation functions to avoid vanishing gradient problems, while in the output layer, we use Softmax or Sigmoid activation functions.

Code for Sentiment Analysis Using LSTM-based Model

Here, we have used the same dataset as we used in the case of the BOW approach. A training accuracy of 0.90 was obtained.

#Importing necessary libraries
import nltk
import pandas as pd
from textblob import Word
from nltk.corpus import stopwords
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from keras.models import Sequential
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split 
#Loading the dataset
data = pd.read_csv('Finance_data.csv')
#Pre-Processing the text 
def cleaning(df, stop_words):
    df['sentences'] = df['sentences'].apply(lambda x: ' '.join(x.lower() for x in x.split()))
    # Replacing the digits/numbers
    df['sentences'] = df['sentences'].str.replace('d', '')
    # Removing stop words
    df['sentences'] = df['sentences'].apply(lambda x: ' '.join(x for x in x.split() if x not in stop_words))
    # Lemmatization
    df['sentences'] = df['sentences'].apply(lambda x: ' '.join([Word(x).lemmatize() for x in x.split()]))
    return df
stop_words = stopwords.words('english')
data_cleaned = cleaning(data, stop_words)
#Generating Embeddings using tokenizer
tokenizer = Tokenizer(num_words=500, split=' ') 
tokenizer.fit_on_texts(data_cleaned['verified_reviews'].values)
X = tokenizer.texts_to_sequences(data_cleaned['verified_reviews'].values)
X = pad_sequences(X)
#Model Building
model = Sequential()
model.add(Embedding(500, 120, input_length = X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(704, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(352, activation='LeakyReLU'))
model.add(Dense(3, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])
print(model.summary())
#Model Training
model.fit(X_train, y_train, epochs = 20, batch_size=32, verbose =1)
#Model Testing
model.evaluate(X_test,y_test)

Using Transformer-Based Models

Transformer-based models are one of the most advanced Natural Language Processing Techniques. They follow an Encoder-Decoder-based architecture and employ the concepts of self-attention to yield impressive results. Though one can always build a transformer model from scratch, it is quite tedious a task. Thus, we can use pre-trained transformer models available on Hugging Face. Hugging Face is an open-source AI community that offers a multitude of pre-trained models for NLP applications. You can use these models as they are or fine-tune them for specific tasks.

Step1: Installation

pip install transformers

Step2: Importing SentimentIntensityAnalyzer class from Vader

import transformers

Step3: Code for Sentiment Analysis Using Transformer based models

To perform any task using transformers, we first need to import the pipeline function from transformers. Then, an object of the pipeline function is created and the task to be performed is passed as an argument (i.e sentiment analysis in our case). We can also specify the model that we need to use to perform the task. Here, since we have not mentioned the model to be used, the distillery-base-uncased-finetuned-sst-2-English mode is used by default for sentiment analysis. You can check out the list of available tasks and models here.

from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["It was the best of times.", "t was the worst of times."]
sentiment_pipeline(data)

Output

[{'label': 'POSITIVE', 'score': 0.999457061290741},  {'label': 'NEGATIVE', 'score': 0.9987301230430603}]

Conclusion

Sentiment analysis in Python offers powerful tools and methodologies to extract insights from textual data across diverse applications. Through this article, we have explored various approaches such as Text Blob, VADER, and machine learning-based models for sentiment analysis. We have learned how to preprocess text data, extract features, and train models to classify sentiments as positive, negative, or neutral. Additionally, we delved into advanced techniques including LSTM and transformer-based models, highlighting their capabilities in handling complex language patterns.

These methods enable organizations to monitor brand perception, analyze customer feedback, and even predict market trends based on sentiment. As sentiment analysis continues to evolve with advancements in natural language processing, mastering these techniques in Python will prove invaluable for making data-driven decisions in today’s digital age.

Key Takeaways

  • Python provides a versatile environment for performing sentiment analysis tasks due to its rich ecosystem of libraries and frameworks.
  • We explored multiple approaches including Text Blob, VADER, Bag of Words, LSTM, and Transformer-based models to analyze sentiment in textual data.
  • The process involves text preprocessing, tokenization, feature extraction, and applying machine learning or deep learning models to classify sentiments.
  • We applied these methods to real-world examples like customer reviews and social media data to classify sentiments as positive, negative, or neutral.
  • Sentiment analysis helps organizations monitor brand perception, analyze customer feedback, and make data-driven decisions.
  • With advancements in natural language processing, sentiment analysis in Python continues to evolve, offering more accurate and sophisticated methods for understanding textual sentiment.

Frequently Asked Questions

Q1. What do you mean by sentiment analysis?

A. Sentiment analysis means extracting and determining a text’s sentiment or emotional tone, such as positive, negative, or neutral.

Q2. What is sentiment analysis with example?

A. Sentiment analysis helps with social media posts, customer reviews, or news articles. For example, analyzing Twitter data to determine the overall sentiment towards a particular product or tracking customer sentiment in online reviews.

Q3. What are the two types of sentiment analysis?

A. The two types of sentiment analysis are (1) Document-level sentiment analysis, which analyzes the sentiment of an entire document, and (2) Sentence-level sentiment analysis, which focuses on analyzing the sentiment of individual sentences within a document.

Q4. What is sentiment analysis and its types?

A. Sentiment analysis is analyzing and classifying the sentiment expressed in text. Sentiment analysis can categorize into document-level and sentence-level sentiment analysis, where the former analyzes the sentiment of a whole document, and the latter focuses on the sentiment of individual sentences.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Suvrat Arora 22 May 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Natural Language Processing
Become a full stack data scientist