YouTube Movie Review Videos and Comments Analysis using NLP

youtube movie review banner

YouTube has over 1 billion active users. It has become the go-to video resource library, from music videos and lifestyle vlogs to how-to tutorials. The user growth rate for YouTube is projected to be 4.9% year-on-year, and it is the second largest search engine – nearly 95% of the global internet population are users. People rely on its videos for reviews and comments before purchasing a product or watching a movie. So, YouTube is an indispensable resource for marketers to look for influencers.

Certain channels have gained a massive following and have considerable subscribers. Therefore, the reach of their videos is high, and they are labeled influencers. Top influencers usually garner millions of views for each video. If movie marketers partner with these influencers, it helps convince their loyal viewers to watch the movie, recommend it to friends, and spread the word. So it helps to increase the popularity of the movie dramatically.

Not unlike on other social media platforms, fans and followers of YouTube influencers want them to be honest and authentic. One of the reasons why review videos do so well is because they are willing to put up with certain movie advertisements if it would benefit them. People are curious about what to watch or buy and whether it is worth their money and time.

Analytics makes it possible to understand and gauge the reactions and opinions of viewers and YouTube influencers within just a few minutes. By leveraging on NLP, we can perform video review and comment analysis to quickly learn the influencer’s and audience’s mindset about the specific film.

Tracking Movie Exposure Using Review and Comment Analysis 

One of the most important questions raised after the movie comes to market is what the general public thinks of it. This can be understood based on the reviews of the film and comments on YouTube. By analyzing sentiment, we can accurately understand people’s opinions of the movie. Whether a movie is good or not can also be presumed by which influencer reviews it.

Converting video reviews into data for analysis

The video available on YouTube is taken, and the audio is then transcribed to get to the correct emotion that NLP can handle.

The following are some of the existing NLP capabilities and applications:

  • Sentiment analysis – extract emotions and moods from the text. 
  • Topic Modeling – understand and summarize extensive collections of textual information with particular themes.
  • Building an N-gram – helps to understand the meaning of the sentence structure with the probability of the words.
  • Speech to Text – Transform speech to text and provide textual data. 
  • Summarization – Analyze large volumes of text and summarize in a minimal format.
Sentiment Analysis

One of the attributes of analyzing the audience is analyzing their sentiment, which is where sentiment analysis is used. Sentiment analysis is the process of research on identifying and categorizing the opinion of the piece of text from the review, whether the attitude towards the movie is positive, negative, or neutral. And with the help of sentiment analysis, we can figure out whether the movie is doing good in the market.

Topic Modeling

Topic modeling is recognizing the frequency of words from the data present in the document. This is useful because extracting the words from a document takes more time and is much more complex than extracting them from topics present in the document. There are many models with which this can be achieved, and one among them is LDA.

Latent Dirichlet Allocation (LDA) generates topics based on the frequency of the words from a set of documents. GridsearchCV helps to determine the optimal number of topics and other parameters for the model.

Building an N-gram
  • Unigram is a one-word sequence of a word and its frequency from the entire sentence
  • A bigram is a two-word sequence of words and their frequency from the entire sentence
  • A trigram is a three-word sequence of words and its frequency from the entire sentence
Speech to Text

Speech is the most common means of communication around the world. Speech Recognition is an essential feature in several applications such as home automation and artificial intelligence The Speech Recognition library in Python has the ability to listen to spoken words and identify them. You can then use speech recognition in Python to convert the spoken words into text. recognize_google() is a Google Web Speech API that helps convert audio files in the .wav format to text format.

speech recognition fig1
Fig 1 . Speech recognition

Text summarization refers to extracting/summarizing relevant information from a large document while retaining the most important information. BERT (Bidirectional Encoder Representations from Transformers) introduces a rather advanced approach to performing NLP tasks.

overview architecture fig2
Fig 2. Overview architecture of BERT Summarization

Converting comments into data for analysis

Specific movie review videos and comments are collected from YouTube with the help of the youtube-search-python library, which gives various types of results like video ID channel information & playlist info on YouTube. It allows us to get data from any specific channel or playlist. With the help of trigram and bigram, we can analyze the frequency of the words used in the comment.

Statistical analysis

While various analytical models help get the overall picture, YouTube viewers count, subscriber count, and the number of likes and dislikes help determine the audience’s thoughts towards each video.


This project could help us determine whether the audience was interested in the movie. The movie can be marketed via YouTube influencers since the review videos analysis are selected from the top influencers based on subscriber and viewer statistics. If it is leveraged in the right way, it can help to enhance the movie’s performance and increase its revenue.

Related Blogs