Video Analytics: The “WHAT, WHY, WHERE & HOW”

Last Updated on March 4, 2020

A Prelude

Video analytics uses computer vision or video streams in near real-time as an input to deep learning algorithms like Convolution Neural Network (CNN) and You Only Look Once (YOLOV3) to output information about temporal and spatial events, attributes, events or patterns within the feeds. This information (patterns & detections) can be used to derive highly useful business insights.

The Need for Video Analytics

Businesses, whether big or small, are sitting on a humongous load of data. According to an article published on the world economic forum, it is estimated that by 2020, there would be around 44 zettabytes of data produced. If businesses can decipher the patterns and insights from this data, they can optimize their strategies to understand customer behaviour and hence provide superior customer satisfaction leading to greater business returns.

There is a significant increase in technological advancement & automation which has led to quick turnaround time and in turn increased accuracy and a higher ROI. Adaptation of these emerging technologies will also propel the business ahead of the competitors in the technology space. Video analytics has increased the structured approach to video content, which provides segregated information to the relevant users based on their search.

The global video analytics market size is valued at USD 2,488.5 Million in 2018 and is projected to reach USD 11,965.6 Million by the end of 2026, exhibiting a CAGR of 22.67% during the forecast period (2018 – 2026) highlighting the emerging scope of video analytics.

Various Use Case Scenarios –

Domain Applications:

Entertainment

Audience engagement in streaming and video content
Automation & streamlining sharing of video contents
Content consumption behaviour of the audience

Health-care
- Medical Imaging – a system capable of detecting a crack in bones or certain kind of cancerous cells from the medical imaging equipment

Retail

Image Segmentation – the capability of representing each pixel in the input image with its corresponding object labels
Object counting – a system capable of maintaining a count of each class of object identified in the observed scene.

Automotive & transport
- Object tracking – a system capable of locating a moving object over time in the video feed.
- License plate reading(OCR) – a system capable of capturing the vehicle registration plate and autonomously extract the registration number.
- Self-driving cars – autonomous vehicles capable of sensing its environment and moving safely with little or no human input.
Home automation
- Motion detection – a system capable of determining the presence of relevant motion in the observed scene
Security and public safety
- Facial recognition – system capable of identifying a person in the input video feed.
- Object detection – system capable of determining the presence of a class of the object and its corresponding location in the observed scene.
- Smoke/flame detection – detect flame and smoke by their characteristics such as color, flickering ratio, shape, pattern and moving direction.
- Tamper detection – It can be used to determine whether the video feed has been tampered with or not.

Implementation –

Faster R-CNN(Faster Region proposal based CNN)

The Fast R-CNN solved the fundamental problem of previous CNN architecture of using a static algorithm that had no learning. Fast R-CNN replaced the selective search algorithm by a region proposal network and this network is trained in conjunction with the main network.

YOLOv3 (You Only Look Once)

YOLOv3 is the state-of-the-art network for object detection with massive speed and accuracy and utilizes a single CNN which predicts the bounding boxes and the class probabilities for these boxes.

Ways to implement YOLO –

Train a network from scratch when you have a specific use case where you have lots of training data for the object detection task in hand.
Use a pre-trained model for a quick and dirty implementation when the need is to detect general objects like humans, cars, etc in the input image.
Use cloud providers like Google Cloud Platform (GCP), which provide powerful pre-trained models to assign labels to images and classify them into millions of predefined categories

Below is the emotion detection using cloud vision API which is classifying each face into predefined categories like happy, sad, neutral.

Video Analytics – A Game Changer

Video analytics applications in real-life scenarios such as Facial recognition provides a unique opportunity to identify individuals in real-time by verifying distinct characteristics against a number of private and public databases. On the other hand, in retail, the same technology can identify high profile shoppers entering a department store. In fact, through the use of video analytics, you can actually reduce your labor costs. With the computer on the job, you can reduce the number of personnel you have required to do the same job manually. Thus, one can say that with the constant technological evolutions of new forms of video, the role of video analytics has never had more relevance than it is having now and in the future as well. The role of video analytics companies will constantly grow with the requirement of video content analysis and solutions with the increase in video content that is published on a day to day basis.