Optimize Experimentation Cost by Enabling Faster Access to Historical Data for Researchers
Expected business impact of $3 million/year for a leading global oil-and-natural-gas company using the Chemical Synthesis Miner solution, which enables easier and quicker access to the centralized repository.
The Problem
-
The R&D scientists used to manually find the necessary documents
needed for their experiments, which were cumbersome in the initial
phase of the research.
-
The R&D scientists used to inadvertently end up repeating
already conducted experiments due to the inability to access related
documents.
The Before state
-
Material recipes were present in the form of
scanned PDFs that contained printed and handwritten steps. These
needed to be digitized into a structured format for storage and
retrieval.
The LatentView
Solution
-
The Chemical Synthesis Miner solution used the following approach to
overcome challenges:
-
In-scope and out-of-scope nuances in the digitization process were
defined clearly.
-
A thorough evaluation of the best combination of OCR packages to
achieve the best results for handwritten text along with printed
text in this use case.
-
Custom-trained NLP models were used for Named Entity Recognition
and Dependency Parsing.
-
A tailored dictionary was created to identify commonly used
chemical terminology.
-
Constant iterations with the stakeholders to help ratify the
progress made.
-
CSM was the pioneer in LV to use a graph database and design an
end-to-end pipeline for translating an unstructured PDF document
containing printed and handwritten text, identifying entities and
their relationships, storing them in a graph database, and
retrieving information through a web application and chatbot. This
was also one of the first applications to be hosted in an Azure
environment. The innovation brought about in this project attracted
tremendous traction within LV and helped us gain more projects.
The After state
-
As indicated by the client, the expected business impact was to
the order of $3 million/year.
-
Using LV’s solution, the client can now access the centralized
repository, and weeks of effort by the research scientists are cut
down to a matter of minutes or even seconds.
-
Using this solution, the client can now spend the time, material
resources, man-hours of effort, money, and even human resources more
optimally.