ML Leaders vs. Laggards
There is a major difference between leaders and laggards when it comes to the use of Machine Learning (ML). Leaders have mastered the art of last-mile integration of machine learning models, while laggards have not.
In large enterprises, less than 40% of the models (or 10% or 20%, depending on who you ask) that are developed get deployed. Many data scientists do not get to see how their hard work gets used in the real world. Moreover, the deployed models are only loosely monitored and often not refreshed in a timely fashion.
Delays in getting models to production lead to lost opportunities to improve operations (e.g., demand planning) or customer experience (e.g., personalization), identify new opportunities for growth (e.g., segmentation). ML models tend to become less effective over time, and this wreaks havoc on the business process that the model is supposed to improve or optimize. In addition, a lack of governance on the models leads to issues with trust and transparency.
This situation is only likely to get worse over time, as “citizen” data scientists rapidly develop new models using tools such as AutoML, which help break down the barriers to creating models, only to see them waiting to be productionized for long periods.
Machine Learning Operations or MLOps is an emerging discipline to bridge the gap between the development and operations of managing Machine Learning models.
What is ML Ops?
ML Ops is a set of practices that combine model development, deployment, monitoring, and continuous improvement. The basic guiding principles of ML Ops is the same as DevOps, its counterpart in software development from which it borrows heavily: fast flow from development to operations, fast feedback from usage back to the business and data science teams, and continuous learning and improvement.
The following graphic shows the steps in a typical MLOps workflow:
As shown in the graphic, ML Ops methodology straddles development, testing models in a pre-release environment, and post-release monitoring. Testing involves releasing the model in a guarded perimeter of the real-world to test its performance in this limited real-world environment.
Once the KPI’s are validated, and the model’s effect is well understood, it is then moved to the wild (the real world). Here, the model is packaged, versioned, and continuously monitored based on business, technical, process, and data-related metrics. If there is a degradation in model effectiveness, data scientists can choose to retrain a new model. Then the cycle begins again.
MLOps involves creating a process for seamlessly moving from one step to another, establishing an organization, defining responsibilities and empowering people to manage this process, and implementing the tools needed to bring this process to life. Implementing MLOps provides feedback.
What are the Challenges in Implementing MLOps?
As mentioned before, MLOps is a cross-disciplined effort, involving data science, business, and engineering teams. However, some challenges are unique to ML.
- ML is developed by Data Scientists and not by software engineers. Their code is not optimized for production, and they use a wide variety of tools for doing pieces of work. It’s not wise to deploy prototype code into production.
- Experimentation, not predictability. 80% of data scientists’ time is spent creating features, trying various combinations of algorithms, hyperparameters, etc. Their focus is not on delivering predictable software or on reuse. To go from here to predictable, continuously improving models is challenging and complicated.
- Complexities in Testing and Packaging. Testing ML is not easy. Apart from tests for the code, there are tests for data and for models (which are mostly black box). In sophisticated ML scenarios, these need to be triggered automatically.
- Complexities in Deployment. In 90% of the simplest cases, IT engineers can wrap a model in an API. But this will not automate the training of the models based on new data. To do so, we need more complex deployment options.
- Need Pipelines to Detect Model Degeneration. Models need to be continuously improved. This needs additional pipelines to detect drift, and then some rules to take action based on data drifts or output drifts: more pipelines, more complexity.
- Many Tools, but no Unified Platform. At last count, there are over 200 tools for managing various aspects of ML. But there’s a need to put these tools together so that the development and operations are managed seamlessly. This is the challenge that MLOps addresses.
How can you get started in ML Ops?
In the next post, we will talk about how to get started on an ML Ops Journey.