Organizations today use Machine Learning (ML) solutions to solve a wide variety of problems. ML models have become an integral part of the day-to-day activity of organizations and consumers equally. We face an ever-growing demand for improved models to solve new problems continually and re-calibrate old models to solve the old problems in a new way. We find ourselves in need of a tool that enables building and deploying machine learning models faster with minimalistic effort.
Auto-ML: Accelerating model build and deployment
Automated machine learning (Auto-ML) is an area that aims to automate (parts of) the construction and use of machine learning pipelines for accelerated build and deployment of machine learning models. This automation enables a wider audience to make effective and responsible use of machine learning. AutoML tools today can automate the build of a large variety of ML models, for example, Regression, Classification, Time Series, Text Analytics, and Neural Networks.
Most AutoML tools achieve automation by analyzing data and selecting algorithms based on the analysis, followed by fitting and tuning models aligned with the algorithms chosen.
Download the ‘AutoML: The Future of Machine Learning’ infographic here.
Role of AutoML in DataScience
Let us start by understanding an AutoML tool’s role in a machine learning or data science project. A typical data science project goes through multiple development stages, starting from problem identification and business understanding to model development and deployment. Some of the intermediate steps need more human intervention and decision-making than others. Auto-ML tries to automate many stages in the development and deployment of a machine learning model. A few steps that Auto-ML automates include:
- Data pre-processing: This process includes improving data quality and converting unstructured, raw data to a structured format with methods like data cleaning, data integration, data transformation, and data reduction.
- Feature engineering: AutoML can automate this method to create more compatible features with machine learning algorithms by analyzing the input data.
- Feature extraction: This process includes combining different features or datasets to generate new features that will enable more accurate results and reduce the size of data processed.
- Feature selection: AutoML can automate the task of selecting only useful features for processing.
- Algorithm selection & hyperparameter optimization: AutoML tools can choose optimal hyperparameters and algorithms without human intervention.
- Model deployment and monitoring: AutoML tools can deploy a model developed using the framework and monitor a model’s decay using dashboards available in the solution.
There are three major types of Auto-ML solution providers:
- Open Source: AI/ML is a scientific area where even tech giants publish their research findings despite significant corporate investments. These readily available findings pave the way for open-source solutions.
- Startups: Given the ever-growing demand for machine learning models. Numerous startups have launched to capture this opportunity. These startups are trying to fulfill this demand for machine learning models by creating new AutoML tools, which would enable faster time to market for machine learning models.
- Tech giants: Google claimed to be an ‘AI-first’ company and launched Google Could AutoML as one of the first AutoML tools by a tech giant. IBM’s SPSS modeler is also offering various AutoML tools.
Some of the critical solutions in this niche field are:
|Solution name||Category||URL for more info|
|Auto Vi ML||Opensource||https://github.com/AutoViML|
|Google Cloud AutoML||Tech Giant||https://cloud.google.com/automl|
Given the vast number of AutoML tools available, having a set of standard tests and KPIs that an AutoML tool could be scored on makes it easier for users to decide which solution fits their requirements and restrictions.
An Experiment to Compare AutoML and Classical ML Models
A sample dataset is selected that contains data points of two different classes. The experiment aims to build a machine learning model that classifies a new entry in one of the two classes, preferably correctly.
We built the machine learning model using two different methods:
- Classical ML
We will use the following KPIs to compare AutoML and Classical ML models:
- Optimizing result
- Ease of usage
- Completeness of automation
- Time complexity
- Computation complexity
We will use the observations gathered while building the models and the model’s quality to compare AutoML and Classical ML.
The AutoML tools chosen for this experiment are:
- H2O – H2O comes bundled with H2O flow, enabling model pipeline build, selection, and monitoring on a common interface. It also has extensive ensemble and deep learning capabilities that enable finding a good fit model an easy job.
- PyCaret – PyCaret has an arsenal of features like feature engineering, ensemble methods, and model explainability features that could improve the goodness of model fit over iterations.
Let us look at why these two AutoML tools are fit for our experiment:
- Both PyCaret and H2O have an extensive user base.
- PyCaret and H2O packages are frequently updated, improving the features available and adding new features making these solutions adapt according to market needs.
- These solutions are generalized for multiple machine learning problem classifications/regression and are loaded with multiple algorithm support.
- An array of feature engineering, extraction, interaction, and selection techniques are available with these solutions.
- Both these solutions have model deployment features with end-to-end pipeline creation capability.
- PyCaret and H2O are cloud compatible.
These two solutions are generalized and extensive enough to handle a wide range of problems. The frequent updates and adaptation hint towards these solutions’ longevity, making them ideal for this experiment.
Can AutoML and Classical ML join forces?
Even though AutoML requires increased computational power and memory and other infrastructure needs, it can still be used in its current form combined with the classical ML approach.
Since AutoML demands higher computation power and memory when compared with classical ML approaches. A smaller sample of the dataset can be used to derive essential inferences from AutoML tools, thereby reducing the computation power and memory needs. These inferences can then be directly used for building classical models.
Feature Engineering: AutoML tools like PyCaret and H2O have comprehensive feature engineering support. These tools can automate the generation of multiple additional features in a minimal period using a tiny piece of code. These features can then be used to build good fit models using classical modeling techniques.
Algorithm selection: Selecting the best algorithm for a problem statement is a long and iterative process. AutoML tools can quickly determine the best algorithm suited for the problem based on the KPI selected. AutoML tries to minimize or maximize the chosen KPI while iterating over multiple algorithms suited for the problem statement, thereby enabling the selection of the best-suited algorithm for the problem statement in a very short time frame using a very small piece of code.
The selected algorithm can then be used to get a good fit model using the classical ML approach.
The good fit model: The classical ML approach offers many techniques to tune a model and converge to the best fit model. Tuning a model is a long and iterative process.
AutoML tools can be used to tune a model based on the KPI chosen and automate the model tuning process while minimizing or maximizing the selected KPI. AutoML can yield a good fit model in a short period using a simple code.
The hyperparameters of this good fit model can be easily migrated to classically fit machine learning models.
- Market Interest
A Google trend result showcasing the amount of web search with the word ‘automl’ since 2004
Interest over time: Numbers represent search interest relative to the chart’s highest point for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means that there was not enough data for this term.
Image Source: Google Trends
With AutoML slowly gaining attention, the AutoML market has generated $270 million in 2019. It is expected to reach $14,512 million by 2030, advancing at a CAGR of 43.7% during the forecast period (2020–2030). The numbers show that AutoML has not peaked and that interest in AutoML will continue to grow.
- Market adoption
AutoML has not yet penetrated different markets due to:
- the availability of other infrastructure choices
- lack of demand from organizations since their needs and problems do not require this approach as yet.
According to a survey conducted by automl.org, compared to non-tech companies and governmental organizations, research labs and tech companies are a step ahead in adopting AutoML practices.
The result of this survey stated that,
“While overall AutoML adoption is similar across continents, non-profit and low-tech organizations see higher adoption in Europe than in North America. We also found that teams with multiple years of experience are more likely to adopt AutoML techniques. Finally, across the board, there is significant room to increase adoption of AutoML, but this is especially true for non-tech companies and governmental organizations.”
AutoML is here to stay and evolve.
AutoML, when used strategically, can be a potent tool in the hands of a data science practitioner. AutoML provides a vast arsenal of statistical tools at the tip of a simple command. Accessing and implementing such techniques without AutoML would require a considerable amount of manual time and effort. AutoML enables the use of multiple statistical techniques and can converge to a good fit model algometrically.
A data science practitioner who understands the problem statement and business context can use an AutoML tool to his benefit. He can access multiple statistical techniques with lesser lines of codes along with algometric optimization in a very short time, thereby quickly reaching the right solution.
Given the increasing interest and discussion around AutoML paired with early market adoptions, AutoML shows the tremendous potential of changing the machine learning landscape.