MLOps (or Machine Learning Operations) is a core function of Machine Learning engineering, that focuses on streamlining the process of taking ML models to production, and maintaining and monitoring them.
But before we get into more details about MLOps, it’s important to understand what operationalization of machine learning is, why it’s important, and how it gave rise to a variety of MLOps tools we see today.
Operationalization of Machine Learning (ML) is the process of making machine learning models run every day, reliably, and integrated into data products. Standalone/ad hoc development of models is not compelling anymore (they don’t make nearly as much business sense). “The majority (85%) of respondent organizations are evaluating AI or using it in production,” according to “AI adoption in the enterprise 2020”. This is the trend across industries, geographies, and scales.
However, operationalization of ML has proven to be more difficult than people expect. Very few ML models reach the production stage. The main challenge is robustness. According to the same report, “Whether it’s controlling for common risk factors—bias in model development, missing or poorly conditioned data, the tendency of models to degrade in production—or instantiating formal processes to promote data governance, adopters will have their work cut out for them as they work to establish reliable AI production lines.”
Machine Learning Platforms
All prominent companies are building ML platforms to build models reliably and scalably. Uber has Michelangelo, Stripe has RailYard, AirBnB has BigHead, and Swiggy has DSP. In fact, Uber shared their motivation for Michelangelo: “there were no systems in place to build reliable, uniform, and reproducible pipelines for creating and managing training and prediction data at scale. Prior to Michelangelo, it was not possible to train models larger than what would fit on data scientists’ desktop machines, and there was neither a standard place to store the results of training experiments nor an easy way to compare one experiment to another. Most importantly, there was no established path to deploying a model into production.”
Google’s engineers detailed what these platforms achieved (Hidden Technical Debt in Machine Learning Systems) at around the same time (2017).
The phrase “technical debt” which Ward Cunningham popularized in 1992, refers to the long-term costs of advancing quickly in software engineering. According to the above-mentioned report presented by Google, technical debt in ML models can occur if one does not account for ML-specific risk factors in system design. Some of these include:
- Boundary erosion: Due to the vegetative origin of machine learning, it is difficult to impose rigid abstraction boundaries on machine learning systems by outlining their intended behavior.
- Entanglement: Machine learning models are essentially data-dependent, implying different data characteristics have varying effects on the model. The model output is subject to fluctuations if the input distribution of a particular feature changes. Since all inputs are connected inside the model, it is impossible for any input to be truly independent.
- Hidden feedback loops: This occurs when the ML model’s output is inadvertently fed into its own input. This results in analysis debt, where it is difficult to anticipate the behavior of a particular model before it is released. These feedback loops can take several forms, they are always more challenging to spot and fix if they arise gradually over time or if models are updated infrequently.
- ML-System Anti-Patterns: Despite the appeal and hype, every ML model only makes up a small portion of the total amount of code needed. The coding for data pre-processing, filtering, and post-processing is necessary yet sometimes overlooked. The paper discussed a number of topics, including pipeline jungles (this anti-pattern occurs when new signals and information sources are discovered and added using data preparation operations sequentially in a haphazard manner resembling a jungle), glue code (supplementary code added in general-purpose packages), and more.
- Data dependencies: According to the report, it is more challenging to unravel data dependencies than code dependencies. This is due to the fact that compilers and linkers can detect code dependencies through static analysis, whereas data dependencies have few tools available.
- Correction Cascade: This happens when an ML model fails to learn the information developers want it to learn, making them apply a hotfix to the model’s output. The accumulation of hotfixes results in a thick layer of heuristics on top of the ML model, which is known as a corrective cascade. As the layer thickness grows, it gets more difficult to tell the model apart from everything else around it.
- Configuration issues: As systems mature, they often have a large variety of adjustable parameters such as features employed, data selection, algorithm-specific learning settings, verification methods, and so on.
The structure of the technical debt problem remains the same, whether the model is simple or complex and whether the development is happening in a small company or large. MLOps Platforms are being used to standardize and manage the process of development, deployment, and operation of machine learning models in order to achieve robustness and grow the use.
Uber shared the lessons learned after multiple years of operation. A couple of them are:
- Models need to be monitored: “model monitoring and instrumentation is a key component of real-world machine learning solutions”
- Data is the hardest thing to get right: “data engineers spend a considerable percentage of their time running extraction and transformation routines over datasets”
A recent presentation by Atindriyo Sanyal, Co-Founder of Galileo, at DCAI Summit 2022, highlighted
- Developers should evaluate model performance based on a hybrid set of metrics. The hybrid set of metrics is a blend of generally used metrics like accuracy and F1, which are critical checkpoints for any ML engineers, and metrics that can gauge the health of the model, e.g., prediction latency.
- Emphasis should be on data quality rather than data quantity; with the increasing magnitude of data, there is a likelihood of risking more biases, class confusion, and labeling costs.
- Paying attention to the limitations of the ML models enables you to determine your model monitoring parameters.
Apart from these, Jacopo Tagliabue, Adjunct Professor of ML at NYU, believes that Machine learning models should be scaled reasonably. As an advocate of “the reasonable scale ML,” he reiterates that ML engineers often handle terabytes of data, in teams of 10 or fewer individuals, with a restricted computer budget, and without unlimited cloud computing accessible, in contrast to the ML teams at Amazon, Google, or Uber. As a result, small-scale companies cannot use the same strategy to incorporate machine learning into their company processes. Companies that do not cater to billions of consumers must build machine-learning models at a practical scale. In contrast to businesses like Google, using reasonable-scale ML can enable businesses to address the most significant current concerns and generate quick ROI.
MLOps (ML Operations)
As every organization is trying to build/buy its own Michelangelo, the way that ML models are developed is changing. MLOps or DevOps for ML–is the new framing and is growing in importance. Based on a recent report on interview study on industry experts, it is acclaimed that three workflow and infrastructure characteristics—Velocity, Validation, and Versioning—govern the success of ML models in production.
- Velocity: It’s crucial to be able to swiftly prototype and iterate on ideas because machine learning is an experimental discipline. In other words, it demands rigorous and constant testing, evaluation in terms of performance, and tweaking the natural dependent variables to observe any noticeable changes. This is why the productivity of ML engineers was attributed to development platforms that prioritized high experimentation velocity and debugging environments, thus allowing them to test ideas quickly.
- Validation: The validation procedures are primarily user-defined error detection routines used to examine the dataset for mistakes. It’s wise to test adjustments, eliminate bad ideas, and proactively check pipelines for vulnerabilities as early as possible because failures cost more to fix after people notice them.
- Versioning: It’s important to have a backup and maintain multiple versions of production models and datasets because it’s nearly impossible to foresee every bug before it happens. This helps with querying, debugging, and reducing production pipeline downtime. In order to fix problematic models that were already in use, ML engineers would swap the model to a simpler, older, or retrained version.
How the 3Vs apply for all sub-areas for MLOps
- DevOps for Models: Develop and deploy models (e.g., Domino Data)
- DevOps for Data: Preparing and monitoring data (e.g., Scribble Data’s Enrich Intelligence Platform)
To this end #1 – DevOps for Models (e.g., Domino Data Lab’s workbench) capabilities include:
- Model development, including A/B testing
- Exploration of large datasets
- Automatic tracking for reproducibility, reusability, and collaboration
- Scalable compute and deployment management
- Reports, dashboards, and API for model output
- Data preparation
- Integration with major compute platforms such as Kubernetes and spark
And as for #2 – DevOps for Data (e.g., Enrich) capabilities include:
- Feature Pipelines for transforming your raw data into features or labels
- A Feature Store for storing historical features and label data
- A Feature Server for serving the latest feature values in production
- An SDK to develop, document, and test feature engineering modules, including transforms, pipelines, and scheduling
- An App Store that enables downstream consumption of datasets with low code decision-making products
- A Monitoring Engine for detecting data quality or drift issues and alerting
Scribble Data’s Enrich platform supports MLOps by:
- Integrating with organizations’ data stack by connecting to the data storage and processing infrastructure, and allowing selection from a wide variety of options for decision-making or modeling. This enables the coupling of several data sources, which can result in the smooth transformation of unprocessed data into enriched, feature-engineered, and reliable data for in-depth analysis.
- Python SDK, which allows users to develop, document, and test feature engineering modules, including transforms, pipelines, and scheduling. This can allow users to generate reusable content, incorporate domain knowledge, optimize procedures, and roll out changes in a controlled and simple environment using a flexible and extensible interface. Moreover, it enables them to construct a workflow that connects various ML phases.
- Full-stack feature engineering platform that translates concept to operational data product in a few weeks. This is accomplished by structuring the input dataset into the format necessary for a particular model or machine learning method. The feature engineering functionality takes care of replacing missing data with statistical estimations of the missing values. Due to resource limitations, this offers many business use cases inside an organization while maximizing time to value.
- APIs which allow easy cataloging of input data stores and surfacing of features at any frequency. We can remove infrastructure complexities and data silos that prevent external users from accessing and using the data with APIs. Additionally, it allows you to define the data contracts and integration points clearly. Enrich assists businesses in gaining insights into many data sources so that executives may make informed decisions.
Conclusion
Since machine learning is experimental in nature, there is a need for testing several models with different hyperparameters before agreeing on the optimum model. However, many ML models never reach production. This is where MLOps assists in embracing robust processes for developing, deploying, and maintaining machine learning models. MLOps handles the production of the ML lifecycle by providing a transparent communication channel to ensure effective, easy cooperation between developers and organizations to reduce barriers and guarantee quick delivery into production.
Stay tuned for the next part of this article, where we do a deep dive into the MLOps journey.