Last updated: October 17, 2022.
In January 2020, both Microsoft and Google released what they referred to as ‘maturity’ models for MLOps. These outlined a series of levels that described the full spectrum of practices among organizations training and deploying machine learning models. A little later that same year, Google released an ‘AI Adoption Framework’ that tackled similar ground from a different angle. They offered their take on how to go from worst practices to best practices in terms of MLOps.
These frameworks are commonly referred to when thinking through how an organization is approaching machine learning workflows, so it’s worth being aware of what these models propose as well as whether there are other options for assessing your own practices.
Microsoft’s model is the most fine-grained in terms of how it distinguishes progress through the levels. It outlines five separate levels (starting at zero) that can be used to understand the extent of MLOps adoption and maturity in an organization.
Google’s level 0 corresponds to that of Microsoft, where everything is manual and there is no automation anywhere. Not only are processes script-driven and manual, but the teams involved in working no the various parts of developing or deploying a model are siloed from each other. After models are ‘deployed’, there is no way of checking how that model is doing or using some kind of feedback loop to catch errors or to roll back if needed.
At this point, the frameworks start to diverge somewhat. Microsoft’s level 1 (“DevOps, but no MLOps”) describes the beginnings of automation, where releases are still “painful” but builds and tests are automated and there is some limited feedback introduced. The next level is where the training process is automated — with the environment (data, infrastructure, code) versioned and reproducible — making releases of models manual but less painful than before. Some centralized tracking of model performance has also started be introduced at this point.
Level 3 focuses on the automation of model deployment, including A/B (and more complex) testing of models as well as full lineage and traceability of models. The overall environments (from training to test to production) are all fully managed and automated.
Level 4 is the final stage where everything is fully automated, allowing for easy monitoring and where improvements or errors are handled without the need for manual intervention.
Google’s assessment of ‘maturity’ in MLOps also relates to automation.
“The level of automation of these steps defines the maturity of the ML process, which reflects the velocity of training new models given new data or training new models given new implementations.” (source)
To Google, MLOps is therefore about how agile you can be when you’re going through your process. The real challenge, as they see it, “isn’t building an ML model, the challenge is building an integrated ML system and to continuously operate it in production”.
Unfortunately, the leap from the fully manual level to level 1 (“ML Pipeline Automation”) is a pretty sizeable one. This makes the Google framework harder to use as a diagnostic tool, since in many ways you either are doing MLOps or you aren’t; there isn’t much in the way of transitional steps between the two.
Level 1 focuses on the automation of the whole machine learning workflow and work cycle for a single pipeline. At this level, everything is completely automated, including the provision for retraining the entire pipeline when needed, full validation across data, code and models and so on.
Level 2 takes this one step further, abstracting another level higher and includes automation for many pipelines. It’s quite similar to level one, just with the ability to handle many models being trained, deployed and handled at the same time.
While it is useful to see how Google think about the most ‘mature’ implementation of MLOps, it doesn’t help much if you are seeking to reach that yourself. It doesn’t make sense to try to implement everything all at once, especially if you are a small organization.
A slightly different framing of how machine learning is treated within an organization can be found in the Google Cloud AI Adoption Framework white paper, a 37-page report published in June 2020. An organization with some tangential or singular uses for machine learning is said to be in the ‘tactical’ phase of adoption. At a certain point the number of these machine learning systems might increase to where there are several active and this is the ‘strategic’ phase. When machine learning models become a key part of the organization and their success, then they can be said to have reached the ‘transformational’ phase. Even though it is a bit more of a business-focused outline of how an organization goes from little machine learning to a lot, there is still considerable overlap between this and the maturity model mentioned above.
The frameworks explained above are useful diagnostic references, but ultimately they are more descriptive than prescriptive: it’s hard to know how to move through the levels if you’re starting from the beginning. As with many other software engineering best practices, or with DevOps, adoption needn’t simply be a boolean state. There are ways to incrementally work on improving how you iterate through the machine learning workflow from experimentation all the way to deployment (and responsively continue to iterate from there).
As one guide to the adoption of MLOps practices states, there may also be some value in starting your process by working on all the steps manually. You’ll likely learn a lot in the process about all the implicit assumptions that your process contains, just as taking a few hours to manually label your training data can be of tremendous benefit when it comes to understanding how to best approach training your model. (For an overview of some of the tools and options available to you as you navigate this landscape, checkout this useful post by Thinesh Sridar over on the NimbleBox blog.)
One option to thinking through your process might be to look at the MLOps Stack Template, a chart that encourages you to list the various technical decisions being made. The same author(s) have proposed an alternative version to the well-known ‘Business Model Canvas’ in the form of an ‘MLOps Stack Canvas’ which you can use as a framework to think through these incremental improvements.
One final suggestion would be to take a look at a much early iteration from Google, their ‘ML Test Score’ paper published in 2016/7. This is a far more practical take on how to approach ‘production readiness’ and the reduction of technical debt than the maturity frameworks. If you’re looking for ways to improve your machine learning systems, I’d recommend giving it a read over the maturity frameworks.
Even though we didn’t start building ZenML with these frameworks and levels in mind, we do tackle much of the same issues. We believe that to jump through the MLOps maturity levels needn’t be as painful as it currently is. Indeed, our goal is to help you move as quickly through towards a fully- or mostly-automated machine learning workflow. We see the best practices that guide you on this journey as following primarily from seeking to produce reproducible models and pipelines.
Our framework allows you to use the tools which you know and love, connecting them together in a way that covers the whole workflow from experimentation to deployment and onwards with continuous monitoring and training.
If you’re interested in learning more about ZenML, how we help you bring all your tools together and grow in MLOps maturity, visit our Github page, read our docs and test out our showcase examples. If you have questions or want to talk through your specific use case, feel free to reach out to us on Slack!