Last updated: May 26, 2023
The 0.40.0 release introduces a completely reworked interface for developing your ZenML steps and pipelines.
In our continuous efforts to simplify and enhance your experience with ZenML,
we’re thrilled to roll out a significant update that relates to our pipeline and
step definition syntax. This substantial modification, the culmination of user
feedback and internal testing, is designed to make working with ZenML
much more natural, intuitive, and enjoyable.
At the core of this change we wanted to make it more flexible to work with two of ZenML’s core building blocks: pipelines and steps. We’ve overhauled the syntax with the primary aim to get out of your way and allow you to focus on what really matters: building efficient, reproducible, and robust machine learning pipelines.
We believe these improvements will make a considerable difference in your ZenML experience. Let’s dive into the new features that you will love using! First things first: new Pipeline
goodies:
External artifacts can be used to pass values to steps that are not produced by an upstream step. This common use case provides more flexibility when working with external data or models:
from zenml.steps.external_artifact import ExternalArtifact
@pipeline
def my_pipeline(lr: float):
data = process_data()
trainer(data=data, start_model=ExternalArtifact(svc.SVC(...)))
Instead of having to define an initial dataloader step, you can now just use these ExternalArtifact
objects directly within your pipeline definition.
Pipelines now support input parameters, making it easier to pass values to your steps. You can use the step directly in the pipeline function and pass pipeline parameters or raw input parameters:
@pipeline
def my_pipeline(lr: float):
data = process_data()
trainer(data=data, lr=lr, gamma=0.0002)
This allows you to configure (and run using) flexible hyperparameters. These input parameters become supercharged with our next feature: pipelines within pipelines!
Step parameters have also been improved: in previous versions, you had to define a separate class for step parameters using BaseParameters
. This is no longer necessary, although it is still supported for backward compatibility. You can now pass parameters directly in the step function:
@step
def trainer(data: pd.Dataframe, lr: float = 0.1, gamma: Optional[float] = 0.02) -> ...:
print(lr)
print(gamma)
You can now call pipelines within other pipelines. This does not execute the inner pipeline but instead adds its steps to the parent pipeline, allowing you to create modular and reusable workflows:
@pipeline(enable_cache=False)
def my_pipeline(a: int = 1):
p1_output = subpipeline(pipeline_param=22)
step_2(a=a, b=p1_output)
We’ve heard from lots of users that they’d like to have this feature, which might neatly combine with the ability to pass in input parameters to your pipeline. As always, you can definitely go overboard with the layers of abstraction used, but at least now you have the power to tackle some of those more complicated workflows.
Pipelines can now define inputs and outputs, providing a clearer interface for working with data and dependencies between pipelines:
@pipeline(enable_cache=False)
def subpipeline(pipeline_param: int):
out = step_1(k=None)
step_2(a=3, b=pipeline_param)
return 17
This would be useful, for example, when running an embedded pipeline that needed to pass some value to either a step or another pipeline. Really the sky’s the limit with these new flexible features!
You can now call steps multiple times inside a pipeline, allowing you to create more complex workflows and reuse steps with different parameters:
@pipeline
def my_pipeline(step_count: int) -> None:
data = load_data_step()
after = []
for i in range(step_count):
train_step(data, learning_rate=i * 0.0001, name=f"train_step_{i}")
after.append(f"train_step_{i}")
model = select_model_step(..., after=after)
This was also a much-requested feature from our users and community members that the new release now unlocks.
You’ll not only want to configure context-specific hyperparameters for your pipelines, but infrastructure-specific configuration is also important. We have a new way to do that:
.with_options()
When creating a pipeline, you should now use the .with_options()
method to configure it:
if __name__ == "__main__":
pipeline_copy = my_pipeline.with_options(
enable_cache=False,
)
pipeline_copy()
We added some quality-of-life improvements to how you can work with pipelines and steps:
You no longer need to create a pipeline instance and then run it separately. You can now pass parameters directly at pipeline instance creation and execute the pipeline in a single step:
my_pipeline(lr=0.000001)
This not only makes ZenML a little more Pythonic but it makes it easier to use because you can run our pipeline and steps just like you would imagine they’d work. To that end, you can now also call steps directly outside of a pipeline, making it easier to test and debug your code:
trainer(data=pd.Dataframe(...)
start_model=svc.SVC(...))
Note that this just runs the function so if you want your artifacts tracked, your code and runs versioned (i.e. all the benefits that ZenML brings) you’ll want to run these steps as part of a pipeline.
We thought about how to make working with pipelines and steps cleaner and easier so here are two other small improvements:
We have made the imports cleaner by removing the need to import BaseParameters
and step
separately. Now, you can simply import step
and pipeline
from zenml
:
from zenml import step, pipeline
Steps can now have Optional
, Union
, and Any
type annotations for their inputs and outputs. This allows you to pass different types of values at runtime, choose not to pass a value at all, or pass None
. You can also return any type and specify a materializer for it, or use the default cloudpickle
materializer:
@step
def trainer(data: pd.Dataframe, start_model: Union[svm.SVC, svm.SVR], coef0: Optional[int] = None) -> Any:
#...your code goes here...
Additionally, default values are allowed for step inputs.
The new interface is backwards-compatible, so you don’t need to worry about it breaking your existing code. However, we recommend switching to the new way of doing things for a more streamlined experience and we do consider the old way deprecated. (It will be removed in the future).
To migrate, simply update your imports, remove the BaseParameters
class, pass parameters directly in the step function, and update your pipeline definition and execution as shown in the examples above.
To get started, simply import the new @step
and @pipeline
decorator and check out our new starter guide for more information.
from zenml import step, pipeline
@step
def my_step(...):
...
@pipeline
def my_pipeline(...):
...
We hope you enjoy the improvements in ZenML 0.40.0 and find it easier to create and manage your pipelines. As always, we welcome your feedback and suggestions for future updates.
If you run into any issues or want to discuss a specific use case, please reach out to us on Slack.