Move over Kubeflow, there’s a new sheriff in town: Github Actions 🤠 | ZenML Blog

Last updated: April 3, 2023.

Note: This example does not work for any ZenML versions > 0.36.1.

We’re really proud of our Kubeflow integration. It gives you a ton of power and flexibility and is a production-ready tool. But we also know that for many of you it’s one step too many. Setting up a Kubernetes cluster is probably nobody’s ideal way to spend their time, and it certainly requires some time investment to maintain.

We thought this was a concern worth addressing so I worked to build an alternative during the ZenHack Day we recently ran. GitHub Actions is a platform that allows you to execute arbitrary software development workflows right in your GitHub repository. It is most commonly used for CI/CD pipelines, but using the GitHub Actions orchestrator ZenML now enables you to easily run and schedule your machine learning pipelines as GitHub Actions workflows.

GitHub Actions: best in class for what?

Most technical decisions come with various kinds of tradeoffs, and it’s worth taking a moment to assess why you might want to use the GitHub Actions orchestrator in the first place.

Let’s start with the downsides:

So what’s the point, then? These are indeed some serious downsides. Firstly and foremostly, there’s the cost: running your pipelines on GitHub Actions is free. If you’re interested in running your pipelines in the cloud on serverless infrastructure, there’s probably no easier way to get started than to try out this orchestrator.

You are also spared the pain of maintaining a Kubernetes cluster. Once you’ve configured it (see below for instructions) there’s basically nothing you have to do on an ongoing basis. I hope you’re sold on trying it out and want to get started, so let’s not hold off any more.

(Note that some of the commands in this tutorial rely on environment variables or a specific working directory from previous commands, so be sure to run them in the same shell. In this tutorial we’re going to use Microsoft’s Azure platform for cloud storage and our MySQL database, but it works just as well on AWS or GCP.

Prerequisites

This tutorial assumes that you have:

Azure Setup

Create an account

If you don’t have an Azure account yet, go to https://azure.microsoft.com/en-gb/free/ and create one.

Create a resource group

Resource groups are a concept in Azure that allows us to bundle different resources that share a similar lifecycle. We’ll create a new resource group for this tutorial so we’ll be able to differentiate them from other resources in our account and easily delete them at the end.

Go to the Azure portal, click the hamburger button in the top left to open up the portal menu. Then, hover over the Resource groups section until a popup appears and click on the + Create button: Resource group step 1 Select a region and enter a name for your resource group before clicking on Review + create: Resource group step 2 Verify that all the information is correct and click on Create: Resource group step 3

Create a storage account

An Azure storage account is a grouping of Azure data storage objects which also provides a namespace and authentication options to access them. We’ll need a storage account to hold the blob storage container we’ll create in the next step.

Open up the portal menu again, but this time hover over the Storage accounts section and click on the + Create button in the popup once it appears: Storage account step 1

Select your previously created resource group, a region and a globally unique name and then click on Review + create:

Storage account step 2

Make sure that all the values are correct and click on Create:

Storage account step 3

Wait until the deployment is finished and click on Go to resource to open up your newly created storage account:

Storage account step 4

In the left menu, select Access keys:

Storage account step 5

Click on Show keys, and once the keys are visible, note down the storage account name and the value of the Key field of either key1 or key2. We’re going to use them for the <STORAGE_ACCOUNT_NAME> and <STORAGE_ACCOUNT_KEY> placeholders later.

Storage account step 6

Create an Azure Blob Storage Container

Next, we’re going to create an Azure Blob Storage Container. It will be used by ZenML to store the output artifacts of all our pipeline steps. To do so, select Containers in the Data storage section of the storage account:

Blob storage container step 1

Then click the + Container button on the top to create a new container:

Blob storage container step 2

Choose a name for the container and note it down. We’re going to use it later for the <BLOB_STORAGE_CONTAINER_NAME> placeholder. Then create the container by clicking the Create button.

Blob storage container step 3

GitHub Setup

Create a GitHub Personal Access Token

Next up, we’ll need to create a GitHub Personal Access Token that ZenML will use to authenticate with the GitHub API in order to store secrets and upload Docker images.

  1. Go to https://github.com, click on your profile image in the top right corner and select Settings:

    PAT step 1

  2. Scroll to the bottom and click on Developer Settings on the left side:

    PAT step 2

  3. Select Personal access tokens and click on Generate new token:

    PAT step 3

    PAT step 4

  4. Give your token a descriptive name for future reference and select the repo and write:packages scopes:

    PAT step 5

  5. Scroll to the bottom and click on Generate token. This will bring you to a page that allows you to copy your newly generated token:

    PAT step 6

Now that we’ve got our token, let’s store it in an environment variable for future steps. We’ll also store our GitHub username that this token was created for. Replace the <PLACEHOLDERS> in the following command and run it:

export GITHUB_USERNAME=<GITHUB_USERNAME>
export GITHUB_AUTHENTICATION_TOKEN=<PERSONAL_ACCESS_TOKEN>

Login to the Container registry

When we’ll run our pipeline later, ZenML will build a Docker image for us which will be used to execute the steps of the pipeline. In order to access this image inside GitHub Actions workflow, we’ll push it to the GitHub container registry. Running the following command will use the personal access token created in the previous step to authenticate our local Docker client with this container registry:

echo "$GITHUB_AUTHENTICATION_TOKEN" | docker login ghcr.io -u "$GITHUB_USERNAME" --password-stdin

Note: If you run into issues during this step, make sure you’ve set the environment variables in the previous step and Docker is running on your machine.

Fork and clone the tutorial repository

Time to fork and clone an example repository which contains a very simple ZenML pipeline that trains a SKLearn SVC classifier on the digits dataset.

If you’re new to ZenML, let’s quickly go over some basic concepts that help you understand what the code in this repository is doing:

Let’s get going:

  1. Go to https://github.com/zenml-io/github-actions-orchestrator-tutorial
  2. Click on Fork in the top right:

    Fork step 1

  3. Click on Create fork:

    Fork step 2

  4. Clone the repository to your local machine:
     git clone git@github.com:"$GITHUB_USERNAME"/github-actions-orchestrator-tutorial.git
     # or `git clone https://github.com/"$GITHUB_USERNAME"/github-actions-orchestrator-tutorial.git` if you want to authenticate with HTTPS instead of SSL
     cd github-actions-orchestrator-tutorial
    

ZenML Setup

Now that we’re done setting up and configuring all our infrastructure and external dependencies, it’s time to install ZenML and configure a ZenML stack that connects all these elements together.

Remote ZenML Server

For Advanced use cases where we have a remote orchestrator such as Vertex AI or to share stacks and pipeline information with team we need to have a separated non local remote ZenML Server that it can be accessible from your machine as well as all stack components that may need access to the server. Read more information about the use case here

In order to achieve this there are two different ways to get access to a remote ZenML Server.

  1. Deploy and manage the server manually on your own cloud/
  2. Sign up for ZenML Enterprise and get access to a hosted version of the ZenML Server with no setup required.

Installation

Let’s install ZenML and all the additional packages that we’re going to need to run our pipeline:

pip install zenml
zenml integration install -y github azure sklearn

We’re also going to initialize a ZenML repository to indicate which directories and files ZenML should include when building Docker images:

zenml init

Connect to ZenML Server

Once the deployment is finished, let’s connect to it by running the following command and logging in with the username and password you set during the deployment phase:

zenml connect --url=<DEPLOYMENT_URL>

Registering the stack

A ZenML stack consists of many components which all play a role in making your ML pipeline run in a smooth and reproducible manner. Let’s register all the components that we’re going to need for this tutorial!

These are all the components that we’re going to use for this tutorial, but ZenML offers additional components like:

With all components registered, we can now create and activate our ZenML stack. This makes sure ZenML knows which components to use when we’re going to run our pipeline later.

zenml stack register github_actions_stack \
    -o github_orchestrator \
    -x github_secrets_manager \
    -c github_container_registry \
    -a azure_artifact_store \
    --set

Registering the secrets

Once the stack is active, we can register the secret that ZenML needs to authenticate with our artifact store. We’re going to need the storage account name and key that we saved when we created our storage account earlier: Replace the <PLACEHOLDERS> in the following command with those concrete values and run it:

zenml secrets-manager secret register azure_store_auth \
    --schema=azure \
    --account_name=<STORAGE_ACCOUNT_NAME> \
    --account_key=<STORAGE_ACCOUNT_KEY>

Run the pipeline

That was quite a lot of setup, but luckily we’re (almost) done now. Let’s execute the python script that “runs” our pipeline and quickly discuss what it is doing:

python run.py

This script runs a ZenML pipeline using our active GitHub stack. The orchestrator will now build a Docker image with our pipeline code and all the requirements installed and push it to the GitHub container registry. Once the image is pushed, the orchestrator will write a GitHub Actions workflow file to the directory .github/workflows. Pushing this workflow file will trigger the actual execution of our ZenML pipeline. We’ll explain later at how to automate this step, but for our first pipeline run there is one last configuration step we need to do: We need to make sure our GitHub Actions are allowed to pull the Docker image that ZenML just pushed.

  1. Wait until the python script has finished running so the Docker image is pushed to GitHub.
  2. Head to https://github.com/users/<GITHUB_USERNAME>/packages/container/package/zenml-github-actions (replace <GITHUB_USERNAME> with your GitHub username) and select Package settings on the right side:

    Package permissions step 1

  3. In the Manage Actions access section, click on Add Repository:

    Package permissions step 2

  4. Search for your forked repository github-actions-orchestrator-tutorial and give it read permissions. Your package settings should then look like this:

    Package permissions step 3

Done! Now all that’s left to do is commit and push the workflow file:

git add .github/workflows
git commit -m "Add ZenML pipeline workflow"
git push

If we now check out the GitHub Actions for our repository here https://github.com/<GITHUB_USERNAME>/github-actions-orchestrator-tutorial/actions we should see our pipeline running! 🎉

Running pipeline

Finished pipeline

Automate the committing and pushing

If we want the orchestrator to automatically commit and push the workflow file for us, we can enable it with the following command:

zenml orchestrator update github_orchestrator --push=true

After this update, calling python run.py should automatically build and push a Docker image, commit and push the workflow file which will in turn run our pipeline on GitHub Actions.

Delete Azure Resources

Once we’re done experimenting, let’s delete all the resources we created on Azure so we don’t waste any compute/money. As we’ve bundled it all in one resource group, this step is very easy. Go the Azure portal and select your resource group in the list of resources:

Cleanup step 1

Next click on Delete resource group on the top:

Cleanup step 2

In the popup on the right side, type the resource group name and click Delete:

Cleanup step 3

This will take a few minutes, but after it’s finished all the resources we created should be gone.

Where to go from here?

If you have any question or feedback regarding this tutorial, let us know here or join our weekly community hour. If you want to know more about ZenML or see more examples, check out our docs, examples or join our Slack.

[Image Credit: Photo by Roman Synkevych on Unsplash]


More from us: