Last updated: February 24, 2023
In today’s information age, we are bombarded with a constant stream of news and media from a variety of sources. Summarizing tasks, particularly when it comes to news sources, can be a powerful tool for the efficient consumption of information. They distill complex or lengthy content into easily digestible chunks that can be scanned and absorbed quickly, allowing us to keep up with the news without being overwhelmed. They can also help us separate the signal from the noise, highlighting the most important details and helping us identify what’s worth further investigation.
This is where ZenNews comes into play. It offers a tool that uses ZenML to automate the summarization process and save users time and effort while providing them with the information they need. This can be particularly valuable for busy professionals or anyone who wants to keep up with the news but doesn’t have the time to read every article in full.
Apart from the advantages of solving a summarization task itself, this project aims to showcase some key benefits of using ZenML.
ZenML features a simple and clean Python SDK. In this project, we leverage it to define our steps and pipelines and to access/manage the resources and artifacts that we interact with along the way. This project shows how this such a design can significantly simplify the process of building robust applications.
ZenML is an extensible framework. We realize that ML projects often require custom-tailored solutions that deviate from off-the-shelf offerings. This is why we employed base abstractions that empower users to craft their solutions without needlessly reinventing the wheel. Take a look, for instance, at the custom materializer and the custom stack component showcased in this project to see how effortlessly one can implement custom solutions.
ZenML separates your code from your stack. In other words, it offers a distinct separation between the code and the underlying infrastructure. As you explore this example, you’ll notice how this separation can allow you to switch effortlessly between a local default stack and a remote deployment with scheduled pipelines, all with the simple flip of a flag.
ZenML can help you to scale up. While this PoC-like example serves as evidence of ZenML’s potential to streamline workflows and hasten the development process, it merely scratches the surface of its capabilities. To delve deeper into the extensive possibilities that ZenML has to offer, we encourage you to check out our docs.
The ZenNews project is published as a
PyPI package
that you can install through pip
:
pip install zennews
It includes a main pipeline called zen_news_pipeline
with three steps:
collect
, summarize
, and report
. In this version, the only collect
step
implementation is the bbc_news_source
that collects articles from the BBC
news feed, whereas the only summarize
step implementation uses the
bart_large_cnn_samsum
model to generate summaries, and the only report
step
creates a report and share the results using an alerter. Additionally, the
package includes a custom stack component called DiscordAlerter
.
Lastly, the package also includes a CLI application named zennews
, which
serves as the primary interface for interacting with the pipeline and its steps.
zennews --help
Once you have installed the zennews
package, you can immediately test it
locally. By running the following command, you will retrieve the top five
articles from the BBC news feed, summarize them, and display the results:
zennews bbc
As an output, you should see:
______ _ _
|___ / | \ | |
/ / ___ _ __ | \| | _____ _____
/ / / _ \ '_ \| . ` |/ _ \ \ /\ / / __|
/ /_| __/ | | | |\ | __/\ V V /\__ \
/_____\___|_| |_|_| \_|\___| \_/\_/ |___/
This is where you get the news.
This will change your active ZenML stack to the default stack and run the
pipeline locally. This also means that the pipeline will download and utilize
the model locally.
Would you like to continue? [y/N]: y
----- BUILDING AND RUNNING THE PIPELINE -----
Initializing the ZenML global configuration version to 0.32.1
Creating default project 'default' ...
Creating default user 'default' ...
Creating default stack for user 'default' in project default...
Setting the global active project to 'default'.
Setting the global active stack to default.
Using the default store for the global config.
Registered new pipeline with name zen_news_pipeline.
Running pipeline zen_news_pipeline on stack default (caching disabled)
Step collect has started.
Step collect has finished in 1.048s.
Step summarize has started.
Step summarize has finished in 48.476s.
Step report has started.
Step report has finished in 0.056s.
Pipeline run test_bbc02_24_2023_10_17_36 has finished in 50.220s.
Pipeline visualization can be seen in the ZenML Dashboard. Run zenml up to
see your pipeline!
----- PIPELINE RESULTS -----
╔═════════════════════════════════════════════════════════════════════════════╗
║ From BBC generated at 02/24/2023 10:18:52 ║
╚═════════════════════════════════════════════════════════════════════════════╝
[top_stories] Vuhledar, in eastern Ukraine, was once a prosperous mining town.
But now it's a wasteland - one of many on Ukraine's 1,300 kilometer front line.
Some of the fiercest fighting of recent months has been here. The town sits on
high ground in the heavily contested Donbas region. Link
[top_stories] Harvey Weinstein was sentenced to 16 years in prison for rape.
He was convicted of attacking an actress in a hotel room during a film festival.
The 70-year-old is already serving a 23-year prison sentence for a separate
conviction in New York. Link
[top_stories] Moldova has warned for weeks that Russia is plotting to seize
power. Officials rejected Russia's claims as "psy-ops" as part of the war.
President Maia Sandu spoke of unprecedented security challenges ahead. Link
[top_stories] Thomas H Lee, 78, found dead at his Manhattan office on Thursday
morning. His family say they are "extremely saddened" by his death. Mr Lee
helped pioneer the debt-fuelled corporate acquisition known as a
leveraged buyout. Link
[top_stories] The capsule is scheduled to dock at the ISS on Sunday. It is
not expected to bring home the three astronauts until September. The
original return vehicle was damaged by a tiny meteoroid. Link
You can also parameterize this process. In order to see the possible parameters, please use:
zennews bbc --help
To fully utilize the potential of an application like zennews
, it’s
recommended to schedule the summarization pipelines instead of manually
triggering them. This is possible by using --schedule
if you have a ZenML
stack which features an orchestrator that supports scheduling.
zennews bbc --schedule daily
If you would like to see how you can set up such a stack, you can visit the GitHub page which contains a much more substantial technical summary around implementation and how you can reproduce it on your local setup/system.
If you have any questions or feedback about this implementation of a news summarization tool and pipeline, let us know on Slack or join our weekly community meeting. If you want to know more about ZenML or see more examples, check out our docs, examples or our other projects.