Leveling up Bytewax to v0.20!
🚀 Explore Advanced Visualization with python -m bytewax.visualize, Enhanced Operators, Improved Performance and more.
📣 We just released v0.20 with some awesome updates! There is a Python interface to make custom windows, dataflow visualization, a caching and enrichment operator and more
Please review the official migration guide and release notes to help you smoothly transition your code from Bytewax v0.19 to v0.20.
Here’s an overview of what’s new
Dataflow Structure Visualizer
Personally, the visualizer is one of my favorite additions in this release. You can now visualize your dataflow as a mermaid diagram by running:
python -m bytewax.visualize
This helps understand how your dataflow works during development and debugging.
You could, for example, dump the mermaid diagram into something like Excalidraw to jumpstart some visualizations.
Changes
There are a couple of breaking changes in this version of Bytewax. Depending on how you are currently using the library, these may affect you.
Recovery Serialization Format
Breaking Change
The internal library used for serialization has changed from using JsonPickle to Python's built-in pickle module. Recovery stores using the old format will be unusable after upgrading and should be recreated.We know this is a big change for those running production dataflows, but this guarantees fewer future headaches around serialization and Python versions.
Renaming of Core Operators
Breaking Change
unary
operator andUnaryLogic
have been renamed tostateful
andStatefulLogic
.Introduces a
stateful_batch
operator for lower-level batch control while managing state.
Windowing Operators and Configuration
Breaking Change
Windowing operators have moved from
bytewax.operators.window
tobytewax.operators.windowing
.ClockConfig
classes are now simplified to justClock
. For instance, SystemClockConfig is nowSystemClock
.WindowConfigs
are renamed toWindowers
, such asSessionWindow
toSessionWindower
.Windowing operators now return a set of streams encapsulated in a
WindowOut
dataclass, withWindowMetadata
and late-arriving data output into their own streams.
Fold Window Merges
Breaking Change
fold_window
now requires a merge
argument to handle session window merges.
Join Operators Update
Breaking Change
The join_named
and join_window_named
operators have been removed to improve compatibility with typed dataflows.
New Additions
Optional Overrides in StatefulLogic
StatefulLogic.on_notify
, StatefulLogic.on_eof
, and StatefulLogic.notify_at
are now optional overrides, retaining state and emitting nothing by default.
Custom Clocks and Windowers
Python interfaces are now available for custom clocks and windowers. Subclass Clock
and ClockLogic
or Windower
and WindowerLogic
to define custom time and window definitions.
New Operators
New
filter_map_value
operatorenrich_cached
operator for easier external data source joiningkey_rm
operator to remove keys from a KeyedStream
Performance and Functionality Enhancements
Session windows now correctly handle out-of-order data and joins.
Windowing operators process items in timestamp order, improving output consistency.
Simplified operator interfaces for better performance and usability.
Documentation and Guides
Documentation cleanups
New async connector guide
Performance guide addition
Updated deployment documentation
Fixed links and typos for better navigation
Community Contributions
A special thanks to Csaba Hoch for his invaluable contributions in keeping our documentation accurate and up to date.
Conclusion 🐝
The v0.20 release of Bytewax represents a significant upgrade, introducing powerful new features and improvements. From the dataflow visualizer to enriched operators and critical performance enhancements, this version offers robust tools for your streaming data processing needs.
As always, the Bytewax community looks forward to your feedback and contributions to further refine and expand this versatile data processing framework.
We encourage you to open an issue with any feedback or bugs you might have in the GitHub repo.
Cheers to seamless dataworkflows!