Apache Beam is an open source, unified model for defining and executing data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and runtime-specific Runners for executing them.
In January 2016, Google and a number of partners submitted the Dataflow Programming Model and SDKs portion as an Apache Incubator Proposal, under the name Apache Beam (unified Batch + strEAM processing).
We're currently working hard to get the Beam site up and running over the next couple weeks, but in the meantime you can learn more about the Beam Model, though still under the original name of Dataflow, in the World Beyond Batch: Streaming 101 and Streaming 102 posts on O’Reilly’s Radar site, and the
Apache Beam (Batch + strEAM) is a model and set of APIs for doing both batch and streaming data processing. It was open-sourced by Google (with Cloudera and PayPal) in 2016 via an Apache incubator project.
Beam tries to take all that a step further via a model that makes it easy to describe the various aspects of the out-of-order processing that often is an issue when combining batch and streaming processing, as described in that Programming Model Comparison.
In particular, to quote from the comparison, The Dataflow model is designed to address, elegantly and in a way that is more modular, robust and easier to maintain:
... the four critical questions all data processing practitioners must attempt to answer when building their pipelines:
What results are calculated? Sums, joins, histograms, machine learning models?
Where in event time are results calculated? Does the time each event originally occurred affect results? Are results aggregated in fixed windows, sessions, or a single global window?
When in processing time are results materialized? Does the time each event is observed within the system affect results? When are results emitted? Speculatively, as data evolve? When data arrive late and results must be revised? Some combination of these?
How do refinements of results relate? If additional data arrive and results chang
Asked in February 2016Viewed 2,737 timesVoted 9Answered 2 times