Things You Need To Know: Apache Storm Topology

Mar 28, 2014

Streams

The core data structure in Storm is the tuple. A tuple is simply a list of named values (key-value pairs), and a Stream is an unbounded sequence of tuples. If you are familiar with complex event processing (CEP), you can think of Storm tuples as events.

Spouts

Spouts represent the main entry point of data into a Storm topology. Spouts act as adapters that connect to a source of data, transform the data into tuples, and emit the tuples as a stream.

Storm provides an API for implementing spouts. Developing a spout is largely a matter of writing the code necessary to consume data from a raw source or API. Potential data sources include:

Click streams from a web-based or mobile application
Twitter or other social network feeds
Sensor output
Application log events
Since spouts typically don’t implement any specific business logic, they can often be reused across multiple topologies.

Bolts

Bolts can be thought of as the operators (like DataTorrent) of your computation. They take as input of streams, process the data, and optionally emit one or more streams. Bolts may subscribe to streams emitted by spouts or other bolts, making it possible to create a complex network of stream transformations (aka DAG or Directed Acyclic Graph).

Bolts can perform processing and like the Spout API. Typical functions performed by bolts include:

Filtering tuples
Joins and aggregations
Calculations
HDFS reads and writes
Database reads and writes

*This content was inspired by: Storm Blueprints: Patterns for Distributed Real-time Computation

Things You Need To Know: Apache Storm Topology

Discussion about this post