Some of these hands-on exercises expect that you first setup a local Kafka instance; see the exercises on Writing to and Reading from Kafka.

Connecting streaming programs through Kafka

Apache Kafka is a central component in many data stream infrastructures. Kafka is a distributed publish-subscribe system for data streams based on the concept of durable logs. A stream is called topic and can be populated by multiple producers and read by multiple consumers. Topics are persisted to harddisks and can be replayed.

Setup a local Apache Kafka instance

The following instructions show how to setup a local Kafka instance in a few steps.

  • Download Apache Kafka for Scala 2.11 here.

  • Extract the archive file and enter the extracted folder:

tar xvfz kafka_2.11-
cd kafka_2.11-
  • Start an Apache Zookeeper instance (Kafka uses ZooKeeper for distributed coordination) on localhost:2181:
./bin/ config/ &
  • Start a Kafka instance on localhost:9092:
./bin/ config/ &

Deleting Kafka topics

Note that Kafka persists topics (i.e., data streams) to /tmp/kafka_logs by default. Should you need to, topics can be removed (or cleared) by shutting Kafka down and deleting this directory. You can stop Kafka and ZooKeeper by calling the ./bin/ and ./bin/ scripts (in that order!).