Apache Flink® is an open source platform for scalable stream and batch data processing. It offers expressive APIs to define batch and streaming data flow programs and a robust and scalable engine to execute these jobs.

The Apache Flink community maintains a self-paced training course that contains a set of lessons and hands-on exercises. These training materials were originally developed by Ververica, and were donated to the Apache Flink project in May 2020.

This step-by-step introduction to Flink focuses on learning how to use the DataStream API to meet the needs of common, real-world use cases, including parallel ETL pipelines, streaming analytics, and event-driven applications. This training course includes:

  • An introduction to Flink, laying out Flink’s vision of a unified engine for batch and stream processing based on parallel streaming dataflows, stateful event-time processing, and state snapshots.
  • An introduction to the DataStream API, covering the basics of how applications are put together, along with a high-level description of the runtime.
  • A careful look at the APIs for working with Flink’s state backends, which together with checkpoints and savepoints deliver high throughput, low latency access to state that is fault tolerant and rescalable.
  • Detailed explanations of the role of event time processing and watermarks in implementing consistent, accurate streaming analytics.
  • Exercises and examples that illustrate how to implement data enrichment, time windows, process functions, side outputs, timers, and more.

Get Started