The following instructions guide you through the process of setting up a development environment for the purpose of developing, debugging, and executing solutions to the training exercises and examples on this site.
1. Software requirements
Flink supports Linux, OS X, and Windows as development environments for Flink programs and local execution. The following software is required for a Flink development setup and should be installed on your system:
- Java JDK 8 only (a JRE is not sufficient, and newer versions of Java will not work)
- Apache Maven 3.x
- an IDE for Java (and/or Scala) development. We recommend IntelliJ, but Eclipse and Visual Studio Code can be used so long as you stick to Java. For Scala you will need to use IntelliJ (and its Scala plugin).
Note that older and newer versions of Java are not supported. Only Java 8 will work; not Java 7, or 9 (or newer).
Note for Windows users: Many of the examples of shell commands provided in the training instructions are for UNIX systems. To make things easier, you may find it worthwhile to setup cygwin or WSL, but you can use the provided .bat scripts with plain cmd. For developing Flink jobs, Windows works reasonably well: you can run a Flink cluster on a single machine, submit jobs, run the webUI, and execute jobs in the IDE.
2. Clone and build the flink-training-exercises project
flink-training-exercises project contains exercises, tests, and reference solutions for the programming exercises, as well as an extensive collection of examples. Clone the
flink-training-exercises project from Github and build it.
For Java, use the java branch:
git clone --branch java https://github.com/dataArtisans/flink-training-exercises.git cd flink-training-exercises mvn clean package
For Scala, use the master branch:
git clone https://github.com/dataArtisans/flink-training-exercises.git cd flink-training-exercises mvn clean package
If you haven’t done this before, at this point you’ll end up downloading all of the dependencies for this Flink training exercises project. This usually takes a few minutes, depending on the speed of your internet connection.
If all of the tests pass and the build is successful, you are off to a good start.
~/.m2/settings.xml). If you don't already have any customized maven settings, you can use this:
<settings> <mirrors> <mirror> <id>nexus-aliyun</id> <mirrorOf>*</mirrorOf> <name>Nexus aliyun</name> <url>http://maven.aliyun.com/nexus/content/groups/public</url> </mirror> </mirrors> </settings>
3. Import the flink-training-exercises project into your IDE
The project needs to be imported as a maven project into your IDE.
Once that’s done you should be able to open
com.dataartisans.flinktraining.exercises.datastream_java.basics.RideCleansingTest and successfully run this test.
Note for Scala users: For Scala you will need to use IntelliJ with the JetBrains Scala plugin, and you will need to add a Scala 2.12 SDK to the Global Libraries section of the Project Structure.
4. Download the data sets
You will also need to download the taxi data files used in this training by running the following commands
wget http://training.ververica.com/trainingData/nycTaxiRides.gz wget http://training.ververica.com/trainingData/nycTaxiFares.gz
It doesn’t matter if you use wget or something else (like curl, or Chrome) to download these files, but however you get the data, do not decompress or rename the
.gz files. Some browsers will do the wrong thing by default.
To learn more about this data, see Using the Taxi Data Streams.
Note: There’s a hardwired path to these data files in the exercises. Before trying to execute them, read How to do the Labs.
If you want to also setup a local cluster for executing Flink jobs outside the IDE, see Setting up a Local Flink Cluster.
If you want to use the SQL client, see Setting up the SQL Client.