The task of the “Taxi Ride Cleansing” exercise is to cleanse a stream of TaxiRide events by removing events that do not start or end in New York City.
GeoUtils utility class provides a static method
isInNYC(float lon, float lat) to check if a location is within the NYC area.
This series of exercises is based a stream of taxi ride events. The Taxi Data Stream instructions show how to setup the
TaxiRideSource which generates a stream of
The result of the exercise should be a
DataStream<TaxiRide> that only contains events of taxi rides which both start and end in the New York City area as defined by
The resulting stream should be printed to standard out.
Rather than following the links in this section, you'll do better to find these classes in the flink-training-exercises project in your IDE. Both IntelliJ and Eclipse have ways to make it easy to
search for and navigate to classes and files. For IntelliJ, see the help on searching, or simply press the Shift key twice and then continue typing something like
RideCleansing and then select from the choices that popup.
This exercise uses these classes:
- Java: com.dataartisans.flinktraining.exercises.datastream_java.basics.RideCleansingExercise
- Scala: com.dataartisans.flinktraining.exercises.datastream_scala.basics.RideCleansingExercise
You will find the test for this exercise in
Like most of these exercises, at some point the
RideCleansingExercise class throws an exception
throw new MissingSolutionException();
Once you remove this line the test will fail until you provide a working solution. You might want to first try something clearly broken, such as
in order to verify that the test does indeed fail when you make a mistake, and then work on implementing a proper solution.
DataStream.filter(FilterFunction)transformation to filter events from a data stream. The
GeoUtils.isInNYC()function can be called within a
FilterFunctionto check if a location is in the New York City area. Your filter function should check both the starting and ending locations of each ride.
Reference solutions are available at GitHub and in the training exercises project: