Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

NiFi: Ingesting real-time data (Optional)

Introduction

In this exercise we will read data in real-time (e.g. Twitter), and store in HDFS:

Twitter --> NiFi --> HDFS

Mandatory:

  • Read from a real-time data source
  • Apply some internal processors (e.g. filtering, split, conversion, etc.)
  • Store in HDFS

Optional:

  • Make it available in Hive

Tip:

IMPORTANT NOTE (with regards to Twitter account creation): Beware that in some cases the creation of the Twitter Dev account can take a few days.

Alternative real-time data sources

If you don't want to use Twitter, feel free to use any other real-time data source:

Or even, you can just read from a source of data which changes over time (e.g. stocks or weather), so you can schedule it every x seconds/minutes. For example:

Alternative non real-time data source

If you don't want to use Twitter and you don't find any real-time data source, feel free to create any other *more or less complex" workflow.

Deliverables

This is what you will have to deliver:

  • Screenshot of the workflow
  • Short explanation of what you did (data source, transformations, etc.)

Resources