Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 1.01 KB

File metadata and controls

22 lines (13 loc) · 1.01 KB

Hot Spot Analysis Of Large Scale Spatio-temporal Data Using Spark

Input: A collection of New York City Yellow Cab taxi trip records for January 2015. The source data may be clipped to an envelope encompassing the five New York City boroughs in order to remove some of the noisy error data (e.g., latitude 40.5N – 40.9N, longitude 73.7W – 74.25W).
Output: A list of the fifty most significant hot spot cells in time and space as identified using the Getis-Ord statistic.

Methodology:

  • JavaPairRDDs are formed from the input data

input to JP

  • Calculate Getis-Ord statistic from above formed RDDs

z-stat calculate

Results:

The results for the analysis are present in Results

Heat Map of the results Heat Map