This repo contains a word count program that writes output to file.
-
Clone this repo
-
Uncomment line 14 when running on local. This line is commented so that we can use Docker master.
//.master("local") //uncomment this line when running on local
-
Build the project by running -
gradle clean build
-
Run
spark-submit
command asspark-submit --master local[4] --verbose --class com.pavanpkulkarni.dockerwordcount.DockerWordCount build/libs/Docker_WordCount_Spark-1.0.jar <input_filename> <output_directory>
E.g:
spark-submit --master local[4] --verbose --class com.pavanpkulkarni.dockerwordcount.DockerWordCount build/libs/Docker_WordCount_Spark-1.0.jar "data.txt" "output"
Output will be available under output/part-00000-xxxxx