Skip to content

Đọc văn bản tệp và ghi chúng vào tệp parquet từ HDFS với Spark Java và truy vấn với Spark SQL.

Notifications You must be signed in to change notification settings

demanejar/parquet-io-spark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

spark-base

Read file text and write them to parquet file from HDFS

Push 5 file .dat in folder sample_text to HDFS with path /usr/trannguyenhan (you can modify path, but you must modify path in code too)

  hdfs dfs -mkdir /usr
  hdfs dfs -mkdir /usr/trannguyenhan
  hdfs dfs -copyFromLocal <file_push> <path>

Later, go to folder project and open terminal and run :

  mvn clean package

was build file jar, file jar was born located in target folder. Run jar file with spark-submit :

  spark-submit --class main.Main --master local[2] target/<file>-V1.jar

One folder pageviewlog was born in HDFS. Open HDFS with chorme browser and see it.

About

Đọc văn bản tệp và ghi chúng vào tệp parquet từ HDFS với Spark Java và truy vấn với Spark SQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%