The source code for my HBase project data manipulation
The source code for data generation is located in https://github.com/Linshuanting/HbaseDataGenerate
Contributors:
sShaAanGg
Linshuanting
- Centos7.9
- Java-1.8.0_202
- Hadoop-3.2.1
- Zookeeper-3.6.3
- Hbase-2.3.7
- maven-3.8.5
Java
org.apache.hadoop.hbase (essential for HBase client API)
org.apache.poi (for the excel format input file)
Please run the commands below at the root directory (HBaseDMsource)
export CLASSPATH=$HBASE_HOME/lib/*:$HADOOP_HOME/lib/native/*:$HADOOP_HOME/share/hadoop/client/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*
javac -d target/ -cp target:$CLASSPATH src/*.java
javac -d target -cp target:$CLASSPATH src/com/tools/*.java
javac -d target -cp target/com/tools/:$CLASSPATH:target/ src/com/data/*.java
cd target && jar -cfe DataGet.jar com/data/DataGet com/data/*.class com/tools/*.class && cd ..
java -cp $CLASSPATH:target/ Processor
java -cp $CLASSPATH:target/DataGet.jar com.data.DataGet
There is no main() function in GetData(). Functions in GetData.*() are called by Processor.
There are 2 tables currently. PutData1.java and PutData2.java puts data into table1 and table2 respectively; Processor calls getData() from GetData1 and GetData2, which fetches data from table1 through the row keys of covid-19 patients(their phone numbers); then we would get a Map<Integer, Long> which maps the place codes(locations) visited by them to the corresponding timestamps. Next, we can fetch data from table2 to determine who must be quarantined.
| Map | (RK) place code | (CF) pos | (CQ) position code | (value) isPositionCodeExist |
hbase(main):004:0> desc 'MAP'
Table MAP is ENABLED
MAP
COLUMN FAMILIES DESCRIPTION
{NAME => 'pos', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NON
E', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s)
Quota is disabled
hbase(main):005:0> scan "MAP"
1000000 row(s)
Took 127.8527 seconds
| Map | (RK) phone number | (CF) liv | (CQ) living pattern | (value) name |
hbase(main):007:0> desc 'PEOPLE'
Table PEOPLE is ENABLED
PEOPLE
COLUMN FAMILIES DESCRIPTION
{NAME => 'liv', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NON
E', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s)
Quota is disabled
hbase(main):009:0> scan 'PEOPLE'
99955 row(s)
Took 22.5512 seconds
(It was really weird that 45 people were lost)
| 1st | phone number | position(VERSIONS => 100) | position code | location |
Row key: phone number; Column family: position(VERSIONS => 100); Column qualifier: position code; value: locaion
hbase:018:0> desc 'table1'
Table table1 is ENABLED
table1
COLUMN FAMILIES DESCRIPTION
{NAME => 'pos', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '100', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION =>
'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s)
Quota is disabled
| 2nd | location | phone_numbers(VERSIONS => 100) | phone number | position code |
Row key: location; Column family: phone_numbers(VERSIONS => 100); Column qualifier: phone number; value: position code
hbase:011:0> desc 'table2'
Table table2 is ENABLED
table2
COLUMN FAMILIES DESCRIPTION
{NAME => 'pho', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '100', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION =>
'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s)
Quota is disabled
row_key: (String) phonenum
columnFamily: pos
columnQualifier: (long)positionCode
value: (int)placeCode
row_key: (int)placeCode
columnFamily: pho
columnQualifier: (String)phonenum
value: (long)positionCode
row_key: (String)xxx_phonenum
columnFamily: All_of_the_time
columnQualifier: (String)time
value: (String)placeCode
row_key: (String)xxx_placecode_time
columnFamily: People
columnQualifier: (String)phonenum
value: null
row_key: (String)xxx_phonenum
columnFamily: All_of_the_time
columnQualifier: (String)time
value: (String)placeCode
row_key: (String)xxx_placecode
columnFamily: All_position_time
columnQualifier: (String)time
value: phonenum