Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DE최우형 - W5M1 #272

Open
wants to merge 45 commits into
base: DE최우형_W5
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
4d37dbb
#[W2M2] : init
dn7638 Jul 8, 2024
b170433
W1 : squash commit
dn7638 Jul 10, 2024
0669677
W2 : squash commit
dn7638 Jul 10, 2024
68b9b2f
W2 : squash commit
dn7638 Jul 10, 2024
76bb203
W1 : refactor for building docker image
dn7638 Jul 10, 2024
8e4f139
W1 : refactor for making docker image
dn7638 Jul 10, 2024
615ec68
W1 : refactor for making docker image
dn7638 Jul 10, 2024
a915ed8
W1 : add docker script
dn7638 Jul 11, 2024
59fc0f2
W2 : add files for build docker image
dn7638 Jul 11, 2024
3277a79
W2 : try to make correct Dockerfile
dn7638 Jul 11, 2024
0dbf58d
W2 : Dockerfile for amazone linux 2
dn7638 Jul 11, 2024
492d952
resolve merge conflict M1<-M2
dn7638 Jul 11, 2024
9a534c2
W1M3
dn7638 Jul 11, 2024
9c4953c
W2 : update W2 README.md
dn7638 Jul 12, 2024
79d1ef2
W1M3 : apply review
dn7638 Jul 14, 2024
f659b05
W1M3 : apply review
dn7638 Jul 14, 2024
3408ac7
W2M1_4 : add explanation
dn7638 Jul 14, 2024
53846c8
Merge branch 'W1M3' into W2/main
dn7638 Jul 14, 2024
07fde9a
W2 : add explation
dn7638 Jul 14, 2024
0c3ff6d
W3_main : init for W3 missions
dn7638 Jul 15, 2024
335f174
W3M2 : init
dn7638 Jul 17, 2024
30da9b2
W3M2 : add Dockerfile.datanode
dn7638 Jul 17, 2024
dea5272
W3M2 : add Dockerfile.namenode
dn7638 Jul 17, 2024
9f9684f
W3M2 : add hadoop config files
dn7638 Jul 17, 2024
e38f253
W3M2 : add docker-compose and shell script for datanode
dn7638 Jul 17, 2024
c1727cf
W3M2 : add config file
dn7638 Jul 18, 2024
ab990cc
W3M2 : add start_script for hadoop services
dn7638 Jul 18, 2024
bbdeca2
W3M2 : add Dockerfiles
dn7638 Jul 18, 2024
bd94179
W3M2 : add script for scp script
dn7638 Jul 18, 2024
9064161
W3M2 : add script for building docker image and runging compose
dn7638 Jul 18, 2024
900a66a
W3M2 : feat modify, verification, mapreduce script
dn7638 Jul 20, 2024
b396fe3
W3M2 : update readme.md
dn7638 Jul 21, 2024
1f036dc
W3M2 : update readme.md
dn7638 Jul 21, 2024
e2c2f87
main : update gitignore
dn7638 Jul 21, 2024
33a9307
W4MAIN : init
dn7638 Jul 23, 2024
05e728c
W4MAIN : init
dn7638 Jul 23, 2024
c19db8a
W4M1 : init
dn7638 Jul 23, 2024
c1ba33c
W4M1 : add spark test script
dn7638 Jul 23, 2024
ff3f70c
W4M1 : add README.md
dn7638 Jul 23, 2024
c772ad6
W4M1 : trivial change
dn7638 Jul 29, 2024
8e89e55
W5M1 : init
dn7638 Jul 29, 2024
ad0c61c
Merge branch 'W3M2' into W5M1
dn7638 Aug 1, 2024
eed9d4c
Merge branch 'W2/main' into W5M1
dn7638 Aug 11, 2024
4503783
W5M1 : delete dir
dn7638 Aug 19, 2024
09f9c5f
W5M1 : Week 5 mission 1
dn7638 Aug 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
W3M2 : feat modify, verification, mapreduce script
dn7638 committed Jul 20, 2024
commit 900a66a4709b074dcb77e1f818a09054ba58893c
3 changes: 3 additions & 0 deletions missions/W3/M2/Dockerfile.datanode
Original file line number Diff line number Diff line change
@@ -53,6 +53,9 @@ COPY ./config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# chown Hadoop configuration files
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME

# Configure JAVA_HOME in Hadoop environment
RUN echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

3 changes: 3 additions & 0 deletions missions/W3/M2/Dockerfile.namenode
Original file line number Diff line number Diff line change
@@ -53,6 +53,9 @@ COPY ./config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# chown Hadoop configuration files
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME

# Configure JAVA_HOME in Hadoop environment
RUN echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

3 changes: 3 additions & 0 deletions missions/W3/M2/Dockerfile.nodemanager
Original file line number Diff line number Diff line change
@@ -53,6 +53,9 @@ COPY ./config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# chown Hadoop configuration files
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME

# Configure JAVA_HOME in Hadoop environment
RUN echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

3 changes: 3 additions & 0 deletions missions/W3/M2/Dockerfile.resourcemanager
Original file line number Diff line number Diff line change
@@ -53,6 +53,9 @@ COPY ./config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# chown Hadoop configuration files
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME

# Configure JAVA_HOME in Hadoop environment
RUN echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

70 changes: 69 additions & 1 deletion missions/W3/M2/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,69 @@
# INIT
# INIT

## Build initial hadoop cluster
* step 1 : run "build_and_run_hadoop_services.sh"
instruction | ./build_and_run_hadoop_services.sh

## Change Configuration
* step 1 : Modify the configuration files in "/change-config".
* step 2 : Add the directories that need to be included for configuration changes at the bottom of the "make_dir.sh" file.
<예시>
''' xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/dfs/data</value>
'''

When changing as above, /hadoop/dfs/name and /hadoop/dfs/data paths must be added inside the container.

* step 3 : 변경 사항을 적용할 컨테이너를 apply_all.sh에 추가합니다
./configuration_modify.sh $HADOOP_HOME $CONTAINER_NAME $ROLE 오 같은 명령행을 추가합니다.

* step 4 : run "apply_all.sh"
instruction | ./apply_all.sh
변경사항이 적용됩니다.


## Verification
* step 1 : run "build-verify-scrips.py"
instruction | python3 run build-verify-scrips.py
it creates four .sh files for verify changed configuration
"verify_core-site_conf.sh", "verify_hdfs-site_conf.sh", "verify_mapred-site_conf.sh", "verify_yarn-site_conf.sh"

The above four scripts are required to run configuration_verify.sh

* step 2 : run "configuration_verify.sh"
instruction : ./configuration_verify.sh <HADOOP_HOME> <CONTAINER_NAME>

## mapreduce
* just run test_mapreduce.sh
instruction : ./test_mapreduce.sh <HADOOP_HOME> <CONTAINER_NAME>

* if you want to test by using another mapreduce process
* chagene input.txt, wordcount.sh


## TEST
* if you only want to test, not customise then, just follow instructions below
instruction | ./build_and_run_hadoop_services.sh
instruction | ./apply_all.sh
instruction | python3 run build-verify-scrips.py
instruction : ./configuration_verify.sh usr/local/hadoop namenode
instruction : ./test_mapreduce.sh usr/local/hadoop namenode


## Troble Shooting
if you change dfs.datanode.data.dir property, then there is a possibility that you can get "java.io.IOException: Incompatible clusterIDs in /hadoop/dfs/data: namenode clusterID = CID-8ec62c1c-7b9a-413b-afa2-05dd41fc8f94; datanode clusterID = CID-79c89a70-9b81-4808-a954-d6d4d8c98c02"

This is because the directory with the changed settings already exists. you should remove the directory and run script again

''' xml
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/data/datanode</value>
</property>
'''
9 changes: 9 additions & 0 deletions missions/W3/M2/change-config/apply_all.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

# configuration_modify.sh $HADOOP_HOME $CONTAINER_NAME $ROLE
./configuration_modify.sh usr/local/hadoop namenode namenode
./configuration_modify.sh usr/local/hadoop resourcemanager resourcemanager
./configuration_modify.sh usr/local/hadoop datanode1 datanode
./configuration_modify.sh usr/local/hadoop datanode2 datanode
./configuration_modify.sh usr/local/hadoop nodemanager1 nodemanager
./configuration_modify.sh usr/local/hadoop nodemanager2 nodemanager
192 changes: 192 additions & 0 deletions missions/W3/M2/change-config/configuration_modify.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
#!/bin/bash

# 인자를 3개 받는다. 이때 첫 번째 인자는 HADOOP_HOME, 두 번째 인자는 CONTAINER_NAME, 세번째 인자는 하둡 서비스 종류이다.
# SERVICE_TYPE = {namenode, datanode, resourcemanager, nodemanager}
if [ $# -ne 3 ]; then
echo "Usage: $0 <HADOOP_HOME> <CONTAINER_NAME> <SERVICE_TYPE>"
exit 1
fi

# make_dir.sh 실행 인자는 CONTAINER_NAME
./make_dir.sh $2

# 환경 변수 설정
HADOOP_HOME=$1
CONTAINER_NAME=$2
SERVICE_TYPE=$3

# 올바른 서비스 타입인지 확인
if [[ ! "namenode datanode resourcemanager nodemanager" =~ (^|[[:space:]])$SERVICE_TYPE($|[[:space:]]) ]]; then
echo "Invalid service type. Please specify one of the following: namenode, datanode, resourcemanager, nodemanager."
exit 1
fi

# 함수 정의: 디렉토리 생성
create_directory() {
local dir=$1
echo "Creating directory $dir..."
docker exec -it $CONTAINER_NAME bash -c "
if [ ! -d $dir ]; then
mkdir -p $dir
fi"
if [ $? -ne 0 ]; then
echo "Failed to create directory $dir inside the container."
exit 1
fi
}

# 함수 정의: 파일 복사
copy_file() {
local src_file=$1
local dest_dir=$2
echo "Copying $src_file..."
docker cp $src_file $CONTAINER_NAME:$dest_dir/
if [ $? -ne 0 ]; then
echo "Failed to copy $src_file."
exit 1
fi
}

# 함수 정의: 파일 백업
backup_file() {
local file=$1
local backup_dir=$2
echo "Backing up $file..."
docker exec -it $CONTAINER_NAME bash -c "
timestamp=$timestamp
mkdir -p $backup_dir &&
cp $file $backup_dir &&
echo 'Configuration file[$(basename $file)] have been backed up to $backup_dir.'
"
if [ $? -ne 0 ]; then
echo "Failed to back up configuration file[$(basename $file)] inside the container."
exit 1
fi
}

# 함수 정의: 파일 업데이트
update_file() {
local src_file=$1
local dest_file=$2
echo "Updating $dest_file..."
docker exec -it $CONTAINER_NAME bash -c "
cp $src_file $dest_file &&
rm -rf $src_file &&
chmod +x $dest_file &&
echo 'Configuration file[$(basename $dest_file)] have been updated and temporary files removed.'"
if [ $? -ne 0 ]; then
echo "Failed to update configuration file[$(basename $dest_file)] inside the container."
exit 1
fi
}

# 함수 정의: namenode 서비스 재시작 (HDFS 서비스)
restart_namenode_service() {
local service=$1
local name_dir_origin=$2
local name_dir_changed=$3
echo "Restarting Hadoop $service service..."

# Determine whether to format or not
if [ "$name_dir_origin" != "$name_dir_changed" ]; then
echo "Formatting NameNode directory..."
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/bin/hdfs --daemon stop $service &&
$HADOOP_HOME/bin/hdfs namenode -format &&
nohup $HADOOP_HOME/bin/hdfs --daemon start $service > /dev/null
"
else
echo "NameNode directory has not changed. Restarting Hadoop $service service..."
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/bin/hdfs --daemon stop $service &&
nohup $HADOOP_HOME/bin/hdfs --daemon start $service > /dev/null
"
fi

if [ $? -ne 0 ]; then
echo "Failed to restart Hadoop $service service."
exit 1
fi
echo "Hadoop $service service has been restarted successfully."
}

# 함수 정의: datanode 서비스 재시작 (HDFS 서비스)
restart_datanode_service() {
local service=$1
echo "Restarting Hadoop $service service..."
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/bin/hdfs --daemon stop $service &&
nohup $HADOOP_HOME/bin/hdfs --daemon start $service > /dev/null
"
if [ $? -ne 0 ]; then
echo "Failed to restart Hadoop $service service."
exit 1
fi
echo "Hadoop $service service has been restarted successfully."
}

# 함수 정의: YARN 서비스 재시작
restart_yarn_service() {
local service=$1
echo "Restarting Hadoop $service service..."
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/bin/yarn --daemon stop $service &&
nohup $HADOOP_HOME/bin/yarn --daemon start $service > /dev/null
"
if [ $? -ne 0 ]; then
echo "Failed to restart Hadoop $service service."
exit 1
fi
echo "Hadoop $service service has been restarted successfully."
}

# 타임스탬프 생성
timestamp=$(date +%s)

# 디렉토리 생성 및 파일 복사
create_directory "$HADOOP_HOME/etc/hadoop/tmp"
copy_file "core-site.xml" "$HADOOP_HOME/etc/hadoop/tmp"
copy_file "hdfs-site.xml" "$HADOOP_HOME/etc/hadoop/tmp"
copy_file "mapred-site.xml" "$HADOOP_HOME/etc/hadoop/tmp"
copy_file "yarn-site.xml" "$HADOOP_HOME/etc/hadoop/tmp"

create_directory "$HADOOP_HOME/etc/hadoop/backup/$timestamp"

# 파일 백업
backup_file "$HADOOP_HOME/etc/hadoop/core-site.xml" "$HADOOP_HOME/etc/hadoop/backup/$timestamp"
backup_file "$HADOOP_HOME/etc/hadoop/hdfs-site.xml" "$HADOOP_HOME/etc/hadoop/backup/$timestamp"
backup_file "$HADOOP_HOME/etc/hadoop/mapred-site.xml" "$HADOOP_HOME/etc/hadoop/backup/$timestamp"
backup_file "$HADOOP_HOME/etc/hadoop/yarn-site.xml" "$HADOOP_HOME/etc/hadoop/backup/$timestamp"

# Get the NameNode directory
NAME_DIR_ORIGIN=$(docker exec -it $CONTAINER_NAME bash -c "$HADOOP_HOME/bin/hdfs getconf -confKey dfs.namenode.name.dir 2>&1" | awk '/file:\/\// {print $1}' | xargs)
DATA_DIR_ORIGIN=$(docker exec -it $CONTAINER_NAME bash -c "$HADOOP_HOME/bin/hdfs getconf -confKey dfs.datanode.data.dir 2>&1" | awk '/file:\/\// {print $1}' | xargs)
echo "NameNode directory: $NAME_DIR_ORIGIN"
echo "DataNode directory: $DATA_DIR_ORIGIN"

# 파일 업데이트
update_file "$HADOOP_HOME/etc/hadoop/tmp/core-site.xml" "$HADOOP_HOME/etc/hadoop/core-site.xml"
update_file "$HADOOP_HOME/etc/hadoop/tmp/hdfs-site.xml" "$HADOOP_HOME/etc/hadoop/hdfs-site.xml"
update_file "$HADOOP_HOME/etc/hadoop/tmp/mapred-site.xml" "$HADOOP_HOME/etc/hadoop/mapred-site.xml"
update_file "$HADOOP_HOME/etc/hadoop/tmp/yarn-site.xml" "$HADOOP_HOME/etc/hadoop/yarn-site.xml"

echo "Configuration files have been copied and updated successfully."

# Get the NameNode directory
NAME_DIR_CHANGED=$(docker exec -it $CONTAINER_NAME bash -c "$HADOOP_HOME/bin/hdfs getconf -confKey dfs.namenode.name.dir 2>&1" | awk '/file:\/\// {print $1}' | xargs)
DATA_DIR_CHANGED=$(docker exec -it $CONTAINER_NAME bash -c "$HADOOP_HOME/bin/hdfs getconf -confKey dfs.datanode.data.dir 2>&1" | awk '/file:\/\// {print $1}' | xargs)

# 서비스 재시작
case $SERVICE_TYPE in
namenode)
restart_namenode_service $SERVICE_TYPE $NAME_DIR_ORIGIN $NAME_DIR_CHANGED
;;
datanode)
restart_datanode_service $SERVICE_TYPE
;;
resourcemanager|nodemanager)
restart_yarn_service $SERVICE_TYPE
;;
esac

echo "Configuration changes applied and services restarted."
14 changes: 14 additions & 0 deletions missions/W3/M2/change-config/core-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
15 changes: 15 additions & 0 deletions missions/W3/M2/change-config/hdfs-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/dfs/data</value>
</property>
</configuration>

29 changes: 29 additions & 0 deletions missions/W3/M2/change-config/make_dir.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash

# 인자를 1개만 받는다. 이때 첫 번째 인자는 컨테이너 이름이다.
if [ $# -ne 1 ]; then
echo "Usage: $0 <CONTAINER_NAME>"
exit 1
fi

CONTAINER_NAME=$1

# 함수 정의: 디렉토리 생성
create_directory() {
local dir=$1
echo "Creating directory $dir..."
docker exec -it $CONTAINER_NAME bin/bash -c "
if [ ! -d $dir ]; then
sudo mkdir -p $dir && \
sudo chown -R hadoop:hadoop $dir
fi"
if [ $? -ne 0 ]; then
echo "Failed to create directory $dir inside the container."
exit 1
fi
}

# add hadoop directories for change configure
create_directory /hadoop/dfs/name
create_directory /hadoop/dfs/data
create_directory /hadoop/tmp
Loading