Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DE최우형 - W5M1 #272

Open
wants to merge 45 commits into
base: DE최우형_W5
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
4d37dbb
#[W2M2] : init
dn7638 Jul 8, 2024
b170433
W1 : squash commit
dn7638 Jul 10, 2024
0669677
W2 : squash commit
dn7638 Jul 10, 2024
68b9b2f
W2 : squash commit
dn7638 Jul 10, 2024
76bb203
W1 : refactor for building docker image
dn7638 Jul 10, 2024
8e4f139
W1 : refactor for making docker image
dn7638 Jul 10, 2024
615ec68
W1 : refactor for making docker image
dn7638 Jul 10, 2024
a915ed8
W1 : add docker script
dn7638 Jul 11, 2024
59fc0f2
W2 : add files for build docker image
dn7638 Jul 11, 2024
3277a79
W2 : try to make correct Dockerfile
dn7638 Jul 11, 2024
0dbf58d
W2 : Dockerfile for amazone linux 2
dn7638 Jul 11, 2024
492d952
resolve merge conflict M1<-M2
dn7638 Jul 11, 2024
9a534c2
W1M3
dn7638 Jul 11, 2024
9c4953c
W2 : update W2 README.md
dn7638 Jul 12, 2024
79d1ef2
W1M3 : apply review
dn7638 Jul 14, 2024
f659b05
W1M3 : apply review
dn7638 Jul 14, 2024
3408ac7
W2M1_4 : add explanation
dn7638 Jul 14, 2024
53846c8
Merge branch 'W1M3' into W2/main
dn7638 Jul 14, 2024
07fde9a
W2 : add explation
dn7638 Jul 14, 2024
0c3ff6d
W3_main : init for W3 missions
dn7638 Jul 15, 2024
335f174
W3M2 : init
dn7638 Jul 17, 2024
30da9b2
W3M2 : add Dockerfile.datanode
dn7638 Jul 17, 2024
dea5272
W3M2 : add Dockerfile.namenode
dn7638 Jul 17, 2024
9f9684f
W3M2 : add hadoop config files
dn7638 Jul 17, 2024
e38f253
W3M2 : add docker-compose and shell script for datanode
dn7638 Jul 17, 2024
c1727cf
W3M2 : add config file
dn7638 Jul 18, 2024
ab990cc
W3M2 : add start_script for hadoop services
dn7638 Jul 18, 2024
bbdeca2
W3M2 : add Dockerfiles
dn7638 Jul 18, 2024
bd94179
W3M2 : add script for scp script
dn7638 Jul 18, 2024
9064161
W3M2 : add script for building docker image and runging compose
dn7638 Jul 18, 2024
900a66a
W3M2 : feat modify, verification, mapreduce script
dn7638 Jul 20, 2024
b396fe3
W3M2 : update readme.md
dn7638 Jul 21, 2024
1f036dc
W3M2 : update readme.md
dn7638 Jul 21, 2024
e2c2f87
main : update gitignore
dn7638 Jul 21, 2024
33a9307
W4MAIN : init
dn7638 Jul 23, 2024
05e728c
W4MAIN : init
dn7638 Jul 23, 2024
c19db8a
W4M1 : init
dn7638 Jul 23, 2024
c1ba33c
W4M1 : add spark test script
dn7638 Jul 23, 2024
ff3f70c
W4M1 : add README.md
dn7638 Jul 23, 2024
c772ad6
W4M1 : trivial change
dn7638 Jul 29, 2024
8e89e55
W5M1 : init
dn7638 Jul 29, 2024
ad0c61c
Merge branch 'W3M2' into W5M1
dn7638 Aug 1, 2024
eed9d4c
Merge branch 'W2/main' into W5M1
dn7638 Aug 11, 2024
4503783
W5M1 : delete dir
dn7638 Aug 19, 2024
09f9c5f
W5M1 : Week 5 mission 1
dn7638 Aug 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
W3M2 : feat modify, verification, mapreduce script
dn7638 committed Jul 20, 2024
commit 900a66a4709b074dcb77e1f818a09054ba58893c
3 changes: 3 additions & 0 deletions missions/W3/M2/Dockerfile.datanode
Original file line number Diff line number Diff line change
@@ -53,6 +53,9 @@ COPY ./config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# chown Hadoop configuration files
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME

# Configure JAVA_HOME in Hadoop environment
RUN echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

3 changes: 3 additions & 0 deletions missions/W3/M2/Dockerfile.namenode
Original file line number Diff line number Diff line change
@@ -53,6 +53,9 @@ COPY ./config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# chown Hadoop configuration files
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME

# Configure JAVA_HOME in Hadoop environment
RUN echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

3 changes: 3 additions & 0 deletions missions/W3/M2/Dockerfile.nodemanager
Original file line number Diff line number Diff line change
@@ -53,6 +53,9 @@ COPY ./config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# chown Hadoop configuration files
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME

# Configure JAVA_HOME in Hadoop environment
RUN echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

3 changes: 3 additions & 0 deletions missions/W3/M2/Dockerfile.resourcemanager
Original file line number Diff line number Diff line change
@@ -53,6 +53,9 @@ COPY ./config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# chown Hadoop configuration files
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME

# Configure JAVA_HOME in Hadoop environment
RUN echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

70 changes: 69 additions & 1 deletion missions/W3/M2/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,69 @@
# INIT
# INIT

## Build initial hadoop cluster
* step 1 : run "build_and_run_hadoop_services.sh"
instruction | ./build_and_run_hadoop_services.sh

## Change Configuration
* step 1 : Modify the configuration files in "/change-config".
* step 2 : Add the directories that need to be included for configuration changes at the bottom of the "make_dir.sh" file.
<예시>
''' xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/dfs/data</value>
'''

When changing as above, /hadoop/dfs/name and /hadoop/dfs/data paths must be added inside the container.

* step 3 : 변경 사항을 적용할 컨테이너를 apply_all.sh에 추가합니다
./configuration_modify.sh $HADOOP_HOME $CONTAINER_NAME $ROLE 오 같은 명령행을 추가합니다.

* step 4 : run "apply_all.sh"
instruction | ./apply_all.sh
변경사항이 적용됩니다.


## Verification
* step 1 : run "build-verify-scrips.py"
instruction | python3 run build-verify-scrips.py
it creates four .sh files for verify changed configuration
"verify_core-site_conf.sh", "verify_hdfs-site_conf.sh", "verify_mapred-site_conf.sh", "verify_yarn-site_conf.sh"

The above four scripts are required to run configuration_verify.sh

* step 2 : run "configuration_verify.sh"
instruction : ./configuration_verify.sh <HADOOP_HOME> <CONTAINER_NAME>

## mapreduce
* just run test_mapreduce.sh
instruction : ./test_mapreduce.sh <HADOOP_HOME> <CONTAINER_NAME>

* if you want to test by using another mapreduce process
* chagene input.txt, wordcount.sh


## TEST
* if you only want to test, not customise then, just follow instructions below
instruction | ./build_and_run_hadoop_services.sh
instruction | ./apply_all.sh
instruction | python3 run build-verify-scrips.py
instruction : ./configuration_verify.sh usr/local/hadoop namenode
instruction : ./test_mapreduce.sh usr/local/hadoop namenode


## Troble Shooting
if you change dfs.datanode.data.dir property, then there is a possibility that you can get "java.io.IOException: Incompatible clusterIDs in /hadoop/dfs/data: namenode clusterID = CID-8ec62c1c-7b9a-413b-afa2-05dd41fc8f94; datanode clusterID = CID-79c89a70-9b81-4808-a954-d6d4d8c98c02"

This is because the directory with the changed settings already exists. you should remove the directory and run script again

''' xml
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/data/datanode</value>
</property>
'''
9 changes: 9 additions & 0 deletions missions/W3/M2/change-config/apply_all.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

# configuration_modify.sh $HADOOP_HOME $CONTAINER_NAME $ROLE
./configuration_modify.sh usr/local/hadoop namenode namenode
./configuration_modify.sh usr/local/hadoop resourcemanager resourcemanager
./configuration_modify.sh usr/local/hadoop datanode1 datanode
./configuration_modify.sh usr/local/hadoop datanode2 datanode
./configuration_modify.sh usr/local/hadoop nodemanager1 nodemanager
./configuration_modify.sh usr/local/hadoop nodemanager2 nodemanager
192 changes: 192 additions & 0 deletions missions/W3/M2/change-config/configuration_modify.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
#!/bin/bash

# 인자를 3개 받는다. 이때 첫 번째 인자는 HADOOP_HOME, 두 번째 인자는 CONTAINER_NAME, 세번째 인자는 하둡 서비스 종류이다.
# SERVICE_TYPE = {namenode, datanode, resourcemanager, nodemanager}
if [ $# -ne 3 ]; then
echo "Usage: $0 <HADOOP_HOME> <CONTAINER_NAME> <SERVICE_TYPE>"
exit 1
fi

# make_dir.sh 실행 인자는 CONTAINER_NAME
./make_dir.sh $2

# 환경 변수 설정
HADOOP_HOME=$1
CONTAINER_NAME=$2
SERVICE_TYPE=$3

# 올바른 서비스 타입인지 확인
if [[ ! "namenode datanode resourcemanager nodemanager" =~ (^|[[:space:]])$SERVICE_TYPE($|[[:space:]]) ]]; then
echo "Invalid service type. Please specify one of the following: namenode, datanode, resourcemanager, nodemanager."
exit 1
fi

# 함수 정의: 디렉토리 생성
create_directory() {
local dir=$1
echo "Creating directory $dir..."
docker exec -it $CONTAINER_NAME bash -c "
if [ ! -d $dir ]; then
mkdir -p $dir
fi"
if [ $? -ne 0 ]; then
echo "Failed to create directory $dir inside the container."
exit 1
fi
}

# 함수 정의: 파일 복사
copy_file() {
local src_file=$1
local dest_dir=$2
echo "Copying $src_file..."
docker cp $src_file $CONTAINER_NAME:$dest_dir/
if [ $? -ne 0 ]; then
echo "Failed to copy $src_file."
exit 1
fi
}

# 함수 정의: 파일 백업
backup_file() {
local file=$1
local backup_dir=$2
echo "Backing up $file..."
docker exec -it $CONTAINER_NAME bash -c "
timestamp=$timestamp
mkdir -p $backup_dir &&
cp $file $backup_dir &&
echo 'Configuration file[$(basename $file)] have been backed up to $backup_dir.'
"
if [ $? -ne 0 ]; then
echo "Failed to back up configuration file[$(basename $file)] inside the container."
exit 1
fi
}

# 함수 정의: 파일 업데이트
update_file() {
local src_file=$1
local dest_file=$2
echo "Updating $dest_file..."
docker exec -it $CONTAINER_NAME bash -c "
cp $src_file $dest_file &&
rm -rf $src_file &&
chmod +x $dest_file &&
echo 'Configuration file[$(basename $dest_file)] have been updated and temporary files removed.'"
if [ $? -ne 0 ]; then
echo "Failed to update configuration file[$(basename $dest_file)] inside the container."
exit 1
fi
}

# 함수 정의: namenode 서비스 재시작 (HDFS 서비스)
restart_namenode_service() {
local service=$1
local name_dir_origin=$2
local name_dir_changed=$3
echo "Restarting Hadoop $service service..."

# Determine whether to format or not
if [ "$name_dir_origin" != "$name_dir_changed" ]; then
echo "Formatting NameNode directory..."
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/bin/hdfs --daemon stop $service &&
$HADOOP_HOME/bin/hdfs namenode -format &&
nohup $HADOOP_HOME/bin/hdfs --daemon start $service > /dev/null
"
else
echo "NameNode directory has not changed. Restarting Hadoop $service service..."
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/bin/hdfs --daemon stop $service &&
nohup $HADOOP_HOME/bin/hdfs --daemon start $service > /dev/null
"
fi

if [ $? -ne 0 ]; then
echo "Failed to restart Hadoop $service service."
exit 1
fi
echo "Hadoop $service service has been restarted successfully."
}

# 함수 정의: datanode 서비스 재시작 (HDFS 서비스)
restart_datanode_service() {
local service=$1
echo "Restarting Hadoop $service service..."
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/bin/hdfs --daemon stop $service &&
nohup $HADOOP_HOME/bin/hdfs --daemon start $service > /dev/null
"
if [ $? -ne 0 ]; then
echo "Failed to restart Hadoop $service service."
exit 1
fi
echo "Hadoop $service service has been restarted successfully."
}

# 함수 정의: YARN 서비스 재시작
restart_yarn_service() {
local service=$1
echo "Restarting Hadoop $service service..."
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/bin/yarn --daemon stop $service &&
nohup $HADOOP_HOME/bin/yarn --daemon start $service > /dev/null
"
if [ $? -ne 0 ]; then
echo "Failed to restart Hadoop $service service."
exit 1
fi
echo "Hadoop $service service has been restarted successfully."
}

# 타임스탬프 생성
timestamp=$(date +%s)

# 디렉토리 생성 및 파일 복사
create_directory "$HADOOP_HOME/etc/hadoop/tmp"
copy_file "core-site.xml" "$HADOOP_HOME/etc/hadoop/tmp"
copy_file "hdfs-site.xml" "$HADOOP_HOME/etc/hadoop/tmp"
copy_file "mapred-site.xml" "$HADOOP_HOME/etc/hadoop/tmp"
copy_file "yarn-site.xml" "$HADOOP_HOME/etc/hadoop/tmp"

create_directory "$HADOOP_HOME/etc/hadoop/backup/$timestamp"

# 파일 백업
backup_file "$HADOOP_HOME/etc/hadoop/core-site.xml" "$HADOOP_HOME/etc/hadoop/backup/$timestamp"
backup_file "$HADOOP_HOME/etc/hadoop/hdfs-site.xml" "$HADOOP_HOME/etc/hadoop/backup/$timestamp"
backup_file "$HADOOP_HOME/etc/hadoop/mapred-site.xml" "$HADOOP_HOME/etc/hadoop/backup/$timestamp"
backup_file "$HADOOP_HOME/etc/hadoop/yarn-site.xml" "$HADOOP_HOME/etc/hadoop/backup/$timestamp"

# Get the NameNode directory
NAME_DIR_ORIGIN=$(docker exec -it $CONTAINER_NAME bash -c "$HADOOP_HOME/bin/hdfs getconf -confKey dfs.namenode.name.dir 2>&1" | awk '/file:\/\// {print $1}' | xargs)
DATA_DIR_ORIGIN=$(docker exec -it $CONTAINER_NAME bash -c "$HADOOP_HOME/bin/hdfs getconf -confKey dfs.datanode.data.dir 2>&1" | awk '/file:\/\// {print $1}' | xargs)
echo "NameNode directory: $NAME_DIR_ORIGIN"
echo "DataNode directory: $DATA_DIR_ORIGIN"

# 파일 업데이트
update_file "$HADOOP_HOME/etc/hadoop/tmp/core-site.xml" "$HADOOP_HOME/etc/hadoop/core-site.xml"
update_file "$HADOOP_HOME/etc/hadoop/tmp/hdfs-site.xml" "$HADOOP_HOME/etc/hadoop/hdfs-site.xml"
update_file "$HADOOP_HOME/etc/hadoop/tmp/mapred-site.xml" "$HADOOP_HOME/etc/hadoop/mapred-site.xml"
update_file "$HADOOP_HOME/etc/hadoop/tmp/yarn-site.xml" "$HADOOP_HOME/etc/hadoop/yarn-site.xml"

echo "Configuration files have been copied and updated successfully."

# Get the NameNode directory
NAME_DIR_CHANGED=$(docker exec -it $CONTAINER_NAME bash -c "$HADOOP_HOME/bin/hdfs getconf -confKey dfs.namenode.name.dir 2>&1" | awk '/file:\/\// {print $1}' | xargs)
DATA_DIR_CHANGED=$(docker exec -it $CONTAINER_NAME bash -c "$HADOOP_HOME/bin/hdfs getconf -confKey dfs.datanode.data.dir 2>&1" | awk '/file:\/\// {print $1}' | xargs)

# 서비스 재시작
case $SERVICE_TYPE in
namenode)
restart_namenode_service $SERVICE_TYPE $NAME_DIR_ORIGIN $NAME_DIR_CHANGED
;;
datanode)
restart_datanode_service $SERVICE_TYPE
;;
resourcemanager|nodemanager)
restart_yarn_service $SERVICE_TYPE
;;
esac

echo "Configuration changes applied and services restarted."
14 changes: 14 additions & 0 deletions missions/W3/M2/change-config/core-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
15 changes: 15 additions & 0 deletions missions/W3/M2/change-config/hdfs-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/dfs/data</value>
</property>
</configuration>

29 changes: 29 additions & 0 deletions missions/W3/M2/change-config/make_dir.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash

# 인자를 1개만 받는다. 이때 첫 번째 인자는 컨테이너 이름이다.
if [ $# -ne 1 ]; then
echo "Usage: $0 <CONTAINER_NAME>"
exit 1
fi

CONTAINER_NAME=$1

# 함수 정의: 디렉토리 생성
create_directory() {
local dir=$1
echo "Creating directory $dir..."
docker exec -it $CONTAINER_NAME bin/bash -c "
if [ ! -d $dir ]; then
sudo mkdir -p $dir && \
sudo chown -R hadoop:hadoop $dir
fi"
if [ $? -ne 0 ]; then
echo "Failed to create directory $dir inside the container."
exit 1
fi
}

# add hadoop directories for change configure
create_directory /hadoop/dfs/name
create_directory /hadoop/dfs/data
create_directory /hadoop/tmp
32 changes: 32 additions & 0 deletions missions/W3/M2/change-config/mapred-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.jobhistory.address</name>
<value>namenode:10020</value>
</property>

<property>
<name>mapreduce.task.io.sort.mb</name>
<value>256</value>
</property>

<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>

<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>

<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
</configuration>

28 changes: 28 additions & 0 deletions missions/W3/M2/change-config/yarn-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<!-- ResourceManager 주소 설정 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resourcemanager</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>resourcemanager:8032</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
</configuration>

7 changes: 0 additions & 7 deletions missions/W3/M2/copy/python_script_scp.sh

This file was deleted.

36 changes: 0 additions & 36 deletions missions/W3/M2/entrypoint.sh

This file was deleted.

File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3

import sys

File renamed without changes.
69 changes: 69 additions & 0 deletions missions/W3/M2/mapreduce/test_mapreduce.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# 인자를 2개만 받는다. 이때 첫 번째 인자는 HADOOP_HOME, 두 번째 인자는 CONTAINER_NAME이다. 세번재 인자는 없다
# 단 Container_name은 namenode에 대한 컨테이너 이름이다.
if [ $# -ne 2 ]; then
echo "Usage: $0 <HADOOP_HOME> <CONTAINER_NAME>"
exit 1
fi

HADOOP_HOME=$1
CONTAINER_NAME=$2

# 함수 정의: 디렉토리 생성
create_directory() {
local dir=$1
echo "Creating directory $dir..."
docker exec -it $CONTAINER_NAME bash -c "
if [ ! -d $dir ]; then
mkdir -p $dir
fi"
if [ $? -ne 0 ]; then
echo "Failed to create directory $dir inside the container."
exit 1
fi
}

# 함수 정의: 파일 복사
copy_file() {
local src_file=$1
local dest_dir=$2
echo "Copying $src_file..."
docker cp $src_file $CONTAINER_NAME:$dest_dir/
if [ $? -ne 0 ]; then
echo "Failed to copy $src_file."
exit 1
fi
}

# mapreduce를 위한 디렉토리 생성
create_directory $HADOOP_HOME/mapreduce_test

# mapreduce를 위한 파일 복사
copy_file input.txt $HADOOP_HOME/mapreduce_test
copy_file wordcount.sh $HADOOP_HOME/mapreduce_test
########## 이곳에 파일을 추가하세요 ##########
# copy_file <SOURCE_FILE> $HADOOP_HOME/mapreduce_test
# end ##################################

# 복사한 파일들을 도커 컨테이너 내부에서 실행할 수 있도록 권한을 변경한다.
# 추가한 파일들에 대해서도 권한을 변경해야 한다.
docker exec -it $CONTAINER_NAME bash -c "
sudo chown hadoop:hadoop $HADOOP_HOME/mapreduce_test/* &&
chmod +x $HADOOP_HOME/mapreduce_test/input.txt
chmod +x $HADOOP_HOME/mapreduce_test/wordcount.sh
"

if [ $? -ne 0 ]; then
echo "Failed to change permission of mapreduce test files inside the container."
exit 1
fi

# wordcount.sh 실행
docker exec -it $CONTAINER_NAME bash -c "
cd $HADOOP_HOME/mapreduce_test &&
./wordcount.sh
"

if [ $? -ne 0 ]; then
echo "Failed to run wordcount.sh inside the container."
exit 1
fi
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
#!/bin/bash

chmod +x mapper.py
chmod +x reducer.py
chmod +x input.txt

hdfs dfs -rm -r /user
hdfs dfs -rm -r /tmp

hdfs dfs -mkdir -p /user/hadoop/input
hdfs dfs -put input.txt /user/hadoop/input
hdfs dfs -ls -R /

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /user/hadoop/input /user/hadoop/output



# hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar \
# -mapper mapper.py \
# -reducer reducer.py \
# -input /user/input/input.txt \
# -output /user/output
# -file /mapper.py \
# -file /reducer.py


4 changes: 3 additions & 1 deletion missions/W3/M2/start_script/start_datanode.sh
Original file line number Diff line number Diff line change
@@ -4,5 +4,7 @@
sudo service ssh start

# Hadoop Datanode 실행
hdfs datanode
$HADOOP_HOME/bin/hdfs --daemon start datanode

# Keep the shell open
tail -f /dev/null
4 changes: 3 additions & 1 deletion missions/W3/M2/start_script/start_namenode.sh
Original file line number Diff line number Diff line change
@@ -4,5 +4,7 @@
sudo service ssh start

# Hadoop namenode 실행
hdfs namenode
$HADOOP_HOME/bin/hdfs --daemon start namenode

# Keep the shell open
tail -f /dev/null
4 changes: 3 additions & 1 deletion missions/W3/M2/start_script/start_nodemanager.sh
Original file line number Diff line number Diff line change
@@ -4,5 +4,7 @@
sudo service ssh start

# Hadoop nodemanager 실행
yarn nodemanager
$HADOOP_HOME/bin/yarn --daemon start nodemanager

# Keep the shell open
tail -f /dev/null
5 changes: 3 additions & 2 deletions missions/W3/M2/start_script/start_resourcemanager.sh
Original file line number Diff line number Diff line change
@@ -4,6 +4,7 @@
sudo service ssh start

# Hadoop resourcemanager 실행
yarn resourcemanager

$HADOOP_HOME/bin/yarn --daemon start resourcemanager

# Keep the shell open
tail -f /dev/null
13 changes: 13 additions & 0 deletions missions/W3/M2/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash

./build_and_run_hadoop_services.sh

cd change-config
./apply_all.sh

cd ../verification
./python3 run build-verify-scrips.py
./configuration_verify.sh usr/local/hadoop namenode

cd ../mapreduce
./test_mapreduce.sh usr/local/hadoop namenode
103 changes: 103 additions & 0 deletions missions/W3/M2/verification/build-verify-scripts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
import xml.etree.ElementTree as ET

# XML 파일 경로
core_site_xml_file_path = '../change-config/core-site.xml'
hdfs_site_xml_file_path = '../change-config/hdfs-site.xml'
mapred_site_xml_file_path = '../change-config/mapred-site.xml'
yarn_site_xml_file_path = '../change-config/yarn-site.xml'

# XML 파일 읽기
tree = ET.parse(core_site_xml_file_path)
root = tree.getroot()

# core-site 검증 파일 생성
with open('verify_core-site_conf.sh', 'w') as script_file:
script_file.write("#!/bin/bash\n\n")

script_file.write("check_conf() {\n")
script_file.write(" local key=$1\n")
script_file.write(" local expected_value=$2\n")
script_file.write(" local actual_value=$(hdfs getconf -confKey $key 2>/dev/null)\n\n")
script_file.write(" if [ \"$actual_value\" == \"$expected_value\" ]; then\n")
script_file.write(" echo \"PASS: ['hdfs', 'getconf', '-confKey', '$key'] -> $actual_value\"\n")
script_file.write(" else\n")
script_file.write(" echo \"FAIL: ['hdfs', 'getconf', '-confKey', '$key'] -> $actual_value (expected $expected_value)\"\n")
script_file.write(" fi\n")
script_file.write("}\n\n")

for property in root.findall('property'):
name = property.find('name').text
value = property.find('value').text
script_file.write(f"check_conf {name} {value}\n")

print("Verification script has been created: verify_hadoop_conf.sh")

# hdfs-site 검증 파일 생성
with open('verify_hdfs-site_conf.sh', 'w') as script_file:
script_file.write("#!/bin/bash\n\n")

script_file.write("check_conf() {\n")
script_file.write(" local key=$1\n")
script_file.write(" local expected_value=$2\n")
script_file.write(" local actual_value=$(hdfs getconf -confKey $key 2>/dev/null)\n\n")
script_file.write(" if [ \"$actual_value\" == \"$expected_value\" ]; then\n")
script_file.write(" echo \"PASS: ['hdfs', 'getconf', '-confKey', '$key'] -> $actual_value\"\n")
script_file.write(" else\n")
script_file.write(" echo \"FAIL: ['hdfs', 'getconf', '-confKey', '$key'] -> $actual_value (expected $expected_value)\"\n")
script_file.write(" fi\n")
script_file.write("}\n\n")

tree = ET.parse(hdfs_site_xml_file_path)
root = tree.getroot()

for property in root.findall('property'):
name = property.find('name').text
value = property.find('value').text
script_file.write(f"check_conf {name} {value}\n")

# mapred-site 검증 파일 생성
with open('verify_mapred-site_conf.sh', 'w') as script_file:
script_file.write("#!/bin/bash\n\n")

script_file.write("check_conf() {\n")
script_file.write(" local key=$1\n")
script_file.write(" local expected_value=$2\n")
script_file.write(" local actual_value=$(hdfs getconf -confKey $key 2>/dev/null)\n\n")
script_file.write(" if [ \"$actual_value\" == \"$expected_value\" ]; then\n")
script_file.write(" echo \"PASS: ['hadoop', 'getconf', '-confKey', '$key'] -> $actual_value\"\n")
script_file.write(" else\n")
script_file.write(" echo \"FAIL: ['hadoop', 'getconf', '-confKey', '$key'] -> $actual_value (expected $expected_value)\"\n")
script_file.write(" fi\n")
script_file.write("}\n\n")

tree = ET.parse(mapred_site_xml_file_path)
root = tree.getroot()

for property in root.findall('property'):
name = property.find('name').text
value = property.find('value').text
script_file.write(f"check_conf {name} {value}\n")


# yarn-site 검증 파일 생성
with open('verify_yarn-site_conf.sh', 'w') as script_file:
script_file.write("#!/bin/bash\n\n")

script_file.write("check_conf() {\n")
script_file.write(" local key=$1\n")
script_file.write(" local expected_value=$2\n")
script_file.write(" local actual_value=$(hdfs getconf -confKey $key 2>/dev/null)\n\n")
script_file.write(" if [ \"$actual_value\" == \"$expected_value\" ]; then\n")
script_file.write(" echo \"PASS: ['yarn', 'getconf', '-confKey', '$key'] -> $actual_value\"\n")
script_file.write(" else\n")
script_file.write(" echo \"FAIL: ['yarn', 'getconf', '-confKey', '$key'] -> $actual_value (expected $expected_value)\"\n")
script_file.write(" fi\n")
script_file.write("}\n\n")

tree = ET.parse(yarn_site_xml_file_path)
root = tree.getroot()

for property in root.findall('property'):
name = property.find('name').text
value = property.find('value').text
script_file.write(f"check_conf {name} {value}\n")
76 changes: 76 additions & 0 deletions missions/W3/M2/verification/configuration_verify.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/bin/bash

# 인자를 2개만 받는다. 이때 첫 번째 인자는 HADOOP_HOME, 두 번째 인자는 CONTAINER_NAME이다. 세번재 인자는 없다
if [ $# -ne 2 ]; then
echo "Usage: $0 <HADOOP_HOME> <CONTAINER_NAME>"
exit 1
fi

# 여기 디렉토리에 있는 파일을 <CONTAINER_NAME> 컨테이너의 <HADOOP_HOME>/verification 폴더로 복사한다. 폴더가 없으면 생성한다.
# 이때 <HADOOP_HOME>은 컨테이너 내부의 하둡 설치 디렉토리를 가리킨다.
HADOOP_HOME=$1
CONTAINER_NAME=$2


# 함수 정의: 디렉토리 생성
create_directory() {
local dir=$1
echo "Creating directory $dir..."
docker exec -it --user hadoop $CONTAINER_NAME bash -c "
if [ ! -d $dir ]; then
mkdir -p $dir
fi"
if [ $? -ne 0 ]; then
echo "Failed to create directory $dir inside the container."
exit 1
fi
}

create_directory $HADOOP_HOME/verification

# 함수 정의: 파일 복사
copy_file() {
local src_file=$1
local dest_dir=$2
echo "Copying $src_file..."
docker cp $src_file $CONTAINER_NAME:$dest_dir/
if [ $? -ne 0 ]; then
echo "Failed to copy $src_file."
exit 1
fi
}

# src_file은 현재 디렉토리에 있는 verify...conf.sh 파일을 가리킨다.
copy_file verify_core-site_conf.sh $HADOOP_HOME/verification
copy_file verify_hdfs-site_conf.sh $HADOOP_HOME/verification
copy_file verify_mapred-site_conf.sh $HADOOP_HOME/verification
copy_file verify_yarn-site_conf.sh $HADOOP_HOME/verification


# 복사한 .sh 파일들을 도커 컨테이너 내부에서 실행할 수 있도록 권한을 변경한다.
docker exec -it --user hadoop $CONTAINER_NAME bash -c "
sudo chown hadoop:hadoop $HADOOP_HOME/verification/* &&
chmod +x $HADOOP_HOME/verification/verify_core-site_conf.sh &&
chmod +x $HADOOP_HOME/verification/verify_hdfs-site_conf.sh &&
chmod +x $HADOOP_HOME/verification/verify_mapred-site_conf.sh &&
chmod +x $HADOOP_HOME/verification/verify_yarn-site_conf.sh
"

if [ $? -ne 0 ]; then
echo "Failed to change permission of verification scripts inside the container."
exit 1
fi


# 복사한 .sh 파일들을 도커 컨테이너 내부에서 실행한다.
docker exec -it $CONTAINER_NAME bash -c "
$HADOOP_HOME/verification/verify_core-site_conf.sh &&
$HADOOP_HOME/verification/verify_hdfs-site_conf.sh &&
$HADOOP_HOME/verification/verify_mapred-site_conf.sh &&
$HADOOP_HOME/verification/verify_yarn-site_conf.sh
"





17 changes: 17 additions & 0 deletions missions/W3/M2/verification/verify_core-site_conf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

check_conf() {
local key=$1
local expected_value=$2
local actual_value=$(hdfs getconf -confKey $key 2>/dev/null)

if [ "$actual_value" == "$expected_value" ]; then
echo "PASS: ['hdfs', 'getconf', '-confKey', '$key'] -> $actual_value"
else
echo "FAIL: ['hdfs', 'getconf', '-confKey', '$key'] -> $actual_value (expected $expected_value)"
fi
}

check_conf fs.defaultFS hdfs://namenode:9000
check_conf hadoop.tmp.dir /hadoop/tmp
check_conf io.file.buffer.size 131072
17 changes: 17 additions & 0 deletions missions/W3/M2/verification/verify_hdfs-site_conf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

check_conf() {
local key=$1
local expected_value=$2
local actual_value=$(hdfs getconf -confKey $key 2>/dev/null)

if [ "$actual_value" == "$expected_value" ]; then
echo "PASS: ['hdfs', 'getconf', '-confKey', '$key'] -> $actual_value"
else
echo "FAIL: ['hdfs', 'getconf', '-confKey', '$key'] -> $actual_value (expected $expected_value)"
fi
}

check_conf dfs.replication 2
check_conf dfs.namenode.name.dir file:///hadoop/dfs/name
check_conf dfs.datanode.data.dir file:///hadoop/dfs/data
20 changes: 20 additions & 0 deletions missions/W3/M2/verification/verify_mapred-site_conf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

check_conf() {
local key=$1
local expected_value=$2
local actual_value=$(hdfs getconf -confKey $key 2>/dev/null)

if [ "$actual_value" == "$expected_value" ]; then
echo "PASS: ['hadoop', 'getconf', '-confKey', '$key'] -> $actual_value"
else
echo "FAIL: ['hadoop', 'getconf', '-confKey', '$key'] -> $actual_value (expected $expected_value)"
fi
}

check_conf mapreduce.framework.name yarn
check_conf mapreduce.jobhistory.address namenode:10020
check_conf mapreduce.task.io.sort.mb 256
check_conf yarn.app.mapreduce.am.env HADOOP_MAPRED_HOME=/usr/local/hadoop
check_conf mapreduce.map.env HADOOP_MAPRED_HOME=/usr/local/hadoop
check_conf mapreduce.reduce.env HADOOP_MAPRED_HOME=/usr/local/hadoop
19 changes: 19 additions & 0 deletions missions/W3/M2/verification/verify_yarn-site_conf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

check_conf() {
local key=$1
local expected_value=$2
local actual_value=$(hdfs getconf -confKey $key 2>/dev/null)

if [ "$actual_value" == "$expected_value" ]; then
echo "PASS: ['yarn', 'getconf', '-confKey', '$key'] -> $actual_value"
else
echo "FAIL: ['yarn', 'getconf', '-confKey', '$key'] -> $actual_value (expected $expected_value)"
fi
}

check_conf yarn.nodemanager.aux-services mapreduce_shuffle
check_conf yarn.resourcemanager.hostname resourcemanager
check_conf yarn.resourcemanager.address resourcemanager:8032
check_conf yarn.nodemanager.resource.memory-mb 8192
check_conf yarn.scheduler.minimum-allocation-mb 1024