Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Dockerfiles for multi pods spark #21

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
ada9910
add Dockerfiles for spark
thanh-nguyen-dang May 30, 2024
3f6eb37
add use quay only is true
thanh-nguyen-dang May 30, 2024
9f0891a
add github action to build all images
thanh-nguyen-dang May 30, 2024
a2a51d3
add github action to build all images
thanh-nguyen-dang May 30, 2024
db1bbbc
add run_config.py
thanh-nguyen-dang Jun 3, 2024
6be90bc
add HADOOP_HOME
thanh-nguyen-dang Jun 4, 2024
34dfb19
update base version
thanh-nguyen-dang Jun 4, 2024
43af8fe
fix environment variable
thanh-nguyen-dang Jun 4, 2024
888784a
remove run_config.py
thanh-nguyen-dang Jun 4, 2024
921fe16
add some packages
thanh-nguyen-dang Jun 5, 2024
6b0ac45
fix ports
thanh-nguyen-dang Jun 6, 2024
8ae256c
expose port 8020
thanh-nguyen-dang Jun 10, 2024
3d99b5e
fix port expose
thanh-nguyen-dang Jun 10, 2024
d2921d6
use hadoop image as based image for spark
thanh-nguyen-dang Jun 10, 2024
4466eb7
fix wrong tag
thanh-nguyen-dang Jun 11, 2024
446493b
fix spark-master to use base from quay.io
thanh-nguyen-dang Jun 11, 2024
695c55d
turn log to DEBUG
thanh-nguyen-dang Jun 11, 2024
bf55441
fix filename
thanh-nguyen-dang Jun 11, 2024
f261ee3
fix run for other dockers
thanh-nguyen-dang Jun 21, 2024
0d0e596
Merge branch 'feat/multi-pod-hadoop-spark' of github.com:uc-cdis/gen3…
thanh-nguyen-dang Jun 21, 2024
f0fc4a5
add ES HADOOP file create apps folder
thanh-nguyen-dang Jun 30, 2024
aa4b762
add SQOOP_VERSION
thanh-nguyen-dang Jun 30, 2024
2a7139b
install poetry in base image
thanh-nguyen-dang Jul 1, 2024
7137c96
add LD_LIBRARY_PATH
thanh-nguyen-dang Jul 1, 2024
c1b3cc0
update hadoop base image
thanh-nguyen-dang Jul 10, 2024
2bf8908
remove worker and master build
thanh-nguyen-dang Jul 10, 2024
7ddc1d7
add log4j properties files to base image
thanh-nguyen-dang Jul 15, 2024
9cf4794
add log4j properties files
thanh-nguyen-dang Jul 15, 2024
a2aa73c
update base images
thanh-nguyen-dang Jul 15, 2024
0caabb9
change log level to INFO
thanh-nguyen-dang Jul 15, 2024
ef825c6
add safemode leave
thanh-nguyen-dang Aug 26, 2024
67479b7
update resource manager starting script
thanh-nguyen-dang Aug 26, 2024
fc272d1
fix resource manager docker waiting for namenode
thanh-nguyen-dang Aug 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 120 additions & 2 deletions .github/workflows/image_build_push.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,126 @@ name: Build Image and Push to Quay
on: push

jobs:
ci:
name: Build Image and Push to Quay
build-hadoop-base:
name: Build Hadoop base image
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
with:
OVERRIDE_REPO_NAME: hadoop-base
OVERRIDE_TAG_NAME: v3.3.0
DOCKERFILE_LOCATION: "./hadoop/base/Dockerfile"
DOCKERFILE_BUILD_CONTEXT: "./hadoop/base"
USE_QUAY_ONLY: true
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
ECR_AWS_SECRET_ACCESS_KEY: ${{ secrets.ECR_AWS_SECRET_ACCESS_KEY }}
QUAY_USERNAME: ${{ secrets.QUAY_USERNAME }}
QUAY_ROBOT_TOKEN: ${{ secrets.QUAY_ROBOT_TOKEN }}
build-spark-base:
name: Build Spark base image
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
needs: [build-hadoop-base]
with:
OVERRIDE_REPO_NAME: spark-base
OVERRIDE_TAG_NAME: 3.3.0-hadoop3.3
DOCKERFILE_LOCATION: "./spark/base/Dockerfile"
USE_QUAY_ONLY: true
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
ECR_AWS_SECRET_ACCESS_KEY: ${{ secrets.ECR_AWS_SECRET_ACCESS_KEY }}
QUAY_USERNAME: ${{ secrets.QUAY_USERNAME }}
QUAY_ROBOT_TOKEN: ${{ secrets.QUAY_ROBOT_TOKEN }}
build-namenode:
name: namenode
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
needs: [build-hadoop-base]
with:
OVERRIDE_REPO_NAME: namenode
OVERRIDE_TAG_NAME: v3.3.0
DOCKERFILE_LOCATION: "./hadoop/namenode/Dockerfile"
DOCKERFILE_BUILD_CONTEXT: "./hadoop/namenode"
USE_QUAY_ONLY: true
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
ECR_AWS_SECRET_ACCESS_KEY: ${{ secrets.ECR_AWS_SECRET_ACCESS_KEY }}
QUAY_USERNAME: ${{ secrets.QUAY_USERNAME }}
QUAY_ROBOT_TOKEN: ${{ secrets.QUAY_ROBOT_TOKEN }}
build-datanode:
name: datanode
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
needs: [build-hadoop-base]
with:
OVERRIDE_REPO_NAME: datanode
OVERRIDE_TAG_NAME: v3.3.0
DOCKERFILE_LOCATION: "./hadoop/datanode/Dockerfile"
DOCKERFILE_BUILD_CONTEXT: "./hadoop/datanode"
USE_QUAY_ONLY: true
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
ECR_AWS_SECRET_ACCESS_KEY: ${{ secrets.ECR_AWS_SECRET_ACCESS_KEY }}
QUAY_USERNAME: ${{ secrets.QUAY_USERNAME }}
QUAY_ROBOT_TOKEN: ${{ secrets.QUAY_ROBOT_TOKEN }}
build-nodemanager:
name: nodemanager
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
needs: [build-hadoop-base]
with:
OVERRIDE_REPO_NAME: nodemanager
OVERRIDE_TAG_NAME: v3.3.0
DOCKERFILE_LOCATION: "./hadoop/nodemanager/Dockerfile"
DOCKERFILE_BUILD_CONTEXT: "./hadoop/nodemanager"
USE_QUAY_ONLY: true
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
ECR_AWS_SECRET_ACCESS_KEY: ${{ secrets.ECR_AWS_SECRET_ACCESS_KEY }}
QUAY_USERNAME: ${{ secrets.QUAY_USERNAME }}
QUAY_ROBOT_TOKEN: ${{ secrets.QUAY_ROBOT_TOKEN }}
build-resourcemanager:
name: resourcemanager
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
needs: [build-hadoop-base]
with:
OVERRIDE_REPO_NAME: resourcemanager
OVERRIDE_TAG_NAME: v3.3.0
DOCKERFILE_LOCATION: "./hadoop/resourcemanager/Dockerfile"
DOCKERFILE_BUILD_CONTEXT: "./hadoop/resourcemanager"
USE_QUAY_ONLY: true
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
ECR_AWS_SECRET_ACCESS_KEY: ${{ secrets.ECR_AWS_SECRET_ACCESS_KEY }}
QUAY_USERNAME: ${{ secrets.QUAY_USERNAME }}
QUAY_ROBOT_TOKEN: ${{ secrets.QUAY_ROBOT_TOKEN }}
build-historyserver:
name: historyserver
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
needs: [build-hadoop-base]
with:
OVERRIDE_REPO_NAME: historyserver
OVERRIDE_TAG_NAME: v3.3.0
DOCKERFILE_LOCATION: "./hadoop/historyserver/Dockerfile"
DOCKERFILE_BUILD_CONTEXT: "./hadoop/historyserver"
USE_QUAY_ONLY: true
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
ECR_AWS_SECRET_ACCESS_KEY: ${{ secrets.ECR_AWS_SECRET_ACCESS_KEY }}
QUAY_USERNAME: ${{ secrets.QUAY_USERNAME }}
QUAY_ROBOT_TOKEN: ${{ secrets.QUAY_ROBOT_TOKEN }}
build-submit:
name: spark submit
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
needs: [build-spark-base]
with:
OVERRIDE_REPO_NAME: spark-submit
OVERRIDE_TAG_NAME: 3.3.0-hadoop3.3
DOCKERFILE_LOCATION: "./spark/submit/Dockerfile"
DOCKERFILE_BUILD_CONTEXT: "./spark/submit"
USE_QUAY_ONLY: true
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
ECR_AWS_SECRET_ACCESS_KEY: ${{ secrets.ECR_AWS_SECRET_ACCESS_KEY }}
QUAY_USERNAME: ${{ secrets.QUAY_USERNAME }}
QUAY_ROBOT_TOKEN: ${{ secrets.QUAY_ROBOT_TOKEN }}
build-gen3-spark:
name: Build Gen3 spark single node
uses: uc-cdis/.github/.github/workflows/image_build_push.yaml@master
secrets:
ECR_AWS_ACCESS_KEY_ID: ${{ secrets.ECR_AWS_ACCESS_KEY_ID }}
Expand Down
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@ EXPOSE 22 4040 7077 8020 8030 8031 8032 8042 8088 9000 10020 19888 50010 50020 5

RUN mkdir -p /var/run/sshd ${HADOOP_HOME}/hdfs ${HADOOP_HOME}/hdfs/data ${HADOOP_HOME}/hdfs/data/dfs ${HADOOP_HOME}/hdfs/data/dfs/namenode ${HADOOP_HOME}/logs

COPY spark/base/confs/log4j.properties /spark/conf/log4j.properties
COPY spark/base/confs/log4j2.properties /spark/conf/log4j2.properties

COPY . /gen3spark
WORKDIR /gen3spark

Expand Down
66 changes: 66 additions & 0 deletions hadoop/base/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# To check running container: docker exec -it tube /bin/bash
FROM quay.io/cdis/python:python3.9-buster-stable

ENV DEBIAN_FRONTEND=noninteractive \
HADOOP_VERSION="3.3.2"

ENV HADOOP_INSTALLATION_URL="http://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz" \
HADOOP_HOME="/hadoop" \
JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64/"

RUN mkdir -p /usr/share/man/man1
RUN mkdir -p /usr/share/man/man7

RUN apt-get update && apt-get install -y --no-install-recommends \
software-properties-common \
libpq-dev \
build-essential \
libssl1.1 \
libgnutls30 \
ca-certificates-java \
openjdk-11-jdk \
openssh-server \
# dependency for pyscopg2 - which is dependency for sqlalchemy postgres engine
libpq-dev \
wget \
git \
# dependency for cryptography
libffi-dev \
# dependency for cryptography
libssl-dev \
vim \
net-tools \
netcat \
gnupg \
dnsutils \
curl \
g++ \
telnetd \
&& rm -rf /var/lib/apt/lists/*

RUN wget ${HADOOP_INSTALLATION_URL} \
&& ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2 \
&& mkdir -p $HADOOP_HOME \
&& tar -xvf hadoop-${HADOOP_VERSION}.tar.gz -C ${HADOOP_HOME} --strip-components 1 \
&& rm hadoop-${HADOOP_VERSION}.tar.gz \
&& rm -rf $HADOOP_HOME/share/doc

ENV HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop \

Check warning on line 48 in hadoop/base/Dockerfile

View workflow job for this annotation

GitHub Actions / Build Hadoop base image / Build Image and Push

Variables should be defined before their use

UndefinedVar: Usage of undefined variable '$LD_LIBRARY_PATH' More info: https://docs.docker.com/go/dockerfile/rule/undefined-var/
HADOOP_MAPRED_HOME=$HADOOP_HOME \
HADOOP_COMMON_HOME=$HADOOP_HOME \
HADOOP_HDFS_HOME=$HADOOP_HOME \
YARN_HOME=$HADOOP_HOME \
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native \
LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH

RUN apt-get --only-upgrade install libpq-dev

ENV PATH="${PATH}:${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin:${JAVA_HOME}/bin:${SCALA_HOME}/bin"

Check warning on line 58 in hadoop/base/Dockerfile

View workflow job for this annotation

GitHub Actions / Build Hadoop base image / Build Image and Push

Variables should be defined before their use

UndefinedVar: Usage of undefined variable '$SCALA_HOME' More info: https://docs.docker.com/go/dockerfile/rule/undefined-var/

ADD entrypoint.sh /entrypoint.sh

RUN chmod a+x /entrypoint.sh

EXPOSE 22 4040 7077 8020 8030 8031 8032 8042 8088 9000

ENTRYPOINT ["/entrypoint.sh"]
119 changes: 119 additions & 0 deletions hadoop/base/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#!/bin/bash

# Set some sensible defaults
export CORE_CONF_fs_defaultFS=${CORE_CONF_fs_defaultFS:-hdfs://`hostname -f`:8020}

function addProperty() {
local path=$1
local name=$2
local value=$3

local entry="<property><name>$name</name><value>${value}</value></property>"
local escapedEntry=$(echo $entry | sed 's/\//\\\//g')
sed -i "/<\/configuration>/ s/.*/${escapedEntry}\n&/" $path
}

function configure() {
local path=$1
local module=$2
local envPrefix=$3

local var
local value

echo "Configuring $module"
for c in `printenv | perl -sne 'print "$1 " if m/^${envPrefix}_(.+?)=.*/' -- -envPrefix=$envPrefix`; do
name=`echo ${c} | perl -pe 's/___/-/g; s/__/@/g; s/_/./g; s/@/_/g;'`
var="${envPrefix}_${c}"
value=${!var}
echo " - Setting $name=$value"
addProperty $path $name "$value"
done
}

configure /hadoop/etc/hadoop/core-site.xml core CORE_CONF
configure /hadoop/etc/hadoop/hdfs-site.xml hdfs HDFS_CONF
configure /hadoop/etc/hadoop/yarn-site.xml yarn YARN_CONF
configure /hadoop/etc/hadoop/httpfs-site.xml httpfs HTTPFS_CONF
configure /hadoop/etc/hadoop/kms-site.xml kms KMS_CONF
configure /hadoop/etc/hadoop/mapred-site.xml mapred MAPRED_CONF

if [ "$MULTIHOMED_NETWORK" = "1" ]; then
echo "Configuring for multihomed network"

# HDFS
addProperty /hadoop/etc/hadoop/hdfs-site.xml dfs.namenode.rpc-bind-host 0.0.0.0
addProperty /hadoop/etc/hadoop/hdfs-site.xml dfs.namenode.servicerpc-bind-host 0.0.0.0
addProperty /hadoop/etc/hadoop/hdfs-site.xml dfs.namenode.http-bind-host 0.0.0.0
addProperty /hadoop/etc/hadoop/hdfs-site.xml dfs.namenode.https-bind-host 0.0.0.0
addProperty /hadoop/etc/hadoop/hdfs-site.xml dfs.client.use.datanode.hostname true
addProperty /hadoop/etc/hadoop/hdfs-site.xml dfs.datanode.use.datanode.hostname true

# YARN
addProperty /etc/hadoop/yarn-site.xml yarn.resourcemanager.bind-host 0.0.0.0
addProperty /etc/hadoop/yarn-site.xml yarn.nodemanager.bind-host 0.0.0.0
addProperty /etc/hadoop/yarn-site.xml yarn.timeline-service.bind-host 0.0.0.0

# MAPRED
addProperty /etc/hadoop/mapred-site.xml yarn.nodemanager.bind-host 0.0.0.0
addProperty /etc/hadoop/mapred-site.xml yarn.resourcemanager.scheduler.address 0.0.0.0
addProperty /etc/hadoop/mapred-site.xml yarn.resourcemanager.resource-tracker.address 0.0.0.0
addProperty /etc/hadoop/mapred-site.xml yarn.resourcemanager.address 0.0.0.0
fi

if [ -n "$GANGLIA_HOST" ]; then
mv /hadoop/etc/hadoop/hadoop-metrics.properties /hadoop/etc/hadoop/hadoop-metrics.properties.orig
mv /hadoop/etc/hadoop/hadoop-metrics2.properties /hadoop/etc/hadoop/hadoop-metrics2.properties.orig

for module in mapred jvm rpc ugi; do
echo "$module.class=org.apache.hadoop.metrics.ganglia.GangliaContext31"
echo "$module.period=10"
echo "$module.servers=$GANGLIA_HOST:8649"
done > /etc/hadoop/hadoop-metrics.properties

for module in namenode datanode resourcemanager nodemanager mrappmaster jobhistoryserver; do
echo "$module.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31"
echo "$module.sink.ganglia.period=10"
echo "$module.sink.ganglia.supportsparse=true"
echo "$module.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both"
echo "$module.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40"
echo "$module.sink.ganglia.servers=$GANGLIA_HOST:8649"
done > /etc/hadoop/hadoop-metrics2.properties
fi

function wait_for_it()
{
local serviceport=$1
local service=${serviceport%%:*}
local port=${serviceport#*:}
local retry_seconds=5
local max_try=100
let i=1

nc -z $service $port
result=$?

until [ $result -eq 0 ]; do
echo "[$i/$max_try] check for ${service}:${port}..."
echo "[$i/$max_try] ${service}:${port} is not available yet"
if (( $i == $max_try )); then
echo "[$i/$max_try] ${service}:${port} is still not available; giving up after ${max_try} tries. :/"
exit 1
fi

echo "[$i/$max_try] try in ${retry_seconds}s once again ..."
let "i++"
sleep $retry_seconds

nc -z $service $port
result=$?
done
echo "[$i/$max_try] $service:${port} is available."
}

for i in ${SERVICE_PRECONDITION[@]}
do
wait_for_it ${i}
done

exec $@
14 changes: 14 additions & 0 deletions hadoop/datanode/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM quay.io/cdis/hadoop-base:v3.3.0

HEALTHCHECK CMD curl -f http://localhost:9864/ || exit 1

ENV HDFS_CONF_dfs_datanode_data_dir=file:///hadoop/dfs/data
RUN mkdir -p /hadoop/dfs/data
VOLUME /hadoop/dfs/data

ADD run.sh /run.sh
RUN chmod a+x /run.sh

EXPOSE 9864 50010

CMD ["/run.sh"]
9 changes: 9 additions & 0 deletions hadoop/datanode/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

datadir=`echo $HDFS_CONF_dfs_datanode_data_dir | perl -pe 's#file://##'`
if [ ! -d $datadir ]; then
echo "Datanode data directory not found: $datadir"
exit 2
fi

$HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR datanode
14 changes: 14 additions & 0 deletions hadoop/historyserver/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM quay.io/cdis/hadoop-base:v3.3.0

HEALTHCHECK CMD curl -f http://localhost:8188/ || exit 1

ENV YARN_CONF_yarn_timeline___service_leveldb___timeline___store_path=/hadoop/yarn/timeline
RUN mkdir -p /hadoop/yarn/timeline
VOLUME /hadoop/yarn/timeline

ADD run.sh /run.sh
RUN chmod a+x /run.sh

EXPOSE 8188

CMD ["/run.sh"]
2 changes: 2 additions & 0 deletions hadoop/historyserver/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#!/bin/bash
$HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR historyserver
14 changes: 14 additions & 0 deletions hadoop/namenode/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM quay.io/cdis/hadoop-base:v3.3.0

HEALTHCHECK CMD curl -f http://localhost:9870/ || exit 1

ENV HDFS_CONF_dfs_namenode_name_dir=file:///hadoop/dfs/name
RUN mkdir -p /hadoop/dfs/name
VOLUME /hadoop/dfs/name

ADD run.sh /run.sh
RUN chmod a+x /run.sh

EXPOSE 9870 9000

CMD ["/run.sh"]
Loading