This is a tutorial on how to run SQLFlow on Kubernetes, and this tutorial will deploy:
- A MySQL server instance with some example data loaded,
- The SQLFlow gRPC server, and
- The Jupyter Notebook server with SQLFlow magic command installed.
- The JupyterHub which can serve multiple Notebook server for various users.
There are two sections in this tutorial:
- Deploy the All-in-One SQLFlow deployed the SQLFlow on Kubernetes quickly.
- Deploy the SQLFlow Hub deployed an SQLFlow cluster and a JupyterHub server which can serve Notebook server instances for users.
- Setup a Kubernetes cluster: You can refer to the official page to set up a full cluster or use a local quick start tool: minikube This tutorial would use [minikube] to demonstrate the SQLFlow.
- Install kubectl, which is the command line tool to interact with the Kubernetes cluster.
- Make sure the Kubernetes nodes can pull the official SQLFlow Docker image [sqlflow/sqlflow:latest] or your custom Docker image.
-
Deploy the SQLFlow Pod on Kubernetes
> kubectl create -f k8s/sqlflow-mysql.yaml
The above command deploys a Pod, a MySQL server instance, a SQLFlow gRPC server and the Jupyter Notebook server runs in this Pod. You can also use your custom Docker image by editting the
image
field of the yaml file: k8s/sqlflow-all-in-one.yamlspec: ... containers: - image: <your repo name>/sqlflow:latest
-
Testing your SQLFlow setup You can find a Pod on Kubernetes which name is
sqlflow-all-in-one-<POD-ID>
:> kubectl get pods NAME READY STATUS RESTARTS AGE NAME READY STATUS RESTARTS AGE sqlflow-all-in-one-9b57566c9-8xkpk 1/1 Running 0 60s
-
Copy the node IP of the sqlflow Pod on minikube as the follows command:
> minikube ip 192.168.99.100
NOTE: If you are using a real cluster, you can find the node domain/IP from the
NODE
column usingkubectl get pods -o wide
:> kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sqlflow-all-in-one-9b57566c9-8xkpk 1/1 Running 0 24s 172.17.0.9 minikube <none> <none>
-
Open a web browser and go to ':8888', you can find the SQLFlow example in the Jupyter notebook file lists.
This section will deploys SQLFlow Hub using JupyterHub to serve Jupyter notebook for multiple users, and easy to scale up/down the SQLFlow gRPC server according to workload.
-
Build the SQLFlow Hub Docker image and push to a registry server that the Kubernetes Note can access it.
$ cd k8s/sqlflowhub $ docker build -t <your-repo>/sqlflowhub . $ docker push <your-repo>/sqlflowhub
-
Deploy the MySQL, SQLFlow gRPC server and JupyterHub step by step:
kubectl create -f k8s/sqlflow-mysql.yaml kubectl create -f k8s/sqlflow-server.yaml kubectl create -f k8s/sqlflow-jhub.yaml
NOTE: Should grant all the remote hosts can access to the MySQL server if you want to use the custorm MySQL Docker image, the grant command like:
GRANT ALL PRIVILEGES ON *.* TO 'root'@'' IDENTIFIED BY 'root' WITH GRANT OPTION;
-
Check the SQLFlow Pods, you can find:
- A MySQL Pod named
sqlflow-mysql-*
. - A JupyterHub Pod named
sqlflow-jhub-*
. - 3 SQLFlow gRPC server Pods named
sqlflow-server-*
, and it's easy to scale up/down the replica count by modifying thereplicas
field of the yaml file: k8s/sqlflow-server.yaml.
$ kubectl get pods NAME READY STATUS RESTARTS AGE sqlflow-jhub-78f96dcf88-sbvt6 1/1 Running 0 4m13s sqlflow-mysql-55db79fd98-nhfjp 1/1 Running 0 4h8m sqlflow-server-7444b4466d-frbcn 1/1 Running 0 4h sqlflow-server-7444b4466d-h5w9c 1/1 Running 0 4h sqlflow-server-7444b4466d-kndwx 1/1 Running 0 4h
- A MySQL Pod named
-
Check the SQLFlow Service, so the Notebook server can connect them across their ClusterIP and Port:
- A MySQL Service named
sqlflow-mysql
, and - An SQLFlow server Service named
sqlflow-server
$ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 29d sqlflow-mysql ClusterIP 10.102.193.217 <none> 3306/TCP 6h4m sqlflow-server ClusterIP 10.102.65.39 <none> 50051/TCP 5h56m
- A MySQL Service named
JupyterHub using the PAMAuthenticator as the default authenticate method. the PAM can authenticate the system users with their username and password. You can find more information on authenticators-users-basics, and other authenticator methods from here
Next, please do as the following steps to create a user on the system and login on the Jupyterhub:
-
List the Pods and execute into the
sqlflow-jhub
Pod$ kubectl get po NAME READY STATUS RESTARTS AGE sqlflow-jhub-78f96dcf88-gp8dg 1/1 Running 0 26m sqlflow-mysql-55db79fd98-nhfjp 1/1 Running 0 51m ... $ kubectl exec -it sqlflow-jhub-78f96dcf88-gp8dg bash
-
Create a user and set a password by the
adduser
command:$ adduser sqlflow -q --gecos "" --home /home/sqlflow Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully
-
Open your browser and go to
<node-ip>:8000
and log in by the username/password as the above step. If you passed the authenticator, the JupyterHub would launch a Notebook server for your account, and then you can run your SQLFlow query in it.