This image makes Datahub (and potentially other similarly-configured k8s clusters) compatible with https://github.com/radiant-rstats/docker/blob/master/rsm-msba-intel-jupyterhub/.
Note that it may be helpful to launch this image this startup script (private repo) if on Datahub.
As little as possible; it is ideal for everyone involved to keep functionality very similar with rsm-msba-intel-jupyterhub.
However, here is a brief summary of what has changed:
- Certain directories such as
/home/jovyan
are accessible globally, and by any user. This is important for Datahub as users are only permitted to spawn k8s pods with their own UID. - Postgresql is not launched by root. Because nothing can be ran as UID 0 on Datahub (same reason as above), we simply run it as the local user instead. See further instructions below
- "Optional" directories (such as dotnet) have been deleted in order to make space for the Github runner's default storage quota. (See the action in
.github
for more details.) - The launch script above ensures 100% functionality if on Datahub
If you would like to make changes to core rsm-msba functionality OR have a bug to report that is not caused by our unique environment, please head to the above repo.
For DSMLP, here is an SSH configuration which will permit you to connect to this container with the VS Code Server:
Host rsm-msba
ProxyCommand ssh -i ~/.ssh/<your_key_here> <username>@dsmlp-login.ucsd.edu /opt/launch-sh/bin/launch-rsm-msba.sh -N vscode-dsmlp -H -j
User <username>
Port 22
IdentityFile ~/.ssh/<your_key_here>
-H
spawns a sshd session inside the container, and -j
ensures that the jupyter notebook server is started.
Additionally, please ensure you have the ProxyCommand
keyword, otherwise you may spawn the VS Code server on dsmlp-login rather than the container itself.
Because we only have access to one user whilst live in the container, you can use the provided script start_single_user_postgres.sh
to setup postgresql without root access.
You can either download and SCP it to the container manually, or run the following command whilst in the container:
wget -qO- https://github.com/ucsd-ets/rsm-msba-datahub/raw/master/start_single_user_postgres.sh | bash
Once this has ran, you can then open Jupyter and go to the pgweb
application.
This is where we differ from these installation instructions.
Under the Scheme
tab in pgweb, use the following URL:
postgresql://<YOUR_USERNAME>:[email protected]:8765/rsm-docker?sslmode=disable
Note that you can use this line in Jupyter Notebooks as well:
from sqlalchemy import create_engine, inspect
engine = create_engine('postgresql://<YOUR_USERNAME>:[email protected]:8765/rsm-docker?sslmode=disable')
## show list of tables
inspector = inspect(engine)
inspector.get_table_names()
SSL is currently disabled because the permissions requirements for the key/cert are a bit quirky. This may be changed soon but is still okay for our purposes (non-production, kubernetes pod).
To START your local postgres server: pg_ctl -D /home/<YOUR_USERNAME>/pgdata -l /home/<YOUR_USERNAME>/logfile start"
To STOP your local postgres server: pg_ctl -D /home/<YOUR_USERNAME>/pgdata -l /home/<YOUR_USERNAME>/logfile stop"
You can also simply run the initial script again to toggle between shutdown/active.
Because we don't have access to root + could potentially leave our container in a bad state, please only install apt or pip packages using the Dockerfile in this repo.
(This means forking this repository, updating the dockerfile, and then modifying the launch script above to use your image instead, e.g. ghcr.io/your-name-here/rsm-msba-datahub:master
)
Example for apt:
USER root
RUN apt update
RUN apt install <package> -y
Example for pip:
RUN pip install --no-cache-dir <package>
If you are on DSMLP and would like to get your jupyter server link in a VS Code server environment (or if you forgot it), run the following command:
wget -qO- https://github.com/ucsd-ets/rsm-msba-datahub/raw/master/dsmlp_check_jupyter_url.sh | bash