Merge Request · Bug Report · Feature Request
MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models.
SharingHub is an AI-focused web portal designed to help you discover, navigate, and analyze your AI-related Git projects hosted on GitLab.
This repository hosts a MLflow "app" plugin that integrates it with SharingHub and GitLab permission system. The plugin also isolates the experiments from each others per GitLab project.
MLflow SharingHub can be configured with a GitLab instance, or a SharingHub instance.
First, create a file named .env
and edit the content:
PROJECT_CACHE_TIMEOUT=30
LOGIN_AUTO_REDIRECT=false
GITLAB_URL=https://gitlab.example.com
GITLAB_OAUTH_CLIENT_ID=<client-id>
GITLAB_OAUTH_CLIENT_SECRET=<client-secret>
The client-id
and client-secret
can be created in your GitLab User settings (Preferences).
You must create an "Application" with the scopes api read_user openid profile email
.
The callback URL is http://localhost:5000/auth/authorize
.
The configuration may vary depending on your instance config. First, create a file named .env
and edit the content.
If you have a default token configured:
PROJECT_CACHE_TIMEOUT=30
LOGIN_AUTO_REDIRECT=false
SHARINGHUB_URL=http://sharinghub.example.com
SHARINGHUB_AUTH_DEFAULT_TOKEN=true
else:
PROJECT_CACHE_TIMEOUT=30
LOGIN_AUTO_REDIRECT=false
SHARINGHUB_URL=http://sharinghub.example.com
With this integration MLflow SharingHub will use the session cookie of SharingHub to interact with the SharingHub Server.
Being an MLflow plugin, in order to use it you'll have to install this project first. It is recommended to use a virtualenv.
pip install .
# OR
make install
Now you can run the mlflow server, you just need to add the parameter --app-name sharinghub
to enable the plugin.
Example:
mlflow server --app-name sharinghub
And to enable hot-reload, add the --dev
.
mlflow server --app-name sharinghub --dev
Note: the make targets
run
andrun-dev
should be preferred as they add more arguments.
Build the image:
docker build . -t mlflow-sharinghub:latest --build-arg VERSION=$(git rev-parse --short HEAD)
# OR
make docker-build
docker run --rm -v $(pwd)/data:/home/mlflow/data -p 5000:5000 --env-file .env --name mlflow-sharinghub mlflow-sharinghub:latest
# OR
make docker-run
First, create a values file, like ./deploy/helm/values.<platform>.yaml
. It will serve as your deployment values file.
Note: values named like
<var>
are "variables", expected to be filled by your real values.
This project is delivered as a docker image, you will need to publish it to a docker registry in order to deploy the service.
You can build the image with the following command:
docker build . -t <docker-registry>/mlflow-sharinghub:latest
After building the image you can push it to your registry:
docker push <docker-registry>/mlflow-sharinghub:latest
If you don't already have one in the namespace, create a robot account in the docker registry to access the image.
kubectl create namespace sharinghub
kubectl create secret docker-registry regcred --docker-username='<robot-username>' --docker-password='<robot-password>' --docker-server='<docker-registry>' --namespace sharinghub
The server needs a secret key for security purposes, create the secret:
kubectl create secret generic mlflow-sharinghub --from-literal secret-key="<random-secret-key>" --namespace sharinghub
As detailed in Configuration, MLflow SharingHub can be configured to use either SharingHub or GitLab for permission management of projects. Follow the instructions for only one.
Configure your deployment values:
gitlabUrl: https://<gitlab-domain>
gitlabMandatoryTopics: "sharinghub:aimodel" # from sharinghub configuration
You will need to create an application in your Gitlab instance in order to use MLflow SharingHub integration of GitLab.
Configure an application in the GitLab instance for OpenID connect authentication:
Callback URLs example:
http://localhost:5000/auth/authorize
https://mlflow.<domain-name>/auth/authorize
Note: localhost URL is for development purposes, if you don't want it you can remove it.
You must then create the secret containing the OIDC secrets.
kubectl create secret generic mlflow-sharinghub-gitlab --from-literal client-id="<gitlab-app-client-id>" --from-literal client-secret="<gitlab-app-client-secret>" --namespace sharinghub
Configure your deployment values:
sharinghubUrl: https://sharinghub.<domain-name>
sharinghubStacCollection: "ai-model"
sharinghubAuthDefaultToken: true
Take note that it is important for your SharingHub instance to write its session cookie on <domain-name>
.
The MLflow server is rather flexible in the location were its data and artifacts are stored, and each can be configured.
By default, our docker image uses an sqlite database (a single file) for the data, located at /home/mlflow/data/mlflow.db
.
You can alternatively choose PostgreSQL as a database.
First, create the secret that will contain PostgreSQL passwords:
kubectl create secret generic mlflow-sharinghub-postgres --from-literal password="<mlflow-user-password>" --from-literal postgres-password="<root-user-password>" --namespace sharinghub
Then, configure the deployment values:
postgresql:
enabled: true
auth:
existingSecret: mlflow-sharinghub-postgres
Edit your mlflow-sharinghub
secret that contains the key secret-key
, and add a new key named backend-store-uri
with the value postgresql://<user>:<password>@<host>:5432/<database>
, filled with your PostgreSQL instance values.
Then, configure the deployment values:
mlflowSharinghub:
backendStoreUriSecret: true
By default, our docker image uses a directory for the artifacts, located at /home/mlflow/data/mlartifacts
.
You can alternatively choose to store your artifacts in an S3.
If you chose to use one, you need to create a s3 bucket in your provider and create the associated secret:
kubectl create secret generic mlflow-sharinghub-s3 --from-literal access-key-id="<access-key>" --from-literal secret-access-key="<secret-key>" --namespace sharinghub
Then, configure the deployment values:
mlflowSharinghub:
artifactsDestination: s3://<bucket>
s3:
enabled: true
endpointUrl: https://<s3-endpoint>
You must edit your deployment values with these last pieces of informations:
image:
repository: <docker-registry>/mlflow-sharinghub
pullPolicy: IfNotPresent
tag: "latest"
imagePullSecrets:
- name: regcred
ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: mlflow.<domain-name>
paths:
- path: /
pathType: ImplementationSpecific
tls:
- secretName: mlflow-sharinghub-tls
hosts:
- mlflow.<domain-name>
When all is done, install/update your deployment:
# Install & Update
helm upgrade --install --create-namespace --namespace sharinghub mlflow-sharinghub ./deploy/helm/mlflow-sharinghub -f ./deploy/helm/values.<platform>.yaml
If you want to contribute to this project or understand how it works, please check CONTRIBUTING.md.
Any contribution is greatly appreciated.
Copyright 2024 CS GROUP - France
MLflow SharingHub is an open source software, distributed under the Apache License 2.0. See the LICENSE
file for more information.