conda create -n ml_playground
conda activate ml_playground
conda install pip
git clone https://github.com/CMSTrackerDPG/MLplayground
cd MLplayground
pip install -r requirements.txt
Create a .env file with the following content:
DJANGO_DATABASE_ENGINE=django.db.backends.sqlite3
DJANGO_DEBUG=True
DJANGO_DATABASE_NAME=db.sqlite3
DJANGO_SECRET_KEY=django_secret_key
Setup your local database
python manage.py migrate
python manage.py createsuperuser
Run the website
python manage.py runserver
When adding new apps, it might be necessary to do the following:
python manage.py makemigrations new_app
python manage.py migrate new_app
The Django core management tool allows to interact with the database using scripts which can be scheduled. In order to fill the database, the runs and run_histos apps contain specific scripts which can be triggered from the scripts folder.
To test adding runs to the database:
cd scripts
source step1_extract_runs.sh
To test adding run-level histograms (entries, mean, rms, skewness, kurtosis) to the database:
cd scripts
source step2_extract_run_histos.sh
To add everything, just uncomment the command in the loop.
In order to prototype the scripts for data extraction and the views, it is possible to use some notebooks.
pip install jupyter django-extensions
python manage.py shell_plus --notebook
On lxplus, some lines can be added to settings.py in order to specify the IP adress and the port to be used as well as the no-browser option in order to forward the notebook.
The main current class structure is the following:
Uniqueness should be established by the use of UniqueConstraint. Runs as well as the combinations of run+dataset+histogram and run+lumisection are unique.
The project can be split in two parts. The first goal is to move from many data sources to a database which can be queried by anyone. The second goal is to provide a framework to compare various approaches to anomaly detection. In order to fulfill the first goal while working towards the second, a REST API can be created which will allow users to access the relevant information from the database without modifying the project.
In order to build the REST API, install the Django REST framework.
pip install djangorestframework
and add it to the list of installed apps.
We then need to add a serializer.py file in each app we want to make accessible through the API. Starting with runs.
In progress
Lumisection information from the ML4DQM dataset are 1D and 2D histograms. In order to add this information to the database, the run and lumisection they are related to need to have been created. The creation of run / lumisection is done when needed using a get_or_create for both as can be seen here.
A script to add lumisections can be ran as follow:
cd scripts
source step4_extract_lumisections.sh
However, this script is of limited interest and will mostly be useful to test the code once tests will be added. Two more useful scripts can be ran to add 1D and 2D histogram information for every run/lumisection.
cd scripts
source step5_extract_lumisection_histos1D.sh
Alternatively:
cd scripts
source step6_extract_lumisection_histos2D.sh
Due to the increasing complexity of the project, now is a good time to start adding tests. Django helps running tests by creating a TestCase class which will be used to create the tests for each app. The test development benefits from the very useful book Test-Driven Development with Python.
To run all the tests:
python manage.py test
To run all tests for a specific app:
python manage.py test app
Unit tests are ran using the python command shown before. However, another set of tests needs to be ran and emulate the behavior of a human connecting to the website, changing pages, interacting with the functionalities. In order to run such tests (functional tests), we need to install selenium which is the standard tool for this kind of browser test as well as geckodriver which will open Firefox to test the website. Based on the recommendations of the book, we will install geckodriver in ~/.local/bin
and append the repository to the PATH using echo 'PATH=~/.local/bin:$PATH' >> ~/.bashrc
.