Skip to content

The Domainlifecycles Code Generator (DCG) repository contains project files, models, and logs, developed as part of a Master's thesis in collaboration with esentri to generate syntactically correct Domainlifecycles JSON objects within the Domain-Driven Design framework.

License

Notifications You must be signed in to change notification settings

Tr33Bug/DomainlifecyclesCodeGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DomainlifecyclesCodeGenerator

The Domainlifecycles Code Generator (DCG) formerly NitroX Code Generator (NCG) is the first version of a Generative AI Assistance System which was developed for esentri's Domain Lifecycles Framework for the Domain-Driven Design (DDD) Development process.

This first version of the DCG is able to create syntactically correct Domainlifecycles JSON objects as part of the Domainlifecycles DSL. More information about the DCG and its creation, as well as limitations and future work, can be found in the Master Thesis PDF.

The DCG was developed in collaboration with esentri as part of my Master's thesis at the Karlsruhe University of Applied Sciences.

This repository contains the project files, models and logs for the DCG.

Outsourced from this repository is the DCG-DemoApp which was developed for the presentation and showcase of the possibilities of this first model.

Important

At the time of the creation of the thesis and the project, the domain lifecycles framework still had the working title NitroX. In the following project, as well as in the entire master thesis, Domainlifecycles is always referred to as NitroX.

Overview

DomainlifecyclesCodeGenerator/
├── [FOLDER]
...
├── .gitignore
├── 1_datasetGenerator.ipynb
├── 2_trainingLoop.ipynb
├── 3_hyperparameterTrainer.ipynb
├── 4_optunaEvaluation.ipynb
├── 5_finalTraining.ipynb
├── ColoredDataPreprocessingProcess.jpg
├── environment.yml
├── finalTraining_v1_tbExport.csv
├── LICENSE
├── Master-Thesis_Götz-Henrik_Wiegand_2024.pdf
└── README.md

Folder Structure

  • all_json:
    • Folder containing the raw JSON files for fine-tuning the DCG.
    • The customer-related project data from the "esentri-Partner" was removed from the data set. That was 80% of the files and so the data set is only stored here as an example.
  • datasets:
    • Empty folder reserved for export and storage of the generated and cleaned dataset.
  • gen_json:
    • Folder with the generated samples for the Model Assessment phase.
  • models:
    • Reserved path for the model export with the finalTraining_v1 model as the result of the final training for the DCG.
  • runs:
    • Reserved tensorboard callback folder for the training history logs. The results from the final training of the DCG are stored here.
    • The metrics and progressions logged there can be displayed and analyzed with a tensorboard.

System Requirements

Important

The entire project was developed on a Linux 64-bit system with an NVIDIA graphics card. The setup and the README have therefore only been tested for these specifications:

  • Ubuntu 22.04.4 LTS
  • GeForce RTX 2080 Ti (11GB VRAM)

Installation and Setup

  1. Clone the Repository

    Clone the repository:

    git clone [email protected]:Tr33Bug/DomainlifecyclesCodeGenerator.git
    cd DomainlifecyclesCodeGenerator
  2. Install the Requirements

    Create a conda environment and install all the requirements from the environment.yml:

    # create environment
    conda env create -f environment.yml
    
    # activate environment
    conda activate DCGServerEnv
  3. Run the Notebooks

    Start with 1_datasetGenerator.ipynb and follow the instructions from the jupyter notebook.

Workflow and Engineering Documentation

This section documents some of the project workflows and their setup.

Remote Training

Note

The entire project was engineered remotely via VS Code SSH access. In order to be able to close the notebook during longer training times, the notebooks were exported as a Python script and executed remotely with a tmux session.

The setup and procedure is explained in this section using the example of the 1_datasetGenerator and the 2_trainingLoop notebook:

  1. Start session and setup:
    • Start ssh session or start a terminal to run the notebook on the local computer.
    • Navigate to the project folder or clone the repository to a desired location.
    • Export the corresponding notebooks to be executed as python script.
  2. Set run_name in 1_datasetGenerator.py and 2_trainingLoop.py to a desired name (must be the same in both)
  3. run bash commands:
    # start tmux session
    tmux
    
    # run the scripts and pipe the output to a log.txt file.
    python 1_datasetGenerator.py > log.txt
    python 2_trainingLoop.py > log.txt
  4. Ctrl+B, then D (to detach the tmux session)
  5. Open tensorboard with the ./runs folder

-> Reconnect to detached Session:

  • tmux lsto list sessions:
    0: 1 windows (created Mon Mar 18 21:36:24 2024)
    1: 1 windows (created Wed Apr  3 10:22:08 2024)
  • Open Session with: tmux attach-session -t NUMBER
    # to resume the example session 1 created on Wed Apr 3, attach session 1:
    tmux attach-session -t 1

Optuna Dashboard Setup

In order to visualize the results of hyperparameter tuning, these can be made visible with the Optuna Dashboard. More information about the dashboard and getting started can be found here: https://optuna-dashboard.readthedocs.io/en/latest/getting-started.html

Note

The dashboard can also be installed and used without the environment.yml dependencies if only the results from the hyperparameter tuning of the thesis are to be analyzed.

  1. Install the dashboard and the recommended dependencies to speed up the dashboard:
    pip install optuna-dashboard
    
    pip install optuna-fast-fanova gunicorn
  2. Open Optuna results in Dashboard:
    optuna-dashboard sqlite:///optuna/A6000_OptunaRun_2048.db

Note

The database path for the file is optuna/A6000_OptunaRun_2048. If you want to analyze your own results, change the command to your own path: optuna-dashboard sqlite:///{YOUR PATH}

  1. Open the dashboard in your browser using the provided information from the optuna-dashboard command (For example: Listening at: http://127.0.0.1:8080).

Log in to Huggingface Hub

  1. Test if huggingface-cli is installed and if already logged in.
    huggingface-cli whoami
  2. Install huggingface-cli (Skip this, if already installed).
    pip install -U "huggingface_hub[cli]"
  3. Login using the huggingface-cli (Skip this if you are already logged in).
    huggingface-cli login
    After this you should be prompted to past an Access Token. Generate the Access Token with the required rights through the Hugging Face website and your user account.
  4. Gain access to the model repository if required. To do this, log in to the Hugging Face Hub, go to the model repository and agree to the corresponding Terms of Services or similar.

Note

For more information on the huggingface-cli see the documentation: Command Line Interface (CLI)

Contributing

This project is not being actively developed further, for questions or suggestions please open an issue.

Acknowledgements

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

The Domainlifecycles Code Generator (DCG) repository contains project files, models, and logs, developed as part of a Master's thesis in collaboration with esentri to generate syntactically correct Domainlifecycles JSON objects within the Domain-Driven Design framework.

Resources

License

Stars

Watchers

Forks