DomainlifecyclesCodeGenerator

The Domainlifecycles Code Generator (DCG) formerly NitroX Code Generator (NCG) is the first version of a Generative AI Assistance System which was developed for esentri's Domain Lifecycles Framework for the Domain-Driven Design (DDD) Development process.

This first version of the DCG is able to create syntactically correct Domainlifecycles JSON objects as part of the Domainlifecycles DSL. More information about the DCG and its creation, as well as limitations and future work, can be found in the Master Thesis PDF.

The DCG was developed in collaboration with esentri as part of my Master's thesis at the Karlsruhe University of Applied Sciences.

This repository contains the project files, models and logs for the DCG.

Outsourced from this repository is the DCG-DemoApp which was developed for the presentation and showcase of the possibilities of this first model.

Important

At the time of the creation of the thesis and the project, the domain lifecycles framework still had the working title NitroX. In the following project, as well as in the entire master thesis, Domainlifecycles is always referred to as NitroX.

Overview

DomainlifecyclesCodeGenerator/
├── [FOLDER]
...
├── .gitignore
├── 1_datasetGenerator.ipynb
├── 2_trainingLoop.ipynb
├── 3_hyperparameterTrainer.ipynb
├── 4_optunaEvaluation.ipynb
├── 5_finalTraining.ipynb
├── ColoredDataPreprocessingProcess.jpg
├── environment.yml
├── finalTraining_v1_tbExport.csv
├── LICENSE
├── Master-Thesis_Götz-Henrik_Wiegand_2024.pdf
└── README.md

Folder Structure

all_json:
- Folder containing the raw JSON files for fine-tuning the DCG.
- The customer-related project data from the "esentri-Partner" was removed from the data set. That was 80% of the files and so the data set is only stored here as an example.
datasets:
- Empty folder reserved for export and storage of the generated and cleaned dataset.
gen_json:
- Folder with the generated samples for the Model Assessment phase.
models:
- Reserved path for the model export with the finalTraining_v1 model as the result of the final training for the DCG.
runs:
- Reserved tensorboard callback folder for the training history logs. The results from the final training of the DCG are stored here.
- The metrics and progressions logged there can be displayed and analyzed with a tensorboard.

System Requirements

Important

The entire project was developed on a Linux 64-bit system with an NVIDIA graphics card. The setup and the README have therefore only been tested for these specifications:

Ubuntu 22.04.4 LTS
GeForce RTX 2080 Ti (11GB VRAM)

Installation and Setup

Clone the Repository

Clone the repository:

git clone [email protected]:Tr33Bug/DomainlifecyclesCodeGenerator.git
cd DomainlifecyclesCodeGenerator

Install the Requirements

Create a conda environment and install all the requirements from the environment.yml:

# create environment
conda env create -f environment.yml

# activate environment
conda activate DCGServerEnv

Run the Notebooks

Start with 1_datasetGenerator.ipynb and follow the instructions from the jupyter notebook.

Workflow and Engineering Documentation

This section documents some of the project workflows and their setup.

Remote Training

Note

The entire project was engineered remotely via VS Code SSH access. In order to be able to close the notebook during longer training times, the notebooks were exported as a Python script and executed remotely with a tmux session.

The setup and procedure is explained in this section using the example of the 1_datasetGenerator and the 2_trainingLoop notebook:

Start session and setup:
- Start ssh session or start a terminal to run the notebook on the local computer.
- Navigate to the project folder or clone the repository to a desired location.
- Export the corresponding notebooks to be executed as python script.
Set run_name in 1_datasetGenerator.py and 2_trainingLoop.py to a desired name (must be the same in both)

run bash commands:

# start tmux session
tmux

# run the scripts and pipe the output to a log.txt file.
python 1_datasetGenerator.py > log.txt
python 2_trainingLoop.py > log.txt

Ctrl+B, then D (to detach the tmux session)
Open tensorboard with the ./runs folder

-> Reconnect to detached Session:

tmux lsto list sessions:

0: 1 windows (created Mon Mar 18 21:36:24 2024)
1: 1 windows (created Wed Apr  3 10:22:08 2024)

Open Session with: tmux attach-session -t NUMBER

# to resume the example session 1 created on Wed Apr 3, attach session 1:
tmux attach-session -t 1

Optuna Dashboard Setup

In order to visualize the results of hyperparameter tuning, these can be made visible with the Optuna Dashboard. More information about the dashboard and getting started can be found here: https://optuna-dashboard.readthedocs.io/en/latest/getting-started.html

Note

The dashboard can also be installed and used without the environment.yml dependencies if only the results from the hyperparameter tuning of the thesis are to be analyzed.

Install the dashboard and the recommended dependencies to speed up the dashboard:
```
pip install optuna-dashboard

pip install optuna-fast-fanova gunicorn
```

Open Optuna results in Dashboard:

optuna-dashboard sqlite:///optuna/A6000_OptunaRun_2048.db

Note

The database path for the file is optuna/A6000_OptunaRun_2048. If you want to analyze your own results, change the command to your own path: optuna-dashboard sqlite:///{YOUR PATH}

Open the dashboard in your browser using the provided information from the optuna-dashboard command (For example: Listening at: http://127.0.0.1:8080).

Log in to Huggingface Hub

Test if huggingface-cli is installed and if already logged in.
```
huggingface-cli whoami
```
Install huggingface-cli (Skip this, if already installed).
```
pip install -U "huggingface_hub[cli]"
```
Login using the huggingface-cli (Skip this if you are already logged in).
```
huggingface-cli login
```
After this you should be prompted to past an Access Token. Generate the Access Token with the required rights through the Hugging Face website and your user account.
Gain access to the model repository if required. To do this, log in to the Hugging Face Hub, go to the model repository and agree to the corresponding Terms of Services or similar.

Note

For more information on the huggingface-cli see the documentation: Command Line Interface (CLI)

Contributing

This project is not being actively developed further, for questions or suggestions please open an issue.

Acknowledgements

Filip Stepniak (feelsteps) - Supervisor from esentri
Mario Herb (chuckson)- Supervisor from esentri
Prof. Patrick Baier (pabair) - Supervising professor from Hochschule Karlsruhe - University of Applied Sciences

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DomainlifecyclesCodeGenerator

Overview

Folder Structure

System Requirements

Installation and Setup

Workflow and Engineering Documentation

Remote Training

Optuna Dashboard Setup

Log in to Huggingface Hub

Contributing

Acknowledgements

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
all_json		all_json
gen_json		gen_json
models/finalTraining_v1		models/finalTraining_v1
optuna		optuna
runs/May22_23-13-35_deep-learning_FinalTraining_v1		runs/May22_23-13-35_deep-learning_FinalTraining_v1
.gitignore		.gitignore
1_datasetGenerator.ipynb		1_datasetGenerator.ipynb
2_trainingLoop.ipynb		2_trainingLoop.ipynb
3_hyperparameterTrainer.ipynb		3_hyperparameterTrainer.ipynb
4_optunaEvaluation.ipynb		4_optunaEvaluation.ipynb
5_finalTraining.ipynb		5_finalTraining.ipynb
6_modelAssessment.ipynb		6_modelAssessment.ipynb
7_finalTrainingPlotsGenerator.ipynb		7_finalTrainingPlotsGenerator.ipynb
ColoredDataPreprocessingProcess.jpg		ColoredDataPreprocessingProcess.jpg
LICENSE		LICENSE
Master-Thesis_Götz-Henrik_Wiegand_2024.pdf		Master-Thesis_Götz-Henrik_Wiegand_2024.pdf
README.md		README.md
environment.yml		environment.yml
finalTraining_v1_tbExport.csv		finalTraining_v1_tbExport.csv

License

Tr33Bug/DomainlifecyclesCodeGenerator

Folders and files

Latest commit

History

Repository files navigation

DomainlifecyclesCodeGenerator

Overview

Folder Structure

System Requirements

Installation and Setup

Workflow and Engineering Documentation

Remote Training

Optuna Dashboard Setup

Log in to Huggingface Hub

Contributing

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages