LLaMaDoc

LLaMaDoc is a VSCode extension to automatically detect and update outdated Python docstrings using a large language model.

How To Use

Press "🔎 Find Outdated Docstrings" in the VSCode status bar.

Click the 💡 lightbulb in the line of the function definition to update the outdated docstring.

🎥 A more detailed screencast can be found here.

📝 The corresponding project report can be found here.

Setup (Linux)

Cloning and dependencies

Clone the repository and change directory to the project folder.

git clone https://github.com/CR1337/LLaMaDoc.git
cd LLaMaDoc

Create virtual environments

python3 -m venv llm-server/.venv
python3 -m venv github-scraper/.venv

Install dependencies for llm-server

cd llm-server
source .venv/bin/activate
pip3 install -r requirements.txt
deactivate
cd ..

Install dependencies for github-scraper

cd github-scraper
source .venv/bin/activate
pip3 install -r requirements.txt
deactivate
cd ..

Install dependencies for the LLaMaDoc Extension. The python dependencies have to be installed globally.

cd llamadoc
pip3 install -r requirements.txt
npm install
cd ..

Recronstructing needed data/files

If you are one of the lucky persons who got the zip file with all the required files you can just merge the content of the zip file with this repo. The directory structure is the same.

`github-scraper`

Put your GitHub authentication token in github-scraper/github-auth.token:

echo "YOUR-TOKEN" > github-scraper/github-auth.token

Run all scraping related scripts and notebooks. This will generate the intermediate files github-sraper/data.pkl, github-scraper/train.pkl and github-scraper/test.pkl as well as the files github-scraper/train_data.json and github-scraper/test_data.json. If you already have these files you can skip this step:

cd github-scraper
source .venv/bin/activate
python3 repo_metadata_scraper.py
python3 repo_scraper.py
# Run the notebook `scraping-analysis.ipynb`
deactivate
cd ..

`finetuning`

Copy the training data into the finetuning directory:

cp github-scraper/train_data.json finetuning/

Run the finetuning notebook finetuning/finetuning.ipynb. This will generate the finetuning checkpoints in finetuning/checkpoints/.

`llm-server`

IF you want to run llm-server on a device without a GPU (not recommended) you have to create this file:

touch llm-server/not-on-server

IF you don't have a huggingface token in your huggingface cache directory you have to create this file and put your token in it:

echo "YOUR-TOKEN" > llm-server/huggingface-token

Copy the fintuned model adapter of your choice (e.g. number 9) into the right llm-server directory and rename it to finetuned_0:

cp -r finetuning/finetuning_checkpoints/checkpoint-ep9 llm-server/checkpoints/finetuned_0

Copy the test data into the right llm-server directory:

cp github-scraper/test_data.json llm-server/evaluation/

Launching the server

Build the docker container:

cd llm-server
./build.sh

(A) Either run the docker container in detached mode:

./run.sh

(B) Or run the docker container in interactive mode:

./run-blocking.sh

(C) Or IF you are on a machine without a GPU run:

./run-cpu.sh

In each script you can set a GROUP which determines the GPU the server will be using by computing GROUP % 4. The servers port will be set to 9000 + GROUP.

Launching the extension

Set the address field and the port field in llamadoc/llm_interface/server_config.json to the address and port of the llm-server:

{
    "address": "http://ADDRESS",
    "port": PORT
}

You can now use the extension by running the VSCode launch task Run Extension. Just open a python file containing functions with docstrings in the newly opened VSCode window.

Evaluation

There are two evaluations. One compares the quality of the generated docstrings between the original and the finetuned model: finetuning/evaluation/finetuning-evaluation.ipynb.

The other evaluation finds the best parameters for the out-of-date test and measures the performance of that test: llm-server/evaluation/out-of-date-evaluation.ipynb.

Finetuning Evaluation

Copy the test data into the right directory:

cp github-scraper/test_data.json finetuning/evaluation/

Run the evaluation notebook finetuning/evaluation/finetuning-evaluation.ipynb.

Out-of-Date Evaluation

Navigate to http://ADDRESS:PORT/docs in your browser and run the /compute-predictions endpoint once with index = 0 and once with index = 1. This will generate the needed data for the evaluation: llm-server/cache/updated_docstrings_0.json and llm-server/cache/updated_docstrings_1.json.
Run the notebook llm-server/evaluation/out-of-date-evaluation.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 317 Commits
.vscode		.vscode
finetuning		finetuning
github-scraper		github-scraper
llamadoc		llamadoc
llm-server		llm-server
resources		resources
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLaMaDoc

How To Use

Setup (Linux)

Cloning and dependencies

Recronstructing needed data/files

`github-scraper`

`finetuning`

`llm-server`

Launching the server

Launching the extension

Evaluation

Finetuning Evaluation

Out-of-Date Evaluation

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

CR1337/LLaMaDoc

Folders and files

Latest commit

History

Repository files navigation

LLaMaDoc

How To Use

Setup (Linux)

Cloning and dependencies

Recronstructing needed data/files

github-scraper

finetuning

llm-server

Launching the server

Launching the extension

Evaluation

Finetuning Evaluation

Out-of-Date Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages

`github-scraper`

`finetuning`

`llm-server`