LLaMaDoc is a VSCode extension to automatically detect and update outdated Python docstrings using a large language model.
Press "🔎 Find Outdated Docstrings" in the VSCode status bar.
Click the 💡 lightbulb in the line of the function definition to update the outdated docstring.
🎥 A more detailed screencast can be found here.
📝 The corresponding project report can be found here.
- Clone the repository and change directory to the project folder.
git clone https://github.com/CR1337/LLaMaDoc.git
cd LLaMaDoc
- Create virtual environments
python3 -m venv llm-server/.venv
python3 -m venv github-scraper/.venv
- Install dependencies for
llm-server
cd llm-server
source .venv/bin/activate
pip3 install -r requirements.txt
deactivate
cd ..
- Install dependencies for
github-scraper
cd github-scraper
source .venv/bin/activate
pip3 install -r requirements.txt
deactivate
cd ..
- Install dependencies for the LLaMaDoc Extension. The python dependencies have to be installed globally.
cd llamadoc
pip3 install -r requirements.txt
npm install
cd ..
If you are one of the lucky persons who got the zip file with all the required files you can just merge the content of the zip file with this repo. The directory structure is the same.
- Put your GitHub authentication token in
github-scraper/github-auth.token
:
echo "YOUR-TOKEN" > github-scraper/github-auth.token
- Run all scraping related scripts and notebooks. This will generate the intermediate files
github-sraper/data.pkl
,github-scraper/train.pkl
andgithub-scraper/test.pkl
as well as the filesgithub-scraper/train_data.json
andgithub-scraper/test_data.json
. If you already have these files you can skip this step:
cd github-scraper
source .venv/bin/activate
python3 repo_metadata_scraper.py
python3 repo_scraper.py
# Run the notebook `scraping-analysis.ipynb`
deactivate
cd ..
- Copy the training data into the finetuning directory:
cp github-scraper/train_data.json finetuning/
- Run the finetuning notebook
finetuning/finetuning.ipynb
. This will generate the finetuning checkpoints infinetuning/checkpoints/
.
- IF you want to run
llm-server
on a device without a GPU (not recommended) you have to create this file:
touch llm-server/not-on-server
- IF you don't have a huggingface token in your huggingface cache directory you have to create this file and put your token in it:
echo "YOUR-TOKEN" > llm-server/huggingface-token
- Copy the fintuned model adapter of your choice (e.g. number 9) into the right llm-server directory and rename it to
finetuned_0
:
cp -r finetuning/finetuning_checkpoints/checkpoint-ep9 llm-server/checkpoints/finetuned_0
- Copy the test data into the right llm-server directory:
cp github-scraper/test_data.json llm-server/evaluation/
- Build the docker container:
cd llm-server
./build.sh
- (A) Either run the docker container in detached mode:
./run.sh
- (B) Or run the docker container in interactive mode:
./run-blocking.sh
- (C) Or IF you are on a machine without a GPU run:
./run-cpu.sh
In each script you can set a GROUP
which determines the GPU the server will be using by computing GROUP % 4
. The servers port will be set to 9000 + GROUP
.
Set the address
field and the port
field in llamadoc/llm_interface/server_config.json
to the address and port of the llm-server
:
{
"address": "http://ADDRESS",
"port": PORT
}
You can now use the extension by running the VSCode launch task Run Extension
. Just open a python file containing functions with docstrings in the newly opened VSCode window.
There are two evaluations. One compares the quality of the generated docstrings between the original and the finetuned model: finetuning/evaluation/finetuning-evaluation.ipynb
.
The other evaluation finds the best parameters for the out-of-date test and measures the performance of that test: llm-server/evaluation/out-of-date-evaluation.ipynb
.
- Copy the test data into the right directory:
cp github-scraper/test_data.json finetuning/evaluation/
- Run the evaluation notebook
finetuning/evaluation/finetuning-evaluation.ipynb
.
-
Navigate to
http://ADDRESS:PORT/docs
in your browser and run the/compute-predictions
endpoint once withindex = 0
and once withindex = 1
. This will generate the needed data for the evaluation:llm-server/cache/updated_docstrings_0.json
andllm-server/cache/updated_docstrings_1.json
. -
Run the notebook
llm-server/evaluation/out-of-date-evaluation.ipynb
.