- A request is sent to BigCode Web, which generates a CSV file of projects for benchmarking based on specified parameters.
- The generated CSV file is processed using the script
gather_repo_stats.py
, which enhances the project data with:- The version of Gradle used in the project.
- The latest commit hash.
- The commit hash of the last release.
- Projects are filtered based on defined criteria (e.g., using a Gradle version within a specified range).
- Suitable projects are added to
extended_benchmarks.csv
, which serves as the foundation for subsequent benchmarking.
The main entry point for the application is the Python script run.py
. It can be configured using the config.properties
file with the following parameters:
-
benchmarks_csv
Path to the CSV file containing benchmark definitions.
Example:benchmark_scripts/extended_benchmarks.csv
-
benchmarks_dir
Directory for storing benchmark results.
Example:results/benchmarks
-
groovy_build_files
Directory containing original Groovy build files.
Example:results/original_build_files
-
logs_dir
Directory for saving logs.
Example:results/logs
-
number_of_conversion_tries_per_project
Number of attempts to convert each project.
Example:3
-
llm_temperature
Temperature setting for the language model.
Example:0.3
-
llm_prompt_path
File path to the prompt used by the language model.
Example:llm_prompt.txt
-
grazie_token_path
File path to the Grazie API token.
Example:grazie_token.txt
-
java_path
Path to the Java SDK installation.
Example:/Library/Java/JavaVirtualMachines/jdk-17.0.9.jdk/Contents/Home/
-
android_path
Path to the Android SDK installation.
Example:/Users/user/Library/Android/sdk/
-
batch_size
Number of projects processed in each batch.
Example:50
-
qodana_token_path
File path to the Qodana API token.
Example:qodana_token.txt
-
github_token_path
File path to the GitHub API token.
Example:github_token.txt
-
qodana_yaml_path
Path to the Qodana configuration YAML file.
Example:/Users/user/llm-gradle/llm_gradle_converter/qodana.yaml
-
qodana_inspection_path
Path to the Qodana inspection script.
Example:/Users/user/llm-gradle/llm_gradle_converter/inspections/extractGradleDataInspection.inspection.kts
-
gradle_data_plugin_jar_path
Path to the Gradle data extraction plugin JAR file. Used for Qodana configuration.
Example:/Users/user/llm-gradle/GradleDataExtractor/build/libs/instrumented-GradleDataExtractor-1.0.0.jar
-
gradle_data_plugin_xml_path
Path to the Gradle plugin XML definition file. Used for Qodana configuration. Example:/Users/kristina/llm-gradle/GradleDataExtractor/src/main/resources/META-INF/plugin.xml
-
Benchmark Setup
The script reads the CSV table of benchmark definitions frombenchmarks_csv
. -
Project Processing
- Projects are downloaded in batches as specified by
batch_size
. - Each project is validated to ensure it builds successfully using
./gradlew build
without additional configuration:- If the build fails, the project is discarded.
- If the build succeeds, the project proceeds to the next step.
- Projects are downloaded in batches as specified by
-
Gradle Conversion
The script triggers the Gradle conversion process using the following component:src/main/kotlin/org/jetbrains/research/llmconverter/GradleConversionManager.kt
This Kotlin class manages the conversion process:- Interacts with Grazie to perform conversion via ChatGPT-o/ChatGPT-Turbo.
- Checks the converted project for buildability using
./gradlew build
. - If it builds run a data extraction process on the project using Qodana custom inspection.
- Compares two extracted Project Models and saves the comparison result to
logs/project-name/comparison_results.log
-
Project Model Extraction
- The project model is extracted using a custom Qodana inspection script:
inspections/extractGradleDataInspection.inspection.kts
- Qodana is configured and launched using the following files:
- Qodana configuration file:
qodana.yaml
- Qodana launch script:
run_qodana.sh
- Qodana configuration file:
- During the inspection process, Qodana invokes the GradleDataExtractor plugin, which performs the actual extraction of the project model.
- The project model is extracted using a custom Qodana inspection script:
-
Result Storage
Conversion results are saved in the directory specified bylogs_dir
.