Skip to content

Commit

Permalink
Release v2.3.0+0.1.9+8.19
Browse files Browse the repository at this point in the history
Merge pull request #44 from JetBrains-Research/v2.3.0-dev
  • Loading branch information
K-dizzled authored Oct 16, 2024
2 parents b13806e + e88b14e commit 7e728cf
Show file tree
Hide file tree
Showing 218 changed files with 12,291 additions and 891 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ on:

env:
coqlsp-path: "coq-lsp"
coqlsp-version: "0.1.8+8.19"
coqlsp-version: "0.1.9+8.19"
artifact-name: ubuntu-latest-build

jobs:
Expand Down Expand Up @@ -51,7 +51,7 @@ jobs:
restore-keys: opam-${{ matrix.os }}-${{ matrix.ocaml-compiler }}-

- name: Set-up OCaml ${{ matrix.ocaml-compiler }}
uses: ocaml/setup-ocaml@v2
uses: ocaml/setup-ocaml@v3.0.10
with:
ocaml-compiler: ${{ matrix.ocaml-compiler }}
dune-cache: true
Expand All @@ -60,7 +60,7 @@ jobs:
env:
OPAMYES: true
run: |
opam install coq-lsp.0.1.8+8.19
opam install coq-lsp.0.1.9+8.19
eval $(opam env)
- name: Install Node.js
Expand Down
12 changes: 9 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,15 @@ src/test/resources/coqProj/Makefile.coq
src/test/resources/coqProj/Makefile.coq.conf
src/test/resources/**/.vscode

# Ignore the generated build files in datatests inside benchmarks
## Benchmarking files

# Ignore generated build files inside dataset
dataset/**/result
dataset/**/.vscode/

# Ignore private files used while benchmarking
src/test/benchmark/benchmarkPrivate/
# Ignore logs & cache
benchmarkLogs/
.cache/

# Ignore private files (outdated)
src/test/benchmark/benchmarkPrivate/
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# Changelog

## 2.3.0

**A major upgrade of the benchmarking system**: At the moment, only a little **new** functionality is provided; moreover, the ability to run benchmarks on Tactician/CoqHammer is temporarily unavailable. However, it will soon be restored and improved. Excessive work has been done to make the benchmarking system more flexible, secure, robust, self-contained, and easy to use. Experiments via our benchmarking framework have been made more accessible than ever. The configurability and reliability of the pipeline have been improved drastically. In a nutshell, the main features of the improved benchmarking system include:
- Flexible DSL for input setup
- Extensive configuration for fine-grained experiments
- Support for single and multi-workspace runs
- Dataset caching
- Comprehensive logging
- Fail-fast strategy
- Additional metrics (tokens used, context theorems, proof generation stats)

An extensive benchmarking guide is now available in [BENCHMARKING_FRAMEWORK_GUIDE.md](etc/docs/benchmark/BENCHMARKING_FRAMEWORK_GUIDE.md).

We are looking forward to your feedback and suggestions for further improvements/new features.

## 2.2.7

### Internal changes:
Expand Down
17 changes: 13 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

## Requirements

* `coq-lsp` version `0.1.8+8.19.0` is currently required to run the extension.
* `coq-lsp` version `0.1.9+8.19` is currently required to run the extension.

## Brief technical overview

Expand All @@ -42,7 +42,7 @@ For each service, an array of models could be defined through the settings. Each

When `CoqPilot` completion command is issued, it parses the currently opened file, extracts theorems that have complete proofs and processes them into a message history for the LLM. It helps LLM to keep the style and hallucinate less.

For each `admit.` present in the file, an independent completion process is issued. If a valid proof is found, it is substituted in the editor. `CoqPilot` also allows a multi-round fixing procedure for the proofs from the LLM. I.e. if the proof was incorrect, compiler message could be automatically sent to the LLM with a request to repair it. It can now be configured in the settings. One can set the amount of attempts for the consequtive proof fixing with compiler feedback.
For each `admit.` present in the file, an independent completion process is issued. If a valid proof is found, it is substituted in the editor. `CoqPilot` also allows a multi-round fixing procedure for the proofs from the LLM. I.e. if the proof was incorrect, compiler message could be automatically sent to the LLM with a request to repair it. It can now be configured in the settings. One can set the number of attempts for the consequtive proof fixing with compiler feedback.

As soon as at least one valid proof is found, it is substituted in the editor and the process is finished.

Expand All @@ -66,7 +66,7 @@ As soon as at least one valid proof is found, it is substituted in the editor an

To run the extension, you must install a `coq-lsp` server. Depending on the system used in your project, you should install it using `opam` or `nix`. A well-configured `nix` project should have the `coq-lsp` server installed as a dependency. To install `coq-lsp` using `opam`, you can use the following commands:
```bash
opam pin add coq-lsp 0.1.8+8.19.0
opam pin add coq-lsp 0.1.9+8.19
opam install coq-lsp
```
For more information on how to install `coq-lsp` please refer to [coq-lsp](https://github.com/ejgallego/coq-lsp).
Expand Down Expand Up @@ -213,7 +213,16 @@ git submodule update
```
After that, you need to build the projects. Be careful, the actively maintained way to build this projects is `nix`. Moreover, when adding your own projects, make sure that they are built using `coq-8.19.0`.

First things first, the process of running the benchmark is not perfectly automated yet. We are working on it. For now, one project (one unit containing nix environment) shall be ran at a time. Let's say you are going to run the benchmark on the `imm` project. You will have to do the following:
### New framework (beta)

The new benchmarking framework with extended capabilities is now available.
However, it is still in the testing phase, so some bugs and missing features may be present.

To use it, follow the instructions in the [`BENCHMARKING_FRAMEWORK_GUIDE.md`](etc/docs/benchmark/BENCHMARKING_FRAMEWORK_GUIDE.md).

### Legacy framework

The process of running the benchmark is not perfectly automated and we are working on it. For now, one project (one unit containing nix environment) shall be ran at a time. Let's say you are going to run the benchmark on the `imm` project. You will have to do the following:

<!-- 0. Go the the `imm` subdirectory and add a `_CoqProject` file in the root with the following:
```
Expand Down
File renamed without changes.
File renamed without changes.
23 changes: 23 additions & 0 deletions dataset/teamCityExampleInput/items/0-auto-benchmark-v-test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"task": {
"goalToProve": "{\"info\":{\"evar\":[\"Ser_Evar\",1],\"name\":null},\"hyps\":[],\"ty\":\"forall (A : Type) (P : A -> Prop) (x : A), P x -> P x\"}",
"positionRange": {
"start": {
"line": 2,
"character": 4
},
"end": {
"line": 2,
"character": 10
}
},
"targetType": "ADMIT",
"relativeSourceFilePath": "auto_benchmark.v",
"sourceTheoremName": "test",
"relativeWorkspacePath": "standalone-source-files"
},
"targetModelIds": [
"invalid-proof",
"prove-with-auto"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"task": {
"goalToProve": "{\"info\":{\"evar\":[\"Ser_Evar\",36],\"name\":null},\"hyps\":[{\"names\":[\"n\"],\"def\":null,\"ty\":\"nat\"}],\"ty\":\"0 + n = n\"}",
"positionRange": {
"start": {
"line": 35,
"character": 4
},
"end": {
"line": 35,
"character": 10
}
},
"targetType": "ADMIT",
"relativeSourceFilePath": "auto_benchmark.v",
"sourceTheoremName": "test_thr",
"relativeWorkspacePath": "standalone-source-files"
},
"targetModelIds": [
"invalid-proof",
"prove-with-auto"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"task": {
"goalToProve": "{\"info\":{\"evar\":[\"Ser_Evar\",7],\"name\":null},\"hyps\":[],\"ty\":\"forall n m : nat, n + m = m + n\"}",
"positionRange": {
"start": {
"line": 12,
"character": 4
},
"end": {
"line": 12,
"character": 10
}
},
"targetType": "ADMIT",
"relativeSourceFilePath": "mixed_benchmark.v",
"sourceTheoremName": "add_comm",
"relativeWorkspacePath": "standalone-source-files"
},
"targetModelIds": [
"invalid-proof",
"prove-with-auto"
]
}
8 changes: 8 additions & 0 deletions dataset/teamCityExampleInput/models/invalid-proof.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"ranker": "random",
"modelId": "invalid-proof",
"tactics": [
"a."
],
"llmServiceIdentifier": 0
}
8 changes: 8 additions & 0 deletions dataset/teamCityExampleInput/models/prove-with-auto.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"ranker": "random",
"modelId": "prove-with-auto",
"tactics": [
"auto."
],
"llmServiceIdentifier": 0
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"relativeDirectoryPath": "standalone-source-files",
"requiresNixEnvironment": false
}
Loading

0 comments on commit 7e728cf

Please sign in to comment.