Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v2.3.0+0.1.9+8.19 #44

Merged
merged 190 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
190 commits
Select commit Hold shift + click to select a range
5d7dc8b
Expose one of hidden `GenerationsLogger` tests
GlebSolovev May 27, 2024
799de6b
Design draft benchmarking results
GlebSolovev May 28, 2024
6b8daea
Design and implement single benchmark
GlebSolovev May 28, 2024
c9802e5
Design draft benchmarks setup
GlebSolovev May 30, 2024
448552b
Introduce `BenchmarkingItem`, refactor
GlebSolovev May 30, 2024
2b05da0
Design benchmarking result
GlebSolovev May 30, 2024
547755b
Handle errors, build result in single benchmark
GlebSolovev Jun 1, 2024
0c5e3fa
Design benchmarks logging
GlebSolovev Jun 1, 2024
1e4bb26
Log single benchmark
GlebSolovev Jun 1, 2024
b01d577
Implement and use `asOneRecord` logs
GlebSolovev Jun 1, 2024
8efd10e
Implement `executeBenchmarkingTask`
GlebSolovev Jun 1, 2024
93709d7
Remove unused `AppendLogsBuilder`
GlebSolovev Jun 1, 2024
2c1d0e8
Implement top-level `benchmark` function
GlebSolovev Jun 2, 2024
dcf15c0
Design & implement targets builders
GlebSolovev Jun 2, 2024
679ca2c
Design & implement benchmarking bundles
GlebSolovev Jun 2, 2024
b6a9749
Implement `Experiment` basics
GlebSolovev Jun 2, 2024
40783f6
Introduce `TargetType`, improve merge targets
GlebSolovev Jun 3, 2024
ac6ae57
Implement draft benchmarking items build
GlebSolovev Jun 3, 2024
0d35cd1
Study IPC, design draft processes interaction
GlebSolovev Jun 3, 2024
fb06b93
Fix `BenchmarkingLogger` implementation
GlebSolovev Jun 4, 2024
3aacf95
Improve ajv errors wrappers
GlebSolovev Jun 4, 2024
56ea671
Design & implement child process execution API
GlebSolovev Jun 4, 2024
fec266a
Implement execution as child process
GlebSolovev Jun 4, 2024
20e0644
Refactor subprocess utils file structure
GlebSolovev Jun 4, 2024
b85df54
Implement `SubprocessesScheduler`
GlebSolovev Jun 4, 2024
37454ea
Improve subprocess utils structure
GlebSolovev Jun 4, 2024
0602f79
Implement test wrapper for subprocess executables
GlebSolovev Jun 5, 2024
32ee044
Implement `checkGeneratedProofsInSubprocess`
GlebSolovev Jun 5, 2024
92f8f1d
Design and resolve experiment run options
GlebSolovev Jun 5, 2024
482fc2e
Call subprocess proofs validation in benchmarks
GlebSolovev Jun 5, 2024
b94a350
Refactor common subprocess utils into specific ones
GlebSolovev Jun 5, 2024
65912ef
Implement execute-subprocess-in-workspace command builder
GlebSolovev Jun 5, 2024
d85ca73
Set timeout for subprocess test executor
GlebSolovev Jun 5, 2024
67f4ea9
Serialize & deserialize theorems and ranges
GlebSolovev Jun 23, 2024
c4a9be4
Serialize & deserialize parsed Coq file
GlebSolovev Jun 23, 2024
8a8f961
Map result of child process executor
GlebSolovev Jun 23, 2024
da0ddcb
Extend command builder with workspace preparation
GlebSolovev Jun 23, 2024
bdc1d0e
Implement `buildAndParseCoqProject` via subprocess
GlebSolovev Jun 23, 2024
92cf485
Build benchmarking items of experiment
GlebSolovev Jun 23, 2024
31dcd59
Redesign & improve `buildAndParseCoqProject` to retrieve target goals
GlebSolovev Jun 24, 2024
c3529af
Add theorem reference to parsing result
GlebSolovev Jun 24, 2024
314c61e
Move `llmServiceIdentifier` inside params
GlebSolovev Jun 24, 2024
84bd4be
Finish `buildBenchmarkingItems` according to new parsing models
GlebSolovev Jun 24, 2024
dce8b1b
Support bundles ids to improve items construction
GlebSolovev Jun 24, 2024
762feaa
Fix root path resolution
GlebSolovev Jun 24, 2024
a00c322
Fix scripts & benchmark test files to run them properly
GlebSolovev Jun 24, 2024
4de9b3d
Find logging bug, support logging into file
GlebSolovev Jun 24, 2024
2b1f0ff
Make paths resolution relative to workspaces
GlebSolovev Jun 25, 2024
919a93d
Introduce mock `noWorkspaceRoot` to guarantee no bugs with `undefined…
GlebSolovev Jun 25, 2024
0e4b01c
Support logging in `mergeRequestedTargets`
GlebSolovev Jun 25, 2024
c5902e2
Clear logs file on start
GlebSolovev Jun 25, 2024
88ee0c7
Disable colors when logging to file
GlebSolovev Jun 25, 2024
32594d6
Fix child logger output, improve subprocess logging
GlebSolovev Jun 25, 2024
038eea5
Make child process executables `.test.ts` files to run them as tests
GlebSolovev Jun 25, 2024
1e29ecb
Fix npm test script by supporting `-r` regex flag
GlebSolovev Jun 25, 2024
37b3eed
Fix `asOneRecord` logger, better support records identifiers
GlebSolovev Jun 25, 2024
c37f65d
Fix subprocess execution finishing
GlebSolovev Jun 25, 2024
6fa30e1
Support stdout & stderr redirection for child process
GlebSolovev Jun 25, 2024
635816a
Separate main benchmarking framework from `test-electron`
GlebSolovev Jun 25, 2024
289ae2a
Fix & enable benchmarking logging into console
GlebSolovev Jun 25, 2024
22ba968
Use `node-ipc` package to fix socket communication
GlebSolovev Jun 28, 2024
eebc087
Fix `buildAndParseCoqProject`: open file via `coq-lsp` client
GlebSolovev Jun 28, 2024
c37cbf5
Silent ipc logger, fix logging messages
GlebSolovev Jun 28, 2024
a66256f
Fix targets extracting and sending in `buildAndParseCoqProject`
GlebSolovev Jun 28, 2024
8049365
Improve benchmarking items logs
GlebSolovev Jun 28, 2024
bc88589
Fix `resultSchema` of checking proofs signature
GlebSolovev Jun 28, 2024
c541440
Create benchmarks output directory if doesn't exist
GlebSolovev Jun 28, 2024
232cd4e
Fix & improve logging messages, fix debug logs disabling
GlebSolovev Jun 28, 2024
2b982fe
Fix experiment report to-json serialization
GlebSolovev Jun 28, 2024
7552f1e
Update experiment setup
GlebSolovev Jun 28, 2024
a925978
Generalize `SubprocessesScheduler` to `AsyncScheduler`
GlebSolovev Jul 18, 2024
1e42613
Make items building more efficient: create resolvers only once
GlebSolovev Jul 18, 2024
d93649e
Improve time measurement
GlebSolovev Jul 18, 2024
06a37ac
Fix failed generation result handling
GlebSolovev Jul 18, 2024
4304680
Merge branch 'main' into benchmarking-rework
GlebSolovev Jul 18, 2024
d7fb886
Separate `completionGenerator.ts` code into exposed & private parts
GlebSolovev Jul 18, 2024
2458a2c
Unify `CoqLspClient` builders, remove test dependency from core
GlebSolovev Jul 18, 2024
572ab41
Implement models scheduler
GlebSolovev Jul 19, 2024
464913c
Design & implement single benchmarking task runner
GlebSolovev Jul 19, 2024
adfb6e6
Design dataset caching interface
GlebSolovev Jul 24, 2024
6ea6901
Design dataset cache models
GlebSolovev Jul 24, 2024
a5ed4e9
Implement cache reader
GlebSolovev Jul 24, 2024
79507be
Make `goalToProve` optional in cache
GlebSolovev Jul 24, 2024
506c097
Implement cache controller basics
GlebSolovev Jul 24, 2024
20e35b3
Redesign input targets structures
GlebSolovev Jul 24, 2024
b4e9eab
Support theorems map inside `ParsedCoqFileData`
GlebSolovev Jul 25, 2024
4613aa9
Redesign `BuildAndParseCoqProject` signature
GlebSolovev Jul 25, 2024
fa2a552
Implement step 1 of dataset parsing: read cache
GlebSolovev Jul 25, 2024
c5c34c8
Implement step 2 of dataset parsing: parse missing targets
GlebSolovev Jul 25, 2024
27c2258
Refactor Goal<PpString> & its serialization
GlebSolovev Jul 25, 2024
8eb87f7
Implement step 3 of dataset parsing: update cache structures
GlebSolovev Jul 25, 2024
a7b062e
Implement step 4 of dataset parsing: build items
GlebSolovev Jul 25, 2024
ddcabc7
Move `resolveOrThrow` utils from test to src
GlebSolovev Jul 25, 2024
af16765
Implement step 5 of dataset parsing: save cache
GlebSolovev Jul 25, 2024
5482105
Refactor cache holders: move algorithms away
GlebSolovev Jul 25, 2024
8e44ff3
Implement dataset parsing pipeline
GlebSolovev Jul 26, 2024
95b0c73
Refactor dataset parsing: reorganize into files
GlebSolovev Jul 26, 2024
1c61812
Handle no missing targets case
GlebSolovev Jul 26, 2024
4ec7979
Handle no input targets / benchmarking items case
GlebSolovev Jul 26, 2024
f61a44c
Implement cache modes support
GlebSolovev Jul 26, 2024
aee50e3
Mock fix single-task runner to compile project
GlebSolovev Jul 29, 2024
f9085fa
Fix & improve no-cache run logging
GlebSolovev Jul 29, 2024
0885283
Fix all theorems request if cache is empty bug
GlebSolovev Jul 29, 2024
0778910
Log cache usage mode properly
GlebSolovev Jul 29, 2024
31d7da2
Fix cache parameters late resolution
GlebSolovev Jul 29, 2024
fc359dc
Make benchmarking logs clear & coherent
GlebSolovev Aug 3, 2024
37793e4
Use `.json` extension for cache files
GlebSolovev Aug 3, 2024
f50b140
Fix cache writer bug: faulty top-level quotes
GlebSolovev Aug 3, 2024
88470ce
Rework cached theorem targets initialization
GlebSolovev Aug 4, 2024
43b5bd6
Properly handle empty cache & document cache data
GlebSolovev Aug 4, 2024
623dfeb
Update benchmarking `.gitignore` targets
GlebSolovev Aug 4, 2024
b517b4a
Fix input targets merge: dependent internal structures
GlebSolovev Aug 14, 2024
d817e2c
Fix cache writer not saving more than 1 file
GlebSolovev Aug 14, 2024
b51e7ff
Rework standalone dataset files root
GlebSolovev Aug 14, 2024
8a846d2
Enhance experiment run options setup
GlebSolovev Aug 15, 2024
707863e
Design `buildDatasetCache` DSL
GlebSolovev Aug 15, 2024
0648e48
Implement dataset cache targets building
GlebSolovev Aug 15, 2024
7beaeb0
Fix `cacheWriter` failing to save deep files
GlebSolovev Aug 16, 2024
19139ea
Refactor internal of `Experiment` to support cache-building command
GlebSolovev Aug 16, 2024
90ccdfd
Implement `buildDatasetCache` operation
GlebSolovev Aug 16, 2024
f39ad17
Fix `EXTEND_CACHE` clearing unused cache
GlebSolovev Aug 16, 2024
93b34d4
Separate `TokensCounter` from `ChatTokensFitter`
GlebSolovev Aug 21, 2024
a74271b
Introduce `CompletelyAnalyzedChatHistory`
GlebSolovev Aug 21, 2024
7161524
Make `AnalyzedChat` a complete one
GlebSolovev Aug 21, 2024
9bd8f7e
Make `validateChoices` static for consistency
GlebSolovev Aug 21, 2024
df1c5cb
Wrap generated raw proofs with tokens metrics
GlebSolovev Aug 21, 2024
44b1213
Support new chat properties with `GenerationsLogger`
GlebSolovev Aug 21, 2024
5c7f728
Refactor `llmService`: separate structures
GlebSolovev Aug 21, 2024
bab500e
Support tokens spent & context theorems in benchmarks
GlebSolovev Aug 23, 2024
7d0ce20
Fix interim becnhmarking results file path
GlebSolovev Aug 23, 2024
8571cb4
Make dataset parsing abstract
GlebSolovev Aug 25, 2024
5966f05
Make proofs check abstract
GlebSolovev Aug 25, 2024
9172bc3
Make `Experiment` abstract, introduce `MultiWorkspaces` one
GlebSolovev Aug 25, 2024
7d93a62
Implement `LocalCoqProjectParser`
GlebSolovev Aug 25, 2024
f39b75e
Improve `coqProjectParser` code style
GlebSolovev Aug 26, 2024
bc886e8
Implement `LocalProofsChecker`
GlebSolovev Aug 26, 2024
910b18f
Improve vscode-module warning
GlebSolovev Aug 26, 2024
b4509d1
Implement `SingleWorkspaceExperiment` & its runner
GlebSolovev Aug 26, 2024
148cfe6
Mark source executable tests with tag
GlebSolovev Aug 30, 2024
de2ebb5
Structure `experiment` folder
GlebSolovev Sep 2, 2024
249967f
Structure `structures` folder
GlebSolovev Sep 2, 2024
6cc2c8f
Structure benchmarking `utils`
GlebSolovev Sep 2, 2024
2b6fc57
Restructure benchmarking core
GlebSolovev Sep 2, 2024
4e6413e
Move `time` module to top-level utils
GlebSolovev Sep 2, 2024
c61b3aa
Make proofs-checking logging less contaminated
GlebSolovev Sep 2, 2024
abceb8c
Improve benchmarking logging
GlebSolovev Sep 2, 2024
3b913d8
Fix file targets resolution algorithm, improve code style
GlebSolovev Sep 3, 2024
ba4e45f
Support spent tokens logging
GlebSolovev Sep 3, 2024
5d25692
Support `proofGenerationRetries` parameter
GlebSolovev Sep 5, 2024
c685b54
Implement fail-fast strategy for benchmarks
GlebSolovev Sep 5, 2024
45cec61
Move `delay` from test to common utils
GlebSolovev Sep 5, 2024
5ba4898
Fix error casting always pass, refactor errors logging
GlebSolovev Sep 5, 2024
690d00c
Fix lsp-v in README: issue#40
K-dizzled Sep 12, 2024
363e210
Update default model to gpt4o: issue#41
K-dizzled Sep 12, 2024
a4c9835
Make `SerializedGoal` a common util
GlebSolovev Sep 19, 2024
ba61cc6
Extend & refactor equality utils
GlebSolovev Oct 2, 2024
f48360d
Implement lighweight benchmarking items serialization
GlebSolovev Oct 2, 2024
7a778b6
Restructure file utils
GlebSolovev Oct 2, 2024
561691a
Refactor execution context preparation
GlebSolovev Oct 7, 2024
faa0d23
Restructure lightweight serialization, support service id in params
GlebSolovev Oct 7, 2024
7754d86
Implement lightweight items execution
GlebSolovev Oct 7, 2024
6e15020
Implement TeamCity `Experiment`-s
GlebSolovev Oct 7, 2024
97f375a
Support TeamCity benchmark setup
GlebSolovev Oct 7, 2024
3a43156
Generate TeamCity example input
GlebSolovev Oct 7, 2024
52b259a
Remove rudimentary single-task runner
GlebSolovev Oct 7, 2024
61218d5
Separate legacy benchmark
GlebSolovev Oct 7, 2024
e508f36
Update `benchmarkSingleCompletionGeneration` docs
GlebSolovev Oct 7, 2024
c872a71
Sketch report holder class for benchmark
K-dizzled Jun 6, 2024
d4735d4
Finish POV for the automatic report generation
K-dizzled Jun 6, 2024
9f4746f
Allow additional imports in benchmarks + generalize report
K-dizzled Jun 8, 2024
e72432c
Fix bug with Additional imports
K-dizzled Jun 18, 2024
132a242
Fix markdown report representation generation
K-dizzled Jun 21, 2024
29b76a6
Get rid of the projectRooot constant in benchmark
K-dizzled Jun 26, 2024
52156f6
Fixed premise selection with different metrics
khram2003 Sep 30, 2024
f744799
Fix bug with dublication of the output channel
K-dizzled Sep 30, 2024
af0a679
Fix last rebase & format code
GlebSolovev Oct 7, 2024
5c75a6a
Update supported openai models
GlebSolovev Oct 7, 2024
76920be
Fix code style: naming & repetitions
GlebSolovev Oct 7, 2024
937f059
Update tests with new openai models
GlebSolovev Oct 7, 2024
e11e740
Create benchmarking framework guide
GlebSolovev Oct 8, 2024
9c9b428
Fix table of contents in benchmarking guide
GlebSolovev Oct 8, 2024
c3324ac
Fix typos & links in benchmarking guide
GlebSolovev Oct 8, 2024
2fede9f
Update `imm` submodule
GlebSolovev Oct 8, 2024
7618911
Merge branch 'v2.3.0-dev' into benchmarking-rework
GlebSolovev Oct 8, 2024
0f4605e
Optimize parsing of prove-theorem targets
GlebSolovev Oct 9, 2024
6c1e16f
Merge pull request #43 from JetBrains-Research/benchmarking-rework
GlebSolovev Oct 9, 2024
e7359f0
Update CHANGELOG for release v2.3.0
K-dizzled Oct 15, 2024
ec4bc2c
Update default coq-lsp version from 0.1.8 to 0.1.9
K-dizzled Oct 15, 2024
4366bf4
Update setup OCaml action version to prevent dep issue
K-dizzled Oct 15, 2024
e88b14e
Fix test timeouts for MacOS CI passing
K-dizzled Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ on:

env:
coqlsp-path: "coq-lsp"
coqlsp-version: "0.1.8+8.19"
coqlsp-version: "0.1.9+8.19"
artifact-name: ubuntu-latest-build

jobs:
Expand Down Expand Up @@ -51,7 +51,7 @@ jobs:
restore-keys: opam-${{ matrix.os }}-${{ matrix.ocaml-compiler }}-

- name: Set-up OCaml ${{ matrix.ocaml-compiler }}
uses: ocaml/setup-ocaml@v2
uses: ocaml/setup-ocaml@v3.0.10
with:
ocaml-compiler: ${{ matrix.ocaml-compiler }}
dune-cache: true
Expand All @@ -60,7 +60,7 @@ jobs:
env:
OPAMYES: true
run: |
opam install coq-lsp.0.1.8+8.19
opam install coq-lsp.0.1.9+8.19
eval $(opam env)

- name: Install Node.js
Expand Down
12 changes: 9 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,15 @@ src/test/resources/coqProj/Makefile.coq
src/test/resources/coqProj/Makefile.coq.conf
src/test/resources/**/.vscode

# Ignore the generated build files in datatests inside benchmarks
## Benchmarking files

# Ignore generated build files inside dataset
dataset/**/result
dataset/**/.vscode/

# Ignore private files used while benchmarking
src/test/benchmark/benchmarkPrivate/
# Ignore logs & cache
benchmarkLogs/
.cache/

# Ignore private files (outdated)
src/test/benchmark/benchmarkPrivate/
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# Changelog

## 2.3.0

**A major upgrade of the benchmarking system**: At the moment, only a little **new** functionality is provided; moreover, the ability to run benchmarks on Tactician/CoqHammer is temporarily unavailable. However, it will soon be restored and improved. Excessive work has been done to make the benchmarking system more flexible, secure, robust, self-contained, and easy to use. Experiments via our benchmarking framework have been made more accessible than ever. The configurability and reliability of the pipeline have been improved drastically. In a nutshell, the main features of the improved benchmarking system include:
- Flexible DSL for input setup
- Extensive configuration for fine-grained experiments
- Support for single and multi-workspace runs
- Dataset caching
- Comprehensive logging
- Fail-fast strategy
- Additional metrics (tokens used, context theorems, proof generation stats)

An extensive benchmarking guide is now available in [BENCHMARKING_FRAMEWORK_GUIDE.md](etc/docs/benchmark/BENCHMARKING_FRAMEWORK_GUIDE.md).

We are looking forward to your feedback and suggestions for further improvements/new features.

## 2.2.7

### Internal changes:
Expand Down
17 changes: 13 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

## Requirements

* `coq-lsp` version `0.1.8+8.19.0` is currently required to run the extension.
* `coq-lsp` version `0.1.9+8.19` is currently required to run the extension.

## Brief technical overview

Expand All @@ -42,7 +42,7 @@ For each service, an array of models could be defined through the settings. Each

When `CoqPilot` completion command is issued, it parses the currently opened file, extracts theorems that have complete proofs and processes them into a message history for the LLM. It helps LLM to keep the style and hallucinate less.

For each `admit.` present in the file, an independent completion process is issued. If a valid proof is found, it is substituted in the editor. `CoqPilot` also allows a multi-round fixing procedure for the proofs from the LLM. I.e. if the proof was incorrect, compiler message could be automatically sent to the LLM with a request to repair it. It can now be configured in the settings. One can set the amount of attempts for the consequtive proof fixing with compiler feedback.
For each `admit.` present in the file, an independent completion process is issued. If a valid proof is found, it is substituted in the editor. `CoqPilot` also allows a multi-round fixing procedure for the proofs from the LLM. I.e. if the proof was incorrect, compiler message could be automatically sent to the LLM with a request to repair it. It can now be configured in the settings. One can set the number of attempts for the consequtive proof fixing with compiler feedback.

As soon as at least one valid proof is found, it is substituted in the editor and the process is finished.

Expand All @@ -66,7 +66,7 @@ As soon as at least one valid proof is found, it is substituted in the editor an

To run the extension, you must install a `coq-lsp` server. Depending on the system used in your project, you should install it using `opam` or `nix`. A well-configured `nix` project should have the `coq-lsp` server installed as a dependency. To install `coq-lsp` using `opam`, you can use the following commands:
```bash
opam pin add coq-lsp 0.1.8+8.19.0
opam pin add coq-lsp 0.1.9+8.19
opam install coq-lsp
```
For more information on how to install `coq-lsp` please refer to [coq-lsp](https://github.com/ejgallego/coq-lsp).
Expand Down Expand Up @@ -213,7 +213,16 @@ git submodule update
```
After that, you need to build the projects. Be careful, the actively maintained way to build this projects is `nix`. Moreover, when adding your own projects, make sure that they are built using `coq-8.19.0`.

First things first, the process of running the benchmark is not perfectly automated yet. We are working on it. For now, one project (one unit containing nix environment) shall be ran at a time. Let's say you are going to run the benchmark on the `imm` project. You will have to do the following:
### New framework (beta)

The new benchmarking framework with extended capabilities is now available.
However, it is still in the testing phase, so some bugs and missing features may be present.

To use it, follow the instructions in the [`BENCHMARKING_FRAMEWORK_GUIDE.md`](etc/docs/benchmark/BENCHMARKING_FRAMEWORK_GUIDE.md).

### Legacy framework

The process of running the benchmark is not perfectly automated and we are working on it. For now, one project (one unit containing nix environment) shall be ran at a time. Let's say you are going to run the benchmark on the `imm` project. You will have to do the following:

<!-- 0. Go the the `imm` subdirectory and add a `_CoqProject` file in the root with the following:
```
Expand Down
File renamed without changes.
File renamed without changes.
23 changes: 23 additions & 0 deletions dataset/teamCityExampleInput/items/0-auto-benchmark-v-test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"task": {
"goalToProve": "{\"info\":{\"evar\":[\"Ser_Evar\",1],\"name\":null},\"hyps\":[],\"ty\":\"forall (A : Type) (P : A -> Prop) (x : A), P x -> P x\"}",
"positionRange": {
"start": {
"line": 2,
"character": 4
},
"end": {
"line": 2,
"character": 10
}
},
"targetType": "ADMIT",
"relativeSourceFilePath": "auto_benchmark.v",
"sourceTheoremName": "test",
"relativeWorkspacePath": "standalone-source-files"
},
"targetModelIds": [
"invalid-proof",
"prove-with-auto"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"task": {
"goalToProve": "{\"info\":{\"evar\":[\"Ser_Evar\",36],\"name\":null},\"hyps\":[{\"names\":[\"n\"],\"def\":null,\"ty\":\"nat\"}],\"ty\":\"0 + n = n\"}",
"positionRange": {
"start": {
"line": 35,
"character": 4
},
"end": {
"line": 35,
"character": 10
}
},
"targetType": "ADMIT",
"relativeSourceFilePath": "auto_benchmark.v",
"sourceTheoremName": "test_thr",
"relativeWorkspacePath": "standalone-source-files"
},
"targetModelIds": [
"invalid-proof",
"prove-with-auto"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"task": {
"goalToProve": "{\"info\":{\"evar\":[\"Ser_Evar\",7],\"name\":null},\"hyps\":[],\"ty\":\"forall n m : nat, n + m = m + n\"}",
"positionRange": {
"start": {
"line": 12,
"character": 4
},
"end": {
"line": 12,
"character": 10
}
},
"targetType": "ADMIT",
"relativeSourceFilePath": "mixed_benchmark.v",
"sourceTheoremName": "add_comm",
"relativeWorkspacePath": "standalone-source-files"
},
"targetModelIds": [
"invalid-proof",
"prove-with-auto"
]
}
8 changes: 8 additions & 0 deletions dataset/teamCityExampleInput/models/invalid-proof.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"ranker": "random",
"modelId": "invalid-proof",
"tactics": [
"a."
],
"llmServiceIdentifier": 0
}
8 changes: 8 additions & 0 deletions dataset/teamCityExampleInput/models/prove-with-auto.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"ranker": "random",
"modelId": "prove-with-auto",
"tactics": [
"auto."
],
"llmServiceIdentifier": 0
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"relativeDirectoryPath": "standalone-source-files",
"requiresNixEnvironment": false
}
Loading