-
Notifications
You must be signed in to change notification settings - Fork 43
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a6fe7c7
commit c2f8b82
Showing
14 changed files
with
47 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# InterCode-CTF Dataset | ||
For a comprehensive description of InterCode-CTF, please refer to the workshop paper we wrote that discusses the motivation behind this environment, the formulation of Capture the Flag as an interactive coding task, the collection procedure, and some initial experiments evaluating existing SOTA LMs on this dataset: | ||
|
||
**[Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag](https://john-b-yang.github.io/static/misc/preprint_InterCode_CTF.pdf)** | ||
[John Yang](https://john-b-yang.github.io/), [Akshara Prabhakar](https://aksh555.github.io/), [Shunyu Yao](https://ysymyth.github.io/), [Kexin Pei](https://sites.google.com/site/kexinpeisite/), [Karthik Narasimhan](https://www.cs.princeton.edu/~karthikn/) | ||
|
||
## Task Description | ||
Task instances were manually collected from the [picoCTF](https://play.picoctf.org/practice) platform. | ||
Each task instance has three components: | ||
* `query`: Natural language instruction | ||
* `gold`: The correct flag | ||
* `task_assets/{task_id}`: Digital assets (e.g., code, images, executables) necessary for task completion | ||
|
||
**Initialization.** At the beginning of each task episode, a task worker is given the `query` as the first observation. | ||
The task worker must then interact with a Bash shell within an Ubuntu OS to solve the task. | ||
Each task episode also places the task worker within a task instance-specific (`task_assets/{task_id}`) folder of the shell. | ||
|
||
**Task Completion.** The task worker can then write Python and Bash commands to navigate the shell, investigate the given digital assets, and attempt to find the flag. | ||
If it finds a flag, formatted as `picoCTF{...}` it must then run `submit picoCTF{...}` as an action. If the submitted value matches the `gold` flag, task completion is successful. | ||
|
||
## Dataset | ||
The `ic_ctf.json` file contains the following fields: `{task_id, query, gold, source, tags}`. | ||
* `query` and `gold` contains the aforementioned information | ||
* `task_id` is a unique identifier connecting the instance with the associated assets in `task_assets/` | ||
* `source` is a URL to the original problem found on `picoCTF` | ||
* `tags` is a list of the categories associated with the task instance |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# NL2Bash Dataset Transformation | ||
* [Link](https://github.com/TellinaTool/nl2bash) to NL2Bash. | ||
|
||
## Dataset Description | ||
This dataset consists of 200 natural language to bash command `[query, gold]` pairs acquired from the NL2Bash dataset. | ||
To ensure that the `gold` solutions to each query are non-trivial, entities such as folders and files have been renamed in both `query` and `gold` in order to ground the solution in a file system. | ||
|
||
To this end, we also create four different file systems for a more diverse evaluation setting: | ||
* [fs_1](nl2bash_fs_1.json) has 60 queries based on the file system defined in [setup_fs_1](../../docker/bash_scripts/setup_nl2b_fs_1.sh) | ||
* [fs_2](nl2bash_fs_1.json) has 53 queries based on the file system defined in [setup_fs_2](../../docker/bash_scripts/setup_nl2b_fs_2.sh) | ||
* [fs_3](nl2bash_fs_3.json) has 60 queries based on the file system defined in [setup_fs_3](../../docker/bash_scripts/setup_nl2b_fs_3.sh) | ||
* [fs_4](nl2bash_fs_4.json) has 27 general bash queries that are not grounded to any specific file system |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters