CORE Datasets

Structure

The datasets CQPy, CQPyUs and the SQJava are as mentioned in our research, a subset of the CodeQueries dataset (CQPy, CQPyUs) and subset of the Sorald Dataset. The overall structure of these are as follows.

<dataset>
    <query>
        <1.py/1.java>
        <2.py/2.java>
        .
        .
        .
        <n.py/n.java>
        result.csv

Where <dataset> refers to dataset being used in question, <query> is the folder corresponding to the files with warnings from corresponding query. Each query subfolder contains n number of files which constitute the parent files that have flagged checks alongwith a results.csv file which summarizes the checks flagged in all the files within the query subfolder in format similar to CodeQL output.

To extract these datasets, simply run from the project directory

tar -xvf dataset/<dataset>.tar.gz -C dataset

to extract the dataset in the dataset dir. The structure defined above is how our scripts expect the files to be present and thus simply run our various scripts on these datasets by giving them as arguments as dataset/<dataset> (from project dir).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CORE Datasets

Structure

Files

README.md

Latest commit

History

README.md

File metadata and controls

CORE Datasets

Structure