GitHub

Each folder cordis, sdss, oncomx holds the relevant files (i.e. seed data, synth data, and dev data) for each of the datasets. Additionally each file contains a tables.json file, which contains a json structure of the database schema including table names, column names, column data types and primary/foreign key relationships.

The following is an example of the file structure:

dev.json --> the manually generated development dataset
seed.json --> the manually generated seed dataset
synth.json --> the synthetically generated dataset using the seed query templates
tables.json --> a json representation of the schema containing:
- the database name ("db_id"),
- free text table names for NLP pipelines ("table_names") e.g. "Stellar spectral line indices" vs "spplines"
- original table names ("table_names_original") i.e. the table names as they are in the database
- free text column names for NLP pipelines ("column_names")
- original column names ("column_names_original") i.e. the column names as they are in the database
- column data types ("column_types"): time, text or number
- foreign key relationships("foreign_keys")
- primary keys ("primary_keys")

The PostgreSQL databases for each of the 3 databases used for this benchmark can be found at the following links: CORDIS SDSS OncoMX

PostgreSQL specification: DBMS: PostgreSQL (ver. 9.5.20) Case sensitivity: plain=lower, delimited=exact Driver: PostgreSQL JDBC Driver (ver. 42.5.0, JDBC4.2)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cordis		cordis
oncomx		oncomx
sdss		sdss
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

ruoyiqiao/sciencebenchmark_dataset

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages