Skip to content

ruoyiqiao/sciencebenchmark_dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Each folder cordis, sdss, oncomx holds the relevant files (i.e. seed data, synth data, and dev data) for each of the datasets. Additionally each file contains a tables.json file, which contains a json structure of the database schema including table names, column names, column data types and primary/foreign key relationships.

The following is an example of the file structure:

  • dev.json --> the manually generated development dataset
  • seed.json --> the manually generated seed dataset
  • synth.json --> the synthetically generated dataset using the seed query templates
  • tables.json --> a json representation of the schema containing:
    • the database name ("db_id"),
    • free text table names for NLP pipelines ("table_names") e.g. "Stellar spectral line indices" vs "spplines"
    • original table names ("table_names_original") i.e. the table names as they are in the database
    • free text column names for NLP pipelines ("column_names")
    • original column names ("column_names_original") i.e. the column names as they are in the database
    • column data types ("column_types"): time, text or number
    • foreign key relationships("foreign_keys")
    • primary keys ("primary_keys")

The PostgreSQL databases for each of the 3 databases used for this benchmark can be found at the following links: CORDIS SDSS OncoMX

PostgreSQL specification: DBMS: PostgreSQL (ver. 9.5.20) Case sensitivity: plain=lower, delimited=exact Driver: PostgreSQL JDBC Driver (ver. 42.5.0, JDBC4.2)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PLpgSQL 100.0%