Skip to content

The purpose of this project is to scrape and collect sequencing project information from NCBI and store it into a database.

Notifications You must be signed in to change notification settings

the-eon-flux/Database_Creation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Database_Creation

Objective :

  1. To scrape all the relevant information from the GEO website for provided GSE IDs and store it into a database of your choice.
  2. After creating the database we have to annotate the biological keywords in the summary & store the annotated keywords of each dataset within the Database
  3. Write a query to get all the dataset IDs which contain disease keyword.

GSE IDs: GSE63312, GSE78224, GSE74018, GSE50734, GSE114644, GSE60477, GSE53599, GSE80582, GSE109493, GSE35200

Milestones achieved

1.) Get basic info using GSE IDs from NCBI website for some sequencing projects. 2.) Annotate the summary info from above scraped data using becas apis. (not working due to some server error from their side) 3.) Creates a database and add the fetched data into it. (Eg. query's to fetch & retrieve data)

About

The purpose of this project is to scrape and collect sequencing project information from NCBI and store it into a database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages