-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extracting GEO information (including links to BioSample) #4
Comments
Thanks, @bswhite! I'll take a closer look. And yes, I recognize that all the scripts I keep asking about apparently live in the repo for the package I created... |
As far as I can tell, sex is embedded in the characteristics GEO field -- though I don't see that this is required and the means of doing so doesn't appear to be standardized. That said, from a few examples, it seems to follow the form Table 1 of this publication lists a bunch of GEO datasets with male/female annotated: I have included 3 examples from this table, each of which was generated by a command line: I suggest we just grep/pattern match for these common cases -- we don't have to catch all datasets. Let's just catch the common cases. Here are a few examples: pattern: gender: M$ more GSE19188-metadata.tsv | cut -f2 | head -3 pattern: Sex: Female$ more GSE14814-metadata.tsv | cut -f2 | head -3 pattern: Sex: mmore GSE33113-metadata.tsv | cut -f2 | head -3 |
The get-geo-annotations.R script here:
https://github.com/Sage-Bionetworks/syndccutils/blob/master/R/scripts/get-geo-annotations.R
extracts links on a per-sample/per-file basis to BioSample. e.g., the following is returned by that script for GSE109089
geo_accession relation
GSM2931519 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN08354877; SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX3554420
GSM2931520 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN08354876; SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX3554421
...
Now, can we backtrack and get the BioSample "dataset" associated with SAMN08354877 and SAMN08354876?
Nothing comes up when I google "SAMN08354876"
The text was updated successfully, but these errors were encountered: