Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extracting GEO information (including links to BioSample) #4

Open
bswhite opened this issue Apr 29, 2020 · 2 comments
Open

extracting GEO information (including links to BioSample) #4

bswhite opened this issue Apr 29, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@bswhite
Copy link

bswhite commented Apr 29, 2020

The get-geo-annotations.R script here:
https://github.com/Sage-Bionetworks/syndccutils/blob/master/R/scripts/get-geo-annotations.R

extracts links on a per-sample/per-file basis to BioSample. e.g., the following is returned by that script for GSE109089

geo_accession relation
GSM2931519 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN08354877; SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX3554420
GSM2931520 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN08354876; SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX3554421
...

Now, can we backtrack and get the BioSample "dataset" associated with SAMN08354877 and SAMN08354876?

Nothing comes up when I google "SAMN08354876"

@jaeddy jaeddy added the enhancement New feature or request label Apr 29, 2020
@jaeddy
Copy link

jaeddy commented Apr 29, 2020

Thanks, @bswhite! I'll take a closer look.

And yes, I recognize that all the scripts I keep asking about apparently live in the repo for the package I created...

@bswhite
Copy link
Author

bswhite commented Apr 29, 2020

As far as I can tell, sex is embedded in the characteristics GEO field -- though I don't see that this is required and the means of doing so doesn't appear to be standardized. That said, from a few examples, it seems to follow the form
sex: m
sex: Female
gender: Male
gender: f
i.e., can use either "sex: " or "gender: ", use "m"/"f" or "male"/"female", and be capitalized or not.

Table 1 of this publication lists a bunch of GEO datasets with male/female annotated:
https://link.springer.com/article/10.1007/s00204-015-1632-4#Tab1
This list may be biased in the way that sex is specified. But it make also give alternate ways to specify sex.

I have included 3 examples from this table, each of which was generated by a command line:
Rscript ./get-geo-annotations.R --gse=GSE19188 > GSE19188-metadata.tsv
Evidently, I can't attach tsv's here. Blah.

I suggest we just grep/pattern match for these common cases -- we don't have to catch all datasets. Let's just catch the common cases.

Here are a few examples:

pattern: gender: M

$ more GSE19188-metadata.tsv | cut -f2 | head -3
characteristics_ch1
tissue type: tumor;cell type: LCC;overall survival: 12.5;status: deceased;gender: M
tissue type: healthy;cell type: healthy;overall survival: Not available;status: Not available;gender: Not available

pattern: Sex: Female

$ more GSE14814-metadata.tsv | cut -f2 | head -3
characteristics_ch1
tissue: primary lung cancer;Post Surgical Treatment: OBS;Stage: II;age: 44.9;Sex: Female;Cause of death: Alive;Histology type: ADC;OS time: 8.52;OS status: Alive;DSS time: 8.52;DSS status: Alive;predominant subtype: Acinar
tissue: primary lung cancer;Post Surgical Treatment: OBS;Stage: I;age: 53.4;Sex: Male;Cause of death: Alive;Histology type: SQCC;OS time: 9.03;OS status: Alive;DSS time: 9.03;DSS status: Alive;predominant subtype: not applicable

pattern: Sex: m

more GSE33113-metadata.tsv | cut -f2 | head -3
characteristics_ch1
disease status: AJCC stage II CRC;tissue: primary tumor resection;age at diagnosis: 41,6;Sex: m;meta or recurrence within 3 years: no;time to meta or recurrence: 2000
disease status: AJCC stage II CRC;tissue: primary tumor resection;age at diagnosis: 66,06;Sex: m;meta or recurrence within 3 years: no;time to meta or recurrence: 140

@vpchung vpchung transferred this issue from mc2-center/csbc-pson-dcc Aug 30, 2022
@vpchung vpchung self-assigned this Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants