-
Notifications
You must be signed in to change notification settings - Fork 0
Transgene curation oa
Transgene postgres tables:
trp_curator | trp_coinjection (dumped as of 4/19/13) | trp_clone (not dumped) | trp_marker_for_paper |
trp_name | trp_public_name | trp_synonym | trp_summary |
trp_driven_by_gene | trp_reporter_product | trp_other_reporter | trp_gene |
trp_integration_method | trp_strain | trp_map | trp_map_person |
trp_map_paper | trp_marker_for | trp_paper | trp_laboratory |
trp_person | trp_reporter_type | trp_threeutr | trp_remark |
trp_species | trp_driven_by_construct | trp_movie | trp_picture |
trp_constructionsummary | trp_cgc_remarks (not dumped) |
WS238:
- add trp_coinjection to dumper as Coinjection marker
WS237:
- added trp_constructionsummary for OA and dumper as Construction summary
- added trp_cgc_remarks for OA
WS229:
- Added trp_public_name WBTransgeneIDs to transgene objects requiring the creation of a public_name field for the transgene so that expression constructs created by the expression curator can be merged with transgene objects through postgres.
This presents two possible problems as, WBTransgeneIDs need to be unique, so it is necessary to make sure when duplicating lines in the OA that the IDs are adjusted to be unique. Two periodic checks can be made to make sure this process is working smoothly:- 1. Find all objects missing WBTransgeneIDs:
SELECT * FROM trp_curator WHERE joinkey NOT IN (SELECT joinkey FROM trp_name);
- 2. Find all objects with the same WBTransgeneID:
SELECT trp_name, COUNT(*) AS count FROM trp_name GROUP BY trp_name HAVING COUNT(*) > 1;
- 1. Find all objects missing WBTransgeneIDs:
- added nightly cron job to assign new WBTransgeneIDs
Yook cronjobs on tazendra, log on as acedb then crontab -e to see the cronjobs.
The script =
0 4 * * * /home/acedb/karen/transgene/assign_transgene_IDs.pl
If it shouldn't run, log on to tazendra as acedb and comment it out.
cron job runs every night at 4am to assign new WBTransgeneIDs to any row/PGID in the transgene OA that does not already have an WBTransgeneID and that meets the following criteria.
Criteria for getting a new WBTransgeneID are as follows:
- The transgene object/PGID does NOT already have an WBTransgeneID
- The transgene object/PGID is NOT flagged as FAIL n.b. this script is based on the interaction ID cron job script
/home/acedb/xiaodong/assigning_interaction_ids/assign_interaction_ids.pl
What the script does :
- looks at data from trp_name trp_objpap_falsepos trp_curator .
- Anything that exists in trp_curator and has neither a trp_name nor a trp_objpap_falsepos gets an ID assigned by padding the joinkey to 8 digits, adding WBTransgene in front, and adding to trp_name and trp_name_hst .
- If we ever change any of those table names this script will not work properly ; interaction and protein call the "False Positive" tables 'falsepositive' instead of objpap_falsepos
WS???:
- deleted trp_rescues table
WS227:
1. Change the 'Rescues' field to a variation list, not gene list and move to top of tab 2
There are 33 genes that you'd have to change, you'd have to remove them, then change the type to variation, and then you could add them again. just get rid of them (it is fine that they are stored in a backup). I will start from scratch from now on with variations.
-
I can easily delete them from the datatable and the history table, if that's what you want (please confirm)
- Consider it confirmed
- Removed from postgres, changed to variation, moved to tab2.
- Consider it confirmed
2. change "Location" to "Laboratory" with corresponding changes in the dumper
-
You mean the Label and the .ace tag ? When should we do that, I mean, it will go in before the next upload ?
- yes
-
Or do we need model approval first ?
- It has already been approved (in fact it was approved many times, but it will be part of WS227.
-
I'll be glad to change the postgres table name (but would rather do it all at once).
- Now is the time to do it all at once.
-
Ok, will do. (would be good to have this all in one wiki, so I don't lose track of what you gave the go-ahead to do, and what's still back and forth questions)
- okay, it's on the wiki now.
-
Also, where's the dumper ?
- /home/acedb/wen/phenote_transgene/transgene_dump_ace.pl --you can also see Transgene.ace dumper
- Changed OA label and dumper and postgres table to trp_laboratory
- /home/acedb/wen/phenote_transgene/transgene_dump_ace.pl --you can also see Transgene.ace dumper
3. Change "Integrated_by" to "Integration_method"
-
OA Label + .ace tag ?
- yes, both
- Will do in future OA -- done, changed table to trp_integration_method
4. Remove the value "not_integrated from the drop down list"
-
Once you remove the data we can do it, once removed you won't be able to query it.
- I won't need to query it as it is a value that is based on the 'Ex' in the name of the transgene, so I can query all 'Extrachromosomal' by virtue of the transgene name.
- You'll need to query it to remove the 2951, let me know when you're done. Karen did it, I've removed it -- J
5. Add values of "MMS mutagenesis" and "Single copy insertion" to the drop down list for 'Integration_method'
-
We can do that. (make sure the case is correct)
- Case is now correct.
- Ok -- added values
6. Add a field 'Reporter type' with a drop down list containing values "Transcriptional fusion" and "Translational fusion" add field to bottom of tab 1 above 'Remark'
-
We'll do that (a wiki would be good, the models wiki doesn't have all these changes)
- The new models does. The line is 'Reporter_type' ?Text
-
Sorry, I meant a wiki on these proposed changes. Also, confirm case, but will assume it's correct.
- the cases are correct.
- added table trp_reporter_type*
- the cases are correct.
7. Add Person evidence multi-ontology field to tab three below paper
-
Same
- Shoot, you are right, please add it, don't dump it yet, and I will request it from Paul.
-
will do (all together when all is clear)
- added table trp_person and changed trp_reference to trp_paper*
8. Move Remark field to bottom of tab one
-
We can do this now, but I'm hoping the future OA will finally be approved so I don't have to make all these changes in two places. -- moved
- I approve the new fOA.
- Yay ! <party hat>
9. Remove 'Picture' delete table in terms of transgene
10. Remove 'Movie' delete table in terms of transgene
-
Easy to do, let me know when we can do it.
- anytime, there are no values in these tables for transgenes.
- Will do (all together) -- done -- J
The Textpresso transgene search deposits the transgene name and all new paper instances of the transgene directly into the transgene table.
The curation of transgenes has moved from Phenote to the Ontology Annotator (OA).
TAB1:
Pgdbid->postgres database ID, entered automatically when curator enters a new transgene
Name(Text)--trp_name-- -> approved name following Lab-prefix (or WBPaperID), Is or Ex, number.
Synonym(T,M)--trp_synonym-- -> keep Free text, separate with a pipe- other names for the transgene or construct
Summary(BigText)-> genotype only and everything bounded in brackets, all other information should be added in the Remark field. If papers report conflicting genotypes also use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.
Driven by Gene(T,MO)-> entry by gene's public_name convert to WBGeneID based on latest genename server version, make sure all entries per line are unique - enter WBGeneID used for promoters in every promoter driven construct of the transgene
Reporter Product(S,M,L)-> list has common reporter genes (heterologous in C. elegans), GFP, RFP, LacZ, etc. These values are listed in the model and any changes to the list need to be appended to the model as well (see Transgene model. Changed in model, still should be a drop down menu, need access to file to add/change values.
Other Reporter(T,M)->(enter with pipes) enter other products encoded as reporters that do not appear in the drop down list ideally entries added through here can be added automatically to the reporter product drop down list
Gene(T,MO)-> make selection list, allow multiple entry by public_name convert to WBGeneID based on latest genename server version, make sure all entries are unique - enter WBGeneID for protein output of construct, which isn't considered a reporter product
move to top of tab 2 Rescues(O)-> gene, only allows a single gene entry delete all info in this table based on gene; change this to a field to a variation ontology; change .ace to start dumping this info
Coinjection (T) -> field not dumped
Add a field 'Reporter type' -> drop down list containing values "Transcriptional fusion" and "Translational fusion" add field to bottom of tab 1 above 'Remark'
TAB2
(move to bottom of tab 1 below 'Reporter_type' )Remark(T,M )-> Big text. Catch all used for clarifying info from other fields, and for entering construct specifics, including co-injection marker. in some cases use controlled vocabulary
- "Conflicting mapping info: ..."
- "Conflicting genotype: ..."
- "No transgene info in original publication."
- "Other integration method: ..."
- "Mapping info: "
Clone(MO)-> waiting for plasmid class to get up to speed. ->Not dumped.
Change to 'Integration_method' Integrated by(S)-> dropdown - choose integration method if known, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..." Changed in model, remove 'not_integrated' value, still should be a drop down menu, need access to file to add/change values. Add values of "MMS mutagenesis" and "Single copy insertion"
Map(S,ML)-> keep together with other Map fields - choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."
Map Paper(T,MO)-> make selection list - WBPaperID for paper that reports mapping info or that performed the mapping
Map Person(T,MO) -> Name, person evidence for Mapping data
Change to Laboratory Location(T,MO) -> Lab designation, from static file obo_data_laboratory Change name to Laboratory; Field now is only used to populate values for transgenes that do not use canonical nomenclature, laboratory values for other transgenes are assigned automatically through the .ace dumper script.
Strain(T,M) -> SOP to only link to strains that exist in the CGC, which will be done by cross referencing from CGC's transgene tags submitted with strain info.
TAB3
Curator(O)
Reference(T,MO)-> WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script
Add Person (MO) multi-ontology field of WBPerson obo to tab three below paper
Marker for(T,M)-> for Wen's expression data
Marker Paper(T,M)-> same as above- WBPaperID, not used ->Wen's expression data
Species(T,M?)-> if this is for species the construct is expressed in, can we make this default C. elegans unless otherwise stated, and can we make this a selection list?
Driven by Construct(T,M)-> for artificial promoters
Remove Movie(T)
Remove Picture(T)
FAIL -> Toggle for false positive textpresso hits
Invoke the phenote transgene configuration interface and access postres ./phenote -c worm-transgene.cfg
If you want to see all the current 'new' transgenes picked up by Textpresso, go to Tab 3 and press the "Search New Transgene" retrieve button. This action with retrieve all transgene objects that have data in the Summary or Remark fields. Usually there will be paper object info already since they were entered from the Textpresso search.
Curators should look for the information of new transgenes in the paper document provided by Textpresso (main paper or supplementary file).
Sometimes papers do not provide any information on the transgenes, only the name is provided. Then "No transgene info in original publication." should be entered into the Remark field so that it will not be identified as a new transgene again.
Here is the controlled vocabulary for the transgene remark field:
Remark "Conflicting mapping info: ..."
Remark "Conflicting genotype: ..."
Remark "No transgene info in original publication."
Remark "Other integration method: ..."
Remark "Clone = "
Remark "Mapping info: "
Unknown/unused fields on the OA but existed on Phenote:
Search New Transgene(T)->Retired from Phenote. The query use to retrieve all transgenes that do not have any summary or remark data, these would be all the transgenes entered by the script. Since we can now find them through curator (Arun), this query field is not necessary.
SQL-> used?
New lines are entered by mistake through copy/paste or through hitting return in the free text fields of the OA during curation. There is no way to see these spaces through the OA so they are entered into postgres and dumped as new lines in the .ace, which create read in errors. This needs to be addressed either through the OA (constraints?) or through the .ace dumper.
I thought I'd already fixed this in the email I sent before you sent the email about this wiki, did you get that ? They're getting globally replaced with spaces -- J
It is fixed.