Transgene curation oa

Transgene postgres tables:

trp_curator	trp_coinjection (dumped as of 4/19/13)	trp_clone (not dumped)	trp_marker_for_paper
trp_name	trp_public_name	trp_synonym	trp_summary
trp_driven_by_gene	trp_reporter_product	trp_other_reporter	trp_gene
trp_integration_method	trp_strain	trp_map	trp_map_person
trp_map_paper	trp_marker_for	trp_paper	trp_laboratory
trp_person	trp_reporter_type	trp_threeutr	trp_remark
trp_species	trp_driven_by_construct	trp_movie	trp_picture
trp_constructionsummary	trp_cgc_remarks (not dumped)

OA and dumper changes

WS238:

add trp_coinjection to dumper as Coinjection marker

WS237:

added trp_constructionsummary for OA and dumper as Construction summary
added trp_cgc_remarks for OA

WS229:

Added trp_public_name WBTransgeneIDs to transgene objects requiring the creation of a public_name field for the transgene so that expression constructs created by the expression curator can be merged with transgene objects through postgres.
This presents two possible problems as, WBTransgeneIDs need to be unique, so it is necessary to make sure when duplicating lines in the OA that the IDs are adjusted to be unique. Two periodic checks can be made to make sure this process is working smoothly:
- 1. Find all objects missing WBTransgeneIDs:
  SELECT * FROM trp_curator WHERE joinkey NOT IN (SELECT joinkey FROM trp_name);
- 2. Find all objects with the same WBTransgeneID:
  SELECT trp_name, COUNT(*) AS count FROM trp_name GROUP BY trp_name HAVING COUNT(*) > 1;

added nightly cron job to assign new WBTransgeneIDs
Yook cronjobs on tazendra, log on as acedb then crontab -e to see the cronjobs.
The script =
0 4 * * * /home/acedb/karen/transgene/assign_transgene_IDs.pl
If it shouldn't run, log on to tazendra as acedb and comment it out.

cron job runs every night at 4am to assign new WBTransgeneIDs to any row/PGID in the transgene OA that does not already have an WBTransgeneID and that meets the following criteria.
Criteria for getting a new WBTransgeneID are as follows:

The transgene object/PGID does NOT already have an WBTransgeneID
The transgene object/PGID is NOT flagged as FAIL n.b. this script is based on the interaction ID cron job script
/home/acedb/xiaodong/assigning_interaction_ids/assign_interaction_ids.pl

What the script does :

looks at data from trp_name trp_objpap_falsepos trp_curator .
Anything that exists in trp_curator and has neither a trp_name nor a trp_objpap_falsepos gets an ID assigned by padding the joinkey to 8 digits, adding WBTransgene in front, and adding to trp_name and trp_name_hst .
If we ever change any of those table names this script will not work properly ; interaction and protein call the "False Positive" tables 'falsepositive' instead of objpap_falsepos

WS???:

deleted trp_rescues table

WS227:
1. Change the 'Rescues' field to a variation list, not gene list and move to top of tab 2
There are 33 genes that you'd have to change, you'd have to remove them, then change the type to variation, and then you could add them again. just get rid of them (it is fine that they are stored in a backup). I will start from scratch from now on with variations.

I can easily delete them from the datatable and the history table, if that's what you want (please confirm)
- Consider it confirmed
  - Removed from postgres, changed to variation, moved to tab2.

2. change "Location" to "Laboratory" with corresponding changes in the dumper

You mean the Label and the .ace tag ? When should we do that, I mean, it will go in before the next upload ?
- yes
Or do we need model approval first ?
- It has already been approved (in fact it was approved many times, but it will be part of WS227.
I'll be glad to change the postgres table name (but would rather do it all at once).
- Now is the time to do it all at once.
Ok, will do. (would be good to have this all in one wiki, so I don't lose track of what you gave the go-ahead to do, and what's still back and forth questions)
- okay, it's on the wiki now.
Also, where's the dumper ?
- /home/acedb/wen/phenote_transgene/transgene_dump_ace.pl --you can also see Transgene.ace dumper
  - Changed OA label and dumper and postgres table to trp_laboratory

3. Change "Integrated_by" to "Integration_method"

OA Label + .ace tag ?
- yes, both
Will do in future OA -- done, changed table to trp_integration_method

4. Remove the value "not_integrated from the drop down list"

Once you remove the data we can do it, once removed you won't be able to query it.
- I won't need to query it as it is a value that is based on the 'Ex' in the name of the transgene, so I can query all 'Extrachromosomal' by virtue of the transgene name.
You'll need to query it to remove the 2951, let me know when you're done. Karen did it, I've removed it -- J

5. Add values of "MMS mutagenesis" and "Single copy insertion" to the drop down list for 'Integration_method'

We can do that. (make sure the case is correct)
- Case is now correct.
Ok -- added values

6. Add a field 'Reporter type' with a drop down list containing values "Transcriptional fusion" and "Translational fusion" add field to bottom of tab 1 above 'Remark'

We'll do that (a wiki would be good, the models wiki doesn't have all these changes)
- The new models does. The line is 'Reporter_type' ?Text
Sorry, I meant a wiki on these proposed changes. Also, confirm case, but will assume it's correct.
- the cases are correct.
  - added table trp_reporter_type*

7. Add Person evidence multi-ontology field to tab three below paper

Same
- Shoot, you are right, please add it, don't dump it yet, and I will request it from Paul.
will do (all together when all is clear)
- - added table trp_person and changed trp_reference to trp_paper*

8. Move Remark field to bottom of tab one

We can do this now, but I'm hoping the future OA will finally be approved so I don't have to make all these changes in two places. -- moved
- I approve the new fOA.
Yay ! <party hat>

9. Remove 'Picture' delete table in terms of transgene
10. Remove 'Movie' delete table in terms of transgene

Easy to do, let me know when we can do it.
- anytime, there are no values in these tables for transgenes.
Will do (all together) -- done -- J

Ontology Annotator Curation

The Textpresso transgene search deposits the transgene name and all new paper instances of the transgene directly into the transgene table.

The curation of transgenes has moved from Phenote to the Ontology Annotator (OA).

TAB1:
Pgdbid->postgres database ID, entered automatically when curator enters a new transgene

Name(Text)--trp_name-- -> approved name following Lab-prefix (or WBPaperID), Is or Ex, number.

Synonym(T,M)--trp_synonym-- -> keep Free text, separate with a pipe- other names for the transgene or construct

Summary(BigText)-> genotype only and everything bounded in brackets, all other information should be added in the Remark field. If papers report conflicting genotypes also use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.

Driven by Gene(T,MO)-> entry by gene's public_name convert to WBGeneID based on latest genename server version, make sure all entries per line are unique - enter WBGeneID used for promoters in every promoter driven construct of the transgene

Reporter Product(S,M,L)-> list has common reporter genes (heterologous in C. elegans), GFP, RFP, LacZ, etc. These values are listed in the model and any changes to the list need to be appended to the model as well (see Transgene model. Changed in model, still should be a drop down menu, need access to file to add/change values.

Other Reporter(T,M)->(enter with pipes) enter other products encoded as reporters that do not appear in the drop down list ideally entries added through here can be added automatically to the reporter product drop down list

Gene(T,MO)-> make selection list, allow multiple entry by public_name convert to WBGeneID based on latest genename server version, make sure all entries are unique - enter WBGeneID for protein output of construct, which isn't considered a reporter product

move to top of tab 2 Rescues(O)-> gene, only allows a single gene entry delete all info in this table based on gene; change this to a field to a variation ontology; change .ace to start dumping this info

Coinjection (T) -> field not dumped

Add a field 'Reporter type' -> drop down list containing values "Transcriptional fusion" and "Translational fusion" add field to bottom of tab 1 above 'Remark'

TAB2
(move to bottom of tab 1 below 'Reporter_type' )Remark(T,M )-> Big text. Catch all used for clarifying info from other fields, and for entering construct specifics, including co-injection marker. in some cases use controlled vocabulary

"Conflicting mapping info: ..."
"Conflicting genotype: ..."
"No transgene info in original publication."
"Other integration method: ..."
"Mapping info: "

Clone(MO)-> waiting for plasmid class to get up to speed. ->Not dumped.

Change to 'Integration_method' Integrated by(S)-> dropdown - choose integration method if known, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..." Changed in model, remove 'not_integrated' value, still should be a drop down menu, need access to file to add/change values. Add values of "MMS mutagenesis" and "Single copy insertion"

Map(S,ML)-> keep together with other Map fields - choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."

Map Paper(T,MO)-> make selection list - WBPaperID for paper that reports mapping info or that performed the mapping

Map Person(T,MO) -> Name, person evidence for Mapping data

Change to Laboratory Location(T,MO) -> Lab designation, from static file obo_data_laboratory Change name to Laboratory; Field now is only used to populate values for transgenes that do not use canonical nomenclature, laboratory values for other transgenes are assigned automatically through the .ace dumper script.

Strain(T,M) -> SOP to only link to strains that exist in the CGC, which will be done by cross referencing from CGC's transgene tags submitted with strain info.

TAB3
Curator(O)

Reference(T,MO)-> WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script

Add Person (MO) multi-ontology field of WBPerson obo to tab three below paper

Marker for(T,M)-> for Wen's expression data

Marker Paper(T,M)-> same as above- WBPaperID, not used ->Wen's expression data

Species(T,M?)-> if this is for species the construct is expressed in, can we make this default C. elegans unless otherwise stated, and can we make this a selection list?

Driven by Construct(T,M)-> for artificial promoters

Remove Movie(T)

Remove Picture(T)

FAIL -> Toggle for false positive textpresso hits

Left over from Phenote

Invoke the phenote transgene configuration interface and access postres ./phenote -c worm-transgene.cfg

If you want to see all the current 'new' transgenes picked up by Textpresso, go to Tab 3 and press the "Search New Transgene" retrieve button. This action with retrieve all transgene objects that have data in the Summary or Remark fields. Usually there will be paper object info already since they were entered from the Textpresso search.

Curators should look for the information of new transgenes in the paper document provided by Textpresso (main paper or supplementary file).

Sometimes papers do not provide any information on the transgenes, only the name is provided. Then "No transgene info in original publication." should be entered into the Remark field so that it will not be identified as a new transgene again.

Here is the controlled vocabulary for the transgene remark field:

Remark "Conflicting mapping info: ..."
Remark "Conflicting genotype: ..."
Remark "No transgene info in original publication."
Remark "Other integration method: ..."
Remark "Clone = "
Remark "Mapping info: "

Unknown/unused fields on the OA but existed on Phenote:
Search New Transgene(T)->Retired from Phenote. The query use to retrieve all transgenes that do not have any summary or remark data, these would be all the transgenes entered by the script. Since we can now find them through curator (Arun), this query field is not necessary.

SQL-> used?

Problems

New lines are entered by mistake through copy/paste or through hitting return in the free text fields of the OA during curation. There is no way to see these spaces through the OA so they are entered into postgres and dumped as new lines in the .ace, which create read in errors. This needs to be addressed either through the OA (constraints?) or through the .ace dumper.

I thought I'd already fixed this in the email I sent before you sent the email about this wiki, did you get that ? They're getting globally replaced with spaces -- J

It is fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transgene curation oa

OA and dumper changes

Ontology Annotator Curation

Left over from Phenote

Problems

Clone this wiki locally