-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to prepare a novel strain/isolate of a bacteria? #71
Comments
Hi Peter, As for strain, we submit using their API interface, so we have to provide a header in the embl, which then gets overwritten with whatever metadata is in the BioSample. Its possible they have moved the goal posts again in the week since we last submitted data..... |
Ah. My hunch was right, and yes - this is exactly the chicken-and-egg situation I am facing. Could you elaborate on what you meant by using a temporary taxa? |
See enasequence/sequencetools#15 This error turned out to be with the validator's internal settings:
However, to avoid this error I currently need to manually edit the source feature in my EMBL file:
Perhaps for people like me using the ENA webin (web interface), rather than the API, there needs to be an extra set of options on [Update: Human error, see below - I was not giving the full organism name to |
(I've not actually submitted this new sequence yet - but I intend to try using the genus level taxid as before) |
Hi Peter,
|
If you could edit your example above on GitHub to wrap it in triple back-ticks, GitHub will render it as a code block, and preserve the white space (so I can copy and paste it for testing here). I suspect the key difference is your example has a taxid for a full species name, Staphylococcus aureus taxon 1280. What happens if you change the example to pretend you have a new species/strain without a pre-existing taxon id, say Staphylococcus sp. XYZ, and try either taxon 1279 (Staphylococcus) or 29387 (Staphylococcus sp.)? |
Heres the file (as a file). So the genus taxon 1279 (Staphylococcus) gets through the validator, but you'll get an email in a few days/weeks informing you that the 'computer says NO'. |
Confirmed using
This was my problematic version:
I can pass validation by adding So there were at least two problems: I was not telling I hope to submit this week, anticipating a query back about this being a novel species without a taxon ID. I will report back later with an update for future readers of this issue. Thanks! |
Good luck with your submission! |
Update on the ENA side of interest: http://listserver.ebi.ac.uk/pipermail/ena-announce/2017-January/000165.html |
Thanks |
(Some months back I did this successfully to submit a new strain from a different genus, so while I might be doing something wrong/different, I suspect the ENA validator has become stricter in the meantime)
For an un-named Serratia which does not (yet) have a unique NCBI taxonomy entry - the parent would be
Serratia
, taxid 613,https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=613&lvl=3&lin=f&keep=1&srchmode=1&unlock
I have tried that, and the entry
Serratia sp.
, taxid 616https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=616&lvl=5&lin=f&keep=1&srchmode=1&unlock
Either taxid approach fails validation:
Here line 17 was the
source
feature. Manually editing the EMBL file to add astrain
qualifier to the feature worked for me, but what exactly it wants for species name eludes me.Am I missing something simple?
[Update: Yes, I was not giving the full organism name to
gff3_to_embl
, but also there was a problem with this version of the validator]Should
gff3_to_embl
have options for inserting source feature qualifiers "strain, environmental_sample, isolate" (or should I have done this in prokka)?Thanks!
The text was updated successfully, but these errors were encountered: