-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatically pick a result when multiple hits returned? classification() #890
Comments
@zachary-foster maybe i'm not remembering but I think |
Yea, classification('Asterina', db = 'ncbi')
classification('Asterina', db = 'ncbi', rows = 1) Although, you might be looking for a fungus and get a starfish : ) |
ah, thanks! yes this does what I want it to. I think for now I'll go with rows = 1, knowing I might be getting some weird results by just blindly always picking first row. there will be more hands-on QC in this case downstream so I think this fix will work. thanks team! |
I don't know why these filters aren't integrated but to have the genus of the right division you can do : taxize::classification(taxize:::get_uid('Asterina', division_filter = "ascomycete fungi")[1], "ncbi")
taxize::classification(taxize:::get_uid('Asterina', division_filter = "starfish")[1], "ncbi") There are similar functions to get ids for each databases. The filters vary by functions (different APIs). |
@salix-d Good observation. I will try to look into making that an option for |
For some reason in the classification function, ncbi is the only one to not have Although the argument's name changes between db and idk that all dbs have that option (I know itis doesn't). |
Yea, I have been wanting to go through and make them all more consistent. Ideally even combine all the library(taxize)
taxize::classification('Asterina', division_filter = "ascomycete fungi", db = "ncbi")
#> ══ 1 queries ═══════════════
#>
#> Retrieving data for taxon 'Asterina'
#> ✔ Found: Asterina
#> ══ Results ═════════════════
#>
#> • Total: 1
#> • Found: 1
#> • Not Found: 0
#> $Asterina
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta clade 33154
#> 4 Fungi kingdom 4751
#> 5 Dikarya subkingdom 451864
#> 6 Ascomycota phylum 4890
#> 7 saccharomyceta clade 716545
#> 8 Pezizomycotina subphylum 147538
#> 9 leotiomyceta clade 716546
#> 10 dothideomyceta clade 715962
#> 11 Dothideomycetes class 147541
#> 12 Dothideomycetes incertae sedis no rank 159987
#> 13 Asterinales order 1619909
#> 14 Asterinaceae family 281108
#> 15 Asterina genus 859380
#>
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "ncbi" Created on 2023-03-09 with reprex v2.0.2 |
I am trying to run classification() in a loop on a computing cluster to look up lineage info for a large list of species ish names. Right now I am getting an error on the cluster that based on all of my debugging has to do with the entries that return multiple hits - and then pop up a prompt to enter the number for the hit you want to select. Is it possible to have classification() automatically pick the first hit when multiple hits are found? Or do you have another suggestion for bypassing this issue when running automatically on a cluster? Thanks!
The text was updated successfully, but these errors were encountered: