Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnr_resolve fails with incorrectly spelled species name Internal Server Error HTTP 500 #940

Open
user3386170 opened this issue Nov 20, 2024 · 8 comments

Comments

@user3386170
Copy link

user3386170 commented Nov 20, 2024

I last used taxize on September 27th but this morning I am running into an error with the function gnr_resolve. Whenever I provide it with species names that are incorrectly spelled, it spits out the following error:
Error: Internal Server Error (HTTP 500)

I tested it with the code lines from the tutorial on the vignette page and I get the same error. If I supply the names correctly to taxize, it gives me all the other information on the species. But the whole point of the function is to be able to fix errors so I hope it is an easy to get it up and running again ;) However, I saw the thread about taxize being pulled off CRAN during arduous updates so I wonder if it is related and may be too complicated to fix in the near future.

library(taxize)
temp <- gnr_resolve(c("Helianthos annus", "Homo saapiens"))
#Error: Internal Server Error (HTTP 500)
head(temp)

temp2 <- gnr_resolve(c("Helianthus annus", "Homo sapiens"))
head(temp2)

UPDATE: I continued to play around with the function. It can still handle minor errors such as capitals in species names (e.g. Acer Rubrum) and oddities after the genus names (e.g. Crataegus sp.. with two periods).

Session Info R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8
[3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Canada.utf8

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] taxize_0.9.100.1

loaded via a namespace (and not attached):
[1] Matrix_1.6-4 jsonlite_1.8.7 dplyr_1.1.4 compiler_4.3.2
[5] crayon_1.5.2 tidyselect_1.2.0 Rcpp_1.0.11 xml2_1.3.6
[9] urltools_1.7.3 parallel_4.3.2 splines_4.3.2 triebeard_0.4.1
[13] uuid_1.1-1 lattice_0.21-9 TH.data_1.1-2 R6_2.5.1
[17] generics_0.1.3 curl_5.1.0 iterators_1.0.14 MASS_7.3-60
[21] tibble_3.2.1 crul_1.4.2 pillar_1.9.0 rlang_1.1.2
[25] utf8_1.2.4 multcomp_1.4-25 httpcode_0.3.0 cli_3.6.1
[29] magrittr_2.0.3 foreach_1.5.2 digest_0.6.33 grid_4.3.2
[33] rstudioapi_0.15.0 mvtnorm_1.2-4 sandwich_3.0-2 lifecycle_1.0.4
[37] nlme_3.1-163 vctrs_0.6.4 glue_1.6.2 data.table_1.14.8
[41] codetools_0.2-19 zoo_1.8-12 survival_3.5-7 ape_5.8
[45] fansi_1.0.5 conditionz_0.1.0 tools_4.3.2 pkgconfig_2.0.3

@ngodron
Copy link

ngodron commented Dec 5, 2024

Dear maintainers, let me start by thanking you for your contributions to the R ecosystem.

I second the issue of @user3386170 which also comes up in my work. I have the same limitation to minor oddities.
Your tool is a great help and a true time-saver, do you believe that this issue could be fixed in the near future?

Error reproduction and R sessionInfo():

Restarting R session...

> taxize::gnr_resolve(sci = "Escherichia coli", data_source_ids = 12)
# A tibble: 1 × 5
  user_supplied_name submitted_name   matched_name     data_source_title    score
* <chr>              <chr>            <chr>            <chr>                <dbl>
1 Escherichia coli   Escherichia coli Escherichia coli Encyclopedia of Life 0.988
> #### Works fine and dandy

> taxize::gnr_resolve(sci = "Escherichia colI", data_source_ids = 12)
# A tibble: 1 × 5
  user_supplied_name submitted_name   matched_name     data_source_title    score
* <chr>              <chr>            <chr>            <chr>                <dbl>
1 Escherichia colI   Escherichia coli Escherichia coli Encyclopedia of Life 0.988
> #### Works fine and dandy

> taxize::gnr_resolve(sci = "Escherichia Ecoli", data_source_ids = 12)
Error: Internal Server Error (HTTP 500)
> #### Works neither fine nor dandy :(
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.11       rstudioapi_0.15.0 magrittr_2.0.3    xml2_1.3.5        uuid_1.1-0        ape_5.8          
 [7] lattice_0.21-8    R6_2.5.1          rlang_1.1.1       foreach_1.5.2     fansi_1.0.4       tools_4.1.2      
[13] parallel_4.1.2    grid_4.1.2        data.table_1.14.8 nlme_3.1-162      utf8_1.2.3        cli_3.6.1        
[19] iterators_1.0.14  digest_0.6.33     httpcode_0.3.0    tibble_3.2.1      taxize_0.9.100.1  lifecycle_1.0.3  
[25] crayon_1.5.2      vctrs_0.6.3       codetools_0.2-19  triebeard_0.4.1   curl_5.2.3        crul_1.4.0       
[31] glue_1.6.2        pillar_1.9.0      conditionz_0.1.0  compiler_4.1.2    urltools_1.7.3    jsonlite_1.8.7   
[37] pkgconfig_2.0.3   zoo_1.8-12 

@zachary-foster
Copy link
Collaborator

Thanks for the info! Looking at this now.

@zachary-foster
Copy link
Collaborator

It appears that the problem is with the GNR service itself, rather than with taxize. If you go to https://resolver.globalnames.org/ and type in "Homo saapiens" you get an error and making the analogous API call manually causes the 500 error. GNR resolve had been depreciated, although there is little to indicate that at first glance since the service is still running, but is increasingly unreliable. GNR resolve has been replaced with GNA verifier:

https://verifier.globalnames.org/

I will need to write a new function for GNA verifier and depreciate gnr_resolve. That could be quick if I dont run into anything too difficult, but its hard to say.

@user3386170
Copy link
Author

That's a pain that the GNR service changed the API! Thanks for looking into this and I hope it can get back up and running soon via taxize. We'll use a work-around in the meantime.

zachary-foster pushed a commit that referenced this issue Dec 13, 2024
@zachary-foster
Copy link
Collaborator

I added a new function called gna_verifier that will replace the functionality of gnr_resolve soon. It will have a different output format to match what the new API returns. It has not been tested much yet, but you can give it a try if you like. You will need to install the current master branch here on GitHub. This can be done with devtools::install_github. You might need to install a few dependencies off GitHub as well depending on what is already installed on your system.

@user3386170
Copy link
Author

Thanks for your hard work! Overall, the new function works but it took me a little bit of getting used to. I'd suggest a couple of tweaks to the error messages if possible but they are minor ones just to make the user experience a bit more smooth in the transition to the new function and are not strictly necessary if you don't like them.

  1. To anyone else trying to test out the new taxize package from GitHub, it is necessary to install bold directly from GitHub first. I figured this out after I update R since the error message said my R version did not support bold. The following code worked for me.
library(devtools)
install_github("ropensci/bold")
install_github("ropensci/taxize")

library(bold)
library(taxize)
  1. Here is the code that I ran to test out the new function. There are two cases where the error messages could be adjusted to help out the user:
    a) when I first wrote out the code, I quickly realized that names replaces sci as the term to use (there is an error message to show this). However, no error message is shown when data_source_ids is entered instead of data_sources and the function just ignores the value entered and uses default. An error message such as "data_source_ids is not a recognized parameter for gna_verifier : the value is ignored" could be useful.
    b) When there is an error in both the species and the genus (or a really weird entry such as adding a number to the name), the function fails. I suspect that this is based on the API server itself and not possible to resolve within taxize. Ideally, the entry with the unmatchable name would be skipped or with a blank in matchedName. I found the error message difficult to understand "Error in match.names(clabs, names(xi)) : names do not match previous names". Perhaps a more informative error to indicate that one of the supplied names cannot be interpreted and needs to verified manually?
library(bold)
library(taxize)
library(dplyr)
library(data.table)

test<-c("Parathelypteris noveboracensis",
        "Dennstaedtia punctilobula",
        "Viola macloskeyi",
        'Lonicera villosa',
        'Lonicera canadensis', 'Cornus canadensis')
gna_verifier(names=test, data_source_ids=147)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#submittedName                                 matchedName     currentCanonicalSimple sortScore dataSourceTitleShort
#1: Parathelypteris noveboracensis Parathelypteris noveboracensis (L.) Holttum Amauropelta noveboracensis  9.414614    Catalogue of Life
#2:      Dennstaedtia punctilobula Dennstaedtia punctilobula (Michx.) T. Moore   Sitobolium punctilobulum  9.414614    Catalogue of Life
#3:               Viola macloskeyi                Viola macloskeyi F. E. Lloyd           Viola macloskeyi  9.414964    Catalogue of Life
#4:               Lonicera villosa   Lonicera villosa (Michx.) Roem. & Schult.           Lonicera villosa  9.414964    Catalogue of Life
#5:            Lonicera canadensis     Lonicera canadensis Bartram ex Marshall        Lonicera canadensis  9.414876    Catalogue of Life
#6:              Cornus canadensis                        Cornus canadensis L.          Cornus canadensis  9.414964    Catalogue of Life

#no error message is thrown to indicated that "data_source_ids" is not an accepted term; the function goes on to ignore the data_source and use default

gna_verifier(names=test, data_sources=147)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#submittedName                                     matchedName     currentCanonicalSimple sortScore dataSourceTitleShort
#1: Parathelypteris noveboracensis Parathelypteris noveboracensis (Linnaeus) Ching Amauropelta noveboracensis  9.412505               VASCAN
#2:      Dennstaedtia punctilobula    Dennstaedtia punctilobula (Michaux) T. Moore   Sitobolium punctilobulum  9.412505               VASCAN
#3:               Viola macloskeyi                     Viola macloskeyi F.E. Lloyd           Viola macloskeyi  9.412857               VASCAN
#4:               Lonicera villosa             Lonicera villosa (Michaux) Schultes          Lonicera caerulea  9.412505               VASCAN
#5:            Lonicera canadensis                    Lonicera canadensis Marshall        Lonicera canadensis  9.412857               VASCAN
#6:              Cornus canadensis                      Cornus canadensis Linnaeus          Cornus canadensis  9.412857               VASCAN
#correct output once I correct data_sources to the new terms for the new function

gna_verifier(sci=test, data_sources=147)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#Error in gna_verifier(sci = test, data_sources = 147) : 
#  argument "names" is missing, with no default
#here, I get an error because I used the old term for the list of names

test2<-c("Parathelyppteris noveboracensis",
         "Dennstaedtiaa punctilobula",
         "Viola macloskeyyi",
         'Lonicera villosei',#error in species only villosa
         'Loniceraa canadensis', 'Cornus ccanadensis')


gna_verifier(names=test2, data_sources=147)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#submittedName                                     matchedName     currentCanonicalSimple sortScore dataSourceTitleShort
#1: Parathelyppteris noveboracensis Parathelypteris noveboracensis (Linnaeus) Ching Amauropelta noveboracensis  9.389352               VASCAN
#2:      Dennstaedtiaa punctilobula    Dennstaedtia punctilobula (Michaux) T. Moore   Sitobolium punctilobulum  9.389352               VASCAN
#3:               Viola macloskeyyi                     Viola macloskeyi F.E. Lloyd           Viola macloskeyi  9.389723               VASCAN
#4:               Lonicera villosei             Lonicera villosa (Michaux) Schultes          Lonicera caerulea  9.364894               VASCAN
#5:            Loniceraa canadensis                    Lonicera canadensis Marshall        Lonicera canadensis  9.389723               VASCAN
#6:              Cornus ccanadensis                      Cornus canadensis Linnaeus          Cornus canadensis  9.389723               VASCAN

test3<-c("Parathelyppteris noveboracensis",
         "Dennstaedtiaa punctilobula",
         "Viola macloskeyyi",
         'Loniceraa villosei',#error in genus Lonicera and error in species villosa
         'Loniceraa canadensis', 'Cornus ccanadensis')

gna_verifier(names=test3, data_sources=147)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#Error in match.names(clabs, names(xi)) : 
#  names do not match previous names
#error message when error in two different terms of the species name submitted

gna_verifier(names=test3)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#Error in match.names(clabs, names(xi)) : 
#  names do not match previous names
#error message even data source not specified

test4<-c("Parathelyppteris noveboracensis",
         "Dennstaedtiaa punctilobula",
         "Viola macloskeyyi",
         'Loniceraa villosa',#error in genus Lonicera only
         'Loniceraa canadensis', 'Cornus ccanadensis')

gna_verifier(names=test4, data_sources=147)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#submittedName                                     matchedName     currentCanonicalSimple sortScore dataSourceTitleShort
#1: Parathelyppteris noveboracensis Parathelypteris noveboracensis (Linnaeus) Ching Amauropelta noveboracensis  9.389352               VASCAN
#2:      Dennstaedtiaa punctilobula    Dennstaedtia punctilobula (Michaux) T. Moore   Sitobolium punctilobulum  9.389352               VASCAN
#3:               Viola macloskeyyi                     Viola macloskeyi F.E. Lloyd           Viola macloskeyi  9.389723               VASCAN
#4:               Loniceraa villosa             Lonicera villosa (Michaux) Schultes          Lonicera caerulea  9.389352               VASCAN
#5:            Loniceraa canadensis                    Lonicera canadensis Marshall        Lonicera canadensis  9.389723               VASCAN
#6:              Cornus ccanadensis                      Cornus canadensis Linnaeus          Cornus canadensis  9.389723               VASCAN


test5<-c("Parathelyppteris noveboracensis",
         "Dennstaedtiaa punctilobula",
         "Viola macloskeYi",
         'Loniceraa villosa',#error in genus Lonicera only
         'Loniceraa canadensis', 'Cornus ccanadensis')

gna_verifier(names=test5, data_sources=147)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#submittedName                                     matchedName     currentCanonicalSimple sortScore dataSourceTitleShort
#1: Parathelyppteris noveboracensis Parathelypteris noveboracensis (Linnaeus) Ching Amauropelta noveboracensis  9.389352               VASCAN
#2:      Dennstaedtiaa punctilobula    Dennstaedtia punctilobula (Michaux) T. Moore   Sitobolium punctilobulum  9.389352               VASCAN
#3:                Viola macloskeYi                                  Viola Linnaeus                      Viola  9.412857               VASCAN
#4:               Loniceraa villosa             Lonicera villosa (Michaux) Schultes          Lonicera caerulea  9.389352               VASCAN
#5:            Loniceraa canadensis                    Lonicera canadensis Marshall        Lonicera canadensis  9.389723               VASCAN
#6:              Cornus ccanadensis                      Cornus canadensis Linnaeus          Cornus canadensis  9.389723               VASCAN

test6<-c("Parathelyppteris noveboracensis",
         "Dennstaedtiaa punctilobula",
         "Viola macloskeYi",
         'Loniceraa6 villosa',#error in genus Lonicera only but with a number at the end of the name
         'Loniceraa canadensis', 'Cornus ccanadensis')

gna_verifier(names=test6, data_sources=147)%>%data.table%>%select(submittedName, matchedName, currentCanonicalSimple, sortScore, dataSourceTitleShort)
#Error in match.names(clabs, names(xi)) : 
#  names do not match previous names

These are just suggestions. The main issue has been resolve and we can go back to using taxize to correct our data sheets. Big thanks!

 sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.8 dplyr_1.1.4       taxize_0.9.102    bold_1.3.0       

loaded via a namespace (and not attached):
 [1] raster_3.6-26      htmlwidgets_1.6.4  lattice_0.21-9     tzdb_0.4.0         vctrs_0.6.4        tools_4.3.2        crosstalk_1.2.1    generics_0.1.3    
 [9] parallel_4.3.2     stats4_4.3.2       curl_5.1.0         sandwich_3.0-2     tibble_3.2.1       proxy_0.4-27       fansi_1.0.5        pkgconfig_2.0.3   
[17] Matrix_1.6-4       KernSmooth_2.23-22 satellite_1.0.4    leaflet_2.2.1      lifecycle_1.0.4    compiler_4.3.2     munsell_0.5.1      terra_1.7-71      
[25] codetools_0.2-19   htmltools_0.5.7    usethis_2.2.3      class_7.3-22       crayon_1.5.3       pillar_1.9.0       MASS_7.3-60        classInt_0.4-10   
[33] iterators_1.0.14   foreach_1.5.2      multcomp_1.4-25    nlme_3.1-163       tidyselect_1.2.1   digest_0.6.33      mvtnorm_1.2-4      stringi_1.8.2     
[41] sf_1.0-14          purrr_1.0.2        splines_4.3.2      fastmap_1.1.1      grid_4.3.2         colorspace_2.1-0   cli_3.6.1          magrittr_2.0.3    
[49] base64enc_0.1-3    triebeard_0.4.1    survival_3.5-7     crul_1.5.0         utf8_1.2.4         ape_5.8            leafem_0.2.3       TH.data_1.1-2     
[57] e1071_1.7-13       withr_3.0.2        readr_2.1.5        scales_1.3.0       sp_2.1-2           zoo_1.8-12         png_0.1-8          hms_1.1.3         
[65] mapview_2.11.2     urltools_1.7.3     rlang_1.1.2        Rcpp_1.0.11        httpcode_0.3.0     glue_1.6.2         DBI_1.1.3          xml2_1.3.6        
[73] jsonlite_1.8.9     rstudioapi_0.17.1  R6_2.5.1           fs_1.6.3           units_0.8-5 

@user3386170
Copy link
Author

@superornitho : Une mise à jour de taxize a été nécessaire à cause d'un changement du serveur. As-tu une liste d'entrées d'espèces et/ou un ancien code que tu pourrais rouler pour test la nouvelle fonction (gna_verifier)?

@user3386170
Copy link
Author

I ran into another minor issue today running gna_verifier. The error message to indicate that the number of species/taxa to query has been maxed out is inconsistent. When I ran about 500 names, I got the message Error: Request-URI Too Long (HTTP 414) which seems correct to me while when I ran about 700 names, I got this message instead Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle); Recv failure: Connection was reset, which is less meaningful to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants