
Description
In submission S2440, you will notice that all binomial taxon labels with suffix codes (e.g. "Arianta schmidtii EG71B") validate automatically, but trinomial taxon labels with suffix codes (e.g. "Arianta arbustorum styriaca EG468") fail to validate (except where I have entered uBio IDs manually, such as with "Arianta arbustorum styriaca AT EG454"). I'm not sure where the problem lies (and, in fact, most if not all of these trinomials are already in TreeBASE, so the problem happens "in house" prior to using uBio's web services), but I would suggest that the solution is run a series of regular expressions on each taxon label. e.g. (1) first make sure that there is a space between species or subspecies names and suffix codes, assuming that a lower case followed by an upper case or a number probably indicates the presence of a suffix code stuck to the end of a species or subspecies name -- i.e. s/([a-z]{3,})([A-Z\d+]+)/\1 \2/ Then (2), test to see if there is a trinomial followed by a possible suffix, realizing that hyphens are allowed in species and subspecies names: m/^([A-Z][a-z]+) ([a-z-]+) ([a-z-]+)(.)$/ -- if you get a hit, search the taxon_variants table for "$1 $2 $3" and if nothing there, throw "$1 $2 $3" against uBio's web services. If no hit, then (3) test to see if there is a binomial followed by a possible suffix: m/^([A-Z][a-z]+) ([a-z-]+)(.)$/ and if you get a hit, search the taxon_variants table for "$1 $2" and if nothing there, throw "$1 $2" at uBio's web services.
Reported by: piel