-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to detect chresonyms? #39
Comments
Hi, Markus. You're very brave to try to tackle chresonyms on a "rules" basis, because a rule in the usual sense would have to refer to what librarians for many decades have called an "authority file" - an agreed standard listing, in this case of accepted names. Taxonomy has no authority file. An alternative might be to allow variably sized clusters of related names, with the relations between them specified. Graph databasing is best for this, but it can also be done in a table. One of the related names could be designated "accepted" as an extra property, but that designation might change with taxonomic opinion. "Orthochresonymy" and "heterochresonymy" are relations and IMO more easily definable and changeable than "orthochresonym" and "heterochresonym" as entities. You're also very brave to deal with the Reptile Database at all. Peter Uetz has put a lot of love and effort into it, but "Database" is a misnomer. He offers a checklist as an Excel file with numerous separate data items crowded into individual spreadsheet cells. The "dump" (the latest one I looked at is Dec 2014) contains two "tab-separated" text files which are structurally a mess (tabs and linefeeds) and which have an astonishingly high content of control and replacement character gibberish. These files are completely unusable without hours of rebuilding. |
@mdoering I think we've hit this exact issue and worked a little on it the last couple of months. It arrose as we are working on moving existing Species Files into TaxonWorks. The latter uses a graph-representation, as @Mesibov notes would be useful, to store all its nomenclature, the former has a good number of rules, but it also allowed for free text nomenclators. We have rules that facilitate matching nomeclators (species epithets as strings) against the "authority file", i.e. assertions that have been successfully translated into the graph. These sit as a middle layer between @dimus Biodiversity gem and the TaxonWorks model. I strongly suspect that you could greatly narrow down the list of chresonyms using a similar approach. Without the full semantics of a graph I think @Mesibov is right in many aspects, but you could eliminate a lot of manual work because you'll be treating each GSD as the authority, and playing it off against itself. The middle layer library is here: https://github.com/SpeciesFileGroup/taxonworks/blob/development/lib/vendor/biodiversity.rb, I suspect it would be relatively trivial for you to translate it given what you seem to have available as documented in the new CoL API. For reference the graph model is documented here- https://github.com/SpeciesFileGroup/taxonworks_doc/blob/master/concepts/TaxonWorksNomenclature.pdf. Sooooon we'll be translating all that into API doc like you've nicely done. |
Hi, Matt. Great to see the library in Ruby, I think I'm allergic to Java. The nomenclature graph also looks good, although I'm not sure how TaxonNameRelationship works? For Markus' benefit, mine (and anyone else interested), could you have a go at listing the non-overlapping relationships that might exist between names? (And why do I have the sneaking suspicion that Rich Pyle did this 20 years ago...?) |
@Mesibov All the semantics come from NOMEN- https://github.com/SpeciesFileGroup/nomen, which we've completely hidden away in the interfaces. TaxonNameRelationships are object properties in OWL. Briefly, whenever you see an epithet, or any relation between monomials/protonyms you use a TNR to define that relationship (it really is a graph). TaxonNameClassifications are assertions (attributes) on a monomial/protonym. Rich Pyle's work was definitely referenced in NOMEN, but we've worked out more technicalities (I think). Our model is also a "true" graph, nodes, edges, attributes on nodes- from my understanding we can traverse various aspects of this graph to reproduce RIch's model. I should add that if you want to see it in action I'd be happy to set you up with a sandbox account. |
We based a lot of the CoL+ models on TCS especially the name relationship types: See also the very useful guide with lots of examples: https://github.com/tdwg/tcs/blob/master/TCS101/UserGuidev_1.3.pdf I wonder how well these relations map to NOMEN |
To preface the spewing below- your question was about chresonyms, to me that clearly falls in the domain of Nomenclature, not TCS. YRMV. My knee-jerk reaction is that NOMEN has nothing to do with taxon concepts, it's about the rules of nomenclature, therefor NOMEN is completely orthogonal to TCS (the intro to TCS clearly demarks these worlds). NOMEN allows you to make a set of assertions, and in theory infer with them, those assertions are not about biological concepts. Since TCS101 is about concepts, the two worlds of assertions do not overlapp. If one wants to infer the existence of concepts based on assertions that reference NOMEN that's upto them, but those inferences need to be recorded as such, likely as referenced in TCS. I think you likely made the right decision to adopt TCS, while the CoL uses words like "synomym" they are a catalog of taxa (more specifically "species"), they don't, to my knowledge claim to represent names, but rather their tips are assertions of the existence of a biologically meaningful entity. Attempting to overload the TCS with rules of nomenclature in the context of the CoL may lead "Bad Things". In TW we'll be focusing on adopting Nico Franz' approach to managing assertions about the relationships between biological entities (taxon concepts), primarily because at its heart it's a logical model that facilitates inference (as, in theory, does NOMEN). https://docs.google.com/document/d/1GpTJwrNoXjfV88Bupf4Lhx7JwzEFCrBdIVlJ0232zs8/edit?usp=sharing In as much as the relationships b/w TCS and Euler has been worked out we'll support both worlds. |
quick notes, @mjy. TCS is deals clearly also with |
@mdoering Point taken. @proceps and I will try and spend some time reconciling
:) If I had a dime for the number of times I've heard this said for all references to Names in general. To me the heart of the issue is this- What can I possibly say about the biology of the taxon if all I know is Really, I don't think it matters which perspective one takes, just make sure you have concepts in one box, names in the other, and clearly indicate where/how the two are linked. Practically this means that at the level of persistence there are unique IDs for concepts that must have no dependencies on names, i.e. the system must allow one to describe a concept with out a taxon name. |
Had some conversations with others here. In retrospect I think the concept of In general I think my issue was that the way we (TW) represents data more or less eliminates the need to address this problem, but I was not thinking about the many ways others represent their data in a less refined (and perhaps therefor more easily conflicting) manner. It's those data that you're looking to resolve, true unknowns, or synonyms in the broad sense (like why are there 4 different authors for the same Aus bus). The conclusion remains somewhat similar, the quality of the solution will be bound by the nature of what you do actually know (the quality and scope of your protonyms), but the approach to resolving the problem is more nuanced then I was thinking. |
My introduction to chresonyms was first with the COL reptile data from Uetz but I got my teeth into them with Hershkovitz’ Catalog of Living Whales.
See https://www.biodiversitylibrary.org/item/33227#page/43/mode/1up
https://www.biodiversitylibrary.org/item/33227#page/51/mode/1up
For purposes of improving recall, I wanted the synonymy (broad sense) but this required working out how to handle and model this both syntactically and semantically. In order to not offend the author I also wanted to be able to recreate the fidelity of the original. I subsequently came across many zoological catalogs that followed this general format.
|
When looking at a synonymy full of Chresonyms we see as real input to the CoL I think we can flag likely chresonyms in many cases: http://reptile-database.reptarium.cz/species?genus=Aspidoscelis&species=tigris You can easily recognize chresonyms here by the markup using the dash before the authorship. Unfortunately this is gone when the data gets to us. The idea would be to first identify clear real names and then mark all "homonyms" with different authorships as chresonyms. If they were real later homonyms they would be heterotypic and unlikely appear in the synonymy. In the example above the accepted name and its basionym can be identified as clear good names. This leaves the rest of the entire first block to be potential chresonyms. Also chresonyms never have a basionym authorship in brackets. So if there is a cluster of identical canonical names which includes a name with brackets all others are likely chresonyms:
You can spot In the case of
|
Some resources, e.g. the Reptile DB, contain many chresonyms for a name which the CoL would like to exclude. Manually flagging these names is very time consuming and not really feasable on this scale. What rules can we apply to discover the real name and flag chresonyms to discard them in the assembly process?
The text was updated successfully, but these errors were encountered: