Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systema Dipterorum (id 1101): test report #127

Open
2 tasks done
yroskov opened this issue May 14, 2021 · 72 comments
Open
2 tasks done

Systema Dipterorum (id 1101): test report #127

yroskov opened this issue May 14, 2021 · 72 comments

Comments

@yroskov
Copy link

yroskov commented May 14, 2021

Version 3.1 received 2021-05-12.
Imported to prod: https://data.catalogueoflife.org/dataset/1101/about

(previous reports are in #6)

  • Metadata: updated

  • Sector: order Diptera minus 4 families

  1. delete CIPA sectors
  2. re-establish and sync CCW (suprefamily Tipuloidea)
  3. establish SD sector as order Diptera
  4. block 4 families:
    Cylindrotomidae
    Limoniidae
    Pediciidae
    Tipulidae
  5. sync

As result, assembly tree looks like that:

image

@yroskov
Copy link
Author

yroskov commented May 14, 2021

TASKS

image

ACC-ACC species (different authors)

image

image

image

image

image

image

image

image

  • ACC-ACC species (same authors)

image

Resolved 2021-05-17

image

  • Identical genus (437 names); Identical subgenus (743 names) - cases of "split" taxa in the classification remain unresolved. Need attention from the authors.

@yroskov
Copy link
Author

yroskov commented May 17, 2021

Synced with 4 blocked families in the assembly tree, 2021-05-17

4 blocked families need to be confirmed after the sync: confirmed, OK.

@yroskov
Copy link
Author

yroskov commented Jun 14, 2021

See CatalogueOfLife/data#273

Both, species and its parent genus, are blocked in CoL.
Systema Dipterorum re-synced 2021-06-14.

@yroskov
Copy link
Author

yroskov commented Jun 17, 2021

Reported by @olafbanki 2021-06-16:

On Diptera I have a question. I see quite some family names that closely resemble each other (Archisargidaae & Archisargidae) and where duplicate genera exist (e.g. Archirhagio). I attach a screen shot from catalogueoflife.org as example. Looks like there are some data quality issues. What is your take on this?

  • Misspelled family name? Archisargidaae (1 sp Archirhagio obscurus Rohdendorf, 1938) vs Archisargidae (59 spp) - for attention of Neal.

In CoL: Archisargidaae "taxon blocked" in assembly tree 2021-06-17. Synced.

@yroskov
Copy link
Author

yroskov commented Jun 17, 2021

Reported by @olafbanki 2021-06-16:

In addition the family Archisargidae is extinct, but it does not have a flag.

  • CoL have no "extinct" flag with SD species. - NOW FIXED ON SPECIES LEVEL IN SD. EXTINCT FLAG FOR PARENT TAXA SHOULD BE CALCULATED BY THE SOFTWARE
    @gdower, could we use data from SD field "Epoch" (Amber, Baltic Amber, Creataceous, etc.). Would it be correct to apply "extinct" flag to every species which have empty Epoch field?

We agreed our action plan as (1) modify script and take available values from Epoch field of Systema Dipterorum V3.1_2021-05-12, fill start & end periods in CoL (where it necessary) for 4K accepted species, (2) apply flag “extinct” to the species with not-empty values in Epoch field.

@yroskov
Copy link
Author

yroskov commented Jun 18, 2021

  • In the Tree "Chironomus mixtus Holmgren, 1869: 45. TL: Norway." as a family. BLOCKED.
    image

  • ISSUES

  • TASKS

V3.1_2021-05-12 synced 2021-06-18

Preview 2021-06-18 https://preview.catalogueoflife.org/ looks good.
New spp stats: 145933 extant & 3759 extinct spp.

  • Parent taxa with all extinct children species are not marked as extinct (i.e. no dagger with genus, family, etc.)

@yroskov
Copy link
Author

yroskov commented Jul 1, 2021

ITIS offers global checklist for Culicidae family:
#8 (comment)

Response from SD: keep Culicidae from SD.

@yroskov
Copy link
Author

yroskov commented Sep 7, 2021

Set of correctly assigned species binomials are placed under incorrect genus in the classification as a false parent: gbif/checklistbank#187
see also CatalogueOfLife/backend#1052

Example:
image

Simple re-sync did not fix a problem.

Two hypotheses (@gdower):

(1) Option "Union", which was used for the sector Diptera, may cause a problem. = No
(2) Name Trigonometopus (Culex) canus (as in a dataset) may cause a problem. = Yes
Generalised problem: the same subgenus name may appear in different genera in SD:

  • single accepted species name Trigonometopus (Culex) canus and rest of species names are in Culex (Culex)
  • single accepted species name Bactrocera (Callantra) axanthina and rest of species names are in Dacus (Callantra)
    etc.

Experiment 1: do not use "Union" in assembly.
Steps to repair via re-assembly of sectors:
(1) delete all sectors in Diptera
(2) delete entire Diptera subtree
(3) establish order Diptera from SD
(4) block 4 families (which we'll take from CCW):
Cylindrotomidae
Limoniidae
Pediciidae
Tipulidae
(5) add 4 CCW families as sectors in Diptera (skip suprefamily Tipuloidea as a rank between family & order)
(6) sync SD and CCW

Bad news: above steps did not fix a problem:

image

Experiment 2: fix "incorrect" name.
Steps to repair via complex decision over species binomial:
(1) complex decision Trigonometopus (Culex) canus --> Trigonometopus canus
(2) re-sync SD

Result successful: subgenus Culex is correctly placed in genus Culex
image
image

Well, sync process is misinterpreting original placement of "homonymic" subgenera.
Source dataset may have a mistake (i.e. incorrect subgenus in a set of species from another genus), but it may use homonymic subgenera for a purpose as well. I would expect that software generates report on detected problems for GSD authors, but also translate placement of subgenus in genus as it occur in the original dataset.

@yroskov
Copy link
Author

yroskov commented Sep 7, 2021

Report for Identical Subgenus contains 743 names:
image

CoL cannot resolve all cases on our end.

@yroskov
Copy link
Author

yroskov commented Sep 8, 2021

Another strange case of subgenus interpretation in the hierarchy: four species with assigned subspecies in the name have been placed in NotAssignedSubgenus node:
image

Can it be addressed in the ChecklistBank code, @mdoering? (@gdower)

@yroskov
Copy link
Author

yroskov commented Sep 14, 2021

Subgenus issues fixed in the code.

SD synced 2021-09-14.

2021-09-16: looks like technical problem is resolved and species placed in correct genera.
Waiting for a new version of the checklist without "homonymic" subgenera in different genera from SD team.

@yroskov
Copy link
Author

yroskov commented Feb 18, 2022

Version 3.6 received 2022-02-14

Imported to DEV https://data.dev.catalogueoflife.org/dataset/1101/classification

@yroskov
Copy link
Author

yroskov commented Feb 18, 2022

Checks of the view 2022-02-18
(few notes)

image

  • A single case where species is placed as a family: = BLOCKED
    image
    image

In the source (SPECIES table):

Family Full Name Line Full Name Line Range Full Species Line
Chironomidae Chironomus mixtus Holmgren, 1869: 45. TL: Norway. Bear I. (HT F NRS). Chironomus mixtus Holmgren, 1869: 45. TL: Norway. Bear I. (HT F NRS). Orthocladius (Orthocladius) mixtus. (PA: PA)

*)
image

  • 18 subfamilies, 69 genera (without square brackets) & 25 genera in brackets are outside families:
    image

CONCLUSION: set of species in these fam & gen have no parent families in the source file (blank values). CLB INTERPRETATION IS CORRECT

Checked few against the source (SPECIES):

Ceratopogoninae have 3 parent families, plus blank family with Serromyia errata:
image

Chironominae have 2 parent families, plus blank family with 3 spp Nandeva pudens, Parachironomus inageheus, Polypedilum (Polypedilum) xianjuensis :
image

Chironomiinae (is it different with above? - check with Neal) have 1 parent family Chironomidae, plus blank family with species Yaeprimus balteatus:
image

Clitellariinae have 2 parent families, plus blank family with species Adoxomyia hasbenlii
image

@mdoering
Copy link
Member

mdoering commented Feb 19, 2022

Species with uncertain placement in the genus: the original genus goes in square brackets, as agreed with Neal. (Previously, such names had question mark in genus field ( ? stercoreus)
How CLD deals with such names?

I don't think it is a good idea to feed names with square bracket embraced genera into CLB to indicate uncertain placement. This is a very specific convention for SD only and not known to anyone nor the system itself.

We should try to change those names and rather follow the guidelines of ColDP, where we have discussed this problem and how to deal with it in a consistent way so both CLB and other users understand the data correctly.

Looking at the verbatim Taxon record of that example I see various problems:

  • col:genus is given as the classification (this is a Taxon, not Name record). If it is uncertain don't do that and better remove the field.
  • col:species is given as an epithet only, the classification expects a binomial. Remove the field.
  • col:provisional should be used to indicate the uncertain placement

In the linked Name record I would suggest to simply remove the square brackets.

@yroskov
Copy link
Author

yroskov commented Feb 22, 2022

@yroskov
Copy link
Author

yroskov commented Feb 22, 2022

In the linked Name record I would suggest to simply remove the square brackets.

I am comfortable with presentation of an original genus in square brackets where a new placement in a genus is not resolved yet. If CLB allows search for names with square bracket, I'll be happy to mark these accepted names as Provisionally Accepted in the CoL. See: CatalogueOfLife/backend#1112

@mdoering
Copy link
Member

mdoering commented Feb 23, 2022

Curly brackets around genera is nothing we support at this stage. It will be considered bad data and likely has impacts down the line when we assemble COL, e.g. when we make sure to have a genus record for every accepted species. Don't be surprised if you find new genera with brackets in COL.

@yroskov
Copy link
Author

yroskov commented May 17, 2022

Version 3.6 (2022-02-14) imported to the PROD 2022-05-17

  • Imported 177,088 spp
    image

  • Metadata: OK (ver. 3.6 Feb 2022, 2022-02-14)

  • Sectors: OK
    Blocked families Cylindrotomidae, Limoniidae, Pediciidae, Tipulidae in Systema Dipterorum (taken from CCW)
    pre-synced 2022-05-17

@yroskov
Copy link
Author

yroskov commented May 17, 2022

ISSUES assessed 2022-05-17

image

@yroskov
Copy link
Author

yroskov commented May 17, 2022

TASKS as 2022-05-17
image

  • Broken decisions, 3506; rematch all = 3501remain broken; deleted all.

!Remember! ACC=ACC sp (diff auth):
all names with genus in square brackets = Prov Acc
? what to do with names with authorstrings without year (they may have different synonyms - keep?)
image

@yroskov
Copy link
Author

yroskov commented May 19, 2022

Version 3.6 (2022-02-14), new crawl iteration imported to the PROD 2022-05-19

  • Imported 169,487 spp (vs 177,088 in previously crawled version)
    image

  • Metadata: OK

  • Sectors: families Cylindrotomidae, Limoniidae, Pediciidae, Tipulidae in Systema Dipterorum (taken from CCW)
    should be blocked again = FIXED

@yroskov
Copy link
Author

yroskov commented May 19, 2022

ISSUES assessed 2022-05-19 (many previous decisions remain in place)
image

@yroskov
Copy link
Author

yroskov commented May 19, 2022

Investigating bare names, 8,455
https://www.checklistbank.org/catalogue/3/dataset/1101/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=100&offset=0&status=bare%20name

? ambigua Pankratova, 1950 = ok
? arcudae Botnariuc, 1956 = ok
? delicatula Botnariuc & Cure, 1956 = ok

However, many names become "bare" for unclear (yet) reasons:
Mesembrinella dorsimacula Aldrich, 1922, it is (Available, Valid) Current Status
Komisca nanensis (Chaiwong, Sukontason & Sukontason, 2009), it is (Available, Valid) Changed Combination / Rank
Abago rohdendorfi Grunin, 1966 (Available, Invalid) Junior SECONDARY Homonym

@yroskov
Copy link
Author

yroskov commented May 19, 2022

TASKS as 2022-05-19
image

  • Broken decisions, 1082; deleted all.
  • Split genera, tribes, subfamilies only partially marked as "prov Acc"

Resolved 2022-05-19:
image

@yroskov
Copy link
Author

yroskov commented May 19, 2022

  • All accepted names with square bracket [ need to be flagged as provisionally accepted. Examples: [Aricia] coronata (Holmgren, 1883); [Cordylura] marginipennis Gimmerthal, 1847
    I cannot do it in CLB: there is no such names in ISSUE reports, neither Workbench search delivers them.

NEW SEARCH OPTION IN Workbench @clb: RegEx Search (Regular Expression Search)
image

#197

@yroskov
Copy link
Author

yroskov commented May 20, 2022

Crawl iteration with pre-flaged "prov acc" names imported 2022-05-19 & 20.
(The main problem: page Tasks failed to be displayed in CLB (spinning progress forever), page Classification whether also failed or too slow - multiple imports. = now resolved)

  • Now crawler automatically flagged all species with genera in square brackets as "Prov Acc" (3,275 prov acc spp in this version).

  • ~700 genera with square brackets were flagged as "Prov Acc" via decisions. (Steps: workbench - filter for acc genera: all are at the end of the list, resorting - set up 700 lines per page - applied balk decision)

@yroskov
Copy link
Author

yroskov commented May 20, 2022

  • Imported: 169,492 spp
  • Sectors: families Cylindrotomidae (blocked), Limoniidae (blocked), Pediciidae (blocked), Tipulidae (not blocked) in Systema Dipterorum (taken from CCW) = FIXED

TASKS as 2022-05-20
(all previous decisions re-applied successfully, new decisions added)

image

Synced 2022-05-20

@yroskov
Copy link
Author

yroskov commented Jun 12, 2023

Systema Dipterorum 4.2.2, May 2023, received 2023-05-27; imported to prod 2023-05-30

  • Imported: 171,937 spp (vs 172,050 spp in 4.2, May 2022)
  • Metadata: OK, ver. 4.2.2 -> 4.2.2, May 2023
  • Classification: 7 subfamilies and many genera outside families in the Tree root
  • Sectors: CCW families Cylindrotomidae, Limoniidae, Pediciidae, Tipulidae should be blocked in Systema Dipterorum; plus subfamily Tipulinae & genus Tipula in the root of Diptera. = all families have no block = FIXED 2023-06-12

image

TASKS

image

  • Broken decisions, 369: deleted all

  • Genera with square brackets blocked

  • Split subgenera - failed to resolve = the sector synced without rank subgenus

Resolved 2023-06-12:

image

Synced 2023-06-12 (without rank subgenus)

@yroskov
Copy link
Author

yroskov commented Jun 15, 2023

2023-06-15: temporary names such as *FChironominae (start as *F) deleted as a node (“taxon”) in Assembly - Draft. All children attached to the next parent. Sync is not involved (i.e. such names will be back with next sync).

@yroskov
Copy link
Author

yroskov commented Nov 13, 2023

Both names blocked in CoL. Reported to Neal.

Systema Dipterorum re-synced 2023-11-13.

@yroskov yroskov mentioned this issue Nov 13, 2023
17 tasks
@yroskov
Copy link
Author

yroskov commented Nov 20, 2023

https://www.checklistbank.org/catalogue/3/dataset/1101/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=50&offset=0&q=%5Cn

image

Re-synced 2023-11-20

@yroskov
Copy link
Author

yroskov commented Dec 5, 2023

  • Uninomials with prefixes *F, *T (e.g. *FChironominae, *FChloropinae, *FTephritini, *TLestremiinae) = blocked 2023-12-05

image

Systema Dipterorum 4.2.2, May 2023 re-synced 2023-12-05

After the check of PREVIEW 2023-12-06:
*F & *T names were blocked as taxa, not names.

Test names:
*FChironominae (next parent subfamily Chironominae)
Kribiopelma albidum Kieffer, 1923
Kribiobius modestus Kieffer, 1923

Decision "Ignore" applied instead of "Block". Synced 2023-12-07. That's work: only names was blocked and children taxa synced in the CoL.

@yroskov
Copy link
Author

yroskov commented Dec 5, 2023

Remains unresolved. Attempt to block subgenus as a rank: (1) vanished all subgenera from the tree & species names, (2) created "self-synonymy" (identical ACC-SYN). Blocking subgenus decision was reversed 2023-12-05. Duplicated subgenera are back (sic! PREVIEW 2023-12-07).

The list was sent to Neal 2023-12-05.

@yroskov
Copy link
Author

yroskov commented Feb 8, 2024

Tests of Systema Dipterorum ver. 4.5, 2023-11-16 processed via TW by DD vs data by GO: #244

@yroskov
Copy link
Author

yroskov commented Feb 8, 2024

Systema Dipterorum ver. 5.0, 2024-01-08 processed via TW by DD; imported 2024-02-07

  • Imported: 177193 spp (vs 171,937 spp in 4.2.2, May 2023)
  • Metadata: Corrected Version / Issued 0.38.1 / 2024-02-07 --> 5.0, Jan 2024 / 2024-01-08
    Added paragraph in Description:
    This version of the Systema Dipterorum data has been imported in TaxonWorks (the author of the import script is D. Dmitriev) and past soft validation there before exporting to the CoLDP format (the author of the export script is G. Ower).
  • Classification: 17 subfamilies and many genera & species are outside families in the Tree root (inside order Diptera). Species are flagged as Prov. Acc. 7 species with the portion [GENUS NOT SPECIFIED] in the root.
  • Sectors: sector "Diptera" broken = FIXED 2024-02-12
    CCW families Cylindrotomidae, Limoniidae, Pediciidae, Tipulidae should be blocked in Systema Dipterorum; plus subfamily Tipulinae (also, check genus Tipula in the root of Diptera) = all families re-blocked 2024-02-12
  • Extinct taxa: missing extinct flag with species (0 spp now vs 3,792 in May 2023). (Field Epoch in the source spreadsheet should be used). There are extinct flags in all other ranks.
  • Genera with square brackets: 37 names with the portion [GENUS NOT SPECIFIED] (workbench, reverse ordering, 40 n/p) = blocked
  • Names start with *F, *T, \n, ? = none
  • Split subgenera, 7 = resolved

image

ISSUES assessed 2024-02-12

image

TASKS

image

Hoplacephala nigriventris (Villeneuve, 1913)
vs
Hoplacephala nigriventris Villeneuve, 1913

Hoplacephala retroseta (Villeneuve, 1913)
vs
Hoplacephala retroseta Villeneuve, 1913

Huttonobesseria verecunda (Hutton, 1901)
vs
Huttonobesseria verecunda Hutton, 1901

Hystricia cuestae (Engel, 1920)
vs
Hystricia cuestae Engel, 1920

Isomyia pseudolucilia (Malloch, 1928)
vs
Isomyia pseudolucilia Malloch, 1928

plus, few cases of two identical accepted species (full list):
Empis (Polyblepharis) fedtschenkoi Shasmshev, 2023
Empis (Polyblepharis) hirsutitarsis Shamshev, 2023
Empis (Polyblepharis) sogdiensis Shamshev, 2023
Empis (Polyblepharis) sogdiensis Shasmshev, 2023
Holops anarayae Barahona-Segovia, 2021
Holops grezi Barahona-Segovia, 2021
Holops pullomen Baharona-Segovia, 2021
Paraclius brooksi Soares, Capellari & Ale-Rocha, 2023
Physoconops tentenvilu Baharona-Segovia, 2020
Polleniopsis bomdilaensis Bharti & Verves, 2016

ACC-ACC species (same authors) 0 of 342: https://www.checklistbank.org/dataset/1101/duplicates?authorshipDifferent=false&category=binomial&limit=50&minSize=2&mode=STRICT&offset=0&status=accepted
Two identical accepted species:
Agadasys hexablepharis Whittington, 2000
Amblypsilopus qinlingensis Yang & Saigusa, 2005
Amplisegmentum venezuelensis Winterton, 2021
etc.

Resolved 2024-02-12:

image

Synced 2024-02-12

@yroskov
Copy link
Author

yroskov commented Mar 4, 2024

TASKS does not detect such cases.

@yroskov
Copy link
Author

yroskov commented Mar 8, 2024

Systema Dipterorum ver. 5.0, 2024-01-08 processed via TW by DD; second iteration (bring back extinct spp); imported 2024-03-07

  • Imported: 177,193 spp (vs 177,193 spp)
  • Metadata: Corrected Version / Issued 0.39.0 / 2024-03-07 --> 5.0, Jan 2024 / 2024-01-08
  • Classification: as above.
  • Sectors: sector "Diptera" broken (subject is missing) = REPAIRED 2024-03-08.
    No blocked CCW families Cylindrotomidae, Limoniidae, Pediciidae, Tipulidae in Systema Dipterorum; plus subfamily Tipulinae (also, check genus Tipula in the root of Diptera) = all families re-blocked 2024-03-08. Species with the portion [GENUS NOT SPECIFIED] in the root = BLOCKED.
  • Extinct taxa: 4742 spp = FIXED!
  • Genera with square brackets: 28 names with the portion [GENUS NOT SPECIFIED] (workbench, reverse ordering, 40 n/p) = blocked.
  • Names start with *F, *T, \n, ? = none
  • Split taxa (see in TASKS)

METRICS

image

TASKS

image

  • Broken decisions, 6765 = no actions (because I have no assurance that decisions/CLB operates correctly)

  • Identical subfamilies approx. 45. Split subfamilies. = RESOLVED: (1) alternatives without authorstring ignored; (2) if both tribes without authorstrings, item with less species ignored.

  • Identical tribe 0 of 37. Split tribes. = RESOLVED: (1) alternatives without authorstring ignored; (2) if both tribes without authorstrings, item with less species ignored. However, decisions are not shown in the interface (see screenshot below) - for attention of @thomasstjerne (https://www.checklistbank.org/catalogue/3/dataset/1101/duplicates?catalogueKey=3&category=uninomial&limit=50&minSize=2&mode=STRICT&offset=0&rank=tribe&status=accepted&withDecision=false)

  • Identical genus 0 of 62. Split genera. = RESOLVED: alternatives without authorstring ignored. However, decisions are not shown in the interface (see screenshot below) - for attention of @thomasstjerne

  • Identical subgenus 0 of 7. Spit subgenera, 7 = RESOLVED: blocked alternative without authorstring - they are empty.

image

image

Seems, a bug resolved. On 2024-03-11, ACC-ACC species (different authors) 512 of 512:
image

The same problem: interface does not show results ofdecision application. Neither in the report nor in the panel.
https://www.checklistbank.org/catalogue/3/dataset/1101/duplicates?authorshipDifferent=false&catalogueKey=3&category=binomial&limit=500&minSize=2&mode=STRICT&offset=0&status=accepted&withDecision=false

image

image

See comments on bugs - stopper:
CatalogueOfLife/backend#1300 (comment)

CatalogueOfLife/backend#1300 (comment)

CatalogueOfLife/backend#1300 (comment)

Synced 2024-03-18, probably with sets of unresolved duplicates:

image

@yroskov
Copy link
Author

yroskov commented Mar 22, 2024

Re-do TASKS after software fixes, 2024-03-22

image

  • Broken decisions: 6765 = deleted
  • ACC-ACC species (same authors), 98 of 342: names with higher IDs blocked

Resolved 2024-03-22:

image

Re-synced 2024-03-22

@yroskov
Copy link
Author

yroskov commented May 23, 2024

Systema Dipterorum ver. 5.2 of 2024-05-15 (as 0.41.1 / 2024-05-23) processed via TW by GO; 1st iteration ; imported 2024-05-23

  • Imported: 216,630 spp (vs 177,193 spp); Synonym Count: 1 = something wrong
  • Metadata: ver. as "0.41.1 / 2024-05-23" need to corrected
  • Classification:
    Species with the portion [GENUS NOT SPECIFIED] in the root
  • Extinct taxa: no flag
  • Genera with square brackets: species with the portion [GENUS NOT SPECIFIED] (workbench, reverse ordering, 40 n/p)
  • Names start with *F, *T, \n, ? = none
  • Split taxa (see in TASKS)

METRICS

image

  • Incorrect authorstring with family Acartophthalmidae Enderlein, Lopes, Hardy, Czerny, Hennig, McAlpine, Munroe, Brauer, Bergenstamm, Fleming, Townsend, Zimin, Leach, Samouelle, Zetterstedt, Agassiz, Marschall, Griffini, Acloque, Arias, Brues, Melander, Lehrer, Hull, Alexander, Lahille, Rohdendorf, Verves, Aldrich, Griffiths, Guimarães, Dugdale, Rübsaamen, Hedicke, McAlpine, Mamaev, Robineau-Desvoidy, Macquart, Swainson, Griffith, Pidgeon, Harris, Rondani, Desmarest, Lioy, Rye, Barrett, Hendel, Coe, Latreille, Burmeister, Carpenter, Bowden, Becker, Brundin, Bigot, Mesnil, Ussatchov, Sack, Bezzi, Stein, Williston, Pleske, Newman, Grimshaw, Verrall, Young, Nowicki, Stuckenberg, Schnabl, Bernardi, Duda, Kloet, Hincks, Shannon, Crozy, Nartshuk, Bickel, Byers, Philip, Westwood, Baranov, Venturi, Schiner, Walker, Roback, Shewell, Winnertz, Malloch, Billberg, Hutton, Brodie, White, Stenhammar, Saether, Zimina, Herting, Glumac, Goffe, Egger, Pascoe, Townes, Frey, Beschovski, Jones, Albuquerque, Cogan, Wesché, Handlirsch, Aczél, Séguy, Wahlgren, Peck, Crampton, Cook, Hong, Cockerell, Wirth, Stone, Vujić, Curran, Kalugina, Zumpt, Blanchard, Drensky, Sturtevant, Bromley, Mani, Shiraki, Bertrand, Loew, Pandellé, Okada, Harris, Alcock, Yabar, Hall, Evenhuis, Edwards, Bhatia, Keilin, Morrison, Kertész, Shatalkin, Anonymous, Sharp, White, Wiedemann, Vimmer, Mathis, Adisoemarto, Wood, Wilcox, Papavero, Costa, Bode, Sabrosky, Arnaud, Smirnov, Nagatomi, Iwata, Daniels, Springer, Dziedzicki, Mallo, Roonwal, Tonnoir, Belkin, Fallén, Haliday, Coquillett, Slosson, Knowlton, Cutler, Hesse, Zaitzev, Theobald, Krivosheina, Ashe, Maa, Gressitt, Steyskal, Wheeler, Erichson, Presl, Bellardi, Berthold, Martin, Dahl, Bibby, Zilahi-Sebess, Grünberg, Brèthes, Curtis, Wesenberg-Lund, Miyatake, Anduze, Sedman, Weems, Vossbrinck, Friedman, Thompson, Nagler, Stanescu, Verb, Eysell, Austen, Rossi, Bau, Wing, Fluke, Holloway, Theodor, Papp, Weidner, Kieffer, Wu, Wenzel, Tokunaga, Grunin, Thomson, Kessel, Maggioncalda, Oken, Bequaert, Wingate, Carrera, Andretta, Woodley, Osten Sacken, Wulp, Stackelberg, Oldenberg, Colless, Riedel, Verbeke, Knutson, Lyneborg, Lameere, Lundström, Camras, Korneyev, Barnes, Heer, Jeannel, Elouard, Andersson, Blanchard, Boyes, Illingworth, Vaillant, Wandolleck, Kröber, Prado, Rafinesque, Perty, Doleschall, Philippi, Jaennicke, Hippa, Ass, Needham, Wasmann, Breddin, Börner, Wheeler, Seebold, Hardy, Meijere, Surcouf, Comstock, Hinton, Stephens, Meigen, Thon, Telford, Vockeroth, Roháček, Yang, Ren, Yang, Grichanov, Mazzarolo, Amorim, Liu, Chandler, Jaschhof, Ansorge, Norris, Richter, Lehr, Gaimari, Irwin, Labandeira, Schmitz, Gerstaecker, Hackman, Ovchinnikova, Barraclough, Freidberg, Tozoni, Andersen, Gagné, Kirby, Spence, Nitzsch, Neuhaus, Dyar, Borkent, Kano, Shinonaga, Kurahashi, Wang, Sasakawa, Marshall, Mostovski, Munari, Yeates, Lukashevich, Shcherbakov, Mason, Naglis, Brake, Kuznetzov, Szadziewski, Grootaert, Meuffels, Meunier, Shaw, Shaw, Greathead, Jaschhof, Didham, Pinto, Han, Schinz, Artigas, Wiegmann, Hancock, Krzemiński, Krzemińska, Saigusa, Nagatomi, Peris, González-Mora, Zloty, Sinclair, Pritchard, Séguy, Vilkamaa, Ševčík, Huang, Lin, Rindal, Guo, Wang, Michelsen, Zhang, Shih, Zhang, Krzeminska, Papier, Ebejer, Cherian, Shinimol, Zhu, Wang, Zhang, Fedotova, Sidorenko, Grimaldi, Cumming, Pape, Pimentel, Azar, Jaschhof, Li, Riccardi, Barták, Skartveit, Brown, Kung, Skibińska, Kaddumi, Pepinelli, Currie, Perkovsky, Lessard, Sidorenko, Lukashevich, Przhiboro, Plakidas, Tanasijtshuk, Oliveira, Ježek, Tang, Cranston, Rasnitsyn, Astakhov, Winterton, Ware, Hoffeins, Oldenburg, Mik, Wang, Blagoderov & Šifner, 2024 [1914]

Should be: Acartophthalmidae Czerny, 1928 = FIXED in 2nd iteration

  • Acartophthalmus coxatus Zetterstedt, 1848 = missing original combination as Agromyza coxata Zetterstedt, 1848 and missing brackets in the authorstring = FIXED in 2nd iteration

Both Acartophthalmidae & Acartophthalmus coxatus were correct in previous version.

@yroskov
Copy link
Author

yroskov commented Jun 3, 2024

Systema Dipterorum ver. 5.2 of 2024-05-15 (as 0.41.1 / 2024-06-01) processed via TW by GO; 2nd iteration ; imported 2024-06-01

  • Imported: 176,894 (vs 216,630 spp in 1st iteration, vs 177,193 spp); "Synonym Count: 133,377" = OK (vs 135,855)

  • Metadata: ver. "0.41.1 / 2024-06-01" corrected as "5.2 of 2024-05-15" 2024-06-03

  • Classification: OK.
    Species with the portion [GENUS NOT SPECIFIED] in the root

  • Sector: order Diptera is broken (missing subject); RESTORED 2024-06-03. 4 CCW families Cylindrotomidae, Limoniidae, Pediciidae & Tipulidae = BLOCKED

  • Extinct taxa: OK (4737 spp vs 4742 spp)

  • Genera with square brackets: species with the portion [GENUS NOT SPECIFIED] (workbench, reverse ordering, 40 n/p) = 19 blocked

  • Names start with *F, *T, \n, ? = none

  • Split taxa (see in TASKS)

  • Family Acartophthalmidae Czerny, 1928 = OK

METRICS

image

ISSUES

image

TASKS

image

  • Broken decisions, 5885 = all deleted
  • Outdated decisions, 57 = no actions (ref: the problem of decisions between GSDs) = 0 after deletion of broken decisions

Resolved 2024-06-03:

image

Synced 2024-06-03

@yroskov
Copy link
Author

yroskov commented Aug 20, 2024

Systema Dipterorum ver. 5.3 of 2024-07-17 (imported as 0.43.1 / 2024-08-12) processed via TW by GO; imported 2024-08-12

  • Imported: 177,314 (vs 176,894 spp); "Synonym Count: 133,479" = OK (vs 133,377)
  • Metadata: ver. "0.43.1 / 2024-08-12" corrected as "5.3 / 2024-07-17"
  • Classification: OK.
    Species with the portion [GENUS NOT SPECIFIED] in the root
  • Sector: Sector shown as healthy here https://www.checklistbank.org/catalogue/3/sector?limit=100&offset=0&subjectDatasetKey=1101, but with wrong subject as "Stenomyia fasciapennis Cresson, 1913" in expand view.
  • Sector expanded: incorrect subject = FIXED 2024-08-20
    image

Assembly view:
image

  • Sector nested: 4 CCW families Cylindrotomidae, Limoniidae, Pediciidae & Tipulidae should be blocked in SD = RE-BLOCKED 2024-08-20

  • Extinct taxa: OK (4746 spp vs 4737 spp)

  • Genera with square brackets: species with the portion [GENUS NOT SPECIFIED] (workbench, reverse ordering, 40 n/p) = 22 blocked 2024-08-20

  • Names start with *F, *T, \n, ? = none

  • Split taxa (see in TASKS)

  • Family Acartophthalmidae Czerny, 1928 = OK

METRICS

image

ISSUES assessed 2024-08-21

image

TASKS

image

  • Broken decisions, 4980 = deleted

  • Outdated decisions, 5 = no actions

  • ACC-ACC species (different authors) 2 of 61. Names reported to Neal:
    Bryophaenocladius pollexus Som, Mukherjee, Das & Charaborty, 2023 [incorrect spelling Chakraborty] = bloked
    Bryophaenocladius pollexus Som, Mukherjee, Das & Chakraborty, 2023
    Gedanohelea liaoningensis Steber, Szadziewski & Wang, 2016 [incorrect spelling of Stebner] = bloked
    Gedanohelea liaoningensis Stebner, Szadziewski & Wang, 2016
    Melanagromyza pseudolappae Gu, Fan & Sasakawa, 1991
    Melanagromyza pseudolappae Gu, 1991
    Ophiomyia bispina Gu, 1991
    Ophiomyia bispina Gu, Fan & Sasakawa, 1991
    Phytomyza disjunctivena Gu, Fan & Sasakawa, 1991
    Phytomyza disjunctivena Gu, 1991
    Urophora sevanensis Evsitgneev, 2023 [incorrect spelling of Evstigneev] = bloked
    Urophora sevanensis Evstigneev, 2023

  • Misspellings reported to Neal:
    Cdeeratopogonidae
    Chironlomidae
    Dolicvhopodidae
    Mycxetophilidae, Myceetophilidae (vs Mycetophilidae)

  • Identical family, genus, subgenus, etc. = pattern: "ignore" taxa with empty authorstring; ignore taxa with less spp; block taxa with 0 sp

Resolved 2024-08-21,22:

image

Synced 2024-08-22

@yroskov
Copy link
Author

yroskov commented Aug 28, 2024

@yroskov
Copy link
Author

yroskov commented Nov 20, 2024

Systema Dipterorum ver. 5.5 of 2024-10-01 (imported as Nov 2024 / 2024-11-12) processed via TW by GO; imported 2024-11-13 (first iteration - no extinct flag); second iteration imported 2024-11-21

  • Imported: 177,646 (vs 177,314 spp); "Synonym Count": 135,897 (vs 133,479)
  • Metadata: OK
  • Classification: OK.
    Species with the portion [GENUS NOT SPECIFIED] in the root
  • Sector: OK
  • Sector nested: 4 CCW families Cylindrotomidae, Limoniidae, Pediciidae & Tipulidae should be blocked in SD = OK 2024-11-21
  • Extinct taxa: OK (4757 spp vs 4746 spp)
  • Genera with square brackets: species with the portion [GENUS NOT SPECIFIED] (workbench, reverse ordering, 40 n/p) = 29 without decision = blocked 2024-11-21 - Request failed with status code 400
  • Names start with *F, *T, \n, ? = none
  • Split taxa (see in TASKS)

METRICS

image

ISSUES - do together with GO to assess the interface functionality

image

TASKS

image

Resolved 2024-11-25:

image

Synced 2024-11-25

@yroskov
Copy link
Author

yroskov commented Nov 26, 2024

Tests of PREVIEW 2024-11-26:

  • Empty families in the Tree (10) = caused by "ignore" decisions in all children taxa down to the genus, excluding species; family also had no decision = families BLOCKED in Assembly 2024-11-26
    Cdeeratopogonidae
    Cecidomyidae vs Cecidomyiidae
    Chironlomidae vs Chironomidae
    Dolcihopodidae vs Dolichopodidae
    Eoptychopteridae
    Huaxiasciaritidae
    Nemestrionidae vs Nemestrinidae (?)
    Simuiiidae vs Simuliidae
    Sinoditomyiidae
    Tethinidae = no this family in Systema Dipterorum = deleted in assembly CoL

image

image

image

image

image

Re-synced 2024-11-26

@yroskov yroskov mentioned this issue Nov 26, 2024
34 tasks
@yroskov
Copy link
Author

yroskov commented Dec 16, 2024

Tests of the PREVIEW 2024-12-16:

ACC-ACC same sp same auth: 171 pair, mainly Systema Dipterorum vs CCW
image
https://www.checklistbank.org/dataset/3/duplicates?authorshipDifferent=false&category=binomial&limit=100&minSize=2&mode=STRICT&offset=0&status=accepted

  • Insecta>Diptera>Conosia = BLOCKED
  • Insecta>Diptera>Styringomyia = BLOCKED
  • Empididae>Antocha>Antocha (Antocha) vs CCW Limoniinae>Antocha>(Antocha) = BLOCKED subfamily Limoniinae in fam Empididae

@yroskov
Copy link
Author

yroskov commented Dec 16, 2024

Systema Dipterorum ver. 5.5 of 2024-10-01 (imported as Nov 2024 / 2024-11-12) processed via TW by GO; imported 2024-11-13 (first iteration - no extinct flag); second iteration imported 2024-11-21; third iteration imported 2024-12-28 = it isn't 3rd iteration, the same dataset as 2024-11-13 but with broken decisions (were IDs changed during a new import?) - Tasks need to be re-done

image
@mdoering, what is the reason for your intervention?

TASKS

image

@mdoering
Copy link
Member

mdoering commented Dec 17, 2024

Sorry Yuri, I don't remember the exact reason. The import wasn't done explicitly, but probably due to restarts or sth. It was the exact same archive as the import by Geoff before, no data has changed as you can see in the history.

https://www.checklistbank.org/dataset/1101/diff?attempts=35..36

@yroskov
Copy link
Author

yroskov commented Dec 17, 2024

Systema Dipterorum ver. 5.5 of 2024-10-01 (re-imported 2024-12-17)

  • Imported: 177,646 (vs 177,646 spp); "Synonym Count": 135,897 (vs 135,897)
  • Metadata: OK
  • Classification: OK.
  • Sector: OK
  • Sector nested: 4 CCW families Cylindrotomidae, Limoniidae, Pediciidae & Tipulidae should be blocked in SD = OK 2024-12-17
  • Extinct taxa: OK
  • Genera with square brackets: species with the portion [GENUS NOT SPECIFIED] (workbench, reverse ordering, 40 n/p) = "block" decision FAILED for 16 names again "Request failed with status code 400"
    image
  • Names start with *F, *T, \n, ? = none
  • Split taxa (check TASKS in the project3)

METRICS

image

TASKS

image

  • Broken decisions, 5392 = deleted

Resolved:

image

Comment: SYN-SYN species (different accepted, same authors) 44 of 47 related to ACC-ACC species (same authors) 82 of 82

Synced 2024-12-18

@yroskov
Copy link
Author

yroskov commented Dec 19, 2024

After PREVIEW id 306706 2024-12-19 checks:

  • empty families Cryomyidae (subfam Aphaniosominae = ignore) & Empiidae = blocked

@yroskov
Copy link
Author

yroskov commented Dec 20, 2024

As we learned, Decisions Rematch in dataset options broke decisions in SD (see the "problem of December" above) CatalogueOfLife/backend#1382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants