Skip to content

Latest commit

 

History

History
77 lines (48 loc) · 11.1 KB

other_data_types.md

File metadata and controls

77 lines (48 loc) · 11.1 KB

Other data types

Content

Multimedia data (Acoustic, Imaging)

If you have multimedia data (e.g. images, acoustic, video) that you want to publish alongside your dataset, you can do so by documenting information in the associatedMedia field in your Occurrence table. The usage of this field requires the media in question to be hosted somewhere with a persistent URL of the annotated image(s), e.g., a publication, museum database, etc. Then you simply copy this link to the associatedMedia field for a given occurrence. You may also include a concatenated list if you need to list multiple sources.

While there are Core types and extensions (e.g., Audubon Core and Simple Multimedia extension) designed for image, video, and audio files, these data file types are not currently processed by OBIS. Thus for now we recommended to include links in the associatedMedia field. Stay tuned however, as OBIS is looking to incorporate the Simple Multimedia extension.

Martin-Cabrera et al., 2022 have produced a best practices for datasets with plankton imaging data that can also apply to acoustic and other imaging data types. Following their guidelines, we strongly recommend including the following terms in your Occurrence table for either of these data types:

  • basisOfRecord - recommended best practice is to always use the term of MachineObservation, especially for imaging datasets derived from imaging instruments
  • identifiedBy - name(s) of persons involved in verifying taxon identification, particularly if automatic identification was made by a software and then validated by a human
  • identificationVerificationStatus - categorical indicator for the extent of taxonomic identification verification. Recommended to use PredictedByMachine or ValidatedByHuman
  • identificationReferences - references used in identification (e.g. citation and version of software or algorithm that identified taxa)

The fields identifiedBy and identificationVerificationStatus are crucial to indicate whether an observation has been validated, and by whom. These fields allow users to filter data when basisOfRecord = MachineObservation, so that they can be confident in the taxonomic identification when identificationVerificationStatus = ValidatedByHuman (Martin-Cabrera et al., 2022).

The identificationVerificationStatus also has implications for documenting grouped occurrences, particularly for planktonic organisms. For example, if all identifications for a specific taxon in a sample has the same identificationVerificationStatus, you ony need one occurrence record with one associated unique occurrenceID. Then, the summed count or concentration for that taxon can be reported in the eMoF as, e.g. “Abundance of biological entity specified elsewhere per unit volume of the water body”. However, if individuals of a taxon have more than one identificationVerificationStatus (e.g. ValidatedByHuman and PredictedByMachine), you will need two occurrence records with associated unique occurrenceIDs. The two records will document the same taxon with different identificationVericationStatus, and with different summed concentrations of abundance reported in the eMoF.

Example Resources: Martin-Cabrera et al. (2022) have created a best practices document for plankton imaging data that you can reference. To see an example imaging dataset implementing these best practices, see the supplementary material of Establishing Plankton Imagery Dataflows Towards International Biodiversity Data Aggregators.

Data originating from ROV (Remote Operating Vehicle) observations may require additional processing. Ocean Networks Canada (ONC) is developing a pipeline for publishing ROV data to OBIS. ROV datasets should have:

  • An Event core that documents the hierarchical nature of ROV dives (e.g., ROV dives nested within a cruise)
  • Occurrence and eMoF extensions to record taxonomic and other measurement data e.g., from sensors.

ONC’s pipeline outlines the importance of including identifiedBy in order to vet taxon identifications by experts.

Habitat data

Event Core is perfect for enriching OBIS with interpreted information such as biological community, biotope or habitat type (collectively referred to as 'habitats'). However, the unconstrained nature of the terms measurementTypeID, measurementValueID, and measurementUnitID leads to a risk that habitats measurements are structured inconsistently within the Darwin Core Archive standard and as a result, are not easily discoverable, understood or usable.

As a result, members of the European Marine Observation and Data Network (EMODnet) Seabed Habitats and Biology thematic groups have produced a technical report Duncan et al. (2021) that provides guidance on using the Darwin Core eMoF extension to submit habitat data to OBIS, following the ENV-DATA approach and using Seabed Habitats as a use case. Note that the guidelines and structuring approach outlined in this document has not yet been approved or accepted at the global level and is only a recommended approach as agreed upon by EMODnet Seabed Habitats, EMODnet Biology, and OBIS. The implementation at the EurOBIS level may be considered a pilot.

The overarching principles are summarised here. Note that because of the numerous classification systems and priority habitat lists in existence, it is not possible to point to a single vocabulary for populating each of measurementTypeID, measurementValueID and measurementUnitID, as for other measurement types, so below are the types of information to include, with an example, as recommended by Duncan et al. (2021):

Please consult the Duncan et al. (2021) technical report (title: A standard approach to structuring classified habitat data using the Darwin Core Extended Measurement or Fact Extension; note you must refine search to Technical Reports from 2021 to identify this report) for more details, including:

  • how to handle a single event with multiple habitat measurements
  • recommended vocabularies and terms for common habitat classification systems
  • example eMoF table

For filling measurementType with habitat-related data and/or the dwc:habitat column, you should reference the NERC vocabulary search. While the Coastal and Marine Ecological Classification Standard (CMECS) and the Environment Ontology (ENVO) also contain habitat vocabularies, OBIS recommends the use of NERC vocabulary. If other vocabularies are used, please provide the NERC vocabulary equivalent as additional records in the eMoF table.

Tracking data

Encoding Tracking data into Darwin Core follows the same standards as that of survey/sighting data. Tracking data should additionally indicate the accuracy in latitudinal and longitudinal measurements received from the positioning system, grouped by location accuracy classes, recorded in the coordinateUncertaintyInMeters field. The Ocean Tracking Network (OTN) has developed some guidelines for formatting this type of data in Darwin Core. We summarize the main points below.

Using Event core for tracking data is recommended as there can be multiple events involved in tracking an organism. There are capture/tag and release events, receiver deployment events, and detection occurrences. Note that the capture and release of an organism are not considered to be distinct Occurrence records because they are not natural occurrences. Thus, in the Event core table you may record unique events for:

  • The capture of an animal
  • The release of an animal
  • The deployment of a listening (or receiver) station

Information pertaining to a specific individual is linked by a unique organismID. You can use eventIDs associated with a receiver to record detection occurrences in the Occurrence table. One organism may then have multiple occurrences (and thus multiple occurrenceIDs), but the same organismID. Any measurements for an organism taken during capture can be recorded in the extendedMeasurementsOrFact extension, linked to the core by the capture event’s eventID as well as the unique organismID. For more details, see the DwC guidelines for biologging.

Extracts from the extendedMeasurementOrFact Extension (eMoF) of the actual dataset Ningaloo Outlook turtle tracking of Green turtles (Chelonia mydas), Western Australia (2018-present), are shown as an example tracking dataset, following ARGOS Location class codes.

extendedMeasurementOrFact (eMoF) extension:

id measurementID occurrenceID measurementType measurementValue measurementValueID
2347540 2347540-argosclass 2347540 ARGOS Location Class A http://vocab.nerc.ac.uk/collection/R05/current/A
2347541 2347541-argosclass 2347541 ARGOS Location Class B http://vocab.nerc.ac.uk/collection/R05/current/B
2347542 2347542-argosclass 2347542 ARGOS Location Class 2 http://vocab.nerc.ac.uk/collection/R05/current/2
2347543 2347543-argosclass 2347543 ARGOS Location Class 3 http://vocab.nerc.ac.uk/collection/R05/current/3