Skip to content

Commit

Permalink
Merge pull request #51 from EMODnet/develop
Browse files Browse the repository at this point in the history
update BODC vocabs, add terms to dictionaries, deprecate Q01, fix cleandataframe
  • Loading branch information
cyrilrader authored Jul 29, 2024
2 parents 8dbb50a + 2c514b2 commit 12163ec
Show file tree
Hide file tree
Showing 16 changed files with 38 additions and 16 deletions.
2 changes: 1 addition & 1 deletion R/checkdataset.R
Original file line number Diff line number Diff line change
Expand Up @@ -542,7 +542,7 @@ checkdataset = function(Event = NULL, Occurrence = NULL, eMoF = NULL, IPTreport
mof_noSamplingdescriptor <- data.frame(level = "warning",
field = "measurementType",
row = NA,
message = "No sampling descriptors present: see http://vocab.nerc.ac.uk/collection/Q01/current/")
message = "No sampling descriptors/effort present: see https://github.com/EMODnet/EMODnetBiocheck/blob/master/files/workingEnvironment.R#L26")
}


Expand Down
19 changes: 14 additions & 5 deletions R/smallfunctions.R
Original file line number Diff line number Diff line change
Expand Up @@ -76,17 +76,26 @@ cleandataframe <- function (x, vector = TRUE) {
}
}

# turn character NAs and spaces into an actual empty cell
x <- x %>% mutate(across(where(is.character), ~na_if(., "NA"))) %>%
mutate(across(where(is.character), ~na_if(., ""))) %>%
mutate(across(where(is.character), ~na_if(., " ")))

# remove empty columns
x <- x[,colSums(is.na(x))<nrow(x)]

# Transform to character
if (vector == TRUE ){
x <- data.frame(lapply(x, as.character), stringsAsFactors=FALSE)
}

# Function to replace non-ASCII characters
replace_non_ascii <- function(x) {
iconv(x, from = "UTF-8", to = "ASCII//TRANSLIT")
}


# turn character NAs and spaces into an actual empty cell
x <- x %>% mutate(across(where(is.character), ~na_if(., "NA"))) %>%
mutate(across(where(is.character), ~na_if(., "NULL"))) %>%
mutate(across(where(is.character), ~na_if(., ""))) %>%
mutate(across(where(is.character), ~na_if(., " "))) %>%
mutate(across(where(is.character), replace_non_ascii))

return (x)
}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ The tool is based on and builds on top of the <a href="https://github.com/iobis/
| Any missing measurementTypeIDs in the eMoF table (for non occurrence related mofs) | measurementTypeID is missing | IPTreport\$dtb\$mof_issues | For interoperability reasons, the measurementTypeID field should be populated if the term exists in BODC, in case it doesn't it is good practice to request the creation of one. |
| Check if measurementTypeIDs resolve to BODC terms (for non occurrence related mofs) | measurementTypeID does not resolve | IPTreport\$dtb\$mof_issues | The vocabulary terms used under the measurementTypeID field must resolve to an existing concept, ideally to a concept from BODC |
| Check if there is a BODC vocab term for "sampling instrument" in the eMoF table | No sampling instrument present | IPTreport\$dtb\$general_issues | It is highly recommended to add sampling related information to a dataset and to use standardised vocabulary terms for interoperability reasons. |
| Check if there is a BODC vocab term for "sampling descriptors" in the eMoF table | No sampling descriptors present: see http://vocab.nerc.ac.uk/collection/Q01/current/ | IPTreport\$dtb\$general_issues | It is highly recommended to add sampling related information to a dataset and to use standardised vocabulary terms for interoperability reasons. |
| Check if there is a BODC vocab term for "sampling descriptors" in the eMoF table | No sampling descriptors/effort present: see https://github.com/EMODnet/EMODnetBiocheck/blob/master/files/workingEnvironment.R#L26 | IPTreport\$dtb\$general_issues | It is highly recommended to add sampling related information to a dataset and to use standardised vocabulary terms for interoperability reasons. |
| Any missing measurementTypeIDs in the eMoF table | measurementTypeID is missing | IPTreport\$dtb\$mof_issues | For interoperability reasons, the measurementTypeID field should be populated if the term exists in BODC, in case it doesn't it is good practice to request the creation of one. |
| Check if occurrence related measurementTypeIDs resolve to BODC terms | measurementTypeID does not resolve | IPTreport\$dtb\$mof_issues | The vocabulary terms used under the measurementTypeID field must resolve to an existing concept, ideally to a concept from BODC |
| Check if measurementUnitIDs resolve to BODC terms (for non occurrence related mofs) | measurementUnitID does not resolve | IPTreport\$dtb\$mof_issues | The vocabulary terms used under the measurementUnitID field must resolve to an existing concept, ideally to a concept from BODC |
Expand Down
Binary file modified data/BODCbiometrics.rda
Binary file not shown.
Binary file modified data/BODCeffort.rda
Binary file not shown.
Binary file modified data/BODCinstrument.rda
Binary file not shown.
Binary file modified data/BODCnomofvalues.rda
Binary file not shown.
Binary file modified data/BODCparameters.rda
Binary file not shown.
Binary file modified data/BODCquantity.rda
Binary file not shown.
Binary file modified data/BODCunits.rda
Binary file not shown.
Binary file modified data/BODCvalues.rda
Binary file not shown.
Binary file modified data/p01todwc.rda
Binary file not shown.
25 changes: 19 additions & 6 deletions files/workingEnvironment.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,15 @@ BODCbiometrics <- c ('http://vocab.nerc.ac.uk/collection/P01/current/LSTAGE01/',
'http://vocab.nerc.ac.uk/collection/P01/current/CELLVOLM/', 'http://vocab.nerc.ac.uk/collection/P01/current/OBSMAXLX/',
'http://vocab.nerc.ac.uk/collection/P01/current/OBSMINLX/', 'http://vocab.nerc.ac.uk/collection/P01/current/CTROPH01/',
'http://vocab.nerc.ac.uk/collection/P01/current/ENTSEX01/', 'http://vocab.nerc.ac.uk/collection/P01/current/SL01XX01/',
'http://vocab.nerc.ac.uk/collection/S06/current/S0600089/', 'http://vocab.nerc.ac.uk/collection/S06/current/S0600270/') #create a list of the measurementtypeIDs to be taken into account when checking the occurrence table for duplicates
'http://vocab.nerc.ac.uk/collection/S06/current/S0600089/', 'http://vocab.nerc.ac.uk/collection/S06/current/S0600270/',
'https://vocab.nerc.ac.uk/collection/P01/current/SZCLBIO1/') #create a list of the measurementtypeIDs to be taken into account when checking the occurrence table for duplicates

BODCeffort <- c('AREABEDS', 'Q01', 'VOLWBSMP', 'LENTRACK' , 'AZDRZZ01' ,'VOLFFMXX')
BODCeffort <- c('Q01', 'AREABEDS', 'VOLWBSMP', 'LENTRACK', 'AZDRZZ01', 'VOLFFMXX', 'SZCLBIO1', 'PRSZSPLW', 'MSHSIZE2', 'DSSPXTMT',
'DSSPPRMT', 'DSAMPA01', 'MTHHGHT1', 'MTHAREA1', 'MTHWDTH1', 'DSSPMT01', 'MSHSIZE1', 'NMSPPF01', 'DSPSSP01',
'PRSZSPUP', 'MSHSIZE3', 'WASPXX01', 'COREWDTH', 'VOLXXXXX', 'VOLFFMDT', 'VOLFDGDT', 'VOLFMCXX', 'VOLSEDSM',
'VOLSPOPC', 'M91BA36A') # Mostly P01s linked to https://vocab.nerc.ac.uk/collection/P02/current/SAMP/

BODCinstrument <- c('Q0100002')
BODCinstrument <- c('NMSPINST')



Expand Down Expand Up @@ -59,9 +63,18 @@ p01todwc <- data.frame(P01s = c("SNANID01", "SCNAME01", "SSAMID01", "SAMPPROT",
stringsAsFactors = FALSE)


# use_data(BODCunits, overwrite = TRUE)
# use_data(BODCvalues, overwrite = TRUE)
# use_data(BODCparameters, overwrite = TRUE)


# Gets new params, values and units from https://gitlab.vliz.be/datac/eurobis-dmt/-/blob/master/BODC/BODC_tables_generator.py
BODCunits <- read.csv("~/EMODnetBiocheck/files/BODCunits.csv") %>% cleandataframe()
BODCparameters <- read.csv("~/EMODnetBiocheck/files/BODCparameters.csv") %>% cleandataframe()
BODCvalues <- read.csv("~/EMODnetBiocheck/files/BODCvalues.csv") %>% cleandataframe()



use_data(BODCunits, overwrite = TRUE)
use_data(BODCvalues, overwrite = TRUE)
use_data(BODCparameters, overwrite = TRUE)
use_data(BODCnomofvalues, overwrite = TRUE)
use_data(BODCbiometrics, overwrite = TRUE)
use_data(BODCeffort, overwrite = TRUE)
Expand Down
2 changes: 1 addition & 1 deletion man/BODCbiometrics.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/BODCeffort.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/BODCquantity.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 12163ec

Please sign in to comment.