- Now kobo surveys are retrieved according the new v2 API. The function
retrieve_survey_data()
has been replaced byget_kobo_data
updated to retrieve data from the new API. The functionretrieve_survey_metadata()
has been deprecated.
- Align variables to the Aquatic Food Ontology (AQFO)
- Estimations are now weighted based on the number of records in each monthly bin. This improve the accuracy of estimations and reduce outliers effect.
- Integrated the ingestion of the new survey form "Peskas 2". The new form was deployed with the aim to be more manageable for enumerators. It also includes some more questions regarding the fish handling and trading.
- Validation step in now implemented using google sheets using the
googlesheets4
package.
-
Improve national and municipal estimates combining packags
Amelia
andmice
for missing and outliers data imputation. -
Integrating price per kg into export data
- Improved revenue outliers identification based on empirical information
- Integrated taxa selection and filtering in tracks file deemed to the leaflet map in the portal
- Added a new feature in the function
ingest_pds_map
. The function process and upload a data frame containing the number of trips, CPE (catch per unit effort) and RPE (revenue per unit effort) splitted by grids to produce leaflet maps in the web portal.
- Improve the estimation of some fish groups' catches by calculating the weight per individual case by case.
-Fix error in cleaning of legacy landings: the columns indicating the number of individuals and fish length for catches > 60cm were exchanged
Adding option to produce Timor map filtered by fishing trips
-
Improve the validation step by flagging those observation characterized by having positive revenue (individuals) despite 0 or null individuals (revenue).
-
Use a log model to identify abnormal weight-revenue relationships on Cook's distance estimation.
-
Replace NA catch code with catch code "0" and ensure these observations having non positive individuals and revenue.
-
Use total length for weight calculation of MOO in all the landings (weights calculated with FL seem quite unrealistic)
-
Use 95° quantile instead of median to summarise weight parameters for catch types, it seems to return more realistic weight estimations by single individuals.
- Split landings preprocessing in two jobs to avoid to run out of memory (docker exit code 137)
- Added a new function
ingest_pds_matched_trips
to ingest matched pds tracks and survey landings in a zip folder on a monthly scale.
- Added nutritional values for each catch by the function
get_nutrients_table
. The function links to the repository https://github.com/mamacneil/NutrientFishbase and join the estimated nutrients values with the FishBase species data.
- Added the folder report containing the scripts useful to generate a pdf report of Peskas. Such report is downloadable from Peskas portal.
- Added a series of functions (
get_sync_tracks()
,get_full_tracks()
,get_full_trips()
) useful to retrieve the complete file of pds tracks. - Generate and retrieve the pds track map with
ingest_pds_map()
andget_tracks_map()
.
- The Docker file now run rocker/r-ver 4.1.1 instead of rocker/geospatial:4.0.3
- Fixed the bug when calling
ingest_rfish_table()
in the main pipeline.
- Added a series of functions (
get_catch_types()
,get_fish_length()
,retrieve_lengths()
) useful to retrieve morphometric conversion factors from catch types names in metadata tables. - Added a new metadata table in airtable (morphometric_table) containing length-weight and length-length conversion factors.
- Added
join_weights()
which integrates morphometric data with merged landings.
- We test validated data to ensure its integrity
- Added
merge_trips()
which integrates data from the landings and tracking together - Added
format_public_data()
to format and export data to be used for analytics
- Added functions to process and validate pds trips. Namely,
preprocess_pds_trips()
andvalidate_pds_trips()
. These functions make sure the data types are appropriate, check for trip duplicates and perform basic checks for trip duration and distance.
- Added
retrieve_pds_trips_data()
,retrieve_pds_trips()
,retrieve_pds_tracks_data()
andretrieve_pds_tracks()
to download trips and tracks from Pelagic Data System API. - Added function
get_pds_res()
to convert Pelagic Data System API responses to data frames and merge trips and tracks data in a unique file. - Added
ingest_pds_trips()
andingest_pds_tracks()
to upload Pelagic Data System data into cloud. - Created a new bucket (pds-timor-dev) to store pds data.
cloud_object_name()
returns an empty vector when the bucket is empty
- Fixed the bug in survey retrieve process. Integrating a server response status check in
retrieve_survey_data()
- Added
merge_landings()
to merge and upload pre-processed recent and legacy landings data.
- Simplified jobs in the pipeline workflow
- Fixed workflow so that runs that are supposed to run in production actually do so
- Added
clean_catches()
,coalist()
andclean_legacy_landings()
to restructure legacy landings to recent landings. - Added
preprocess_legacy_landings()
to clean and ingest preprocessed legacy data.
- Added
validate_landings()
andingest_validation_tables()
to get validation data, check the ladings and upload flags to Airtable. - Added
air_tibble_to_records()
andair_upload_records()
to create and update records in Airtable - Added logic to validate IMEIs in the landing data
- Added a script in inst/airtable/edit-submission-link.js which can be used in Airtable to generate a link to the editable submission form
- Deactivated check of uniqueness of IMEI per boat in
pt_
- Landings pre-processing now renames a few columns in the data: specifically submission_id (used to be _id) and landing_date (used to be date)
- Added
ingest_legacy_landings
to retrieve data from legacy data (SFF landings)
- Added
air_get_records()
andair_records_to_tibble()
to retrieve and process records from Airtable - Added
pt_validate_boats()
,pt_validate_devices()
, andpt_validate_vms_installs()
to perform basic data validation from the metadata tables
cloud_object_name()
can now also match files by exact name and not just by prefix
ingest_metadata_tables()
andpreprocess_metadata_tables()
now use logic to use Airtable instead og Google Sheets
- Added
ingest_metadata_tables()
to ingest data about boats, species, municipalities, etc. - Added
preprocess_metadata_tables()
to preprocess the data from the metadata ingestion. - Added
pt_get_devices_table()
andpt_validate_flags()
as helper functions for the metadata preprocessing.
- We use a single function to read the config.file and load the environment parameters
- Added an skeleton pipeline to provide guidance to future development. This is now visible in the README.
- Renamed
ingest_timor_landings()
toingest_landings()
for brevity and because all functions relate to Timor anyways.
- Renamed the job names in the github workflow as the functions that the job calls.
- Improved documentation of the package: particularly the readme and the function reference.
- We now skip RCurl from windows tests
- The preprocessing of East Timor landings is implemented in
preprocess_landings()
- Added
pt_nest_attachments()
to group all attachment columns into a nested column containing data frames. - Added
pt_nest_species()
to group all attachment columns into a nested column containing data frames. - Added
cloud_object_name()
as a complement toadd_version()
to return the latest or an specified version of an object in an storage location. - Added
download_cloud_file()
to download files from cloud storage providers.
- Now using
cloud_storage_authenticate()
to internally authenticate to cloud storage instead of authenticating separately in each cloud functionjj. This simplifies authentication and ensures authentication is not attempted when credentials have been already validated.
download_survey_data()
,download_survey_metadata()
, anddownload_survey()
have been renamed toretrieve_survey_data()
,retrieve_survey_metadata()
, andretrieve_survey()
. This is to avoid confusion with planned functions that download data from cloud locations.- The suffix raw and metadata that is appended to the prefix when retrieving survey information is now separated using "_" rather than "-". This is to more easily distinguish between information encoded in the file name.
- The prefix name of surveys is not hard-coded and can be specified in the config file (
file_prefix
field).
Adds infrastructure to download survey data and upload it to cloud storage providers and implements the ingestion of East Timor landings.
- The ingestion of East Timor Landings is implemented in
ingest_timor_landings()
. - The functions
download_survey_data()
anddownload_survey_metadata()
which download data and metadata for an electronic survey hosted by kobo, kobohr, or ona. download_survey()
can be used as a wrapper to download data and metadata in a single call.upload_cloud_file()
can be used to upload a set of files to a cloud storage bucket. Currently only Google Cloud Services (GCS) is supported.add_version()
is an utility function that appends date-time and sha information to a string and is used to version file names.get_host_url()
is an utility function that gets the host url of an electronic survey provider API.
The data pipeline is implemented and run in GitHub Actions on a schedule.