diff --git a/globalprep/le/v2023/README.html b/globalprep/le/v2023/README.html new file mode 100644 index 00000000..90bbb207 --- /dev/null +++ b/globalprep/le/v2023/README.html @@ -0,0 +1,676 @@ + + + + +
+ + + + + + + + +In 2023 we cleaned and prepped the best available data for all
+sectors and components included in this goal. When newly updated data
+wasn’t available, we re-downloaded and cleaned the previous data source.
+The old versions of the raw data prepped by OHI were stored on a server
+previously used by OHI and were no longer accessible. All of the cleaned
+files are now saved in the folder
+~/ohiprep_v2023/globalprep/le/v2023/int
in the format
+sector_component.csv.
More detailed methods and explanations are available in the
+livelihoods_economies_dataprep.RMD saved in
+~/ohiprep_v2023/globalprep/le/v2023
. Included below is a
+summary of what tasks were completed in the methods update and what
+still needs to be done.
For all datasets, except tourism revenue, the current format has one +value for each country and year included in the dataset. Tourism uses a +pre-cleaned version of the revenue data, so countries have already been +converted to regions. We did not do any gapfilling to fill in countries +missing from the cleaned data sets, so this will likely need to be done +for most of the included data.
+Acronyms for sectors used in the original output layers are used for +simplicity of incorporating into the finalized OHI model. A new sector +FP was added in this analysis, and will need to be incorporated into the +model.
+Sector | +Acronym | +
---|---|
Fishing | +cf | +
Mariculture | +mar | +
Tourism | +tour | +
Ports and Harbors | +ph | +
Ship and Boat Building | +sb | +
Aquarium Fishing | +aqf | +
Transportation and Shipping | +tran | +
Marine Mammal Watching | +mmw | +
Wave and Renewable Energy | +og | +
value in each of these datasets is the total estimated revenue per +country in us dollars.
+Data:
+FAO +Capture Production Database: This database contains capture +production statistics by country or territory, species item, and FAO +Major Fishing Area.
Type: | +Quantity | +
Unit: | +Tonnes | +
Unit: | +Number of animals (removed) | +
Methods Description:
+A disaggregation version of the OECD data can be found in OECD’s Employment in fisheries, aquaculture and processing +Database if needed. We did not use the dis-aggregated data when +preparing this data, as the disaggregated numbers were not available for +the FAO data which was used to gapfill.
+We used OECD data on people employed in aquaculture sector +(marine and inland), total by occupation rate, thousands.
Because this data included both marine and inland values, we +estimate the proportion of total aquaculture jobs that can be attributed +to marine and brackish aquaculture, we used country-specific proportions +of marine and brackish aquaculture revenues (compared to total revenues) +calculated from FAO aquaculture production value data set.
See note in fishing about data disaggregation for fulltime, +partime, occassional and status unspecified.
Data for the fishery processing sector is from the OECD +Sustainable Economies database. We use the variable. People employed in +fishery processing sector (marine and inland), total by occupation rate, +thousands
Due to timing constraints we did not determine a method to subset +this data to only marine related fishery processing. This will needed to +be added to cleaning script.
+
OHI Science | Citation policy
+
This script was created in 2023 to progress towards updating the +livelihoods and economies goal.
+It is used to read in and wrangle all available livelihoods and +economies data as of August 2023. When newer updated data is not +available, the original data used in the livelihoods and goal in 2013 is +included. For more details on what data is available, please refer to +the spreadsheet linked in the livelihoods and economies issue.
+# load libraries, set directories
+library(ohicore) #devtools::install_github('ohi-science/ohicore@dev')
+library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
+## ✔ dplyr 1.1.3 ✔ readr 2.1.4
+## ✔ forcats 1.0.0 ✔ stringr 1.5.0
+## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
+## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
+## ✔ purrr 1.0.2
+## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
+## ✖ dplyr::filter() masks stats::filter()
+## ✖ dplyr::lag() masks stats::lag()
+## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
+
+##
+## Attaching package: 'plotly'
+##
+## The following object is masked from 'package:ggplot2':
+##
+## last_plot
+##
+## The following object is masked from 'package:stats':
+##
+## filter
+##
+## The following object is masked from 'package:graphics':
+##
+## layout
+
+## here() starts at /home/arobinson/OHI_repositories/ohiprep_v2023
+
+##
+## Attaching package: 'janitor'
+##
+## The following objects are masked from 'package:stats':
+##
+## chisq.test, fisher.test
+
+## Using poppler version 0.86.1
+
+## This file makes it easier to process data for the OHI global assessment
+## by creating the following objects:
+##
+## * dir_M = identifies correct file path to Mazu (internal server) based on your operating system
+## * mollCRS = the crs code for the mollweide coordinate reference system we use in the global assessment
+## * regions_shape() = function to load global shapefile for land/eez/high seas/antarctica regions
+## * ohi_rasters() = function to load two rasters: global eez regions and ocean region
+## * region_data() = function to load 2 dataframes describing global regions
+## * rgn_syns() = function to load dataframe of region synonyms (used to convert country names to OHI regions)
+## * low_pop() = function to load dataframe of regions with low and no human population
+## * UNgeorgn = function to load dataframe of UN geopolitical designations used to gapfill missing data
+
+Dynamic Data Sources:
+FAO Capture Production Quantity
+The FAO capture production data was downloaded via the fishstat +query portal. This data is updated annually and can be downloaded using +the following instructions.
Navigate to the online +query portal for FAO Global Capture Production Quantity. Deselect +all pre-selected years. Drag these fields into selected rows: ASFIS +species name, FAO major fishing area name, ASFIS species ISSCAP group +name En. ASFIS species Family scientific name, FAO major fishing areas, +Inland/Marine areas Name en. Click on show data and confirm that data is +present for 1950- two years prior to current year. Click download and +select yes to include Null Values.
Static Data Sources:
+ex-vessel price database.
+This database contains ex-vessel price per tonne. It was +downloaded from the environmental markets lab on request. It is possible +that this database may be updated in future years.
Recommended citation: Melnychuk, M. C., Clavelle, T., Owashi, B., +and Strauss, K. 2016. Reconstruction of global ex-vessel prices of +fished species. - ICES Journal of Marine Science. doi:10.1093/icesjms/fsw169.
# OECD Data
+oecd_raw <- read_csv(file.path(dir_M, "/git-annex/globalprep/_raw_data/OECD/d2023/ocean_economy.csv")) %>%
+ select(Country, Year, VARIABLE, Variable, Unit, Value, PowerCode)
## Rows: 124736 Columns: 15
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (9): COUNTRY, Country, VARIABLE, Variable, Unit Code, Unit, PowerCode, F...
+## dbl (4): YEAR, Year, PowerCode Code, Value
+## lgl (2): Reference Period Code, Reference Period
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+#fao commodities data
+#used for aquarium fish
+commodities_value <- read_csv(file.path(dir_M, "git-annex/globalprep/_raw_data/FAO_commodities/d2023/FAO_raw_commodities_value_1976_2021.csv")) %>%
+ janitor::clean_names()
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
+## e.g.:
+## dat <- vroom(...)
+## problems(dat)
+## Rows: 106510 Columns: 51
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (51): Reporting country (Name), Commodity (Name), Trade flow (Name), Uni...
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+#fao capture production data, used for fishing revenue
+fao_capture <- read_csv(file.path(dir_M, "git-annex/globalprep/_raw_data/FAO_capture/d2023/capture_quantity_2023_online_query.csv"))
## Rows: 27854 Columns: 151
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (79): Country Name En, ASFIS species Name En, FAO major fishing area Nam...
+## dbl (72): 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, ...
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+#ex-vessel prices, used for fishing revenue
+exvessel_prices <- read_csv(file.path(dir_M, "git-annex/globalprep/_raw_data/ex-vessel-price-database-updated/price-db-results/exvessel_price_database_1976_2019.csv"))
## New names:
+## Rows: 101921 Columns: 8
+## ── Column specification
+## ──────────────────────────────────────────────────────── Delimiter: "," chr
+## (5): ASFIS_species, scientific_name, pooled_commodity, ISSCAAP_group, IS... dbl
+## (3): ...1, Year, exvessel
+## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
+## Specify the column types or set `show_col_types = FALSE` to quiet this message.
+## • `` -> `...1`
+#FAO aquaculture value
+aquaculture_value <- read_csv(file.path(dir_M, "git-annex/globalprep/_raw_data/FAO_mariculture/d2023/FAO_GlobalAquacultureProduction_Value_1950_2021.csv"))
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
+## e.g.:
+## dat <- vroom(...)
+## problems(dat)
+## Rows: 3809 Columns: 44
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (44): Country (Name), ASFIS species (Name), FAO major fishing area (Name...
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+#fao yearbook pdf tables
+fao_pdf <- pdf_text(file.path(dir_M, "/git-annex/globalprep/_raw_data/FAO/d2023/fao_fishers_table.pdf"))
+
+#whale watching pdf
+whale_pdf <- pdf_text(file.path(dir_M, "/git-annex/globalprep/_raw_data/IFAW_MarineMammalWatching/whale_watching_worldwide.pdf"))
+
+#oww
+oww <- read_csv(file.path(dir_M,"/git-annex/globalprep/_raw_data/OWW/d2023/oww3.csv"))
## Rows: 206449 Columns: 63
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (11): y1, country_code, country_name, y3, isic88, isco88, curr_current, ...
+## dbl (52): y0, y4, hw1wl_current, hw2wl_current, hw3wl_current, hw4wl_current...
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+
+All values for revenue are formatted to US dollars.
+#get the ocean energy revenue
+offshore <- oecd_raw %>%
+ filter(VARIABLE == "RD_TOTAL_OCEAN_OFFSHORE") %>%
+ select(-VARIABLE) %>%
+ clean_names() %>%
+ mutate(sector = "ocean energy",
+ unit = "USD (1)",
+ component = "ocean and offshore energy",
+ data_source = "oecd sustainable ocean economy",
+ value = (value * 1000000)
+ ) %>%
+ select(-c(power_code)) %>%
+ filter(!country %in% c("OECD - Total",
+ "European Union – 27 countries (from 01/02/2020)")) %>%
+ select(country, year, value, unit, component, sector, data_source)
+
+
+write_csv(offshore, here(data_path, "int/offshore_energy_revenue.csv"))
The FAO +global fish Trade Data (Value) is used to estimate revenue from +aquarium fishing. This is the original data source used, but it has been +updated since the livelihoods and economies goal was originally +calculated.
+From the methods:
+approximate revenue from aquarium fishing we used +export data from the FAO Global Commodities database +for ‘Ornamental fish’ for all available years.
used data from two of the four subcategories listed, excluding +the subcategory ‘Fish for culture including ova, fingerlings, etc.’ +because it is not specific to ornamental fish, and the subcategory +‘Ornamental freshwater fish’ because it is not from marine +systems.
#clean up data for ornamental fishing revenue
+ornamental_value <- commodities_value %>%
+ filter(str_detect(commodity_name, "Ornamental")) %>%
+ rename(country = reporting_country_name,
+ commodity = commodity_name,
+ trade = trade_flow_name) %>%
+ pivot_longer(cols = -c(country, commodity, trade, unit, unit_name),
+ names_to = "year", values_to = "value") %>%
+ mutate(year = str_remove(year, "x")) %>%
+ filter(trade == "Exports" & commodity != "Ornamental freshwater fish") %>%
+ fao_clean_data_new() %>%
+ mutate(value = (value * 1000), #convert to single dollar
+ unit = "USD (1)")
+
+#sum for each country
+ornamental_revenue <-
+ ornamental_value %>%
+ group_by(country, year) %>%
+ summarize(value = sum(value, na.rm = TRUE), unit = first(unit)) %>%
+ mutate(component = "revenue",
+ sector = "aquarium fishing",
+ data_source = "FAO Global Trade") %>%
+ select(country, year, value, unit, component, sector, data_source)
## `summarise()` has grouped output by 'country'. You can override using the
+## `.groups` argument.
+
+FAO capture data is combined with ex-vessel price data to estimate +total revenue from fishing for each country.
+Here we do some basic cleanup of the global capture production +database.
+fao_latest_data_year <- 2021 #update to the last year available in FAO data
+
+fao_capture_clean <- fao_capture %>%
+ dplyr::rename(country = "Country Name En",
+ asfis_species = "ASFIS species Name En",
+ area = "FAO major fishing area Name En",
+ area_type = "Inland/Marine areas Name En",
+ isscaap_group = "ISSCAAP group Name En",
+ family_scientific = "Family Scientific name")
+
+fao_capture_clean <- fao_capture_clean %>%
+ fao_online_portal_clean(last_data_year = fao_latest_data_year) %>% #function to clean fao data
+ mutate(FAO_name = ifelse(!is.na(asfis_species), asfis_species, family_scientific)) %>%
+ filter(area_type == "Marine areas") %>% #we only want marine capture
+ mutate(year = as.numeric(year)) %>%
+ filter(year >= 1976 & year <= 2019) %>% #filter to years in the price database
+ filter(!is.na(value) & value > 0) # we do not need the nas or 0, only want the ones with value for tonnes
Next we’ll clean up and gapfill the exvessel price data. This has +exvessel price in usd/metric tonne.
+exvessel_prices_clean <- exvessel_prices %>%
+ clean_names() %>%
+ mutate(year = as.numeric(year)) %>%
+ group_by(asfis_species, scientific_name, pooled_commodity, isscaap_group, isscaap_division, year) %>%
+ summarize(exvessel = mean(exvessel, na.rm = TRUE)) %>%
+ filter(!(is.na(year) & is.na(exvessel))) %>%
+ #there are some duplicate prices for a species- average when this is the case
+ dplyr::group_by(asfis_species) %>%
+ #counts the numbers of non-missing values for each country (logical TRUEs regarded as one)
+ dplyr::mutate(value_num = sum(!is.na(exvessel))) %>%
+ filter(value_num > 0) #the minimum we have is 8 years
## `summarise()` has grouped output by 'asfis_species', 'scientific_name',
+## 'pooled_commodity', 'isscaap_group', 'isscaap_division'. You can override using
+## the `.groups` argument.
+#look at the data to see how price and year are related overall
+model <- lm(exvessel~year, data = exvessel_prices_clean)
+summary(model)
##
+## Call:
+## lm(formula = exvessel ~ year, data = exvessel_prices_clean)
+##
+## Residuals:
+## Min 1Q Median 3Q Max
+## -2473 -1093 -505 307 34309
+##
+## Coefficients:
+## Estimate Std. Error t value Pr(>|t|)
+## (Intercept) -7.469e+04 1.206e+03 -61.94 <2e-16 ***
+## year 3.832e+01 6.035e-01 63.49 <2e-16 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## Residual standard error: 2099 on 76174 degrees of freedom
+## (2144 observations deleted due to missingness)
+## Multiple R-squared: 0.05026, Adjusted R-squared: 0.05025
+## F-statistic: 4031 on 1 and 76174 DF, p-value: < 2.2e-16
+#very significant though does not explain a high % of the variation
+
+#look at linear model by individual species
+
+#define all of the unique species
+species <- unique(exvessel_prices_clean$asfis_species)
+
+#take a look at it at the individual species level
+model_list <- list()
+
+#loop through all of the unique species and run the model, store the results in a list
+for (i in seq_along(species)) {
+
+ species_current <- species[i]
+
+ new <- exvessel_prices_clean %>%
+ filter(asfis_species == species_current)
+
+ model <- lm(exvessel~year, data = new)
+ summary <-summary(model)
+ coefficients <- summary$coefficients %>%
+ as.data.frame()
+ model_list[[i]] <-coefficients
+}
+
+model_test <- bind_rows(model_list)
+
+# Filter rows based on row names containing "year", only want significance of year
+filtered_data <- model_test[grep("year", rownames(model_test)), ] %>%
+ clean_names()
+not_sig <- filtered_data %>% filter(pr_t > 0.05)
+
+#roughly 85% of these have a significant relationship, appears to be a reasonable gapfilling method
+
+#use lm to predict the value for years which are missing data
+#based on code from ao_need data prep
+price_model <- function(df) {lm(exvessel~year, data = df)}
+
+exvessel_prices_gapfilled <- exvessel_prices_clean %>%
+ dplyr::group_by(asfis_species) %>%
+ tidyr::nest() %>%
+ dplyr::mutate(
+ ### Apply the model to all country groupings
+ model = purrr::map(data, price_model),
+ #Use the trained model to get predicted values
+ predictions = purrr::map2(data, model, add_predictions)) %>%
+ tidyr::unnest(cols = c(predictions)) %>%
+ dplyr::select(-data, -model, prediction = pred) %>%
+ dplyr::mutate(
+ gapfilled = dplyr::case_when(is.na(exvessel) | value_num == 1 ~ 1, T ~ 0),
+ exvessel = dplyr::case_when(is.na(exvessel) ~ prediction, T ~ exvessel),
+ method = dplyr::case_when(
+ value_num == 1 ~ "gapfilled using one year of data",
+ gapfilled == 1 & value_num > 1 ~ paste0("lm based on N years data: ", value_num),
+ T ~ as.character(NA))) %>%
+ dplyr::ungroup() %>%
+ filter(exvessel > 0) #remove price less than 0. This doesn't make sense and never occurs after 1998 anyway so would not be used to calculate any scores.
fao_capture_price <- fao_capture_clean %>%
+ left_join(exvessel_prices_gapfilled, by = c("asfis_species", "year", "isscaap_group")) %>%
+ select(-prediction)
+
+#8% of observations still have NAS
+summary(fao_capture_price)
## country asfis_species area family_scientific
+## Length:394382 Length:394382 Length:394382 Length:394382
+## Class :character Class :character Class :character Class :character
+## Mode :character Mode :character Mode :character Mode :character
+##
+##
+##
+##
+## isscaap_group area_type Unit Name year
+## Length:394382 Length:394382 Length:394382 Min. :1976
+## Class :character Class :character Class :character 1st Qu.:1990
+## Mode :character Mode :character Mode :character Median :2002
+## Mean :2001
+## 3rd Qu.:2011
+## Max. :2019
+##
+## value FAO_name scientific_name pooled_commodity
+## Min. : 0 Length:394382 Length:394382 Length:394382
+## 1st Qu.: 18 Class :character Class :character Class :character
+## Median : 201 Mode :character Mode :character Mode :character
+## Mean : 8783
+## 3rd Qu.: 1741
+## Max. :9800223
+##
+## isscaap_division exvessel value_num gapfilled
+## Length:394382 Min. : 14.16 Min. : 8.00 Min. :0.00
+## Class :character 1st Qu.: 758.74 1st Qu.:44.00 1st Qu.:0.00
+## Mode :character Median : 1433.05 Median :44.00 Median :0.00
+## Mean : 2018.30 Mean :42.66 Mean :0.02
+## 3rd Qu.: 2520.06 3rd Qu.:44.00 3rd Qu.:0.00
+## Max. :36798.14 Max. :44.00 Max. :1.00
+## NA's :33494 NA's :33494 NA's :33494
+## method
+## Length:394382
+## Class :character
+## Mode :character
+##
+##
+##
+##
+#if still NA fill with average price for species group for that year
+
+#find the average price for each group/year
+group_average_price_year <- exvessel_prices_gapfilled %>%
+ group_by(isscaap_group,year) %>%
+ summarize(mean_group_price = mean(exvessel))
## `summarise()` has grouped output by 'isscaap_group'. You can override using the
+## `.groups` argument.
+fao_capture_price_final <-
+ fao_capture_price %>%
+ left_join(group_average_price_year, by = c("isscaap_group", "year")) %>%
+ mutate(final_price = ifelse(!is.na(exvessel), exvessel, mean_group_price)) %>% mutate(gapfilled = ifelse((is.na(exvessel) & !is.na(final_price)), 1, gapfilled)) %>%
+ mutate(method = ifelse((is.na(exvessel) & !is.na(final_price)), "filled based on iscaap group average", method))
+
+summary(fao_capture_price_final)
## country asfis_species area family_scientific
+## Length:394382 Length:394382 Length:394382 Length:394382
+## Class :character Class :character Class :character Class :character
+## Mode :character Mode :character Mode :character Mode :character
+##
+##
+##
+##
+## isscaap_group area_type Unit Name year
+## Length:394382 Length:394382 Length:394382 Min. :1976
+## Class :character Class :character Class :character 1st Qu.:1990
+## Mode :character Mode :character Mode :character Median :2002
+## Mean :2001
+## 3rd Qu.:2011
+## Max. :2019
+##
+## value FAO_name scientific_name pooled_commodity
+## Min. : 0 Length:394382 Length:394382 Length:394382
+## 1st Qu.: 18 Class :character Class :character Class :character
+## Median : 201 Mode :character Mode :character Mode :character
+## Mean : 8783
+## 3rd Qu.: 1741
+## Max. :9800223
+##
+## isscaap_division exvessel value_num gapfilled
+## Length:394382 Min. : 14.16 Min. : 8.00 Min. :0.000
+## Class :character 1st Qu.: 758.74 1st Qu.:44.00 1st Qu.:0.000
+## Mode :character Median : 1433.05 Median :44.00 Median :0.000
+## Mean : 2018.30 Mean :42.66 Mean :0.077
+## 3rd Qu.: 2520.06 3rd Qu.:44.00 3rd Qu.:0.000
+## Max. :36798.14 Max. :44.00 Max. :1.000
+## NA's :33494 NA's :33494 NA's :12134
+## method mean_group_price final_price
+## Length:394382 Min. : 27.76 Min. : 14.16
+## Class :character 1st Qu.: 1005.63 1st Qu.: 770.98
+## Mode :character Median : 1705.12 Median : 1433.05
+## Mean : 1988.46 Mean : 1997.54
+## 3rd Qu.: 2430.76 3rd Qu.: 2500.07
+## Max. :20579.98 Max. :36798.14
+## NA's :12134 NA's :12134
+#there are still some NAS here, 3%
+nas <- fao_capture_price_final %>%
+ filter(is.na(final_price))
+unique(nas$isscaap_group)
## [1] "Corals"
+## [2] "Miscellaneous aquatic plants"
+## [3] "Blue-whales, fin-whales"
+## [4] "Sperm-whales, pilot-whales"
+## [5] "Brown seaweeds"
+## [6] "Red seaweeds"
+## [7] "Green seaweeds"
+## [8] "Pearls, mother-of-pearl, shells"
+## [9] "Sponges"
+## [10] "Turtles"
+## [11] "Eared seals, hair seals, walruses"
+## [12] "Sea-squirts and other tunicates"
+## [13] "Horseshoe crabs and other arachnoids"
+## [14] "Frogs and other amphibians"
+#we can either just leave this as no money-or we could do an average species price for that year
+fao_capture_price_final <- fao_capture_price_final %>%
+ select(country, year, value, asfis_species, final_price, gapfilled, method) %>% mutate(revenue = final_price * value) # multiply price per tonne by number of tonnes
+
+summary(fao_capture_price_final)
## country year value asfis_species
+## Length:394382 Min. :1976 Min. : 0 Length:394382
+## Class :character 1st Qu.:1990 1st Qu.: 18 Class :character
+## Mode :character Median :2002 Median : 201 Mode :character
+## Mean :2001 Mean : 8783
+## 3rd Qu.:2011 3rd Qu.: 1741
+## Max. :2019 Max. :9800223
+##
+## final_price gapfilled method revenue
+## Min. : 14.16 Min. :0.000 Length:394382 Min. :1.000e+00
+## 1st Qu.: 770.98 1st Qu.:0.000 Class :character 1st Qu.:2.676e+04
+## Median : 1433.05 Median :0.000 Mode :character Median :2.924e+05
+## Mean : 1997.54 Mean :0.077 Mean :9.955e+06
+## 3rd Qu.: 2500.07 3rd Qu.:0.000 3rd Qu.:2.434e+06
+## Max. :36798.14 Max. :1.000 Max. :1.709e+10
+## NA's :12134 NA's :12134 NA's :12134
+#sum prices by country for each year
+fishing_revenue <- fao_capture_price_final %>%
+ group_by(country, year) %>%
+ summarize(value = sum(revenue, na.rm = TRUE)) %>%
+ mutate(unit = "USD (1)",
+ component = "revenue",
+ sector = "fishing",
+ data = "FAO capture production and ex-vessel prices")
## `summarise()` has grouped output by 'country'. You can override using the
+## `.groups` argument.
+
+#clean the aquaculture value data
+#this will also be used in the mariculture jobs data section, so need to keep in all environments in the first step
+aquaculture_value_clean <- aquaculture_value %>%
+ dplyr::rename(country = "Country (Name)",
+ species = "ASFIS species (Name)",
+ area = "FAO major fishing area (Name)",
+ environment = "Environment (Name)") %>%
+ dplyr::select(-c("Unit (Name)")) %>% #added removing Unit also
+ dplyr::rename_with(~ base::gsub("\\[", "", .)) %>%
+ dplyr::rename_with(~ base::gsub("\\]", "", .)) %>%
+ pivot_longer(cols = "1984":"2021", names_to = "year", values_to = "value") %>%
+ fao_clean_data_new()
+
+ mariculture_revenue <- aquaculture_value_clean %>%
+ filter(environment %in% c("Brackishwater", "Marine")) %>%
+ mutate(unit = "USD(1)",
+ value = value *1000) #multiply value times 1000 to get dollars
+
+#sum by region and year (we need total revenue)
+total_mar_rev <- mariculture_revenue %>%
+ group_by(country, year) %>%
+ summarize(value = sum(value, na.rm = T), unit = first(Unit)) %>%
+ mutate(sector = "mariculture", component = "revenue", data_source = "FAO aquaculture value") %>%
+ select(country, year, unit, component, sector, data_source)
## `summarise()` has grouped output by 'country'. You can override using the
+## `.groups` argument.
+
+The whale watching revenue is stored in a separate table for each +country throughout the paper. We will create function to extract each of +the tables and then combine them.
+Key takeaways from the OHI methods (7.19.0.4 Marine mammal watching +and 7.61.1.3 Marine mammal watching) to guide cleaning
+total expenditures are used as a close proxy for total +revenue.
We used total expenditure data (direct and indirect expenditures) +to avoid using a literature derived multiplier effect.
When IFAW reported “minimal” revenue from whale watching, we +converted this description to a 0 for lack of additional +information.
For countries with both marine and freshwater cetacean viewing, +we adjusted by the proportion of marine revenue as described for the +jobs dataset.
+First we will extract all the tables with summary of country results. +This will mainly be used for the total number of whale watchers-which is +part of the jobs calculation, but we will extract it now to get the list +of countries included in the paper.
+#number of whale watchers per country
+
+# there are 7 different tables (one per larger region) in the paper with this information, need to extract all of them and combine
+
+page <- 199
+slice_set <- 7:10
+region <- "North America"
+new_name <- "na_data"
+#make a function to extract each one
+whale_pdf_convert_1 <- function(page, slice_set, region, new_name){
+
+ columns <- c("country", "1998", "2008", "growth") #create columns
+
+ final_data <- whale_pdf[page] %>% #call the entire page
+ as.data.frame() %>%
+ separate_rows('.', sep = "\n") %>% #split into rows based on page breaks
+ rename(main_column = ".") %>%
+ slice(slice_set) %>% #input rows to select
+ mutate(main_column = str_trim(main_column)) %>%
+ separate(main_column, into = columns, sep = " {2,}", extra = "merge") %>% #split into columns when there are two or more spaces
+ select(-c("growth")) %>%
+ pivot_longer(cols = c("1998", "2008"), names_to = "year") %>%
+ mutate(value = case_when(value == "None" ~ "0",
+ value == "Minimal"~ "0",
+ T ~ value)) %>%
+ mutate(value = str_remove_all(value, ",")) %>% #remove commas
+ mutate(value = str_remove(value, " ")) %>% #remove spaces between numbers
+ mutate(value = ifelse(value == "N/A", NA, value)) %>%
+ mutate(value = as.numeric(value)) %>%
+ mutate(region = region)
+
+ assign(new_name, final_data, envir = .GlobalEnv)
+}
+
+#africa and middle east
+whale_pdf_convert_1(42, 5:26, "Africa and Middle East", "africa_data")
## Warning: Expected 4 pieces. Missing pieces filled with `NA` in 1 rows [13].
+#europe
+whale_pdf_convert_1(82, 5:26, "Europe", "europe_data")
+
+#asia
+whale_pdf_convert_1(121, 7:25, "Asia", "asia_data")
+
+#oceania
+whale_pdf_convert_1(157, 5:22, "Oceania, Pacific Islands and Antarctica", "oceania_data")
## Warning: Expected 4 pieces. Missing pieces filled with `NA` in 1 rows [6].
+oceania_data <- oceania_data %>%
+ filter(country != "Micronesia") %>%
+ mutate(country = str_replace(country, "Federated States of", "Federated States of Micronesia"))
+
+#north america
+whale_pdf_convert_1(199, 7:10, "North America", "na_data")
+
+#mexico had a subscript that got combined with value, replace manually
+na_data <- na_data %>%
+ mutate(value = ifelse(country == "México" & year == "2008", 190184, value))
+
+#north america
+whale_pdf_convert_1(238, 5:27, "Central America and Caribbean", "ca_data")
+
+#north america
+whale_pdf_convert_1(269, 17:27, "South America", "sa_data")
+
+#now that we have all of the available countries, combine into one data frame
+whale_watching_numbers <- rbind(africa_data, europe_data, asia_data, oceania_data, na_data, ca_data, sa_data)
For whale watching revenue each of the countries included in the +table have their own report. We’ll use the list of countries from the +previous step to locate and convert all of these tables.
+#country <- country_list[15] # for testing
+#region <- region_list[15]
+
+#test <- whale_extract(country,region)
+
+#create a function to find all of the country tables with revenue and extract
+whale_extract <- function(country, region){
+
+#the summary table has some of the countries written a bit differently
+#this replaces them with the way they are written in the separate tables when needed
+replacements <- c("The Gambia" = "Gambia, The",
+ "Portugal – Madeira Archipegalo" = "Portugal ‐ Madeira Archipelago",
+ "China ‐ Mainland" = "China (Mainland)",
+ "Myanmar" = "Myanmar (Burma)",
+ "Guadeloupe and islands" = "Guadeloupe and islands (including St. Martin and St. Barthélemy)",
+ "Falkland Islands" = "Falkland Islands (Las Malvinas)",
+ "Georgia, Ukraine – Black Sea" = "Georgia, Ukraine and Russia – Black Sea",
+ "St. Vincent and the Grenadines" = "St Vincent and the Grenadines",
+ "Netherlands Antilles" = "Netherlands Antilles – Aruba, Bonaire, Curaçao and St. Maarten")
+
+country_new <- str_replace_all(country, replacements)
+
+# Escape special characters in the country name
+country_new <- str_replace_all(country_new, "[\\(\\)]", "\\\\\\0") #remove special characters
+
+
+match_character <- paste0(country_new, "\n") # add /n to help find the locations
+
+matches <- gregexpr(match_character, whale_pdf) #search the pdf for the country
+
+#set each item to true or false depending on if it was a match
+matches_good <- sapply(matches, function(match) any(match != -1))
+
+#get the page number with the table (where match is true)
+page_number_original <- which(matches_good)
+
+#if there is more than one match use another method to narrow it down
+if(length(page_number_original) >1) {
+ updated_match <- paste(country_new, "Year")
+ #search again
+ page_search <- whale_pdf[page_number_original]
+
+ squish <- str_squish(page_search)
+
+ matches <- gregexpr(updated_match, squish)
+ matches_good <- sapply(matches, function(match) any(match != -1))
+
+#get the page number with the table
+page_number_new <- which(matches_good)
+
+#update the page number to just the correct one
+page_number_original <- page_number_original[page_number_new]
+
+#if it is still giving us more than one possible page, select the one we want by page (based on looking at the pdf)
+if (length(page_number_original) > 1){
+
+page_number_original <- subset(page_number_original, page_number_original %in% c("118", "114", "160", "162", "194"))
+
+}
+
+}
+
+#subset to the page with the table
+page <- whale_pdf[page_number_original]
+
+#set columns
+column_jobs <- c("Year", "Number of whale watchers", "AAGR", "Number of operators", "Direct expenditure", "Indirect expenditure", "Total Expenditure")
+
+#turn the page into a table
+data <- page %>%
+ as.data.frame() %>%
+ separate_rows('.', sep = "\n") %>%
+ rename(main_column = ".") %>%
+ mutate(main_column = str_trim(main_column)) %>%
+ separate(main_column, into = column_jobs, sep = " {2,}", extra = "merge")
+
+# Find the index of the row with value of country
+index <- which(data$Year == country_new)
+if(length(index) == 0){
+ index <- 1
+}
+
+#filter the page to just the table
+data_clean <- data %>%
+ slice((index + 1):n()) %>% #remove any other tables above
+ filter(Year %in% c("1991", "1994", "1998", "2003", "2004", "2005", "2006", "2007", "2008")) %>%
+ mutate(first_2008_index = min(which(Year == 2008, arr.ind = TRUE))) %>%
+ filter(ifelse(is.na(first_2008_index), TRUE, row_number() <= first_2008_index)) %>%
+ select(-first_2008_index) %>%
+ clean_names() %>%
+ mutate_all(~ str_replace_all(., ",", "")) %>%
+ mutate_all(~ str_replace_all(., "\\$", "")) %>%
+ mutate_all(~ str_replace_all(., "None", "0")) %>% #set none to zero
+ mutate_all(~ str_replace_all(., "Minimal", "0")) %>% #set minimal to zero based on methods
+ mutate(country = country,
+ region = region,
+ page = page_number_original)
+#add a column with the country and one with page number (used to get other tables later on)
+return(data_clean)
+}
+
+#identify the countries based on data we created earlier
+#118 countries
+country_regions <- whale_watching_numbers %>%
+ group_by(country, region) %>%
+ summarize(n=n())
## `summarise()` has grouped output by 'country'. You can override using the
+## `.groups` argument.
+country_list <- country_regions$country
+region_list <- country_regions$region
+
+
+whale_list <- list()
+for (i in seq_along(country_list)) {
+
+ country <- country_list[i]
+ region <- region_list[i]
+ data <- whale_extract(country,region)
+
+ whale_list[[i]] <- data
+
+}
## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 35 rows [1, 3, 4, 10, 11,
+## 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 32 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 42 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 3, 4, 10, 11,
+## 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 34 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 41 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 33 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 43 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 34 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 34 rows [1, 2, 3, 4, 6,
+## 7, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 37 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 21 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 41 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 11, 12, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 2, 4, 5, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 41 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 42 rows [1, 3, 4, 10, 11,
+## 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 2, 3, 4, 6,
+## 7, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 33 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 2, 3, 4, 6,
+## 7, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 33 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 27, 28, 29, 30, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 17 rows [1, 2, 3, 5, 6,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 28 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 42 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 42 rows [1, 3, 4, 10, 11,
+## 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 41 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Expected 7 pieces. Missing pieces filled with `NA` in 41 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 21 rows [1, 2, 3, 4, 9,
+## 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 33 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 10, 11, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 35 rows [1, 2, 3, 5, 6,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 33 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 32 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 37 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 33 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 30 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 2, 4, 5, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 2, 3, 4, 5,
+## 6, 7, 9, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 37 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 20 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 36 rows [1, 2, 3, 4, 5,
+## 6, 7, 9, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 25 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 13, 14, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 28 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 11, 12, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 42 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 11, 12, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 4, 5, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 37 rows [1, 2, 3, 4, 6,
+## 7, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 3, 4, 10, 11,
+## 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 31 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 10, 11,
+## 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 37 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 34 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 13 rows [1, 3, 4, 10, 11,
+## 12, 13, 14, 15, 16, 17, 18, 19].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 41 rows [1, 3, 4, 10, 11,
+## 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 43 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 31 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 35 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 36 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 13, 14, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 8, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 11, 12, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 37 rows [1, 2, 3, 4, 5,
+## 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 20 rows [1, 2, 3, 5, 6,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 21 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 35 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 41 rows [1, 2, 3, 4, 5,
+## 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 41 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 31 rows [1, 2, 3, 5, 6,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 32 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 6,
+## 7, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 6,
+## 7, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 39 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 35 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 2, 3, 4, 5,
+## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 40 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 15 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20].
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 35 rows [1, 2, 3, 5, 6,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 38 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+## Warning: There was 1 warning in `mutate()`.
+## ℹ In argument: `first_2008_index = min(which(Year == 2008, arr.ind = TRUE))`.
+## Caused by warning in `min()`:
+## ! no non-missing arguments to min; returning Inf
+## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 37 rows [1, 3, 4, 9, 10,
+## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...].
+#combine all of the individual tables into one and clean
+whale_revenue_data <- bind_rows(whale_list) %>%
+ mutate_all(~ str_replace_all(., "N/A", NA_character_)) %>%
+ mutate(number_of_whale_watchers = as.numeric(str_extract(number_of_whale_watchers, "^[0-9]+")),
+ aagr = str_extract(aagr, "[-‐]?[0-9]+\\.*[0-9]*"),
+ total_expenditure = as.numeric(str_extract(total_expenditure, "^[0-9]+"))) %>%
+ mutate(direct_expenditure = as.numeric(direct_expenditure),
+ total_expenditure = as.numeric(total_expenditure),
+ indirect_expenditure = as.numeric(indirect_expenditure),
+ number_of_whale_watchers = as.numeric(number_of_whale_watchers)) %>%
+ mutate(page = as.numeric(page))
We now need to subset the revenue to only revenue related to +freshwater species. The first step is identifying all of the main +species used for each country. These are also stored in various tables +scattered throughout the pdf.
+#testing
+page <- 135
+country <- "Japan"
+
+key_species_country <- function(page, country){
+ #the first few countries were split into multiple regions and had to be processed separately
+ if (country %in% c("New Zealand")){
+ filtered_data <- data.frame(species = c("Bryde’s whale",
+ "bottlenose dolphin",
+ "short‐beaked common dolphin",
+ "orca",
+ "sperm whale",
+ "bottlenose dolphin",
+ "dusky dolphin",
+ "Hector’s dolphin")) %>%
+ mutate(country = country)
+ } else if (country %in% c("Japan", "Canada", "Australia", "USA")){
+
+ start_page <- page
+ list <- sort(species_country$page)
+
+ end_page <- which(list == page)
+
+ end_page <- list[end_page+1]
+
+ next_country <- species_country %>% filter(page == end_page)
+ next_country <- next_country$country
+
+ pages_search <- whale_pdf[start_page:end_page]
+
+ whale_data <- pages_search %>%
+ as.data.frame() %>% #turn into data frame
+ separate_rows('.', sep = "\n") %>%
+ rename(main_column = ".")
+
+ end_row <- which(whale_data$main_column == next_country) -1
+
+ whale_data <- whale_data %>%
+ slice(1:end_row) %>% mutate(country = country)
+
+ #filter to just rows that have main speicies
+ index <- which(str_detect(whale_data$main_column, "Main species"))
+
+ temp_list <- list()
+ for (i in seq_along(index)){
+ start_row <- index[i] + 1
+ data_subset <- whale_data[(start_row):nrow(whale_data), ]
+ end_row <- which(str_detect(data_subset$main_column,"Tourists:"))[1] -1
+ # Filtering the data frame to include rows between 'start_row + 1' and 'end_row'
+filtered_data <- data_subset %>%
+ slice(1:end_row)
+
+filtered_data <- filtered_data %>%
+ filter(!str_detect(main_column, "Small cetaceans:")) %>%
+ mutate(main_column = str_squish(main_column)) %>%
+ rename(species = main_column) %>%
+ mutate(country = country) %>%
+ separate_rows(species, sep = ",") %>%
+ mutate(species = str_squish(species)) %>%
+ filter(!(species == ""))
+ temp_list[[i]] <-filtered_data
+ }
+ filtered_data <- bind_rows(temp_list) %>%
+ mutate(
+ species = ifelse(species == "false", paste("false", lead(species)), species)
+ ) %>%
+ filter(species != "false" | lead(species) != "false") %>%
+ filter(species!= "Small Cetaceans:")
+ } else if (country %in% c("Kenya")){
+ filtered_data <- data.frame(species = "Indo‐Pacific bottlenose dolphin",
+ country = country)
+ #these countries are included but don't have formal whale watching
+ } else if (country %in% c("Morocco", "Senegal", "Seychelles", "Turkey")){
+ filtered_data <- data.frame(species = NA,
+ country = country)
+ } else{ # the rest of the countries can be processed in the same manner
+ #subset the whale pdf to the page and the two below
+ pages_search <-whale_pdf[page:(page+2)]
+
+ whale_data <- pages_search %>%
+ as.data.frame() %>% #turn into data frame
+ separate_rows('.', sep = "\n") %>%
+ rename(main_column = ".")
+
+ #filter to just rows below where the country name is
+ index <- which(whale_data$main_column == country)
+ if (length(index) <1) {
+ index <-1
+ }
+
+ filtered_data <- whale_data[(index):nrow(whale_data), ]
+
+ #Find the row index where the condition is met
+start_row <- which(str_detect(filtered_data$main_column,"Main species:"))[1] +1
+
+# Calculate the ending row index
+end_row <- which(str_detect(filtered_data$main_column,"Tourists:"))[1] -1
+
+# Filtering the data frame to include rows between 'start_row + 1' and 'end_row'
+filtered_data <- filtered_data[(start_row):end_row, ]
+
+filtered_data <- filtered_data %>%
+ filter(!str_detect(main_column, "Small cetaceans:")) %>%
+ mutate(main_column = str_squish(main_column)) %>%
+ rename(species = main_column) %>%
+ mutate(country = country) %>%
+ separate_rows(species, sep = ",") %>%
+ mutate(species = str_squish(species)) %>%
+ filter(!(species == ""))
+
+}
+
+return(filtered_data)
+
+
+}
+
+#make a data frame of unique country and page number combos
+species_country <- whale_revenue_data %>%
+ group_by(country,page) %>%
+ summarize(n =n())
## `summarise()` has grouped output by 'country'. You can override using the
+## `.groups` argument.
+#loop through all of the countries and the page numbers we located earlier
+species_list <- list()
+for (i in seq_along(species_country$country)) {
+
+ country <- species_country$country[i]
+ page <- species_country$page[i]
+ data <- key_species_country(page, country)
+
+ species_list[[i]] <- data
+
+}
+
+#in the description instead of table, need to add separataly
+pakistan <- data.frame(country = "Pakistan", species = "Indus River dolphin")
+
+species_final <- bind_rows(species_list) %>%
+ mutate(
+ species = case_when(
+ lead(species) == "dolphin" ~ paste(species, "dolphin"),
+ lead(species) == "headed" ~ paste(species, "headed"),
+ lead(species) == "whale" ~ paste(species, "whale"),
+ lead(species) == "whale)" ~ paste(species, "whale)"),
+ lag(species) == "southern" ~ paste("southern", species),
+ lag(species) == "short‐" ~ paste0("short‐", species),
+ lag(species) == "beaked" ~ paste0("beaked", species),
+ lag(species) == "false" ~ paste0("false", species),
+ lag(species) == "gray" ~ paste("gray", species),
+ lag(species) == "Atlantic" ~ paste("Atlantic", species),
+ lag(species) == "melon-" ~ paste("melon-", species),
+ lag(species) == "long‐finned" ~ paste("long‐finned", species),
+ lag(species) == "Indo‐Pacific" ~ paste("Indo‐Pacific", species),
+ lag(species) == "short‐beaked" ~ paste("short‐beaked", species),
+ lag(species) == "short‐finned" ~ paste("short‐finned", species),
+ lag(species) == "harbour" ~ paste("harbour", species),
+ lag(species) == "various" ~ paste("various", species),
+ lag(species) == "Pacific" ~ paste("pacific", species),
+ lag(species) == "Dall’s" ~ paste("dall’s", species),
+ TRUE ~ species
+ )) %>%
+ filter(!species %in% c("dolphin", "southern", "short‐", "beaked", "short‐beaked", "false", "gray", "whale", "Atlantic", "melon‐", "species", "others occasionally sighted", "as the Chinese white dolphin)",
+"None", "101", "long‐finned", "headed whale", "Indo‐Pacific", "Small cetaceans","harbour",
+"short‐finned", "various", "Pacific", "Dall’s",
+"whale)")) %>%
+ filter(country != "Pakistan") %>%
+ bind_rows(pakistan) %>%
+ mutate(species = str_remove(species, "known locally")) %>%
+ mutate(species = str_to_lower(species)) %>%
+ mutate(species = case_when(species == "risso’s dolphin dolphin" ~ "risso's dolphin",
+ T~species))
Now that we have a list of all the main species for each country, +we’ll match as many of these species to the IUCN red-list and query the +IUCN api for habitat information.
+#check the iucn redlist for species matches
+
+#convert some species to match the iucn redlist
+species_final <- species_final %>%
+ mutate(species = str_remove(species, "‐")) %>%
+ mutate(species = str_remove(species, "[-’']")) %>%
+ mutate(species = str_remove(species, "’")) %>%
+ mutate(species = str_remove(species, "'")) %>%
+ mutate(species = str_remove_all(species, "[()]"))%>%
+ mutate(species = str_remove(species, "–")) %>%
+ mutate(species = str_squish(species)) %>%
+ mutate(species = str_replace(species, "dolphins", "dolphin")) %>%
+ mutate(species = case_when(species == "orca"~ "killer whale",
+ species == "bottlenose dolphin" ~ "common bottlenose dolphin",
+ species == "pantropical spotted dolphins" ~ "pantropical spotted dolphins",
+ species == "sperm whale (north west of crete)" ~ "sperm whale",
+ species == "falsekiller whale" ~ "false killer whale",
+ species %in% c("rissos dophin", "rissos dolphin dolphin") ~ "rissos dolphin",
+ species %in% c("longbeaked common dolphin",
+ "shortbeaked common dolphin") ~ "common dolphin",
+ species == "dwarf or pygmy sperm whale" ~ "pygmy sperm whale",
+ species == "beakedwhales especially blainvilles." ~ "blainvilles beaked whale",
+ species %in% c("minke whale", "minke whales",
+ "dwarf minke whale subspecies of minke whale" ) ~ "common minke whale",
+ species == "sperm whale north west of crete"~ "sperm whale",
+ species == "amazon bolivian river dolphin" ~ "amazon river dolphin", species == "beluga" ~ "beluga whale",
+ species == "humpback" ~ "humpback whale",
+ (species == "finless porpoise" & country == "Bangladesh")~ "indopacific finless porpoise",
+ (species == "finless porpoise" & country == "Japan")~ "narrowridged finless porpoise",
+ species == "sperm whale occasional" ~ "sperm whale",
+
+ TRUE ~species))
+
+#read in the list of all the species IUCN has
+iucn_available <- read_csv(here("globalprep/ico/v2023/raw/spp_list_from_api.csv")) %>%
+ filter(class == "MAMMALIA")
## Rows: 153732 Columns: 13
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (12): region_identifier, kingdom, phylum, class, order, family, genus, s...
+## dbl (1): iucn_sid
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+#create a new column join_name, where we modify the list to match our species
+iucn_available <- iucn_available %>%
+ filter(!is.na(main_common)) %>%
+ distinct(sciname, main_common) %>%
+ mutate(join_name = str_remove(main_common, "-")) %>%
+ mutate(join_name = str_to_lower(join_name)) %>%
+ mutate(join_name = str_remove(join_name, "'")) %>%
+ mutate(join_name = str_remove(join_name, "’")) %>%
+ mutate(join_name = str_remove_all(join_name, "[()]")) %>%
+ distinct(sciname, join_name, .keep_all = T)
+
+
+#check which species we can't get from iucn
+setdiff(species_final$species, iucn_available$join_name)
## [1] "various dolphin" "tucuxi marine"
+## [3] "various tropical dolphin" "dolphin various species"
+## [5] NA "marine tucuxi"
+#according to the paper, all the species with no match are associated with marine watching
+#various dolphin- marine
+#tucuxi marine & marine tucuxi -marine
+#various tropical dolphin -marine
+#dolphin various species - marine
+
+
+#add the iucn data to the final species list
+species_iucn <- species_final %>%
+ left_join(iucn_available, by = c("species" = "join_name")) %>%
+ filter(!is.na(species) & !is.na(sciname))
+
+query_list <- unique(species_iucn$sciname)
+
+#query the iucn redlist for habitat information
+
+# Loop through each species and query the API for the habitats for these species
+
+results_list <- list()
+api_file <- "/home/shares/ohi/git-annex/globalprep/ico/api_key_gc.csv"
+api_key <- scan(api_file, what = 'character')
+for (species_name in query_list) {
+ # Construct the API endpoint URL for the current species
+ spp_page_url <- paste0('https://apiv3.iucnredlist.org/api/v3/habitats/species/name/',
+ URLencode(species_name), '?token=', api_key)
+
+ # Make the API request and parse the JSON response
+ species_data <-jsonlite::fromJSON(spp_page_url) %>% as_tibble()
+
+ #add back to the list
+ results_list[[species_name]] <- species_data
+ }
+
+
+#turn into a single dataframe
+species_results <- bind_rows(results_list) %>%
+ unnest(col = result)
+
+#check if any of them are found in freshwater habitats
+
+potentially_freshwater <- species_results %>%
+ filter(str_detect(habitat,"Wetlands")| str_detect(habitat, "Artificial/Aquatic"))
+
+freshwater_species <- species_iucn %>%
+ left_join(potentially_freshwater, by = c("sciname" = "name")) %>%
+ filter(!is.na(habitat)) %>%
+ filter(!suitability == "Marginal") #assume they're not common, so unlikely to be tourists watching for them in marginal habitat
## Warning in left_join(., potentially_freshwater, by = c(sciname = "name")): Detected an unexpected many-to-many relationship between `x` and `y`.
+## ℹ Row 36 of `x` matches multiple rows in `y`.
+## ℹ Row 1 of `y` matches multiple rows in `x`.
+## ℹ If a many-to-many relationship is expected, set `relationship =
+## "many-to-many"` to silence this warning.
+#make a list of all the freshwater species
+freshwater_list <- unique(freshwater_species$species)
+
+species_final_hab <- species_final %>%
+ filter(!is.na(species)) %>% #remove countries with no species
+ mutate(habitat = case_when(species %in% freshwater_list ~"Freshwater",
+ TRUE ~ "Marine")) %>%
+ group_by(country, habitat) %>%
+ summarize(count = n()) %>% #count the total number of species in each habitat
+ ungroup() %>%
+ pivot_wider(names_from = habitat, values_from = count) %>%
+ replace_na(list(Freshwater = 0, Marine = 0)) %>%
+ mutate(percent_marine = Marine/(Marine + Freshwater))
## `summarise()` has grouped output by 'country'. You can override using the
+## `.groups` argument.
+
+Now that we have percent marine mammal watching we can calculate the +final marine mammal watching revenue.
+whale_revenue_final <- whale_revenue_data %>%
+ left_join(species_final_hab, by = "country") %>%
+ mutate(value = percent_marine * total_expenditure) %>%
+ mutate(value = ifelse(is.na(value), 0, value)) %>% #nas are 0, countries with 0 documented revenue
+ mutate(unit ="USD (1)",
+ component = "revenue",
+ sector = "marine mammal watching",
+ data_source = "O’Connor et al 2009") %>%
+ select(country, year, value, unit, component, sector, data_source)
+
+write_csv(whale_revenue_final, here(data_path, "int/marine_mammal_revenue.csv"))
We will prep two sources to provide future options. The UNWTO data +was available with projections through 2030, and is the same data source +used in the livelihoods and economies goal. Real data is available +throug x, with future years of data not available for free from this +source. However this is the most complete data set and would still be a +major update from the last calculation of the livelihoods and economies +goal, even if projections were not used. This data was already cleaned +in the v2022 tourism data prep. We will use this cleaned file and modify +slightly.
+tourism_revenue <- read_csv("~/OHI_repositories/ohiprep_v2023/globalprep/tr/v2022/intermediate/wttc_empd_rgn.csv")
## Rows: 5364 Columns: 5
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (1): rgn_name
+## dbl (4): rgn_id, year, jobs_ct, jobs_pct
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+tourism_revenue_clean <- tourism_revenue %>%
+ mutate(unit = "USD (1)",
+ value = jobs_ct,
+ sector = "tourism",
+ component = "revenue",
+ data_source = "UNWTO"
+ ) %>%
+ select(rgn_id, rgn_name, year, value, unit, component, sector, data_source)
+
+write_csv(tourism_revenue_clean, here(data_path, "int/tourism_jobs.csv"))
Turn the number of fishers table from the FAO yearbook into a data +frame and clean.
+#turn the fao pdf into a format we can use
+fishers_table <- fao_pdf[[3]] %>% #page number of the pdf
+ as.data.frame()
+
+#specify columns
+columns <- c("country", "1995", "2000", "2005", "2010", "2014", "2015", "2016", "2017", "2018", "2019")
+
+fao_fisher_jobs <- fishers_table %>%
+ separate_rows('.', sep = "\n") %>% #separate rows based on line breaks
+ rename(main_column = ".") %>%
+ mutate(main_column = str_remove_all(main_column, "\\*")) %>% #remove the *
+ mutate(main_column = str_replace_all(main_column, "F(?!\\p{L})", " ")) %>% #replace all the Fs not followed by letters (used to indicate estimate)
+ slice(7:69) %>% #remove all rows not in the table
+ separate(main_column, into = columns, sep = " {2,}", extra = "merge") %>% #separate into columns based on double spaces
+ pivot_longer(cols = -c("country"), names_to = "year") %>%
+ mutate(value = str_squish(value)) %>% #remove extra spaces
+ mutate(value = str_remove_all(value, " ")) %>%
+ mutate(value = if_else(value == "…" , NA, value)) %>%
+ mutate(value = as.numeric(value)) %>%
+ mutate(Unit = "persons (1)",
+ sector = "fishing",
+ component = "jobs",
+ data_source = "FAO Number of Fishers") %>%
+ clean_names() %>%
+ select(country, year, value, unit, component, sector, data_source)
#turn the fao pdf into a format we can use
+fish_farm_table <- fao_pdf[[5]] %>% #page number of the pdf
+ as.data.frame()
+
+#specify columns
+columns <- c("country", "1995", "2000", "2005", "2010", "2014", "2015", "2016", "2017", "2018", "2019")
+
+fao_farm_jobs <- fish_farm_table %>%
+ separate_rows('.', sep = "\n") %>% #separate into rows based on line breaks
+ rename(main_column = ".") %>%
+ mutate(main_column = str_remove_all(main_column, "\\*")) %>% #remove the *
+ mutate(main_column = str_replace_all(main_column, "F(?!\\p{L})", " ")) %>%
+ slice(7:36) %>% #select the part of the page with the table
+ separate(main_column, into = columns, sep = " {2,}", extra = "merge") %>% #split into columns if there is more than two spaces
+ pivot_longer(cols = -c("country"), names_to = "year") %>%
+ mutate(value = str_squish(value)) %>%
+ mutate(value = str_remove_all(value, " ")) %>%
+ mutate(value = if_else(value == "…" , NA, value)) %>%
+ mutate(value = as.numeric(value)) %>%
+ mutate(unit = "persons (1)",
+ sector = "mariculture",
+ component = "jobs",
+ data_source = "FAO Number of Fish Farmers") %>%
+ clean_names() %>%
+ select(country, year, value, unit, component, sector, data_source)
#number of people employed in th fishing sector
+
+
+#note that this is data for the same category as the FAO fishers data, used for gapfilling OECD
+
+oecd_fisher_jobs <- oecd_raw %>%
+ filter(VARIABLE == "FISH_EMP_MARINE_FISH_TOT") %>%
+ select(- c(PowerCode, VARIABLE)) %>%
+ filter(!Country == "OECD - Total") %>%
+ clean_names() %>%
+ mutate(unit = ("persons (1)"),
+ value = (value * 1000),
+ component = "jobs",
+ sector = "fishing",
+ data_source = "oecd sustainable economies") %>%
+ select(country, year, value, unit, component, sector, data_source)
+
+#number of people employed in aquaculture, both marine and inland
+
+#note that this is data for the same category as the FAO fish farmers data, one could potentially be used for gapfilling the main data set
+oecd_aqua_jobs <- oecd_raw %>%
+ filter(VARIABLE == "FISH_EMP_AQUACULTURE_TOT") %>%
+ select(- c(PowerCode, VARIABLE)) %>%
+ filter(!Country == "OECD - Total") %>%
+ clean_names() %>%
+ mutate(Unit = ("persons (1)"),
+ value = (value * 1000),
+ component = "jobs",
+ sector = "mariculture",
+ data_source = "oecd sustainable economies") %>%
+ select(country, year, value, unit, component, sector, data_source)
+
+
+oecd_fishproc_jobs <- oecd_raw %>%
+ filter(VARIABLE == "FISH_EMP_PROCESSING_TOT") %>%
+ select(- c(PowerCode, VARIABLE)) %>%
+ clean_names() %>%
+ mutate(unit = ("persons (1)"),
+ value = (value * 1000))
Gapfill and save files.
+For the fishers jobs data we will use the OECD data set as the +primary data source and gapfill using data from FAO. To update this data +in future years, we will need to consider gapfilling OECD data using +another method, as FAO no longer publishes disaggregated data on the +number of people employed in fishing. Another possibility is contacting +FAO or ILOSTAT to see if they are able to provide disaggregated data for +this sector.
+oecd_fisher_jobs <-oecd_fisher_jobs %>%
+ mutate(country = ifelse(country == "Korea", "South Korea", country)) #south korea is an oecd member
+
+fao_fisher_jobs <- fao_fisher_jobs %>%
+ mutate(country = case_when(country == "USA" ~ "United States",
+ country == "Turkey" ~ "Türkiye",
+ country == "China" ~ "China (People's Republic of)",
+ country == "China,Taiwan" ~ "Chinese Taipei",
+ country == "UK" ~ "United Kingdom",
+ country == "Korea Rep" ~ "South Korea",
+ country == "Korea D P Rp" ~ "North Korea",
+ T~ country))
+
+#look at the difference to see if we need to rename any other countries
+setdiff(oecd_fisher_jobs$country, fao_fisher_jobs$country)
## [1] "Australia" "Belgium" "Finland" "Greece" "Italy"
+## [6] "Portugal" "Sweden" "Colombia" "Costa Rica" "Estonia"
+## [11] "Latvia" "Lithuania" "Slovenia" "Israel"
+
+## [1] "Russian Fed" "Myanmar" "Morocco" "Nigeria" "Iran"
+## [6] "Mauritania" "Cambodia" "Faroe Is" "Ecuador" "Uganda"
+## [11] "Oman" "Senegal" "Sri Lanka" "Pakistan" "Tanzania"
+## [16] "Namibia" "South Africa" "Angola" "Egypt" "Ghana"
+## [21] "Mozambique" "Guinea" "Cameroon" "Papua N Guin" "Venezuela"
+## [26] "Greenland" "Georgia" "Congo Dem R" "Panama" "Kiribati"
+## [31] "Belize" "North Korea" "Sierra Leone"
+fao_missing_oecd <- fao_fisher_jobs %>%
+ filter(country %in% c(missing_oecd))
+
+fishing_jobs_final <- oecd_fisher_jobs %>%
+ rbind(fao_missing_oecd)
+
+write_csv(fishing_jobs_final, here(data_path, "int/fishing_jobs.csv"))
For the mariculture jobs data we will use the OECD data set as the +primary data source and gapfill using data from FAO. To update this data +in future years, we will need to consider gapfilling OECD data using +another method, as FAO no longer publishes disaggregated data on the +number of people employed in aquacutlure. Another possibility is +contacting FAO or ILOSTAT to see if they are able to provide +disaggregated data for this sector.
+oecd_aqua_jobs <- oecd_aqua_jobs %>%
+ mutate(country = ifelse(country == "Korea", "South Korea", country)) #south korea is an oecd member
+
+fao_farm_jobs <- fao_farm_jobs %>%
+ mutate(country = case_when(country == "USA" ~ "United States",
+ country == "Turkey" ~ "Türkiye",
+ country == "China" ~ "China (People's Republic of)",
+ country == "China, Taiwan" ~ "Chinese Taipei",
+ country == "UK" ~ "United Kingdom",
+ country == "Korea Rep" ~ "South Korea",
+ T~ country))
+
+setdiff(oecd_aqua_jobs$country, fao_farm_jobs$country)
## [1] "Australia" "Austria" "Czech Republic" "Denmark"
+## [5] "Finland" "Greece" "Hungary" "Iceland"
+## [9] "Ireland" "Italy" "Netherlands" "New Zealand"
+## [13] "Poland" "Portugal" "Slovak Republic" "Sweden"
+## [17] "Argentina" "Costa Rica" "Estonia" "Latvia"
+## [21] "Lithuania" "Slovenia" "Switzerland" "Israel"
+## [25] "Belgium" "Germany" "Peru"
+#countries in fao data not in oecd
+oecd_missing <- setdiff(fao_farm_jobs$country, oecd_aqua_jobs$country)
+
+fao_farm_oecd_missing <- fao_farm_jobs %>%
+ filter(country %in% c(oecd_missing))
+
+#combine the two datasets
+aqua_jobs_combined <- oecd_aqua_jobs %>%
+ rbind(fao_farm_oecd_missing) %>%
+ mutate(year = as.numeric(year))
Both data-sets combine mariculture and other aquaculture. Because of +this we need to subset the aquaculture jobs to just those related to +marine and brackish mariculture.
+In the methods document it states: “In order to estimate the +proportion of total aquaculture jobs that can be attributed to marine +and brackish aquaculture, we used country-specific proportions of marine +and brackish aquaculture revenues (compared to total revenues) +calculated from FAO aquaculture production data, assuming that numbers +of jobs approximately scale with production in terms of revenue. For +country-years with no data for the proportion of marine/brackish +production because of gaps in the FAO production data, we used the +proportion from the most recent year for which data were available. For +countries without proportion estimates from any years, we used the +average proportion from the country’s geographic region (e.g., +Caribbean, Polynesia, Eastern Asia), with the exception of American +Samoa, for which we used the proportion value from Guam.”
+To replicate these methods we will use the fao global aquaculture +production value database. We will find then find the proportion of +revenue for each country that is associated with marine/brackish +species.
+#classify brackish as marine for summing
+ aquaculture_value_proportions <- aquaculture_value_clean %>%
+ filter(!is.na(environment)) %>%
+ mutate(environment = ifelse(environment == "Brackishwater", "Marine", environment)) %>%
+ group_by(country, year, environment) %>%
+ summarize(value = sum(value, na.rm = T)) %>%
+ pivot_wider(names_from = environment, values_from = value) %>%
+ mutate(Marine = ifelse((is.na(Marine) & !is.na(Freshwater)), 0, Marine)) %>%
+ mutate( Freshwater = ifelse((is.na(Freshwater) & !is.na(Marine)), 0, Freshwater)) %>%
+ mutate(prop_marine = (Marine/sum(Freshwater,Marine))) %>%
+ mutate(prop_marine = ifelse(Marine == 0 & Freshwater == 0, NA, prop_marine)) #if both are 0, we don't know the proportions
## `summarise()` has grouped output by 'country', 'year'. You can override using
+## the `.groups` argument.
+ aquaculture_value_proportions <- aquaculture_value_proportions %>%
+ select(country, year, prop_marine)
+
+#combine prop_marine with the jobs data set
+mariculture_jobs_final <- aqua_jobs_combined %>%
+ left_join(aquaculture_value_proportions, by = c("year", "country")) %>%
+ mutate(jobs = round(prop_marine * value)) %>%
+ select(country, year, value= jobs, unit, component, sector, data_source)
+
+
+write_csv(mariculture_jobs_final, here(data_path, "int/mariculture_jobs.csv"))
Key takeaways from methods:
+Jobs based on number of whale watchers in a country and a +regional average number of whale watchers per employee. Includes all +marine mammal watching.
When IFAW reported “minimal” numbers of whale watchers, we +converted this description to a 0 for lack of additional +information.
For countries with both marine and freshwater cetacean viewing, +we adjusted by the proportion of marine revenue accordingly. Because +some of the whale watching in O’Connor et al. focused on freshwater +cetacean viewing, we categorized the target species listed for each +country as freshwater or marine. For countries with both marine and +freshwater species, we categorized the whale watching in those countries +as either 50% or 90% marine, based on the number of marine versus +freshwater target species and information provided in the report +narrative. For Colombia and Indonesia, more detailed information in the +report narrative allowed for a more precise determination of the +percentage of marine-based whale watching. We applied these marine +proportions to data on the number of whale watchers before converting +these estimates into employment estimates.
#get table with regional average number of whale watchers per employee out of pdf
+columns <- c("region", "job_number", "watch_per_employee")
+
+regional_whale_employee <- whale_pdf[26] %>% #page number
+ as.data.frame() %>%
+ separate_rows('.', sep = "\n") %>% #separate into rows based on page break
+ rename(main_column = ".") %>%
+ slice(23:31) %>% #select rows with table
+ slice(1,2,3,4, 6,7,9) %>% #remove rows that get split into two
+ mutate(main_column = str_trim(main_column)) %>%
+ separate(main_column, into = columns, sep = " {2,}", extra = "merge") %>% #separate into columns if there is a double or more space
+ mutate(region = case_when(str_detect(region, "Oceania") ~ "Oceania and the Pacific
+Islands",
+str_detect(region, "Central America") ~ "Central America and Caribbean",
+T ~ region)) %>% #fix a few regions that were split up
+ mutate(watch_per_employee = as.numeric(str_remove(watch_per_employee, ","))) %>% #remove commas from numbers
+ select(-job_number)
Apply proportion Marine to the number of whale watchers.
+whale_watching_marine <- whale_watching_numbers %>%
+ left_join(species_final_hab, by = "country") %>%
+ mutate(watcher = value * percent_marine) %>%
+ select(watcher,country, region, year)
whale_jobs <- whale_watching_marine %>%
+ left_join(regional_whale_employee, by = c("region")) %>%
+ mutate(jobs = round(watcher/watch_per_employee,0)) %>%
+ mutate(value = ifelse(is.na(jobs), 0, jobs)) %>% #nas are 0s in this case
+ mutate(unit ="persons (1)",
+ component = "jobs",
+ sector = "marine mammal watching",
+ data_source = "O’Connor et al 2009") %>%
+ select(country, year,value, unit, component, sector, data_source)
+
+write_csv(whale_jobs, here(data_path, "int/whale_jobs.csv"))
From the OHI methods:
+We used the Occupational Wages around the World (OWW) database +produced by Remco H. Oostendorp and Richard B. Freeman in 2005 (http://www.nber.org/oww/). These data were drawn from +the International Labour Organization and subjected to a standardization +process (for more information, see http://www.nber.org/oww/Technical +_document_1983-2003_standardizationv3.pdf).
+“The database provides several different calibrations, and we use +the”x3wl calibration”, described as a “country-specific and uniform +calibration with lexicographic weighting,” and recommended as being the +preferred calibration in most cases.”
Although significant gaps exist in this database, it contains +country-specific information on average wages in many industries for +more than 150 countries from 1983-2003.
Data represent average monthly wages of a male worker. Wage data +were divided by the inflation conversion factor for 2010 so that wage +data across years would be comparable (http://oregonstate.edu/cla/polisci/sahr/sahr), and then +multiplied by the purchasing power parity-adjusted per capita gdp +(ppppcgdp, WorldBank).
The adjusted wage data were then multiplied by 12 to get annual +wages.
We used the industry and occupation classifications reported in +the OWW to estimate wages for marine-related sectors.
Update Notes:
+The available data from OWW goes through 2008, the database has +been discontinued and will not be updated in the future.
Since we have not decided what year we will be using the +inflation conversion factor for, data are cleaned only to the point of +selecting the appropriate sectors and occupations.
Although the methods say we used “xw3l”, it appears this is a +typo and the correct calibraiton is w3wl. OWW documentation says: “The +standardized wage with country-specific calibration and imputation and +lexicographic weighting (w3wl) is recommended for use.” The variable +mw3wl_us is selected: monthly: country-specific +calibration with imputation (lex), US$ Occupation codes are +defined in Technical_document_OWW_1953-2008_release1.0.docx under the +documentation section.
From the methods we know the occupations we need:
+Fishing
+Industry: deep sea & coastal fishing; (y3 Industry code +AE)
Occupations: Deep-sea fisherman; inshore (coastal) maritime +fisherman (y4 Occupation codes 9 & 10)
Ports & harbors
+Industry: supporting services to maritime transport; (y3: +NE)
Occupation: dock worker (y4: 117)
Transportation and shipping
+Industry: maritime transport (y3 : ND)
occupations: ship’s chief engineer; ship’s passenger stewards; +able seaman (y4: 114, 115, 116)
All data are in monthly wages (USD)
Ship & boat building Industry
+Industry: shipbuilding and repairing; (y3: JD)
Occupation: ship plater (y4: 75)
Tourism
+Industry: restaurants and hotels; (y3: MC)
Occupations: hotel receptionist; cook; waiter; room attendant or +chambermaid. (y4: 97, 98, 99, 100)
These data are not specific to coastal/marine tourism jobs, and +thus we assumed that wages in these jobs are equal in coastal and +non-coastal areas
Process and save the oww data
+oww_clean <- oww %>%
+ filter(y4 %in% c(9, 10, 75, 97, 98, 99, 100, 114,115, 116, 117)) %>% #select the relevany occupation codes
+ select(y0, y1, y3, y4, country_name, mw3wl_us) %>%
+ rename(country = country_name,
+ year = y0,
+ industry_code = y3,
+ occupation_code = y4,
+ value = mw3wl_us) %>%
+ mutate(unit = "monthly wage USD (1)",
+ sector = case_when(occupation_code %in% c(9, 10) ~ "cf",
+ occupation_code %in% c(117) ~"ph",
+ occupation_code %in% c(114, 115, 116) ~ "tran",
+ occupation_code == 75 ~ "sb",
+ occupation_code %in% c(97, 98, 99, 100) ~ "tour"),
+ data_source = "oww database") %>%
+ select(country, year, value, unit, occupation_code, sector, data_source)
+
+
+write_csv(oww_clean, here(data_path, "int/all_sectors_wages.csv"))