Skip to content

Commit

Permalink
Merge pull request #26 from gbif/private-sec-table
Browse files Browse the repository at this point in the history
table building and machine tags with gh actions
  • Loading branch information
MattBlissett authored Jul 15, 2024
2 parents 97092b4 + bface13 commit 2112f8c
Show file tree
Hide file tree
Showing 16 changed files with 1,290 additions and 116 deletions.
47 changes: 47 additions & 0 deletions .github/workflows/run.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
on:
push:
schedule:
- cron: '0 0 1 * *' # every month

jobs:
build:
runs-on: ubuntu-latest
name: run R script

env:
GBIF_USER: ${{ secrets.GBIF_USER }}
GBIF_PWD: ${{ secrets.GBIF_PWD }}

steps:
- uses: actions/checkout@v3
- name: Setup R
uses: r-lib/actions/setup-r@v2
with:
r-version: '4.2.2'
- run: Rscript -e 'print("hello")'

- name: Cache R packages
uses: actions/cache@v3
with:
path: ${{ env.R_LIBS_USER }}
key: ${{ runner.os }}-${{ hashFiles('.github/R-version') }}-2-${{ hashFiles('.github/depends.Rds') }}
restore-keys: ${{ runner.os }}-${{ hashFiles('.github/R-version') }}-2-

- name: Install pak
run: |
install.packages("pak", repos = "https://r-lib.github.io/p/pak/devel/")
shell: Rscript {0}

- name: Install dependencies
run: |
pak::pkg_install("jhnwllr/gbifmt")
pak::pkg_install(c("httr2","tibble","tidyr","rgbif","readr","purrr","dplyr", "ascii"))
shell: Rscript {0}

- name: build table
run: Rscript build-table-script/R/table.R

# - name: update machine tags
# run: Rscript build-table-script/R/mt.R

- uses: stefanzweifel/git-auto-commit-action@v4
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
.bash_history
.DS_Store
.idea
lunr-index.json
lunr.log
*.asis
*.brotli
*.html
Expand All @@ -10,6 +12,9 @@
*.??.adoc
!*.en.adoc
*.??-??.adoc
sectors.??.txt
!sectors.en.txt
*.mo
*.icloud
build.sh
.history
16 changes: 11 additions & 5 deletions 200.en.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,22 @@ Sep 2022: 7,789,180
<<table-01,Table 1>> lists the most significant private sector publishers.

[[table-01]]
[caption="Table 1. "]
.Private-sector companies that publish their data through GBIF (as of August 2023)
include::250-private-sector-table.adoc[]
[%header,cols="4,2,2,>1,>1,>1",format=csv]
.Private-sector companies that publish their data through GBIF
|===
Company,Activity sector,Country,Datasets,Occurrence records,Data citations

include::250-private-sector-table.csv[]
|===

[[table-02]]
[caption="Table 2. "]
[%header,format=csv]
.Grand Totals
include::260-private-sector-totals.adoc[]
|===
Datasets,Occurrence records,Data citations

include::260-private-sector-totals.csv[]
|===

=== What data could the company publish through GBIF?

Expand Down
65 changes: 0 additions & 65 deletions 250-private-sector-table.adoc

This file was deleted.

62 changes: 62 additions & 0 deletions 250-private-sector-table.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
https://www.gbif.org/publisher/ca11748e-a30a-4252-930f-bdb017e942c5[AGBAR],{Consulting},{ES},1,142 194,67
https://www.gbif.org/publisher/f2429cd1-4d45-475c-852a-892024cb4aba[ARC - Arctic Research and Consulting DA],{Consulting},{NO},1,8 914,89
https://www.gbif.org/publisher/6d1beb45-43bc-499a-85a0-f06f67e81591[Aguas de Bogotá S.A. E.S.P.],{Utilities},{CO},1,13 280,107
https://www.gbif.org/publisher/620e3d31-d433-4154-9cf6-232a6a6b5e3f[Akvaplan-niva],{Consulting},{NO},3,594,25
https://www.gbif.org/publisher/b5904aaf-02c7-4ff3-85a6-0f528dbb632e[Anadarko Colombia Company],{Energy},{CO},7,1 178,57
https://www.gbif.org/publisher/df604473-66f0-444d-94c4-22795f268afe[AngloGold Ashanti Colombia S.A.S],{Materials},{CO},5,87 020,196
https://www.gbif.org/publisher/612c9b58-e739-4af4-a038-4b3901fa5649[Asplan Viak AS],{Engineering},{NO},14,3 775,440
https://www.gbif.org/publisher/e62a5313-e771-4c81-b6d1-cba6e4085635[Aures Bajo],{Energy},{CO},2,368,47
https://www.gbif.org/publisher/83500190-21b6-445c-ab2c-c0565fc0afce[Awake Travel],{Consulting},{CO},1,8 644,21
https://www.gbif.org/publisher/eea64f26-8fd5-49fb-be7e-a1d4cfc051ee[Aïgos SAS],{Consulting},{CO},3,2 404,48
https://www.gbif.org/publisher/b2c1126d-e3b4-4619-9f94-b236dcc0a947[Biofokus],{Consulting},{NO},1,444 289,1 133
https://www.gbif.org/publisher/a41046bd-eaca-49bf-919b-419062ffc2a2[Biolog J.B. Jordal AS],{Consulting},{NO},1,177 814,684
https://www.gbif.org/publisher/8e6bc843-c1b4-4b10-b546-881f06049004[Biotica Consultores Ltda],{Consulting},{CO},4,1 318,219
https://www.gbif.org/publisher/14fb9c57-68a5-4870-b434-5355df7a9c3c[Carbones del Cerrejón Limited],{Materials},{CO},9,197 100,279
https://www.gbif.org/publisher/0fd86a13-3d0d-4d6e-b809-2811706f35d6[Celsia Colombia S.A. E.S.P.],{Energy},{CO},11,55 792,63
https://www.gbif.org/publisher/bbf93124-1cc2-4cac-a101-b4412dd04e2a[Central Hidroeléctrica de Caldas S.A E.S.P],{Energy},{CO},2,3 570,48
https://www.gbif.org/publisher/1a4f4e64-eb3d-42c3-a359-1be3869b3a20[Cerro Matoso S.A],{Materials},{CO},3,19 309,205
https://www.gbif.org/publisher/d49251f5-379f-43b4-b747-9d8240334fa5[Chevron Australia],{Energy},{AU},1,2 048,68
https://www.gbif.org/publisher/03a8bc52-9c2e-4aee-8dd7-9b4d279e4960[Compensation International Progress S.A. -Ciprogress Greenlife-],{Industrials},{CO},1,820,77
https://www.gbif.org/publisher/db41c5c6-d34a-4d27-8ac9-0c8d085393f7[Concesión La Pintada S.A.S],{Industrials},{CO},2,0,0
https://www.gbif.org/publisher/d3c29fed-bcac-4f84-8d3d-f4b7f76fdc8e[Construcciones y Ambiente Conambiente S.A.S],{Consulting},{CO},10,4 392,82
https://www.gbif.org/publisher/c5245889-c63d-48fa-ae4b-90ddd74f1d2d[Cunaguaro Consultores LTDA],{Consulting},{CO},1,657,67
https://www.gbif.org/publisher/efc5d3c7-2fec-42dd-85de-078a73973bd1[DNV],{Energy},{NO},1,2 372 473,73
https://www.gbif.org/publisher/e5150835-f502-424c-b470-24dd496b1b18[EDP],{Energy},{PT},120,1 855 649,475
https://www.gbif.org/publisher/76c3443b-bf10-4fb6-a6e7-aeaa65be383c[ENGIE],{Energy},{FR},20,29 555,10
https://www.gbif.org/publisher/fac91b96-c087-460f-ab01-b808f341c2f5[Ecofact],{Consulting},{NO},3,12 508,482
https://www.gbif.org/publisher/3ca2ab24-7f53-458e-b4ad-6e88ea6d9628[Econativa Consultores SpA],{Consulting},{CL},1,3,12
https://www.gbif.org/publisher/d5ef14a1-5177-4547-9ce2-46d84a4214eb[Ecopetrol S.A.],{Energy},{CO},89,736 568,204
https://www.gbif.org/publisher/d42b7e5d-a3e5-4fc2-8b3d-105336d70898[Empresas Públicas de Medellín E.S.P.],{Energy},{CO},48,2 278 898,224
https://www.gbif.org/publisher/f442f96e-2017-4cf5-b19f-1f3320ae7577[Enel Colombia],{Energy},{CO},13,31 192,78
https://www.gbif.org/publisher/51818adb-2745-4201-9397-6d6dc433954f[Equinor],{Energy},{NO},2,1 102,7
https://www.gbif.org/publisher/d98d7029-8cb7-44c2-88af-52988adc3a62[Faun Naturforvaltning AS],{Consulting},{NO},1,3 787,428
https://www.gbif.org/publisher/37c1c493-782c-4f53-914d-b1f66cdcf61c[Federación Nacional de Cacaoteros],{Agriculture},{CO},1,17,21
https://www.gbif.org/publisher/fe602f47-b553-4291-b6e5-197b9837e167[Federación Nacional de Cafeteros de Colombia],{Agriculture},{CO},6,26 804,441
https://www.gbif.org/publisher/2977895d-3ce2-4fb9-b62e-a775c8fd9304[Grupo Energía Bogotá],{Energy},{CO},1,61 111,162
https://www.gbif.org/publisher/946b9adc-5ec0-4d76-a143-8bd43444415f[HBH Projekt spol. s r.o.,Kabátnikova 5, 602 00 Brno,ČR – organizačná zložka Slovensko],{Engineering},{SK},14,2 024,13
https://www.gbif.org/publisher/90d2e455-c279-4bf1-ba87-806495641e18[Hatovial S.A.S],{Engineering},{CO},1,1 898,153
https://www.gbif.org/publisher/2d7ea901-0128-4a7a-8207-425020c1fd99[Holcim Spain],{Mining},{ES},2,35,46
https://www.gbif.org/publisher/67c63221-0c74-4c18-97f9-e2b2acb739ce[INERCO Consultoría Colombia],{Consulting},{CO},1,1 090,183
https://www.gbif.org/publisher/04ce62dd-30ec-4d98-8b30-b09cafc3ac38[Isagen S.A. E.S.P.],{Energy},{CO},12,41 665,388
https://www.gbif.org/publisher/b1670923-c90b-4420-be96-1db600ed2109[Lake Tanganyika Floating Health Clinic],{Health Care},{CD},3,337,9
https://www.gbif.org/publisher/54eb018e-54d8-49cc-b98b-37733bb70028[Mineros Aluvial S.A.S. BIC],{Mining},{CO},1,7 307,34
https://www.gbif.org/publisher/4d14137b-ce2c-4111-98a9-0078f5d53237[Minería Social Incluyente S.A.S.],{Mining},{CO},1,4 159,78
https://www.gbif.org/publisher/9a21807b-b9c5-4071-b393-764f3cd58abc[Moam Monitoreos Ambientales S.A.S],{Consulting},{CO},1,1 781,69
(https://www.gbif.org/dataset/d0a90634-21fb-4c76-9081-98bf3930ad7c)[Monitoramento fauna e flora Mineração Vale Verde do Brasil Ltda.],{Materials},{BR},1,299,148
https://www.gbif.org/publisher/359ba517-ca03-46dd-9583-d2be73085c2f[Multiconsult],{Consulting},{NO},1,308,179
https://www.gbif.org/publisher/a1648ebf-7363-4c27-beb0-23271087220f[NNI Resources AS],{Consulting},{NO},2,3 115,104
https://www.gbif.org/publisher/99c6eaae-f15b-4656-a600-d0c50044962e[NaturRestaurering AS],{Consulting},{NO},10,17 609,322
(https://www.gbif.org/dataset/72e23311-b65a-46d0-bc07-ff0a251b47e1)[Nature monitoring data, Amphi Consult and Biomedia, Denmark],{Consulting},{DK},1,47 254,25
https://www.gbif.org/publisher/52bd9c22-340b-480d-b414-73db37cd9379[Navantia, S.A.],{Industrials},{ES},6,823,33
https://www.gbif.org/publisher/4e8fae15-2ca7-4493-8c57-573194d29c0f[Nocturne Environmental Surveyors Ltd],{Consulting},{GB},1,32,28
https://www.gbif.org/publisher/c3da1f49-b2c8-4751-b72f-28855546ec4c[Oleoducto Bicentenario],{Energy},{CO},11,4 161,273
https://www.gbif.org/publisher/dbc07e15-c05b-4781-9ec3-59d331a9a4d8[Parex Resources Colombia - AG Sucursal],{Energy},{CO},17,215 099,20
https://www.gbif.org/publisher/9a408a2b-6bbb-4c95-80d9-0dce1fba1c00[Pierre Fabre],{Consumer Staples},{FR},20,4 049,202
https://www.gbif.org/publisher/dbc2ab56-d499-403c-8db5-c1a49cd0b75f[Promigas S.A E.S.P],{Energy},{CO},12,180 937,328
https://www.gbif.org/publisher/815809f1-e6e6-44df-b3fd-b17a9d87eada[Regelink Ecology & Landscape],{Consulting},{NL},1,157 976,160
https://www.gbif.org/publisher/80e15a76-70e8-417d-9111-b2e9e0dd8f18[Rådgivende Biologer],{Consulting},{NO},5,15 214,401
https://www.gbif.org/publisher/c4444b2c-6b07-40c2-8474-6556a195cd40[SWECO Norge AS],{Engineering},{NO},1,1 139,407
https://www.gbif.org/publisher/2c542862-b9dd-40fc-8260-fb434997efa7[Stratos Consultoría Geológica],{Consulting},{CO},2,1 084,35
https://www.gbif.org/publisher/f5db868f-e5bf-4208-bd9d-d4063ae1c825[TERRASOS],{Consulting},{CO},20,39 479,284
https://www.gbif.org/publisher/728e3362-3063-4a43-a6cf-71d61b50025b[TotalEnergies],{Energy},{FR},56,45 783,171
https://www.gbif.org/publisher/04a12c74-4b26-4994-a51a-8b733a57318b[Veolia Colombia],{Energy},{CO},2,672,13
4 changes: 0 additions & 4 deletions 260-private-sector-totals.adoc

This file was deleted.

1 change: 1 addition & 0 deletions 260-private-sector-totals.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
595,9 380 476,10 745
81 changes: 81 additions & 0 deletions build-table-script/R/table.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@

library(dplyr)
library(purrr)
library(gbifmt) # my library
# setwd("C:/Users/ftw712/Desktop/doc-private-sector-data-publishing/")

# harvest publishers and datasets private sector publishers directly from machineTags

# dataset machine tags
ds_mt = get_mt("privateSector.gbif.org",type="dataset",limit=500) %>%
mutate(link = paste0("https://www.gbif.org/dataset/",uuid)) %>%
mutate(pd = "dataset") %>%
select(link, `Activity sector` = value, pd, key = uuid) %>%
glimpse()

# publisher machine tags
pb_mt = get_mt("privateSector.gbif.org",type="organization",limit=500) %>%
mutate(link = paste0("https://www.gbif.org/publisher/",uuid)) %>%
mutate(pd = "publisher") %>%
select(link, `Activity sector` = value, pd, key = uuid) %>%
glimpse()

# combine
ss = list(ds_mt,pb_mt) %>%
bind_rows() %>%
glimpse()

gbif_country = rgbif::enumeration_country() %>% select(Country=title,iso2) %>% glimpse()

pp = ss %>%
dplyr::filter(pd == "publisher") %>%
select(key,`Activity sector`) %>%
mutate(name = map_chr(key,~rgbif::organizations(uuid=.x,limit=1)$data$title)) %>%
mutate(`Occurrence records` = map_dbl(key,~ rgbif::occ_search(publishingOrg = .x,occurrenceStatus=NULL,limit=0)$meta$count)) %>%
mutate(Datasets = map_dbl(key,~rgbif::dataset_search(publishingOrg= .x,limit=0)$meta$count)) %>%
mutate(`Data citations` = map_dbl(key,~rgbif::lit_count(publishingOrg = .x))) %>%
mutate(Company = paste0("https://www.gbif.org/publisher/",key,"[",name,"]")) %>%
mutate(iso2 = map_chr(key,~rgbif::dataset_search(publishingOrg=.x,limit=1)$data$publishingCountry)) %>%
merge(gbif_country,by="iso2") %>%
glimpse()

dd = ss %>%
dplyr::filter(pd == "dataset") %>%
select(key,`Activity sector`) %>%
mutate(name = map_chr(key,~rgbif::dataset_get(uuid=.x)$title)) %>%
mutate(`Occurrence records` = map_dbl(key,~
rgbif::occ_search(datasetKey = .x,occurrenceStatus=NULL,limit=0)$meta$count)) %>%
mutate(Datasets = 1) %>%
mutate(`Data citations` = map_dbl(key,~rgbif::lit_count(datasetKey = .x))) %>%
mutate(Company = paste0("(https://www.gbif.org/dataset/",key,")[",name,"]")) %>%
mutate(p_key = map_chr(key,~ rgbif::dataset_get(uuid=.x)$publishingOrganizationKey)) %>%
mutate(iso2 = map_chr(p_key,~rgbif::dataset_search(publishingOrg=.x,limit=1)$data$publishingCountry)) %>%
merge(gbif_country,by="iso2") %>%
select(-p_key) %>%
glimpse()

# combine tables
tt = rbind(pp,dd) %>%
arrange(name) %>%
select(Company, `Activity sector`, Country = iso2, Datasets, `Occurrence records`, `Data citations`)

# save csv and clean up
tt %>%
mutate(`Datasets` = trimws(format(`Datasets`, nsmall=0, big.mark="\u202F"),which ="left")) %>%
mutate(`Occurrence records` = trimws(format(`Occurrence records`, nsmall=0, big.mark="\u202F"),which ="left")) %>%
mutate(`Data citations` = trimws(format(`Data citations`, nsmall=0, big.mark="\u202F"),which ="left")) %>%
mutate(`Activity sector` = paste0("{",`Activity sector`,"}")) %>%
mutate(`Country` = paste0("{",`Country`,"}")) %>%
write.table(file = "250-private-sector-table.csv", row.names = FALSE, col.names = FALSE, sep = ",", quote = FALSE)

# totals table
tt %>%
summarise(
Datasets = sum(Datasets),
`Occurrence records` = sum(`Occurrence records`),
`Data citations` = sum(`Data citations`)
) %>%
mutate(`Datasets` = trimws(format(`Datasets`, nsmall=0, big.mark="\u202F"),which ="left")) %>%
mutate(`Occurrence records` = trimws(format(`Occurrence records`, nsmall=0, big.mark="\u202F"),which ="left")) %>%
mutate(`Data citations` = trimws(format(`Data citations`, nsmall=0, big.mark="\u202F"),which ="left")) %>%
write.table(file = "260-private-sector-totals.csv", row.names = FALSE, col.names = FALSE, sep = ",", quote = FALSE)
Loading

0 comments on commit 2112f8c

Please sign in to comment.