Carry prov forward in certain scenarios #151

jeanetteclark · 2019-11-26T17:39:06Z

see issue #131

https://stat.ethz.ch/pipermail/r-sig-debian/2018-April/002829.html r-spatial/sf#729 Rdatatable/data.table#2801

Added `cache: packages` back in

…tautils into publish_update2

Add EML object as an alternative option for metadata_path

… production node

create pid_to_eml_entity function deprecate old functions

Added checks within function so it does not create dummy objects on a…

eml_otherEntity_to_dataTable function

Update eml.R

Put the metadata check in the `check_first` block and left the resource map check as non-optional.

still some failure cases that aren't working

…pport

…last names less likely to discard information

add NSF award helper

fix bug if co-pis is empty in new helper

…o carry_prov

amoeba

This looks really good. Especially great to see the tests I was looking for to verify the behavior. After looking closer, I see now what you were saying about provenance being forwarded when the data PIDs don't change, even by default. Missed it on the first go-round. I think that works.

Made a few comments, let me know what you think.

amoeba · 2019-11-26T20:43:03Z

R/editing.R

@@ -255,6 +255,7 @@ update_object <- function(mn, pid, path, format_id = NULL, new_pid = NULL, sid =
 #'   Checks that objects exist and are of the right format type. This speeds up the function, especially when `data_pids` has many elements.
 #' @param format_id (character) Optional. When omitted, the updated object will have the same formatId as `metadata_pid`. If set, will attempt
 #'   to use the value instead.
+#'   @param keep_prov (logical) Option to force publish_update to keep prov


Suggest removing two leading spaces before @param and changing to:

Optional. Whether or not to retain provenance statements present in resource_map_pid in the updated resource map. Defaults to FALSE.

in order to match style.

amoeba · 2019-11-26T20:45:37Z

R/editing.R

+  # Get the old resource map so we can extract any statements we need out of it
+  # such as PROV statements
+  old_resource_map_path <- tempfile()
+  writeLines(rawToChar(dataone::getObject(mn, resource_map_pid)), old_resource_map_path)


If the call to dataone::getObject fails, the function will stop and show an error about why. This is probably fine since, at this point, no work has been done that needs to get cleaned up. Sound good?

amoeba · 2019-11-26T20:51:50Z

R/editing.R

+                        other_statements)
+  }
+
+  prov_pids <- gsub("https://cn-stage-2.test.dataone.org/cn/v[0-9]/resolve/|https://cn.dataone.org/cn/v[0-9]/resolve/|https://cn-stage.test.dataone.org/cn/v[0-9]/resolve/",


This works for 99+% of cases, but it might be better if it was generalized to all environments: ^https?\:\/\/.+?\.(test\.)?org\/cn\/v\d+\/resolve\/.

amoeba · 2019-11-26T20:55:16Z

R/editing.R

+  prov_pids <- gsub("https://cn-stage-2.test.dataone.org/cn/v[0-9]/resolve/|https://cn.dataone.org/cn/v[0-9]/resolve/|https://cn-stage.test.dataone.org/cn/v[0-9]/resolve/",
+                    "",
+                    c(statements$subject, statements$object)) %>%
+    gsub("%3A", ":", .)


Since the PID portion of the resolve URI can have more than just URL-encoded ":" characters (like "/"), should this use URLdecode(x, reserved = TRUE) instead?

amoeba · 2019-11-26T20:58:30Z

R/editing.R

+                    "",
+                    c(statements$subject, statements$object)) %>%
+    gsub("%3A", ":", .)
+  prov_pids <- prov_pids[-(grep("^http", prov_pids))] %>% # might need to catch other things besides URLs


Not sure I'm understanding this part. Above, prov_pids should turn into a list of PIDs, possibly with dupes. Why do we need to filter it further before uniquifying?

amoeba · 2019-11-26T21:04:06Z

R/editing.R

  # Create the replacement resource map
  if (is.null(identifier)) {
    identifier <- paste0("resource_map_", new_uuid())
  }

-  new_rm_path <- generate_resource_map(metadata_pid = metadata_pid,
+  if (keep_prov == FALSE){


I notice that the call to generate_resource_map shows up four times here, in separate conditional branches. While it totally works, a slightly tidier pattern is to use the conditionals to generate the appropriates values for each parameter of generate_resource_map and call it once after the conditional block. The benefits are simplifying the conditional, and reducing any pain that may come later when if/when generate_resource_map is refactored which would cause use to have to update four call sites instead of one.

This'd look something like:

if (CONDITION) { other_statements = NULL } generate_resource_map(...) # All args, even NULL'd-out ones like `other_statements`

amoeba · 2019-11-26T21:08:28Z

R/editing.R

+#' Get a data.frame of prov statements from a resource map pid.
+#'
+#' This is a function that is useful if you need to recover lost prov statements. It returns
+#' a data.frame of statements that can be passed to `update_resource_map` in the `other_statements`


Do we wanna linkify these references?

amoeba · 2019-11-26T21:10:09Z

tests/testthat/test_editing.R


-  expect_true(stringr::str_detect(url_new[2], new_data_pid))
-})
+  prov_pids <- gsub("https://cn-stage-2.test.dataone.org/cn/v[0-9]/resolve/|https://cn.dataone.org/cn/v[0-9]/resolve/|https://cn-stage.test.dataone.org/cn/v[0-9]/resolve/", "", c(t$subject, t$object)) %>%


I see this code is duplicated, might be good to make a function or two?

amoeba · 2019-11-26T21:11:45Z

R/editing.R

+            old_prov <- recover_prov(mn, rm_pid)
+            rm_new <- update_resource_map(mn, rm_pid, metadata_pid, data_pids, other_statements = old_prov, keep_prov = T)")
+
+    new_rm_path <- generate_resource_map(metadata_pid = metadata_pid,


Indentation is a tad off.

amoeba · 2019-11-26T21:12:38Z

R/editing.R

+      warning("Old provenance contains data pids not in new resource map. Provenance information will be removed. \n
+            You can get old provenance statements back using:
+            old_prov <- recover_prov(mn, rm_pid)
+            rm_new <- update_resource_map(mn, rm_pid, metadata_pid, data_pids, other_statements = old_prov, keep_prov = T)")


What do you think about changing keep_prov = T to keep_prov = TRUE to encourage people away from using abbreviations which is pretty dangerous.

…th 1

Co-Authored-By: Jeanette Clark <[email protected]>

return eml_validate check in eml_otherEntity_to_dataTable

often the same personnel are listed on multiple grants, this prevents them from being listed multiple times in the EML project section

remove duplicate personnel from NSF helper

Merge remote-tracking branch 'upstream/master' into carry_prov # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

isteves and others added 30 commits May 14, 2018 09:29

Try uncaching packages to see if devel/new release will work

8346ada

https://stat.ethz.ch/pipermail/r-sig-debian/2018-April/002829.html r-spatial/sf#729 Rdatatable/data.table#2801

Try update: true

96ae41f

Added `cache: packages` back in

Merge branch 'publish_update2' of https://github.com/isteves/arcticda…

5201e95

…tautils into publish_update2

Merge pull request NCEAS#85 from isteves/publish_update2

ad7c63a

Add EML object as an alternative option for metadata_path

Update maintenance.md with more details on releases

b1a7fc2

created pid_to_eml_entity function

3555412

cleaned up whitespace

879edbc

updated to reflect newest version: v0.6.4

75dddb3

updated example

d9b00b2

Added checks within function so it does not create dummy objects on a…

9623fb8

… production node

Merge pull request NCEAS#91 from maier-m/master

d14e6e8

create pid_to_eml_entity function deprecate old functions

Merge pull request NCEAS#94 from sharisochs/issue-93

a80ce97

Added checks within function so it does not create dummy objects on a…

initial eml_otherEnt_to_dt commit

a70a23b

eml_otherEntity_to_dataTable function

06fd03b

added unit tests

34199e2

added examples

b482759

updated documentation

332d1a7

travis fixes

0ca6da5

more travis fixes

f8e18ec

added mitchell's function

b522831

syntax updates

0012d28

Merge pull request NCEAS#97 from dmullen17/master

4e3473f

eml_otherEntity_to_dataTable function

Update eml.R

946bb03

Merge pull request NCEAS#98 from dmullen17/master

6539ed5

Update eml.R

added checks for current versions

30e572e

switched checks to getSystemMetadata calls

bff972d

added unit tests, moved new checks position in function

f5079e4

travis fixes

e868981

revisions from bryce's comments

686e7bb

Merge pull request NCEAS#100 from dmullen17/publish_update_checks

ed26220

Put the metadata check in the `check_first` block and left the resource map check as non-optional.

jeanetteclark added 18 commits November 14, 2019 14:49

update warning message with correct function name

11aa4c3

start of the NSF function

fd927f2

still some failure cases that aren't working

update to now cover more failure cases

3788836

add tests for nsf to project function

0693996

make namespacing more explicit to make the R checks happier

8a01bd6

update DESCRIPTION, ns, and Rd files

b46cf06

prepend "NSF" to award numbers for funding section

d8bed9c

make the warning output nicer and start clearing a path for EML2.2 su…

d8cc7ed

…pport

add support for EML 2.2

c2f3117

update description and Rd

7090a12

add an eml_version argument check and make the spliting of first and …

adc5659

…last names less likely to discard information

remove a dependency we don't need anymore

5e4ecd5

update namespace file removing rlang dep

a32850f

incorporate some of Dom's suggestions

65ec9b5

Merge pull request NCEAS#149 from NCEAS/nsf_award_helper

a7b4ef4

add NSF award helper

fix bug if co-pis is empty

4568c52

Merge pull request NCEAS#150 from NCEAS/nsf_award_helper

b5f6714

fix bug if co-pis is empty in new helper

Merge branch 'master' of https://github.com/nceas/arcticdatautils int…

448e0a8

…o carry_prov

jeanetteclark requested a review from amoeba November 26, 2019 17:39

documenting new keep_prov argument

4cec070

amoeba suggested changes Nov 26, 2019

View reviewed changes

dmullen17 and others added 7 commits December 11, 2019 14:20

return eml_validate check in eml_otherEntity_to_dataTable

d9a8613

updated eml_otherEntity_to_dataTable to handle otherEnt lists of leng…

deb06b9

…th 1

Update R/eml.R

c045895

Co-Authored-By: Jeanette Clark <[email protected]>

Merge pull request NCEAS#154 from dmullen17/master

2c371ce

return eml_validate check in eml_otherEntity_to_dataTable

remove duplicate personnel

3282e33

often the same personnel are listed on multiple grants, this prevents them from being listed multiple times in the EML project section

Merge pull request NCEAS#155 from NCEAS/nsf_award_helper

f29d733

remove duplicate personnel from NSF helper

laijasmine force-pushed the master branch from a47bd41 to 6193636 Compare October 2, 2020 18:12

Base automatically changed from master to main February 23, 2021 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Carry prov forward in certain scenarios #151

Carry prov forward in certain scenarios #151

jeanetteclark commented Nov 26, 2019

amoeba left a comment

amoeba Nov 26, 2019

amoeba Nov 26, 2019

amoeba Nov 26, 2019

amoeba Nov 26, 2019

amoeba Nov 26, 2019

amoeba Nov 26, 2019

amoeba Nov 26, 2019

amoeba Nov 26, 2019

amoeba Nov 26, 2019

amoeba Nov 26, 2019

Carry prov forward in certain scenarios #151

Are you sure you want to change the base?

Carry prov forward in certain scenarios #151

Conversation

jeanetteclark commented Nov 26, 2019

amoeba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment