Reviewers using anonymous private URL might learn dataset author's identity from information about the Dataverse installation or collection #8184

jggautier · 2021-10-25T17:31:05Z

Information about the installation and about the Dataverse collection that the dataset is in could help reveal the dataset author's identity to the dataset reviewer.

Information about the repository housing the dataset:

The anonymous private URL page shows the name of the Dataverse repository/installation that the dataset is in and the reviewer can navigate around the repository website to find more information about the repository. This could be an issue for Dataverse repositories with a more narrow/focused audience, like the repositories that only allow researches affiliated with a certain institution to deposit datasets.

Information about the Dataverse collection housing the dataset:

The anonymous private URL page shows the name of the Dataverse collection that the dataset is in, even if the Dataverse collection is unpublished. This feature was meant mostly for "Journal Dataverse collections," (#1724) so we should expect that the reviewer would already know, before ever visiting the anonymous private URL page, that the dataset is associated with a particular journal.

But the depositor's dataset could be in a Dataverse collection whose information (such as collection name or description) could be used to identify the author. This point was also brought up in two comments (1, 2) in the original GitHub issue. For example, many collections include the researcher's name because when people create Dataverse collections, the Dataverse software prefills the "Dataverse Name" field with the name of the Dataverse repository account that created the Dataverse collection. This is often the author's name, and the reviewer can see that Dataverse collection name, in the breadcrumbs on the anonymous private URL page.

djbrooke · 2021-10-25T17:49:51Z

Thanks @jggautier!

I mentioned in Slack that I think it would be challenging to implement a programmatic fix for this, as you'd need to obscure the collection name and also potentially obscure names of other datasets, subcollections, parent collections etc. We could also revisit the functionality generally in order to not allow navigation off the dataset, but this would be a big change as well - right now the application just creates a temporary user that allows the access. Food for thought if we're able to prioritize this at some point in the future.

jggautier · 2021-10-25T18:18:15Z

Thanks. Do you think users might share anonymous URLs before they realize that reviewers might see information about the repository or the dataset's Dataverse collection that could give away who the author is?

A careful depositor might check the URL before sharing it and realize this, but I think there are things we could do to increase the chances that most users will realize this, like adding this info in the User Guides or in the popup.

djbrooke · 2021-10-25T18:23:52Z

@jggautier Oh yeah, option three - better explanatory text. :) All for it!

jggautier · 2021-11-05T20:38:48Z

Hi everyone. Changes in the UI that @TaniaSchlatter and I are proposing are in the PDF, Proposed changes to Anonymous Private URL.pdf. The PDF has two boxes, the first describing how the feature works as designed now (v5.7) and the second describing changes based on reviews by the curation team at Harvard Dataverse Repository and @kmika11's review with some researchers who've needed to share their datasets anonymously.

Changes to the User Guides section about this feature are in the Google Doc at https://docs.google.com/document/d/1bn4fIPr_yhOj_DYDldzdKEZjmETV-WLYc98sWTgcg58.

The changes are meant to address the issue described in this GitHub issue as well as address confusion about the differences between the two types of URLs (#8185). (@jeisner brought up other points in an older GitHub issue, particularly about being able to anonymously share a dataset that's already been published, that this feature doesn't address.)

The next steps are:

Gathering feedback...
- From the community, particularly people who've worked on or expressed interest in this feature (including @jeisner, @Duhem123, @adam3smith, @philippconzett, @qqmyers, @meghangoodchild and
  Jonathan Bohan at CISER). Please feel free to download and review the changes described in the PDF, visit the Google Doc with changes to the User Guide section, and leave comments and questions in this GitHub issue, issue #8185, and the Google Doc with revisions to the User Guide.
- By reaching out to and testing the changes with more depositors who've already created and/or published datasets in the Harvard Dataverse Repository for double blind review
Iterate on the proposed changes based on the feedback

philippconzett · 2021-11-06T08:02:43Z

Thanks for sharing the progress on this feature! The proposed changes look all good to me. The term "Prepublish URL" is clearer than "Private URL", and the descriptions in the pop-up windows and in the user guide are all very clear. I think the anonymized version of the Prepublish URL is mainly useful in cases where a dataset is part of a double-blind peer-review process. I have added a note on this in the Google doc.

As mentioned earlier, in DataverseNO, we use a special, unpublished collection for datasets that are part of a double-blind peer review process. Page 12 in this presentation summarizes how DataverseNO currently supports double-blind peer review. See also this fake example of an anonymized dataset in our double-blind peer review collection.

Maybe an easy(?) way to enhance the Prepublish URL feature even more, could work like this:
When the depositor or curator (depending on the access rights) clicks the Prepublish URL button and selects Anonymous Review, a copy of the dataset will be pushed into an anonymized collection like the double-blind peer review collection at DataverseNO, and the copy will be anonymized following the current Anonymous Review feature.

That way, the name of the repository would still be revealed, but the collection would be anonymized.

jggautier · 2021-12-16T14:40:57Z

Thanks @philippconzett :) I think for now we've decided to change the layout and the text on the popup to help the depositor understand the limitations of the feature, like how the collection name can help reveal the the authors' identities.

I'm all for opening another issue specifically for discussing ways to remove that limitation. @djbrooke and @TaniaSchlatter, what do you think?

To get more feedback about the redesigned popup, we reviewed it with 6 people - 5 people who I found used workarounds to deposit datasets in Harvard Dataverse Repository for anonymous review and 1 person who manages a journal's Dataverse Collection in the repository and has been interested in support for anonymous review. The redesign seemed to work well and I made small text-based adjustments based on the feedback:

One depositor who saw the button for disabling the Anonymous Review URL said he worried that if he disabled the link, the people who he gave the link to would then be able to see the metadata that would identify him as an author. (During the review the depositor didn't click the button, which would show a confirmation popup that says that others "will no longer be able to use it to access your unpublished dataset".)
Another person suggested including that the "General Review" page will display metadata that could identify the author.

These screenshots of the popup show the text changes:

I also split the last block of text in two to improve readability, clarified that the files will be "accessible" if they're not restricted, and changed "data files" to "dataset's files". We heard during the review of the metadata tooltips that "data files" could be interpreted to exclude other types of files like "documentation files" and "code files", so I think it's better to use broader language here.

We learned more about the feature in general, including how discoverable it is (or isn't), and we heard things about the journal review process that I think we need to learn more about, so I'm working on summarizing that feedback and recommending next steps.

TaniaSchlatter · 2022-01-03T16:15:54Z

The wording and layout changes outlined above should help from the UI perspective, however moving them forward is not a complete programmatic fix.

meghangoodchild · 2022-01-05T19:04:57Z

Thanks for the opportunity to provide feedback. The anonymous review feature is certainly a desired feature.

Based on some discussions with members from our community, we learned about several experiences where researchers have used the private URL in their article's data availability statement (instead of the DOI). We would like to stress the importance of using terminology that emphasizes the temporary nature of the URL, such as "temporary prepublish URL" or "prepublish preview URL".

philippconzett · 2022-01-09T15:06:50Z

We have had the same experience as @meghangoodchild describes - although we have emphasized for the depositors that they must replace the private URL with the DOI before the article is published. Right now, this is the case in a Nature article that was published several months ago and so far, we have not been able to make Nature replace the private URL with the DOI. As a result, we cannot publish the dataset, because that would cause the private URL to be deactivated and the dataset URL in the reference list of the article would thus no longer resolve.

Maybe you could include some explicit wording in the private URL feature that makes depositors aware of the importance of making sure that the dataset reference in the final article must contain the dataset DOI, not the private URL.

jggautier · 2022-01-12T15:36:11Z

Thanks @meghangoodchild and @philippconzett. We've heard the same thing, and I saw that figshare mentions in their guides that their "private link" shouldn't be used to cite the data in publications. I'm proposing adjusting the name of the feature and adding a line in the popup (and in the User Guides) about how the dataset's PID should be used to cite the data in publications:

Because the name of the feature is in the URL, too, if the name makes the temporary nature of the link more obvious, hopefully it'll be more obvious to researchers and journal editors just by looking at the URL, e.g. https://demo.dataverse.org/previewurl.xhtml?token=0f04F8c2-bcer-4adf-816d-3b950c73ddce

But like I mentioned in emails, we'll be trying to contact journals and publishers to learn more about why authors have been adding this temporary URL to their articles in the first place and why there's friction when that URL needs to be replaced by the PID before the article is published. We've seen journal and publisher policies, like Springer Nature's policies, that I'd think are pretty explicit about using persistent IDs in articles to cite data. Is there anything about a publisher's or journal's processes that contribute to this friction?

We've also seen that sometimes researchers don't realize that the PIDs of unpublished datasets will "work" (lead to the datasets) once the datasets are published. Would making this fact more obvious encourage researchers to cite datasets with PIDs instead of private URLs?

jggautier · 2022-01-25T17:21:13Z

@TaniaSchlatter agrees that the redesign of the feature name, the popup, banner messages, and relevant guide pages are done and can be moved to development when possible.

The changes are illustrated in mockups in an image and in a section of a virtual whiteboard. They include changes to:

The name of the feature in the Edit Dataset dropdown on the dataset page
Changes to the text, layout, and interaction of the feature's popup
Changes to the text of the "Disable URL" confirmation popup
Changes to the name of feature in the URL (e.g. previewurl in https://demo.dataverse.org/previewurl.xhtml?token=39b07d51-e0aa-4a89-a179-cacd63c94d72)
Changes to banner messages shown when using the feature

The changes to the guide pages - pages in the User, API, Installation, Developer and Style guides - are in the Google Doc at https://docs.google.com/document/d/1bn4fIPr_yhOj_DYDldzdKEZjmETV-WLYc98sWTgcg58

The change to the name of the feature will require changes to the names of associated code files, e.g. PrivateUrlUtil.java

mreekie · 2022-09-26T18:18:02Z

Worked on by

Tania
Julian

scolapasta · 2024-02-07T20:35:24Z

Removing information about the Dataverse collection should be relatively straight forward to not render (would have to see how it looks), i.e. don't show the dataverse collection header, don't show breadcrumbs.

Repository name can't be hidden as it's part of the URL.

Currently not sure about the past versions - since I'm not sure if the persistent ID is exposed in this; if it is, then an end user could use that to find the dataset. If it's not, then we should be able to not render the version tabs.

**Still need to determine how to deal with published versions - would we have published versions of a dataset needing anonymous review? @jggautier If this is a case, then we code to not allow the creation of an anonymous link once a dataset has one published version.

Other considerations:
*We still have the lingering issue of small repositories being at risk of identifying information exposure.
-Julian had suggested not providing a URL (which contains the repository name) but instead providing a PDF of the data to avoid interacting with the repository identifying information
-What alternatives can we consider?

sbarbosadataverse · 2024-02-07T20:43:54Z

From the comments on Jan 25, 2022: This task needs review (@scolapasta_: "Changes to the text, layout, and interaction of the feature's popup"

@jggautier can you explain further what this particular change would fix?

scolapasta · 2024-02-07T21:04:03Z

So reviewing with @qqmyers it does seem that anonymos peer review can only happen for initial drafts - which means there are no previously published versions. That means there's no code/logic to worry about there, but also that the suggested popups above don't even need to mention previous versions in that case.

jggautier · 2024-02-08T14:48:02Z

The changes are meant to:

Help depositors be aware that reviewers might figure out who they are based on the name of the repository and the name of the Dataverse collection that the dataset is deposited in
Help depositors be sure about the differences between the "Private URL" and "Anonymous Private URL". We wrote more about this in Some researchers unsure of difference between "Private URL" and "Anonymous Private URL" #8185
Encourage depositors to use the PID when citing the dataset, instead of using the private URL
Make depositors aware that their files are not being anonymized or changed in any way

I didn't know that anonymous peer review is available only for initial drafts of the dataset, but that's great!

adam3smith · 2024-02-08T15:54:43Z

So this is only changes the guidance, correct? I think changes look good and worthwhile, though ime you shouldn't expect big effects on user behavior from any written text.

jggautier · 2024-02-08T15:59:38Z

Yeah changes to text, and also the popup's layout and interaction

sbarbosadataverse · 2024-02-08T16:55:52Z

My next question is can we get this done and on the list of prioritization @scolapasta @jggautier
Any blockers to the changes we need to make to have this work on HDV?

qqmyers · 2024-03-25T19:41:13Z

FWIW: The text proposed, at least as of #8184 (comment) indicated that restricted files are not available with the anonymized preview URL - that is not currently the case. If this is desired, I suspect it may need to be an option (presumably some review requires looking at the restricted files?). #10403 is proposing to allow this in general (allowing users to be given the ability to view unpublished datasets but not restricted files), but that doesn't necessarily change how anonymized preview Urls work, so that would have to be handled somewhere. In any case, the text change here shouldn't include that unless/until the functionality is changed.

djbrooke added this to Priority Issues Oct 26, 2021

djbrooke self-assigned this Oct 26, 2021

djbrooke moved this to Todo in Priority Issues Oct 26, 2021

djbrooke removed their assignment Oct 26, 2021

djbrooke removed this from Priority Issues Oct 26, 2021

djbrooke mentioned this issue Oct 27, 2021

Some researchers unsure of difference between "Private URL" and "Anonymous Private URL" #8185

Open

jggautier mentioned this issue Oct 31, 2021

Investigate Anonymous Review feature for use on Harvard Dataverse Repository IQSS/dataverse.harvard.edu#119

Closed

TaniaSchlatter added the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Jan 3, 2022

cmbz added this to IQSS Dataverse Project Mar 18, 2024

cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Mar 18, 2024

cmbz added the Size: 3 A percentage of a sprint. 2.1 hours. label Mar 26, 2024

cmbz moved this from SPRINT- NEEDS SIZING to SPRINT READY in IQSS Dataverse Project Mar 26, 2024

DS-INRAE added this to Recherche Data Gouv Jul 10, 2024

DS-INRAE moved this to ⚠️ Needed/Important in Recherche Data Gouv Jul 10, 2024

sekmiller added the FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) label Oct 3, 2024

sekmiller moved this from SPRINT READY to In Progress 💻 in IQSS Dataverse Project Oct 3, 2024

sekmiller self-assigned this Oct 3, 2024

sekmiller added a commit that referenced this issue Oct 8, 2024

#8184 ui-bundle changes

cd2fa36

sekmiller added a commit that referenced this issue Oct 8, 2024

#8184 fix integration test text

06d4fa5

cmbz added Size: 10 A percentage of a sprint. 7 hours. FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) and removed Size: 3 A percentage of a sprint. 2.1 hours. labels Oct 9, 2024

sekmiller added a commit that referenced this issue Oct 16, 2024

#8184 update popup and Bundle

e85c422

sekmiller added a commit that referenced this issue Oct 16, 2024

#8184 change nominal url add redirect

8084bc6

sekmiller added a commit that referenced this issue Oct 16, 2024

#8184 revert redirect url

bc79834

sekmiller added a commit that referenced this issue Oct 16, 2024

#8184 fix unit tests

53e2f0e

sekmiller added a commit that referenced this issue Oct 16, 2024

#8184 fix existing test

57960c6

sekmiller added a commit that referenced this issue Oct 16, 2024

#8184 update constructor/test urls

1b8a257

sekmiller added a commit that referenced this issue Oct 17, 2024

#8184 add updated api endpoints - deprecate private url

8af5b1c

sekmiller added a commit that referenced this issue Oct 22, 2024

#8184 display url in popup upon creation

c8b9b38

sekmiller added a commit that referenced this issue Oct 22, 2024

#8184 add release notes

5ae208d

sekmiller added a commit that referenced this issue Oct 22, 2024

#8184 add endpoints for get by token

c1acf33

sekmiller linked a pull request Oct 23, 2024 that will close this issue

rename Private URL to Preview URL and other changes #10961

Open

sekmiller removed this from IQSS Dataverse Project Oct 23, 2024

sekmiller added a commit that referenced this issue Oct 29, 2024

#8184 hide breadcrumbs/header for anon access

016fc9a

sekmiller added a commit that referenced this issue Oct 30, 2024

#8184 acceptance testing

fa904b6

sekmiller added a commit that referenced this issue Oct 31, 2024

#8184 update disable button labels

a8dbae4

sekmiller added a commit that referenced this issue Oct 31, 2024

#8184 update popup message per Julian

42ac8c0

sekmiller added a commit that referenced this issue Nov 7, 2024

#8184 remove deprecated code redundancy

d6001c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reviewers using anonymous private URL might learn dataset author's identity from information about the Dataverse installation or collection #8184

Reviewers using anonymous private URL might learn dataset author's identity from information about the Dataverse installation or collection #8184

jggautier commented Oct 25, 2021 •

edited

Loading

djbrooke commented Oct 25, 2021

jggautier commented Oct 25, 2021 •

edited

Loading

djbrooke commented Oct 25, 2021

jggautier commented Nov 5, 2021 •

edited

Loading

philippconzett commented Nov 6, 2021 •

edited

Loading

jggautier commented Dec 16, 2021 •

edited

Loading

TaniaSchlatter commented Jan 3, 2022

meghangoodchild commented Jan 5, 2022

philippconzett commented Jan 9, 2022

jggautier commented Jan 12, 2022 •

edited

Loading

jggautier commented Jan 25, 2022

mreekie commented Sep 26, 2022

scolapasta commented Feb 7, 2024 •

edited by sbarbosadataverse

Loading

sbarbosadataverse commented Feb 7, 2024

scolapasta commented Feb 7, 2024

jggautier commented Feb 8, 2024

adam3smith commented Feb 8, 2024

jggautier commented Feb 8, 2024

sbarbosadataverse commented Feb 8, 2024

qqmyers commented Mar 25, 2024

Reviewers using anonymous private URL might learn dataset author's identity from information about the Dataverse installation or collection #8184

Reviewers using anonymous private URL might learn dataset author's identity from information about the Dataverse installation or collection #8184

Comments

jggautier commented Oct 25, 2021 • edited Loading

djbrooke commented Oct 25, 2021

jggautier commented Oct 25, 2021 • edited Loading

djbrooke commented Oct 25, 2021

jggautier commented Nov 5, 2021 • edited Loading

philippconzett commented Nov 6, 2021 • edited Loading

jggautier commented Dec 16, 2021 • edited Loading

TaniaSchlatter commented Jan 3, 2022

meghangoodchild commented Jan 5, 2022

philippconzett commented Jan 9, 2022

jggautier commented Jan 12, 2022 • edited Loading

jggautier commented Jan 25, 2022

mreekie commented Sep 26, 2022

scolapasta commented Feb 7, 2024 • edited by sbarbosadataverse Loading

sbarbosadataverse commented Feb 7, 2024

scolapasta commented Feb 7, 2024

jggautier commented Feb 8, 2024

adam3smith commented Feb 8, 2024

jggautier commented Feb 8, 2024

sbarbosadataverse commented Feb 8, 2024

qqmyers commented Mar 25, 2024

jggautier commented Oct 25, 2021 •

edited

Loading

jggautier commented Oct 25, 2021 •

edited

Loading

jggautier commented Nov 5, 2021 •

edited

Loading

philippconzett commented Nov 6, 2021 •

edited

Loading

jggautier commented Dec 16, 2021 •

edited

Loading

jggautier commented Jan 12, 2022 •

edited

Loading

scolapasta commented Feb 7, 2024 •

edited by sbarbosadataverse

Loading