how to push sensitive information to a confidential section #261

alecristia · 2021-07-26T11:53:15Z

alecristia
Jul 26, 2021
Maintainer

It's a good idea to keep a backup that is well organized and complete, including perhaps some sensitive information. To this end, you can have a datalad dataset set up with a structure whereby sensitive information goes into a "confidential" section. We are not going to explain how to set this up here, but instead assume this has already been done, and what you want to know is how to contribute to such a repository.

To make this more precise, we are going to be using the soderstrom corpus, which is part of the EL1000 superdataset. Soderstrom is private, and only admins have access to the confidential portion, so you will not be able to reproduce these steps unless you are one of the admins.

1. Make sure you have access to the confidential portion

For the Soderstrom dataset, this means following the EL1000's instructions for "gaining access to the data". Once we gave you the go-ahead, you can check that you can visit this site when you are logged in. If you can, that means you have access rights. (If you get a 404 error, you are either not logged in, or you don't have access rights.)

2. Make a local version of the dataset (not the confidential part, but the normal part)

For the Soderstrom dataset, this means that you navigate where you want to make a local copy of the data, and do:

datalad install [email protected]:/EL1000/winnipeg.git
cd winnipeg
datalad run-procedure setup --confidential

3. Add your confidential files in a subfolder that contains "confidential" in its path -- provided the repo has been correctly set up

How these paths will work (ie whether the contents will be reflected only in the confidential section or broadly) depend on how your datalad dataset was set up. If you are not sure, it's better to do a trial run without any sensitive information.

If you are adding .its files, then put your confidential .its files inside annotations/its/confidential/raw/.

If you are adding unvetted .eaf files that contain sensitive info, then put your confidential .eaf files inside annotations/eaf/confidential/.

If you are adding metadata, then put it inside metadata/confidential/.

If you are adding unvetted recordings, then put them inside recordings/raw/confidential/. NOTE: This one is not currently set up for the Soderstrom corpus!! So please don't share unvetted recordings just yet. If you are hoping to, let us know!

And for anything else, consider putting them inside extra/ but always using confidential/ in the path. For instance, perhaps you want to put transcripts of a discussion with the families. Then you can name the folder extra/interviews/confidential/, so that one day you can put vetted transcripts in extra/interviews/.

4. Submit your changes

Share your changes by:

datalad save . -m "added files"
datalad push --to gin

lucasgautheron · 2021-07-26T12:11:43Z

lucasgautheron
Jul 26, 2021
Maintainer

2. Make a local version of the dataset (not the confidential part, but the normal part)

For the Soderstrom dataset, this means that you navigate where you want to make a local copy of the data, and do:
datalad install [email protected]:/EL1000/winnipeg.git
cd winnipeg

There is one crucial step here which is missing:

datalad run-procedure setup --confidential

This will configure the confidential sibling (pointing to winnipeg-confidential) and the rules to decide which content should go where.
Otherwise, everything will be pushed to the main repository.

If you are adding unvetted recordings, then put them inside recordings/confidential/raw/.

This would be ignored by ChildProject, which looks for recordings inside of recordings/raw or recordings/converted/something, but not in recordings/whatever.

If you think it is necessary, I'll add more rules to EL1000 dataset' setup procedure to allow for recordings (like what has been done with png2019 et tsimane)

1 reply

alecristia Jul 26, 2021
Maintainer Author

oops - I'll fix the missing step!

About the rules, let me fix the wording to say that this needs to be set up properly.

alecristia · 2021-10-11T12:57:47Z

alecristia
Oct 11, 2021
Maintainer Author

update: Trev uploaded the confidential .its and I saw them online, so I know the upload was successful. However, now I'm trying to anonymize (then re-import) the .its, and they don't seem to be there: https://gin.g-node.org/EL1000/winnipeg/src/main/annotations/its/confidential/raw/C004_20090801.its

when I look at the history, I only see his commit:
https://gin.g-node.org/EL1000/winnipeg/commit/d953ab0c170359fd6ef93660fe8b6f4ab3cfc5e8

not sure what we did wrong - help welcome!

1 reply

lucasgautheron Oct 11, 2021
Maintainer

Nothing! Confidential files are stored into winnipeg-confidential. You can see them there: https://gin.g-node.org/EL1000/winnipeg-confidential/src/main/annotations/its/confidential/raw/C004_20090801.its

To access them from your clone of winnipeg, you need to to datalad run-procedure setup --confidential at least once; then you can datalad get them

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to push sensitive information to a confidential section #261

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

2. Make a local version of the dataset (not the confidential part, but the normal part)

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

how to push sensitive information to a confidential section #261

alecristia Jul 26, 2021 Maintainer

1. Make sure you have access to the confidential portion

2. Make a local version of the dataset (not the confidential part, but the normal part)

3. Add your confidential files in a subfolder that contains "confidential" in its path -- provided the repo has been correctly set up

4. Submit your changes

Replies: 2 comments · 2 replies

lucasgautheron Jul 26, 2021 Maintainer

2. Make a local version of the dataset (not the confidential part, but the normal part)

alecristia Jul 26, 2021 Maintainer Author

alecristia Oct 11, 2021 Maintainer Author

lucasgautheron Oct 11, 2021 Maintainer

alecristia
Jul 26, 2021
Maintainer

Replies: 2 comments 2 replies

lucasgautheron
Jul 26, 2021
Maintainer

alecristia Jul 26, 2021
Maintainer Author

alecristia
Oct 11, 2021
Maintainer Author

lucasgautheron Oct 11, 2021
Maintainer