-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a QC namespace #233
Comments
Sometimes iterative and early QC requires the full dataset, some examples:
|
Would also welcome thoughts on whether storage of data in the If we reinforce the policy that all 'final' analysis should still be done in Access to the User requesting access to the |
Given the increased security risk, we could also have a mandatory induction/tutorial for accessing this
But the 2nd and 3rd points can be screened in my previous post above - if we require users to specify their use case before granting them access. |
It sounds logistically like a lot more work to implement, but I think it's worth it. There are always going to be interesting outliers or unexpected subsets of the data in any cohort we ingest. The current set-up creates a lot of friction when investigating these types of results (in practical terms we still could, but there is the PR system (and time lag between submitting a PR and getting it approved; not to mention the cost of context switching when one briefly has to work on a different analysis while waiting for the PR to get approved and then switching back when that PR gets approved) that can disincentivize analysts from exploring subsets of the data. But this also needs to be balanced with data security, efficient cloud storage use, and aligned with existing CPG principles for |
A few thoughts/questions that might be useful to consider when designing a solution. Some of these may not be "in scope" for this specific user story.
|
Some context: https://docs.google.com/document/d/1hO4-VAKjul25_lfYrvELocgfag3TIwBzICdHaHVnATk/edit#heading=h.1atj7ihnv188
Relevant user stories:
Its usage with metamist is UNDEFINED in this, and will be more properly resolved later.
This will involve creating:
hail
,dataproc
,cromwell
users.yaml
qc
,qc-analysis
,qc-tmp
) (no web bucket)$dataset-qc
group (of persons) that has read access to theqc-analysis
, and list access toqc
/qc-tmp
(Not needed if QC can read from main)main-full
can APPEND data, but NOT read (to discourage copying results back).QC service accounts cannot access main level data(based on Hope's feedback below)depends_on
flag here, so QC groups should NOT allow access to transitive datasets.qc
service accounts should be able to access the common-main bucket (for reference data)Random notes:
@violetbrina, it's worth thinking about what other implications creating a new namespace has. analysis-runner, billing, etc.
The text was updated successfully, but these errors were encountered: