The user.yaml
file is one way to get authorization information into Gen3. It is ingested via Fence's usersync
script. The format of this file is tightly coupled with the notions of resource, role and policy as defined by Gen3's policy engine, Arborist.
For Gen3 Data Commons that do not use Arborist or that use the Google Data Access method of Google Service Account Registration, refer to the Deprecated format section.
In a fully deployed Gen3 Commons using Cloud Automation, the user.yaml
file is usually hosted in S3 and configured via the global.useryaml_s3path
setting of the Gen3 Data Commons manifest:
{
"global": {
"useryaml_s3path": "s3://bucket-name/path/to/user.yaml",
...
},
...
}
A template, ready-to-use user.yaml
file can be found here.
When updating your user.yaml
file, you should use the gen3users
CLI to validate it before use.
Note that the user.yaml
example below is minimal, as the goal is only to describe its structure. For a working user.yaml
file that contains everything needed to get started, refer to the base user.yaml instead.
authz:
# policies automatically given to anyone, even if they are not authenticated
anonymous_policies:
- open_data_reader
# policies automatically given to authenticated users (in addition to their other policies)
all_users_policies: []
# each group can contain multiple policies and multiple users
groups:
- name: program1_readers
policies:
- program1_reader
users:
- [email protected]
# resource tree
resources:
- name: open
- name: programs
subresources:
- name: program1
# each policy can contain multiple roles and multiple resources
policies:
- id: open_data_reader
role_ids:
- reader
- storage_reader
resource_paths:
- /open
- id: program1_reader
description: Read access to program1
role_ids:
- reader
- storage_reader
resource_paths:
- /programs/program1
- id: program1_indexd_admin
description: Admin access to program1
role_ids:
- indexd_admin
resource_paths:
- /programs/program1
# currently existing methods are `read`, `create`, `update`,
# `delete`, `read-storage` and `write-storage`
roles:
- id: reader
permissions:
- id: reader
action:
method: read
service: '*'
- id: storage_reader
permissions:
- id: storage_reader
action:
method: read-storage
service: '*'
- id: creator
permissions:
- id: creator
action:
method: create
service: '*'
- id: indexd_admin
permissions:
- id: indexd_admin
action:
method: '*'
service: indexd
# OIDC clients
clients:
client1:
policies:
- open_data_reader
# all users must be defined here, even if they are not granted
# any individual permissions outside of the groups they are in.
# additional arbitrary information can be added in `tags`.
users:
[email protected]: {}
username2:
tags:
name: John Doe
email: [email protected]
policies:
- program1_reader
The resource tree contains, among other resources, the programs and projects created via Sheepdog. If you created a program { "name": "program1" }
and a project { "name": "project1", "dbgap_accession_number": "phs1", "code": "P1" }
, your resource tree should contain the following:
resources:
- name: programs
subresources:
- name: program1
subresources:
- name: projects
subresources:
- name: P1
Policies would refer to this resource as /programs/program1/projects/P1
.
There are several ways to attach a policy to a user:
- In the
users
section, under the appropriate username, in the list ofpolicies
; - In the
groups
section, add the username to the group'susers
and the policy to the group'spolicies
; - In the
anonymous_policies
group, add policies that anyone should have (there is no need to set specific usernames in this case); - In the
all_users_policies
group, add policies that all logged in users should have (there is no need to set specific usernames in this case).
Policies can also be attached to Fence OIDC clients in the clients
section. Use the client's name
(not client_id
) to grant access to a client.
{"message":"You don't have access to this resource: Unauthorized: User must be Sheepdog program admin"}
If you are using Arborist and you get this error message when trying to create a program or a project, you need to add the following to your user.yaml
file and grant the services.sheepdog-admin
policy to admin users:
- resources:
- name: services
subresources:
- name: sheepdog
subresources:
- name: submission
subresources:
- name: program
- name: project
- role:
# Sheepdog admin role
- id: sheepdog_admin
description: sheepdog admin role for program project crud
permissions:
- id: sheepdog_admin_action
action:
service: sheepdog
method: '*'
- policy:
- id: services.sheepdog-admin
description: CRUD access to programs and projects
role_ids:
- sheepdog_admin
resource_paths:
- /services/sheepdog/submission/program
- /services/sheepdog/submission/project
- While Arborist itself allows granular and inherited access through use of its resource tree / paths, granular access control beyond the
program
andproject
in the current Gen3 graph is not supported at the moment. - Arborist does not support policies granting access to a root resource
/
.
The global cloud_providers
and groups
sections are deprecated.
The users.admin
flag used below is the deprecated way of granting program and project CRUD access.
The users.projects
section used below is the deprecated way of providing access. We should now use users.policies
for individual access and groups
for group access.
users:
username1:
admin: true
projects:
- auth_id: program1
privilege:
- read
- read-storage
- write-storage
For Gen3 Data Commons that do not use Arborist or use the Google Data Access method of Google Service Account Registration
When Arborist is not being used (which is when the deprecated acl
field of Indexd records is used for access control instead of the newer authz
field), or when the Google Data Access method of Google Service Account Registration is used, only the access granted to users through the deprecated user.yaml
format will take effect. This is how you should configure your user.yaml
if you are not using Arborist:
authz:
user_project_to_resource:
program1: /programs/program1
resources:
- name: programs
subresources:
- name: program1
- name: program2
users:
username1:
projects:
- auth_id: program1
privilege:
- read
- auth_id: program2
resource: /programs/program2
privilege:
- read
The user_project_to_resource
section can be used to avoid specifying a resource path for each users.projects.resource
.
What is involved in making a project "public"; that is, making both the metadata and object files accessible to anyone who visits the Data Commons?
Arborist can be configured to apply a policy to all users who visit the system. This is done via the special user.yaml
field anonymous_policies
. Note that the same can be done with all_users_policies
instead of anonymous_policies
if access should be granted to all authenticated users instead of both authenticated and non-authenticated users.
The example below shows the setup for a program PUBLIC_PROGRAM
and a project PROJECT_1
under it. Because the policy PUBLIC_PROGRAM_reader
, which grants access to this program, is in anonymous_policies
, this program and all the subresources under it are accessible to all users.
Structured graph data in program PUBLIC_PROGRAM
and data files whose indexd records' authz
field includes /programs/PUBLIC_PROGRAM/projects/PROJECT_1
will both be publicly accessible.
authz:
# policies automatically given to anyone, even if they haven't authenticated
anonymous_policies:
- PUBLIC_PROGRAM_reader
resources:
- name: programs
subresources:
- name: PUBLIC_PROGRAM
subresources:
- name: projects
subresources:
- name: PROJECT_1
policies:
- id: PUBLIC_PROGRAM_reader
role_ids:
- reader
- storage_reader
resource_paths:
- /programs/PUBLIC_PROGRAM
Arborist is very flexible: we could define an open policy per public program, or per public project, or even a single open policy with a list of all open resources.
Note that we may alter the behavior around "/open" in the future so as not to have hard-coded resource logic in Fence, so relying on this behavior is not recommended.
/open
is a special resource supported by Gen3. It is only used for data files (in the authz
field of indexd records).
An indexd record's authz
field containing the resouce path /open
means that Fence doesn't need to sign presigned URLs. Fence will assume the bucket is public. When a user tries to download the file, Fence will return a non-signed URL.
If the bucket is not public but the data should be publicly accessible, public access should be granted via the user.yaml
file but /open
should not be added in the authz
field.
The example below shows how to set up public access to the /open
resource.
authz:
# policies automatically given to anyone, even if they haven't authenticated
anonymous_policies:
- open_data_reader
resources:
- name: open
policies:
- id: open_data_reader
role_ids:
- reader
- storage_reader
resource_paths:
- /open