Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add studio auth to datachain #514

Merged
merged 9 commits into from
Oct 17, 2024
Merged

Add studio auth to datachain #514

merged 9 commits into from
Oct 17, 2024

Conversation

amritghimire
Copy link
Contributor

@amritghimire amritghimire commented Oct 16, 2024

Adds a new auth command to datachain that authorizes the datachain with
studio.

Commands added are:

  • datachain studio login
  • datachain studio logout
  • datachain studio token

Studio command

Authenticate Datachain with Studio and set the token. Once this token has been properly configured, Datachain will utilize it for seamlessly sharing
datasets and using Studio features from CLI

positional arguments:
  {login,logout,token}  Use `Datachain studio CMD --help` to display command-specific help.
    login               Authenticate Datachain with Studio host
    logout              Logout user from Studio
    token               View the token datachain uses to contact Studio

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  --ttl TTL             Time-to-live of data source cache. Negative equals forever.
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  --debug-sql           Show All SQL Queries (very verbose output, for debugging only)
  --pdb                 Drop into the pdb debugger on fatal exception

Login

To login, you can call datachain studio login which will open the Studio with a device code prefilled in your browser. Once the user authorizes the code, the token is saved to global config.

usage: datachain studio login [-h] [-q | -v] [-H HOSTNAME] [-s SCOPES] [-n NAME] [-d]

By default, this command authorize datachain with Studio with
 default scopes and a random  name as token name.

options:
  -h, --help            show this help message and exit
  -q, --quiet           Be quiet.
  -v, --verbose         Be verbose.
  -H HOSTNAME, --hostname HOSTNAME
                        The hostname of the Studio instance to authenticate with.
  -s SCOPES, --scopes SCOPES
                        The scopes for the authentication token.
  -n NAME, --name NAME  The name of the authentication token. It will be used to identify token shown in Studio profile.
  -d, --use-device-code
                        Use authentication flow based on user code. You will be presented with user code to enter in browser. Datachain will also use this if it
                        cannot launch browser on your behalf.

Example usages:

Normal flow

On running the command datachain studio login , following screen is presented in CLI:

datachain studio login
A web browser has been opened at
https://studio.iterative.ai/auth/device-login.
Please continue the login in the web browser.
If no web browser is available or if the web browser fails to open,
use device code flow with `datachain studio login --use-device-code`.

A webbrowser is opened with the device code in Studio.
image
Once user authorizes the token in the screen above, the token is saved to the user's global config.
Additional message as below will be shown in the CLI and will exit.

Authentication successful. The token will be available as risen-geum in Studio profile.

Device login flow

In case, you are running this in remote machine or where you are unable to open web browser, the following screen is presented.

Please open the following url in your browser.
https://studio.iterative.ai/auth/device-login
And enter the user code below 06D21JBK to authorize.

Every other flow is same.

Logout

usage: datachain studio logout [-h] [-q | -v]

This command helps to log out user from  Studio.

options:
  -h, --help     show this help message and exit
  -q, --quiet    Be quiet.
  -v, --verbose  Be verbose.

Token

usage: datachain studio token [-h] [-q | -v]

View the token datachain uses to contact Studio

options:
  -h, --help     show this help message and exit
  -q, --quiet    Be quiet.
  -v, --verbose  Be verbose.

Related to https://github.com/iterative/studio/issues/10774

Based on #513

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 94.18605% with 5 lines in your changes missing coverage. Please review.

Project coverage is 87.43%. Comparing base (f29e034) to head (b1d6ace).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/studio.py 92.15% 2 Missing and 2 partials ⚠️
src/datachain/cli.py 95.23% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #514      +/-   ##
==========================================
+ Coverage   87.16%   87.43%   +0.26%     
==========================================
  Files          96       97       +1     
  Lines        9991    10069      +78     
  Branches     1367     1374       +7     
==========================================
+ Hits         8709     8804      +95     
+ Misses        928      908      -20     
- Partials      354      357       +3     
Flag Coverage Δ
datachain 87.40% <94.18%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@0x2b3bfa0
Copy link
Member

281406025-8662d52b-9523-4db4-9742-02808d7e3ec4

Why does it day "DvcX operations" instead of DataChain? 😅

src/datachain/cli.py Outdated Show resolved Hide resolved
src/datachain/cli.py Outdated Show resolved Hide resolved
src/datachain/cli.py Outdated Show resolved Hide resolved
src/datachain/studio.py Outdated Show resolved Hide resolved
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @amritghimire . some minor questions here and there

@amritghimire
Copy link
Contributor Author

281406025-8662d52b-9523-4db4-9742-02808d7e3ec4

Why does it day "DvcX operations" instead of DataChain? 😅

Oh, I copied the description from dvc PR for the screenshot 😀

Copy link

cloudflare-workers-and-pages bot commented Oct 17, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: b1d6ace
Status: ✅  Deploy successful!
Preview URL: https://1bce5ccb.datachain-documentation.pages.dev
Branch Preview URL: https://amrit-studio-auth.datachain-documentation.pages.dev

View logs

Base automatically changed from amrit/config to main October 17, 2024 14:59
An error occurred while trying to automatically change base from amrit/config to main October 17, 2024 14:59
As a part of #10774, this introduces a process to save the configuration
in local, system and global configuration.

The precedence of the level are as:
- system
- global
- local

Local configuration overrides global and so on.

This borrows the logic of how configuration is managed in DVC.
Adds a new auth command to datachain that authorizes the datachain with
studio.

Commands added are:
- `datachain studio login`
- `datachain studio logout`
- `datachain studio token`

## Studio command
```sh
Authenticate Datachain with Studio and set the token. Once this token has been properly configured, Datachain will utilize it for seamlessly sharing
datasets and using Studio features from CLI

positional arguments:
  {login,logout,token}  Use `Datachain studio CMD --help` to display command-specific help.
    login               Authenticate Datachain with Studio host
    logout              Logout user from Studio
    token               View the token datachain uses to contact Studio

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  --ttl TTL             Time-to-live of data source cache. Negative equals forever.
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  --debug-sql           Show All SQL Queries (very verbose output, for debugging only)
  --pdb                 Drop into the pdb debugger on fatal exception
```

## Login

To login, you can call `datachain studio login` which will open the Studio with a device code prefilled in your browser. Once the user authorizes the code, the token is saved to global config.
```sh
usage: datachain studio login [-h] [-q | -v] [-H HOSTNAME] [-s SCOPES] [-n NAME] [-d]

By default, this command authorize datachain with Studio with
 default scopes and a random  name as token name.

options:
  -h, --help            show this help message and exit
  -q, --quiet           Be quiet.
  -v, --verbose         Be verbose.
  -H HOSTNAME, --hostname HOSTNAME
                        The hostname of the Studio instance to authenticate with.
  -s SCOPES, --scopes SCOPES
                        The scopes for the authentication token.
  -n NAME, --name NAME  The name of the authentication token. It will be used to identify token shown in Studio profile.
  -d, --use-device-code
                        Use authentication flow based on user code. You will be presented with user code to enter in browser. Datachain will also use this if it
                        cannot launch browser on your behalf.
```

### Example usages:

#### Normal flow
On running the command `datachain studio login` , following screen is presented in CLI:
```sh
datachain studio login
A web browser has been opened at
https://studio.iterative.ai/auth/device-login.
Please continue the login in the web browser.
If no web browser is available or if the web browser fails to open,
use device code flow with `datachain studio login --use-device-code`.
```

A webbrowser is opened with the device code in Studio.
<img width="632" alt="image" src="https://github.com/iterative/dvc/assets/16842655/8662d52b-9523-4db4-9742-02808d7e3ec4">
Once user authorizes the token in the screen above, the token is saved to the user's global config.
Additional message as below will be shown in the CLI and will exit.
```sh
Authentication successful. The token will be available as risen-geum in Studio profile.
```

#### Device login flow
In case, you are running this in remote machine or where you are unable to open web browser, the following screen is presented.
```
Please open the following url in your browser.
https://studio.iterative.ai/auth/device-login
And enter the user code below 06D21JBK to authorize.
```
Every other flow is same.

## Logout
```sh
usage: datachain studio logout [-h] [-q | -v]

This command helps to log out user from  Studio.

options:
  -h, --help     show this help message and exit
  -q, --quiet    Be quiet.
  -v, --verbose  Be verbose.
```

## Token
```sh
usage: datachain studio token [-h] [-q | -v]

View the token datachain uses to contact Studio

options:
  -h, --help     show this help message and exit
  -q, --quiet    Be quiet.
  -v, --verbose  Be verbose.
```
@amritghimire amritghimire merged commit f6445e2 into main Oct 17, 2024
38 checks passed
@amritghimire amritghimire deleted the amrit/studio-auth branch October 17, 2024 15:43
@amritghimire amritghimire self-assigned this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants