Merge pull request #38 from InfuseAI/docs/20220412

Update document
InfuseAI · Apr 12, 2022 · 6895c74 · 6895c74
2 parents 0475d74 + 0f17b3f
commit 6895c74
Show file tree

Hide file tree

Showing 17 changed files with 359 additions and 47 deletions.
diff --git a/README.md b/README.md
@@ -1,26 +1,19 @@
 # ArtiVC
 
-[ArtiVC](https://artivc.io/) (**Arti**facts **V**ersion **C**ontrol) is a version control system for large files. 
-
-To store and share large files, we may use NFS or object storage (e.g. s3, MinIO). However, if we would like to do versioning on top of them, it is not a trivial thing. ArtiVC is a CLI tool to enable you to version files on your storage without pain. You don't need to install any additional server or gateway and we turn your storage into the versioned repository.
+[ArtiVC](https://artivc.io/) (**Arti**facts **V**ersion **C**ontrol) is a handy command-line tool for data versioning on cloud storage. With only one command, it helps you neatly snapshot your data and Switch data between versions. Even better, it seamlessly integrates your existing cloud environment. ArtiVC supports three major cloud providers (AWS S3, Google Cloud Storage, Azure Blob Storage) and the remote filesystem using SSH.
 
 [![asciicast](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257.svg)](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257?autoplay=1)
 
 Try it out from the [Getting Started](https://artivc.io/usage/getting-started/) guide
 
 # Features
 
-- **Use your own storage**: If you store data in NFS or S3, just use the storage you already use.
-- **No additional server required**: ArtiVC is a CLI tool. No server or gateway is required to install or operate.
-- **Multiple backend support**: Currently, we support local, NFS (by local repo), and s3. And more in the future
-
-- **Reproducible**: A commit is stored in a single file and cannot be changed. There is no way to add/remove/modify a single file in a commit.
-- **Expose your data publicly**: Expose your repository with a public HTTP endpoint, then you can download your data in this way
-  ```
-  avc get -o /tmp/dataset https://mybucket.s3.ap-northeast-1.amazonaws.com/path/to/my/[email protected]
-  ```
-- **Smart storage and transfer**: For the same content of files, there is only one instance stored in the artifact repository. If a file has been uploaded by other commits, no upload is required because we know the file is already there in the repository. Under the hood, we use [content-addressable storage](https://en.wikipedia.org/wiki/Content-addressable_storage) to put the objects.
-
+- **Data Versioning**: Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the specific version.
+- **Use your own storage**: We are used to putting large files in NFS or S3. To use ArtiVC, you can keep putting your files on the same storage without changes.
+- **No additional server is required**: ArtiVC is a CLI tool. No server or gateway is required to install and operate.
+- **Multiple backends support**:  ArtiVC natively supports local filesystem, remote filesystem (by SSH), AWS S3, Google Cloud Storage, and Azure Blob Storage as backend. And 40+ backends are supported through [Rclone](https://artivc.io/backends/rclone/) integration. [Learn more](https://artivc.io/backends/)
+- **Painless Configuration**:  No one like to configure. So we leverage the original configuration as much as possible. Use `.ssh/config` for ssh access, and use `aws configure`, `gcloud auth application-default login`, `az login` for the cloud platforms.
+- **Efficient storage and transfer**:  The file structure of the repository is stored and transferred efficiently by [design](https://artivc.io/design/how-it-works/). It prevents storing duplicated content and minimum the number of files to upload when pushing a new version. [Learn more](https://artivc.io/design/benchmark/)
 
 # Documentation
 

diff --git a/docs/content/en/_index.md b/docs/content/en/_index.md
@@ -7,12 +7,10 @@ geekdocAnchor: false
 ---
 
 {{< columns >}}
-### ArtiVC (Artifact Version Control) is a version control system for large files.
 
-
-**rsync** is an ssh-based tool that provides fast incremental file transfer.<br>
-**Rclone** is a rsync-like tool for cloud storage.<br>
-**ArtiVC** is like Git for files versioning and like Rclone for cloud storage.
+<p style="text-align: left">
+<b>ArtiVC (Artifact Version Control) is a handy command-line tool for data versioning on cloud storage.</b> With only one command, it helps you neatly snapshot your data and Switch data between versions. Even better, it seamlessly integrates your existing cloud environment. ArtiVC supports three major cloud providers (AWS S3, Google Cloud Storage, Azure Blob Storage) and the remote filesystem using SSH.
+</p>
 
 <--->
 [![asciicast](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257.svg)](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257?autoplay=1)
@@ -25,17 +23,17 @@ geekdocAnchor: false
 {{< columns >}}
 ### Data Versioning
 
-Version your data like versioning code. ArtiVC supports commmit history, commit message, version tag. You can diff two commits, pull data from speciifc version.
+Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the specific version.
 
 <--->
 
 ### Use your own storage
 
-We are used to putting large files in NFS or S3. To use ArtiVC, you can keep put your files on the same storage without changes.
+We are used to putting large files in NFS or S3. To use ArtiVC, you can keep putting your files on the same storage without changes.
 
 <--->
 
-### No additional server required
+### No additional server is required
 
 ArtiVC is a CLI tool. No server or gateway is required to install and operate.
 
@@ -45,19 +43,19 @@ ArtiVC is a CLI tool. No server or gateway is required to install and operate.
 
 ### Multiple backends support
 
-ArtiVC natively supports local filesystem, remote filesystem (by SSH), AWS S3, Google Cloud Storage, Azure Blob Storage as backend. And 40+ backends are supported through [Rclone](backends/rclone/) integration. [Learn more](backends/)
+ArtiVC natively supports local filesystem, remote filesystem (by SSH), AWS S3, Google Cloud Storage, and Azure Blob Storage as backend. And 40+ backends are supported through [Rclone](backends/rclone/) integration. [Learn more](backends/)
 
 <--->
 
 ### Painless Configuration
 
-No one like to configure. So we leverage the original configuraion as much as possible. Use `.ssh/config` for ssh access, and use `aws configure`, `gcloud auth application-default login`, `az login` for the cloud platforms.
+No one like to configure. So we leverage the original configuration as much as possible. Use `.ssh/config` for ssh access, and use `aws configure`, `gcloud auth application-default login`, `az login` for the cloud platforms.
 
 <--->
 
 ### Efficient storage and transfer
 
-The file structure of repository is storage and transfer effiecntly by [design](design/how-it-works/). It prevents from storing duplicated content and minimum the number of files to upload when pushing a new version. [Learn more](design/benchmark/)
+The file structure of the repository is stored and transferred efficiently by [design](design/how-it-works/). It prevents storing duplicated content and minimum the number of files to upload when pushing a new version. [Learn more](design/benchmark/)
 
 
 {{< /columns >}}
diff --git a/docs/content/en/backends/azureblob.md b/docs/content/en/backends/azureblob.md
@@ -3,6 +3,8 @@ title: Azure Blob Storage
 weight: 13
 ---
 
+{{< toc >}}
+
 Use [Azure Blob Storage](https://azure.microsoft.com/services/storage/blobs/) as the repository backend.
 
 ## Configuration
@@ -22,6 +24,8 @@ The logged-in account requires **Storage Blob Data Contributor** role to the sto
 For more information, please see https://docs.microsoft.com/azure/storage/blobs/assign-azure-role-data-access
 {{< /hint >}}
 
+The azure blob storage backend authenticates by a default procedure defined by [Azure SDK for Go](https://docs.microsoft.com/azure/developer/go/azure-sdk-authentication)
+
 ### Use Azure CLI to login
 
 This backend supports to use [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli) to configure the login account. It will open the browser and start the login process. 

diff --git a/docs/content/en/backends/gcs.md b/docs/content/en/backends/gcs.md
@@ -3,6 +3,8 @@ title: Google Cloud Storage
 weight: 12
 ---
 
+{{< toc >}}
+
 Use [Google Cloud Storage (GCS)](https://cloud.google.com/storage) as the repository backend.
 
 Note that Google Cloud Storage is not [Google Drive](https://www.google.com.tw/drive/). They are different google product.
@@ -30,7 +32,7 @@ Before using the backend, you have to configure the service account credential.
 1. Use the service account in the GCP resources (e.g. GCE, GKE). It is recommended way if the `ArtiVC` is run in the GCP environment. Please see [default service accounts](https://cloud.google.com/iam/docs/service-accounts#default) document
 
 
-    
+The GCS backend finds credentials by a default procedure defined by [Google Cloud](https://cloud.google.com/docs/authentication/production)
 
 
 
@@ -46,3 +48,10 @@ Clone a repository
 avc clone gs://mybucket/path/to/mydataset
 cd mydataset/
 ```
+
+
+## Environment Variables
+
+| Name | Description | Default value |
+| --- | --- | --- |
+| `GOOGLE_APPLICATION_CREDENTIALS` | The location of service account keys in JSON |  |
diff --git a/docs/content/en/backends/s3.md b/docs/content/en/backends/s3.md
@@ -3,6 +3,8 @@ title: AWS S3
 weight: 11
 ---
 
+{{< toc >}}
+
 Use the S3 as the repository backend.
 
 ## Features
@@ -12,7 +14,17 @@ Use the S3 as the repository backend.
 
 ## Configuration
 
-Prepare the `~/.aws/credentials` to access the s3 backend. Please see the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
+1. Install the [AWS CLI](https://aws.amazon.com/cli/)
+2. Configure the AWS CLI. Please see the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
+   ```
+   aws configure
+   ```
+3. Check current config
+   ```
+   aws configure list      
+   ```
+
+The S3 backend loads configuration by a default procedure of [AWS SDK for Go](https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/#specifying-credentials)
 
 ## Usage
 
@@ -26,3 +38,12 @@ Clone a repository
 avc clone s3://mybucket/path/to/mydataset
 cd mydataset/
 ```
+
+## Environment Variables
+
+| Name | Description | Default value |
+| --- | --- | --- |
+| `AWS_ACCESS_KEY_ID` | The access key |  |
+| `AWS_SECRET_ACCESS_KEY` | The access secret key |  |
+| `AWS_PROFILE` | The profile to use in the credential file | `default` |
+| `AWS_REGION` | The region to use | the region from profile |
diff --git a/docs/content/en/backends/ssh.md b/docs/content/en/backends/ssh.md
@@ -3,6 +3,8 @@ title: Remote Filesystem (SSH)
 weight: 2
 ---
 
+{{< toc >}}
+
 Use remote filesystem through SSH as the repository backend.
 
 ## Features

diff --git a/docs/content/en/design/images/benchmark1.svg b/docs/content/en/design/images/benchmark1.svg
diff --git a/docs/content/en/design/images/benchmark2.svg b/docs/content/en/design/images/benchmark2.svg
diff --git a/docs/content/en/design/images/benchmark3.svg b/docs/content/en/design/images/benchmark3.svg
diff --git a/docs/content/en/usage/dryrun.md b/docs/content/en/usage/dryrun.md
@@ -0,0 +1,36 @@
+---
+title: Dry Run
+weight: 11
+---
+
+Pushing and pulling data is time-consuming. And need to be double-checked before transferring. Dry-run is the feature that allows listing the changeset before sending.
+
+
+## Push
+
+1. Dry run before pushing
+    ```shell
+    avc push --dry-run
+    ```
+
+1. Do the actual push
+    ```
+    avc push
+    ```
+
+## Pull
+
+1. Dry run before pulling
+    ```shell
+    avc pull -dry-run
+    # or check in delete mode
+    # avc pull --delete -dry-run
+    ```
+
+1. Do the actual pull
+
+    ```shell
+    avc pull
+    # avc pull --delete
+    ```
+
diff --git a/docs/content/en/use-cases/expose.md → docs/content/en/usage/expose.md b/docs/content/en/use-cases/expose.md → docs/content/en/usage/expose.md
@@ -1,9 +1,9 @@
 ---
-title: Expose the dataset
-weight: 3
+title: Expose the data
+weight: 20
 ---
 
-ArtiVC repository can be exposed as a http endpoint. In S3, we can just make the bucket and give the data consumer the http endpiont of the repository. In this way, we can download data through CDN or other reverse proxies.
+ArtiVC repository can be exposed as an HTTP endpoint. In S3, we can just make the bucket and give the data consumer the HTTP endpoint of the repository. In this way, we can download data through CDN or other reverse proxies.
 
 1. [Make your S3 bucket public](https://aws.amazon.com/premiumsupport/knowledge-center/read-access-objects-s3-bucket/?nc1=h_ls)
 1. Copy the public URL of your repository. For example