From 2b8dc03cb5943d861ffe35aec094df0dd11b03f8 Mon Sep 17 00:00:00 2001 From: popcorny Date: Tue, 12 Apr 2022 14:10:54 +0800 Subject: [PATCH 1/5] Refine the getting started document Signed-off-by: popcorny --- docs/content/en/backends/azureblob.md | 4 + docs/content/en/backends/gcs.md | 11 +- docs/content/en/backends/s3.md | 23 ++- docs/content/en/backends/ssh.md | 2 + docs/content/en/usage/getting-started.md | 210 ++++++++++++++++++++++- 5 files changed, 241 insertions(+), 9 deletions(-) diff --git a/docs/content/en/backends/azureblob.md b/docs/content/en/backends/azureblob.md index 1173e3b..e9ebb76 100644 --- a/docs/content/en/backends/azureblob.md +++ b/docs/content/en/backends/azureblob.md @@ -3,6 +3,8 @@ title: Azure Blob Storage weight: 13 --- +{{< toc >}} + Use [Azure Blob Storage](https://azure.microsoft.com/services/storage/blobs/) as the repository backend. ## Configuration @@ -22,6 +24,8 @@ The logged-in account requires **Storage Blob Data Contributor** role to the sto For more information, please see https://docs.microsoft.com/azure/storage/blobs/assign-azure-role-data-access {{< /hint >}} +The azure blob storage backend authenticates by a default procedure defined by [Azure SDK for Go](https://docs.microsoft.com/azure/developer/go/azure-sdk-authentication) + ### Use Azure CLI to login This backend supports to use [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli) to configure the login account. It will open the browser and start the login process. diff --git a/docs/content/en/backends/gcs.md b/docs/content/en/backends/gcs.md index 33a347f..d0bae65 100644 --- a/docs/content/en/backends/gcs.md +++ b/docs/content/en/backends/gcs.md @@ -3,6 +3,8 @@ title: Google Cloud Storage weight: 12 --- +{{< toc >}} + Use [Google Cloud Storage (GCS)](https://cloud.google.com/storage) as the repository backend. Note that Google Cloud Storage is not [Google Drive](https://www.google.com.tw/drive/). They are different google product. @@ -30,7 +32,7 @@ Before using the backend, you have to configure the service account credential. 1. Use the service account in the GCP resources (e.g. GCE, GKE). It is recommended way if the `ArtiVC` is run in the GCP environment. Please see [default service accounts](https://cloud.google.com/iam/docs/service-accounts#default) document - +The GCS backend finds credentials by a default procedure defined by [Google Cloud](https://cloud.google.com/docs/authentication/production) @@ -46,3 +48,10 @@ Clone a repository avc clone gs://mybucket/path/to/mydataset cd mydataset/ ``` + + +## Environment Variables + +| Name | Description | Default value | +| --- | --- | --- | +| `GOOGLE_APPLICATION_CREDENTIALS` | The location of service account keys in JSON | | \ No newline at end of file diff --git a/docs/content/en/backends/s3.md b/docs/content/en/backends/s3.md index 00202aa..8d18ec4 100644 --- a/docs/content/en/backends/s3.md +++ b/docs/content/en/backends/s3.md @@ -3,6 +3,8 @@ title: AWS S3 weight: 11 --- +{{< toc >}} + Use the S3 as the repository backend. ## Features @@ -12,7 +14,17 @@ Use the S3 as the repository backend. ## Configuration -Prepare the `~/.aws/credentials` to access the s3 backend. Please see the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) +1. Install the [AWS CLI](https://aws.amazon.com/cli/) +2. Configure the AWS CLI. Please see the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) + ``` + aws configure + ``` +3. Check current config + ``` + aws configure list + ``` + +The S3 backend loads configuration by a default procedure of [AWS SDK for Go](https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/#specifying-credentials) ## Usage @@ -26,3 +38,12 @@ Clone a repository avc clone s3://mybucket/path/to/mydataset cd mydataset/ ``` + +## Environment Variables + +| Name | Description | Default value | +| --- | --- | --- | +| `AWS_ACCESS_KEY_ID` | The access key | | +| `AWS_SECRET_ACCESS_KEY` | The access secret key | | +| `AWS_PROFILE` | The profile to use in the credential file | `default` | +| `AWS_REGION` | The region to use | the region from profile | diff --git a/docs/content/en/backends/ssh.md b/docs/content/en/backends/ssh.md index faa7b38..5638c5c 100644 --- a/docs/content/en/backends/ssh.md +++ b/docs/content/en/backends/ssh.md @@ -3,6 +3,8 @@ title: Remote Filesystem (SSH) weight: 2 --- +{{< toc >}} + Use remote filesystem through SSH as the repository backend. ## Features diff --git a/docs/content/en/usage/getting-started.md b/docs/content/en/usage/getting-started.md index 988a6ff..3dc3097 100644 --- a/docs/content/en/usage/getting-started.md +++ b/docs/content/en/usage/getting-started.md @@ -21,6 +21,74 @@ brew tap infuseai/artivc brew install artivc ``` +# Configuration +Here we describe how to configure credentials to access the remote backend. The principle of ArtiVC is "Use your tool's config". This allows you to access among the toolchains painlessly. + +{{}} +{{}} +No configuration required +{{}} + +{{}} +1. Configure the `~/.ssh/config` + ```bash + Host myserver + HostName myserver.hosts + User myname + IdentityFile ~/.ssh/id_ed25519 + ``` +1. Check if you can access the SSH server successfully + ``` + ssh myserver + ``` + +For more information, please see the [Remote Filesystem (SSH) backend](../../backends/ssh) +{{}} + +{{}} +1. Install the [AWS CLI](https://aws.amazon.com/cli/) +2. Configure the AWS CLI + ``` + aws configure + ``` +3. Check current config + ``` + aws configure list + ``` + +For more information, please see the [AWS S3 backend](../../backends/s3) + +{{}} + +{{}} +1. Install the [glcoud CLI](https://cloud.google.com/sdk/gcloud) +2. Login the application default credential + ``` + gcloud auth application-default login + ``` +3. Check the current credential is available + ``` + gcloud auth application-default print-access-token + ``` + +For more information, please see the [Google Cloud Storage backend](../../backends/gcs) +{{}} + +{{}} +1. Install the [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli) +2. Login the Azure CLI + ``` + az login + ``` +3. Check the login status + ``` + az account show + ``` + +For more information, please see the [Azure Blob Storage](../../backends/azureblob) +{{}} +{{}} + # Quick Start ## Push data 1. Prepare your data. We put data in the folder `/tmp/artivc/workspace` @@ -43,10 +111,43 @@ brew install artivc 1. Init the workspace + {{}} + {{}} ```shell # in /tmp/artivc/workspace avc init /tmp/artivc/repo ``` + {{}} + + {{}} + ```shell + # in /tmp/artivc/workspace + avc init :path/to/repo + ``` + {{}} + + {{}} + ```shell + # in /tmp/artivc/workspace + avc init s3:///path/to/repo + ``` + {{}} + + {{}} + ```shell + # in /tmp/artivc/workspace + avc init gs:///path/to/repo + ``` + {{}} + + {{}} + ```shell + # in /tmp/artivc/workspace + avc init https://.blob.core.windows.net//path/to/repo + ``` + {{}} + + {{}} 1. Push the data ```shell @@ -83,10 +184,45 @@ brew install artivc ## Clone data from exisiting repository 1. Go to the folder to clone repository + + {{}} + {{}} ```shell cd /tmp/artivc/ avc clone /tmp/artivc/repo another-workspace ``` + {{}} + + {{}} + ```shell + cd /tmp/artivc/ + avc clone :path/to/repo + ``` + {{}} + + {{}} + ```shell + cd /tmp/artivc/ + avc clone s3:///path/to/repo + ``` + {{}} + + {{}} + ```shell + cd /tmp/artivc/ + avc clone gs:///path/to/repo + ``` + {{}} + + {{}} + ```shell + cd /tmp/artivc/ + avc clone https://.blob.core.windows.net//path/to/repo + ``` + {{}} + + {{}} + Then the workspace is created, and the data is downloaded. 1. See the commit log @@ -98,9 +234,39 @@ brew install artivc ## Download data 1. Download the latest version - ```shell - avc get -o /tmp/artivc/dl-latest /tmp/artivc/repo - ``` + + {{}} + {{}} + ```shell + avc get -o /tmp/artivc/dl-latest /tmp/artivc/repo + ``` + {{}} + + {{}} + ```shell + avc get -o /tmp/artivc/dl-latest :path/to/repo + ``` + {{}} + + {{}} + ```shell + avc get -o /tmp/artivc/dl-latest s3:///path/to/repo + ``` + {{}} + + {{}} + ```shell + avc get -o /tmp/artivc/dl-latest gs:///path/to/repo + ``` + {{}} + + {{}} + ```shell + avc get -o /tmp/artivc/dl-latest https://.blob.core.windows.net//path/to/repo + ``` + {{}} + + {{}} check the content ```shell @@ -108,11 +274,41 @@ brew install artivc ``` 1. Or download the specific version - ```shell - avc get -o /tmp/artivc/dl_v0.1.0 /tmp/artivc/repo@v0.1.0 - ``` + + {{}} + {{}} + ```shell + avc get -o /tmp/artivc/dl-v0.1.0 /tmp/artivc/repo@v0.1.0 + ``` + {{}} + + {{}} + ```shell + avc get -o /tmp/artivc/dl-v0.1.0 :path/to/repo@v0.1.0 + ``` + {{}} + + {{}} + ```shell + avc get -o /tmp/artivc/dl-v0.1.0 s3:///path/to/repo@v0.1.0 + ``` + {{}} + + {{}} + ```shell + avc get -o /tmp/artivc/dl-v0.1.0 gs:///path/to/repo@v0.1.0 + ``` + {{}} + + {{}} + ```shell + avc get -o /tmp/artivc/dl-v0.1.0 https://.blob.core.windows.net//path/to/repo@v0.1.0 + ``` + {{}} + + {{}} check the content ```shell - ls /tmp/artivc/dl_v0.1.0 + ls /tmp/artivc/dl-v0.1.0 ``` \ No newline at end of file From 5683c00a799a7782a598cd8b7fb26ce99354d9ed Mon Sep 17 00:00:00 2001 From: popcorny Date: Tue, 12 Apr 2022 16:58:01 +0800 Subject: [PATCH 2/5] Update document Signed-off-by: popcorny Co-authored-by: wcchang --- docs/content/en/_index.md | 17 +++---- docs/content/en/usage/dryrun.md | 36 +++++++++++++++ .../content/en/{use-cases => usage}/expose.md | 4 +- docs/content/en/usage/ignore-file.md | 2 +- docs/content/en/usage/partial-download.md | 2 +- docs/content/en/use-cases/backup.md | 45 +++++++++++++++++++ docs/content/en/use-cases/dataprep.md | 12 ++++- docs/content/en/use-cases/experiment.md | 2 +- 8 files changed, 103 insertions(+), 17 deletions(-) create mode 100644 docs/content/en/usage/dryrun.md rename docs/content/en/{use-cases => usage}/expose.md (94%) create mode 100644 docs/content/en/use-cases/backup.md diff --git a/docs/content/en/_index.md b/docs/content/en/_index.md index 24e17b7..ca79f3b 100644 --- a/docs/content/en/_index.md +++ b/docs/content/en/_index.md @@ -9,10 +9,7 @@ geekdocAnchor: false {{< columns >}} ### ArtiVC (Artifact Version Control) is a version control system for large files. - -**rsync** is an ssh-based tool that provides fast incremental file transfer.
-**Rclone** is a rsync-like tool for cloud storage.
-**ArtiVC** is like Git for files versioning and like Rclone for cloud storage. +Do you need to backup your data regularly? Does keeping summarizing and organizing various dataset versions take up your day's productivity? ArtiVC is a handy command-line tool. With only one command, it helps you neatly snapshot your data and tidily switch among different versions of the data. Even better, it seamlessly integrates your existing cloud environment. ArtiVC supports three major cloud providers (AWS S3, Google Cloud Storage, Azure Blob Storage) or stores data in the remote filesystem using SSH. ArtiVC unleashes your performance on your most important jobs with no pain. <---> [![asciicast](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257.svg)](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257?autoplay=1) @@ -25,17 +22,17 @@ geekdocAnchor: false {{< columns >}} ### Data Versioning -Version your data like versioning code. ArtiVC supports commmit history, commit message, version tag. You can diff two commits, pull data from speciifc version. +Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the speciifc version. <---> ### Use your own storage -We are used to putting large files in NFS or S3. To use ArtiVC, you can keep put your files on the same storage without changes. +We are used to putting large files in NFS or S3. To use ArtiVC, you can keep putting your files on the same storage without changes. <---> -### No additional server required +### No additional server is required ArtiVC is a CLI tool. No server or gateway is required to install and operate. @@ -45,19 +42,19 @@ ArtiVC is a CLI tool. No server or gateway is required to install and operate. ### Multiple backends support -ArtiVC natively supports local filesystem, remote filesystem (by SSH), AWS S3, Google Cloud Storage, Azure Blob Storage as backend. And 40+ backends are supported through [Rclone](backends/rclone/) integration. [Learn more](backends/) +ArtiVC natively supports local filesystem, remote filesystem (by SSH), AWS S3, Google Cloud Storage, and Azure Blob Storage as backend. And 40+ backends are supported through [Rclone](backends/rclone/) integration. [Learn more](backends/) <---> ### Painless Configuration -No one like to configure. So we leverage the original configuraion as much as possible. Use `.ssh/config` for ssh access, and use `aws configure`, `gcloud auth application-default login`, `az login` for the cloud platforms. +No one like to configure. So we leverage the original configuration as much as possible. Use `.ssh/config` for ssh access, and use `aws configure`, `gcloud auth application-default login`, `az login` for the cloud platforms. <---> ### Efficient storage and transfer -The file structure of repository is storage and transfer effiecntly by [design](design/how-it-works/). It prevents from storing duplicated content and minimum the number of files to upload when pushing a new version. [Learn more](design/benchmark/) +The file structure of the repository is stored and transferred efficiently by [design](design/how-it-works/). It prevents storing duplicated content and minimum the number of files to upload when pushing a new version. [Learn more](design/benchmark/) {{< /columns >}} diff --git a/docs/content/en/usage/dryrun.md b/docs/content/en/usage/dryrun.md new file mode 100644 index 0000000..bad49bd --- /dev/null +++ b/docs/content/en/usage/dryrun.md @@ -0,0 +1,36 @@ +--- +title: Dry Run +weight: 11 +--- + +Pushing and pulling data is time-consuming. And need to be double-check before transfering. Dry run is the feature allows to list the changeset before sending. + + +## Push + +1. Dry run before pushing + ```shell + avc push --dry-run + ``` + +1. Do the actual push + ``` + avc push + ``` + +## Pull + +1. Dry run before pulling + ```shell + avc pull -dry-run + # or check in delete mode + # avc pull --delete -dry-run + ``` + +1. Do the actual pull + + ```shell + avc pull + # avc pull --delete + ``` + diff --git a/docs/content/en/use-cases/expose.md b/docs/content/en/usage/expose.md similarity index 94% rename from docs/content/en/use-cases/expose.md rename to docs/content/en/usage/expose.md index bedf874..02b9bb7 100644 --- a/docs/content/en/use-cases/expose.md +++ b/docs/content/en/usage/expose.md @@ -1,6 +1,6 @@ --- -title: Expose the dataset -weight: 3 +title: Expose the data +weight: 20 --- ArtiVC repository can be exposed as a http endpoint. In S3, we can just make the bucket and give the data consumer the http endpiont of the repository. In this way, we can download data through CDN or other reverse proxies. diff --git a/docs/content/en/usage/ignore-file.md b/docs/content/en/usage/ignore-file.md index 59126a9..0d03068 100644 --- a/docs/content/en/usage/ignore-file.md +++ b/docs/content/en/usage/ignore-file.md @@ -1,6 +1,6 @@ --- title: Ignore File -weight: 2 +weight: 12 --- Just like git, you can put a `.avcignore` file at the root of workspace to define the excluding list. The rule is the same as `.gitignore`. For more details, please check the [pattern format](https://git-scm.com/docs/gitignore#_pattern_format) in the git document. diff --git a/docs/content/en/usage/partial-download.md b/docs/content/en/usage/partial-download.md index d3dd08a..5ca9bd7 100644 --- a/docs/content/en/usage/partial-download.md +++ b/docs/content/en/usage/partial-download.md @@ -1,6 +1,6 @@ --- title: Partial Download -weight: 3 +weight: 13 --- By default, ArtiVC download all files of a version. It also supports to download partial of the files in a commit. diff --git a/docs/content/en/use-cases/backup.md b/docs/content/en/use-cases/backup.md new file mode 100644 index 0000000..e237894 --- /dev/null +++ b/docs/content/en/use-cases/backup.md @@ -0,0 +1,45 @@ +--- +title: Data Backup/Snapshot +weight: 1 +--- + +Data backup is one of the most common requirement in different scenario. ArtiVC is a very simple tool to backup, or even snapshot, your data in the cloud storage. + +## Snapshot the data + +1. Init the repository + + ```shell + avc init s3://mybucket/mydocuments + ``` +1. Snapshot + + ``` + avc push + ``` +1. Optionally to tag current snapshot as a version + ``` + avc tag '2022-Q1' + ``` + +## Rollback + +1. See the snapshot timeline + + ``` + avc log + ``` + +1. Rollback. Use `--delete` to delete local files which are not listed in the snapshot version. + + ``` + avc pull --delete 49175d02 + ``` + +## Get a file from a version + +1. Get a file from a given version + + ``` + avc pull 49175d02 -- path/to/my/file + ``` diff --git a/docs/content/en/use-cases/dataprep.md b/docs/content/en/use-cases/dataprep.md index 92e0adb..b563565 100644 --- a/docs/content/en/use-cases/dataprep.md +++ b/docs/content/en/use-cases/dataprep.md @@ -1,8 +1,16 @@ --- title: Dataset Preparation +weight: 2 --- -Dataset Preparation use case is the most commmon use case in ArtiVC. You can prepare the unstructured data and push commit to the remote frequently. +Organizing dataset can be a hassle, especially as data is constantly evolving. ArtiVC is the most suitable tool to organize the dataset. There are the following benefits. + +- No need to transfer files with the exisitng content. Even you rename or copy to different folder. ArtiVC knows they are the same content. It is common to move or keep the same images, videos when the dataset is evloving. +- Version tagging. If there is a stable version of dataset, we can tag a commit as the human-readable version. + +## Prepare a dataset + +Here are the common steps to prepare a dataset 1. Create a dataset folder and use subfolders as image labels 1. Initiate the workspace. @@ -41,7 +49,7 @@ Dataset Preparation use case is the most commmon use case in ArtiVC. You can pre avc log ``` -## Clone by other users +## Clone the dataset Use the dataset in the other machine diff --git a/docs/content/en/use-cases/experiment.md b/docs/content/en/use-cases/experiment.md index d595d9e..7129995 100644 --- a/docs/content/en/use-cases/experiment.md +++ b/docs/content/en/use-cases/experiment.md @@ -1,6 +1,6 @@ --- title: ML Experiments -weight: 2 +weight: 3 --- Here we use three repositories - Dataset for training From 69296671998d005ecb38467903d15443d230017a Mon Sep 17 00:00:00 2001 From: popcorny Date: Tue, 12 Apr 2022 17:22:10 +0800 Subject: [PATCH 3/5] Update README and landing page Signed-off-by: popcorny Co-authored-by: wcchang --- README.md | 21 +++++++-------------- docs/content/en/_index.md | 5 +++-- 2 files changed, 10 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 0cfdc43..3e65f4a 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,6 @@ # ArtiVC -[ArtiVC](https://artivc.io/) (**Arti**facts **V**ersion **C**ontrol) is a version control system for large files. - -To store and share large files, we may use NFS or object storage (e.g. s3, MinIO). However, if we would like to do versioning on top of them, it is not a trivial thing. ArtiVC is a CLI tool to enable you to version files on your storage without pain. You don't need to install any additional server or gateway and we turn your storage into the versioned repository. +[ArtiVC](https://artivc.io/) (**Arti**facts **V**ersion **C**ontrol) is a handy command-line tool for data versioning on cloud storage. With only one command, it helps you neatly snapshot your data and Switch data between versions. Even better, it seamlessly integrates your existing cloud environment. ArtiVC supports three major cloud providers (AWS S3, Google Cloud Storage, Azure Blob Storage) and the remote filesystem using SSH. [![asciicast](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257.svg)](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257?autoplay=1) @@ -10,17 +8,12 @@ Try it out from the [Getting Started](https://artivc.io/usage/getting-started/) # Features -- **Use your own storage**: If you store data in NFS or S3, just use the storage you already use. -- **No additional server required**: ArtiVC is a CLI tool. No server or gateway is required to install or operate. -- **Multiple backend support**: Currently, we support local, NFS (by local repo), and s3. And more in the future - -- **Reproducible**: A commit is stored in a single file and cannot be changed. There is no way to add/remove/modify a single file in a commit. -- **Expose your data publicly**: Expose your repository with a public HTTP endpoint, then you can download your data in this way - ``` - avc get -o /tmp/dataset https://mybucket.s3.ap-northeast-1.amazonaws.com/path/to/my/data@v0.1.0 - ``` -- **Smart storage and transfer**: For the same content of files, there is only one instance stored in the artifact repository. If a file has been uploaded by other commits, no upload is required because we know the file is already there in the repository. Under the hood, we use [content-addressable storage](https://en.wikipedia.org/wiki/Content-addressable_storage) to put the objects. - +- **Data Versioning**: Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the speciifc version. +- **Use your own storage**: We are used to putting large files in NFS or S3. To use ArtiVC, you can keep putting your files on the same storage without changes. +- **No additional server is required**: ArtiVC is a CLI tool. No server or gateway is required to install and operate. +- **Multiple backends support**: ArtiVC natively supports local filesystem, remote filesystem (by SSH), AWS S3, Google Cloud Storage, and Azure Blob Storage as backend. And 40+ backends are supported through [Rclone](https://artivc.io/backends/rclone/) integration. [Learn more](https://artivc.io/backends/) +- **Painless Configuration**: No one like to configure. So we leverage the original configuration as much as possible. Use `.ssh/config` for ssh access, and use `aws configure`, `gcloud auth application-default login`, `az login` for the cloud platforms. +- **Efficient storage and transfer**: The file structure of the repository is stored and transferred efficiently by [design](https://artivc.io/design/how-it-works/). It prevents storing duplicated content and minimum the number of files to upload when pushing a new version. [Learn more](https://artivc.io/design/benchmark/) # Documentation diff --git a/docs/content/en/_index.md b/docs/content/en/_index.md index ca79f3b..9057078 100644 --- a/docs/content/en/_index.md +++ b/docs/content/en/_index.md @@ -7,9 +7,10 @@ geekdocAnchor: false --- {{< columns >}} -### ArtiVC (Artifact Version Control) is a version control system for large files. -Do you need to backup your data regularly? Does keeping summarizing and organizing various dataset versions take up your day's productivity? ArtiVC is a handy command-line tool. With only one command, it helps you neatly snapshot your data and tidily switch among different versions of the data. Even better, it seamlessly integrates your existing cloud environment. ArtiVC supports three major cloud providers (AWS S3, Google Cloud Storage, Azure Blob Storage) or stores data in the remote filesystem using SSH. ArtiVC unleashes your performance on your most important jobs with no pain. +

+ArtiVC (Artifact Version Control) is a handy command-line tool for data versioning on cloud storage. With only one command, it helps you neatly snapshot your data and Switch data between versions. Even better, it seamlessly integrates your existing cloud environment. ArtiVC supports three major cloud providers (AWS S3, Google Cloud Storage, Azure Blob Storage) and the remote filesystem using SSH. +

<---> [![asciicast](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257.svg)](https://asciinema.org/a/6JEhzpJ5QMiSkiC74s5CyT257?autoplay=1) From 70c465ba7c4f1c703e63833371ab4b0e88de05b3 Mon Sep 17 00:00:00 2001 From: popcorny Date: Tue, 12 Apr 2022 17:27:43 +0800 Subject: [PATCH 4/5] Update the benchmark images Signed-off-by: popcorny Co-authored-by: wcchang --- docs/content/en/design/images/benchmark1.svg | 2 +- docs/content/en/design/images/benchmark2.svg | 2 +- docs/content/en/design/images/benchmark3.svg | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/content/en/design/images/benchmark1.svg b/docs/content/en/design/images/benchmark1.svg index 60de252..2ea6789 100644 --- a/docs/content/en/design/images/benchmark1.svg +++ b/docs/content/en/design/images/benchmark1.svg @@ -1 +1 @@ - \ No newline at end of file + \ No newline at end of file diff --git a/docs/content/en/design/images/benchmark2.svg b/docs/content/en/design/images/benchmark2.svg index 55d879f..aad5438 100644 --- a/docs/content/en/design/images/benchmark2.svg +++ b/docs/content/en/design/images/benchmark2.svg @@ -1 +1 @@ - \ No newline at end of file + \ No newline at end of file diff --git a/docs/content/en/design/images/benchmark3.svg b/docs/content/en/design/images/benchmark3.svg index 8e89f18..ba5a93b 100644 --- a/docs/content/en/design/images/benchmark3.svg +++ b/docs/content/en/design/images/benchmark3.svg @@ -1 +1 @@ - \ No newline at end of file + \ No newline at end of file From 0f17b3f5a9770fbcbeca0a648473998959bc8d37 Mon Sep 17 00:00:00 2001 From: "Wei-Chun, Chang" Date: Tue, 12 Apr 2022 17:50:34 +0800 Subject: [PATCH 5/5] fix typo Signed-off-by: Wei-Chun, Chang --- README.md | 2 +- docs/content/en/_index.md | 2 +- docs/content/en/usage/dryrun.md | 2 +- docs/content/en/usage/expose.md | 2 +- docs/content/en/usage/getting-started.md | 2 +- docs/content/en/use-cases/dataprep.md | 4 ++-- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 3e65f4a..7df4974 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ Try it out from the [Getting Started](https://artivc.io/usage/getting-started/) # Features -- **Data Versioning**: Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the speciifc version. +- **Data Versioning**: Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the specific version. - **Use your own storage**: We are used to putting large files in NFS or S3. To use ArtiVC, you can keep putting your files on the same storage without changes. - **No additional server is required**: ArtiVC is a CLI tool. No server or gateway is required to install and operate. - **Multiple backends support**: ArtiVC natively supports local filesystem, remote filesystem (by SSH), AWS S3, Google Cloud Storage, and Azure Blob Storage as backend. And 40+ backends are supported through [Rclone](https://artivc.io/backends/rclone/) integration. [Learn more](https://artivc.io/backends/) diff --git a/docs/content/en/_index.md b/docs/content/en/_index.md index 9057078..f53b1b5 100644 --- a/docs/content/en/_index.md +++ b/docs/content/en/_index.md @@ -23,7 +23,7 @@ geekdocAnchor: false {{< columns >}} ### Data Versioning -Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the speciifc version. +Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the specific version. <---> diff --git a/docs/content/en/usage/dryrun.md b/docs/content/en/usage/dryrun.md index bad49bd..2879050 100644 --- a/docs/content/en/usage/dryrun.md +++ b/docs/content/en/usage/dryrun.md @@ -3,7 +3,7 @@ title: Dry Run weight: 11 --- -Pushing and pulling data is time-consuming. And need to be double-check before transfering. Dry run is the feature allows to list the changeset before sending. +Pushing and pulling data is time-consuming. And need to be double-checked before transferring. Dry-run is the feature that allows listing the changeset before sending. ## Push diff --git a/docs/content/en/usage/expose.md b/docs/content/en/usage/expose.md index 02b9bb7..0e1e257 100644 --- a/docs/content/en/usage/expose.md +++ b/docs/content/en/usage/expose.md @@ -3,7 +3,7 @@ title: Expose the data weight: 20 --- -ArtiVC repository can be exposed as a http endpoint. In S3, we can just make the bucket and give the data consumer the http endpiont of the repository. In this way, we can download data through CDN or other reverse proxies. +ArtiVC repository can be exposed as an HTTP endpoint. In S3, we can just make the bucket and give the data consumer the HTTP endpoint of the repository. In this way, we can download data through CDN or other reverse proxies. 1. [Make your S3 bucket public](https://aws.amazon.com/premiumsupport/knowledge-center/read-access-objects-s3-bucket/?nc1=h_ls) 1. Copy the public URL of your repository. For example diff --git a/docs/content/en/usage/getting-started.md b/docs/content/en/usage/getting-started.md index 3dc3097..0a8112f 100644 --- a/docs/content/en/usage/getting-started.md +++ b/docs/content/en/usage/getting-started.md @@ -182,7 +182,7 @@ For more information, please see the [Azure Blob Storage](../../backends/azurebl avc log ``` -## Clone data from exisiting repository +## Clone data from existing repository 1. Go to the folder to clone repository {{}} diff --git a/docs/content/en/use-cases/dataprep.md b/docs/content/en/use-cases/dataprep.md index b563565..b7c6086 100644 --- a/docs/content/en/use-cases/dataprep.md +++ b/docs/content/en/use-cases/dataprep.md @@ -5,7 +5,7 @@ weight: 2 Organizing dataset can be a hassle, especially as data is constantly evolving. ArtiVC is the most suitable tool to organize the dataset. There are the following benefits. -- No need to transfer files with the exisitng content. Even you rename or copy to different folder. ArtiVC knows they are the same content. It is common to move or keep the same images, videos when the dataset is evloving. +- No need to transfer files with the existing content. Even you rename or copy to different folder. ArtiVC knows they are the same content. It is common to move or keep the same images, videos when the dataset is evolving. - Version tagging. If there is a stable version of dataset, we can tag a commit as the human-readable version. ## Prepare a dataset @@ -29,7 +29,7 @@ Here are the common steps to prepare a dataset # Push avc push -m 'my second version' ``` -1. If there are new version is pushed by others, sync the data set with remote +1. If there are new versions is pushed by others, sync the data set with remote ```shell # Check the difference avc pull --dry-run