doc: setup CRAB documentation (#17)

camel-ai · Aug 25, 2024 · 873f2ef · 873f2ef
1 parent ac88118
commit 873f2ef
Show file tree

Hide file tree

Showing 18 changed files with 341 additions and 151 deletions.
diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml
@@ -0,0 +1,29 @@
+name: Build and deploy CRAB documents
+on:
+  push:
+    branches: [ "main" ]
+  workflow_dispatch:
+permissions:
+    contents: write
+jobs:
+  docs:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python environment and install dependencies
+      uses: ./.github/actions/camel_install
+      with:
+        python-version: "3.10"
+    - name: Sphinx build
+      run: |
+        cd docs
+        poetry run make html
+    - name: Deploy
+      uses: peaceiris/actions-gh-pages@v3
+      if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main'}}
+      with:
+        publish_branch: gh-pages
+        github_token: ${{ secrets.GITHUB_TOKEN }}
+        publish_dir: docs/_build/html/
+        force_orphan: true
diff --git a/crab-benchmark-v0/README.md b/crab-benchmark-v0/README.md
@@ -13,7 +13,7 @@ Our benchmark contains two important parts: **Environments** and **Tasks**.
 Since our Ubuntu environment is built upon KVM, setting it up locally requires you an experienced Linux user to deal with many small and miscellaneous issues. Therefore, we provide two environment setup methods:
 
 * [Local setup](./docs/environment_local_setup.md) provides you a step-by-step guideline to build environments on a Linux Machine with **at least one monitor and 32G memory**, but it doesn't cover details like how to install KVM on your machine because they are various on different Linux distros.
-* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./docs/environment_gcp_setup.md). Specifically, we publish a disk image contains all required softwares and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).
+* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./docs/environment_gcp_setup.md). Specifically, we publish a disk image contains all required software and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).
 
 We connect to the Android environment via ADB, so any Android device, from an emulator to a physical smartphone, will work. You should ensure ADB is installed on your system and can be directly called through the command line. In our experiment, we used the built-in emulator of [Android Studio](https://developer.android.com/studio) to create a Google Pixel 8 Pro virtual device with the release name \textit{R} and installed necessary extra Apps.
 

diff --git a/docs/_static/CRAB_logo1.png b/docs/_static/CRAB_logo1.png
diff --git a/assets/benchmark_config.png → docs/_static/benchmark_config.png b/assets/benchmark_config.png → docs/_static/benchmark_config.png
diff --git a/assets/crab_overview.png → docs/_static/crab_overview.png b/assets/crab_overview.png → docs/_static/crab_overview.png
diff --git a/docs/_static/favicon.png b/docs/_static/favicon.png
diff --git a/docs/conf.py b/docs/conf.py
@@ -29,18 +29,20 @@
 # -- Project information -----------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
 
-project = 'Crab'
+project = 'CRAB'
 copyright = '2024, CAMEL-AI.org'
 author = 'CAMEL-AI.org'
-release = '0.1.0'
+version = '0.1'
+release = '0.1.2'
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
 
 extensions = [
     'sphinx.ext.autodoc',
     'sphinx.ext.viewcode',
-    'sphinx.ext.napoleon'
+    'sphinx.ext.napoleon',
+    'myst_parser',
 ]
 
 templates_path = ['_templates']
@@ -51,5 +53,12 @@
 # -- Options for HTML output -------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
 
-html_theme = 'sphinx_rtd_theme'
+html_theme = 'sphinx_book_theme'
+html_favicon = '_static/favicon.png'
 html_static_path = ['_static']
+html_logo = "_static/CRAB_logo1.png"
+html_title = "CRAB Documentation"
+html_theme_options = {
+    "repository_url": "https://github.com/camel-ai/crab",
+    "use_repository_button": True,
+}
diff --git a/docs/crab.benchmarks.rst b/docs/crab.benchmarks.rst
@@ -4,14 +4,6 @@ crab.benchmarks package
 Submodules
 ----------
 
-crab.benchmarks.android module
-------------------------------
-
-.. automodule:: crab.benchmarks.android
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
 crab.benchmarks.template module
 -------------------------------
 
@@ -20,22 +12,6 @@ crab.benchmarks.template module
    :undoc-members:
    :show-inheritance:
 
-crab.benchmarks.ubuntu module
------------------------------
-
-.. automodule:: crab.benchmarks.ubuntu
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-crab.benchmarks.ubuntu\_android module
---------------------------------------
-
-.. automodule:: crab.benchmarks.ubuntu_android
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
 Module contents
 ---------------
 

diff --git a/crab-benchmark-v0/docs/assets/android_1.png → docs/crab_benchmark_v0/assets/android_1.png b/crab-benchmark-v0/docs/assets/android_1.png → docs/crab_benchmark_v0/assets/android_1.png
diff --git a/crab-benchmark-v0/docs/assets/android_2.png → docs/crab_benchmark_v0/assets/android_2.png b/crab-benchmark-v0/docs/assets/android_2.png → docs/crab_benchmark_v0/assets/android_2.png
diff --git a/...enchmark-v0/docs/environment_gcp_setup.md → ...rab_benchmark_v0/environment_gcp_setup.md b/...enchmark-v0/docs/environment_gcp_setup.md → ...rab_benchmark_v0/environment_gcp_setup.md
@@ -1,3 +1,5 @@
+# Google cloud platform setup
+
 ## Setup and Start the VM Instance
 
 The development image is hosted in the project `capable-vista-420022` with image name `crab-benchmark-v0-1`.

diff --git a/...chmark-v0/docs/environment_local_setup.md → ...b_benchmark_v0/environment_local_setup.md b/...chmark-v0/docs/environment_local_setup.md → ...b_benchmark_v0/environment_local_setup.md
@@ -1,3 +1,5 @@
+# Local setup
+
 ## Install CRAB
 
 First you should install `poetry`, a modern python dependency management tool.

diff --git a/docs/crab_benchmark_v0/get_started.md b/docs/crab_benchmark_v0/get_started.md
@@ -0,0 +1,30 @@
+# Get started
+
+`crab-benchmark-v0` is a benchmark released with the crab framework to provide a standard usage. It includes two virtual machine environments: an Android smartphone and an Ubuntu desktop computer, with 100 tasks and 59 different evaluator functions in the dataset. It effectively evaluates the MLM-based agents' performance on operating real-world tasks across multiple platforms.
+
+## Concept
+
+Our benchmark contains two important parts: **Environments** and **Tasks**.
+
+#### Environment
+
+Since our Ubuntu environment is built upon KVM, setting it up locally requires you an experienced Linux user to deal with many small and miscellaneous issues. Therefore, we provide two environment setup methods:
+
+* [Local setup](./environment_local_setup.md) provides you a step-by-step guideline to build environments on a Linux Machine with **at least one monitor and 32G memory**, but it doesn't cover details like how to install KVM on your machine because they are various on different Linux distros.
+* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./environment_gcp_setup.md). Specifically, we publish a disk image contains all required software and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).
+
+We connect to the Android environment via ADB, so any Android device, from an emulator to a physical smartphone, will work. You should ensure ADB is installed on your system and can be directly called through the command line. In our experiment, we used the built-in emulator of [Android Studio](https://developer.android.com/studio) to create a Google Pixel 8 Pro virtual device with the release name \textit{R} and installed necessary extra Apps.
+
+#### Task
+
+We manage our task dataset using a CRAB-recommended method. Sub-tasks are defined through Pydantic models written in Python code, and composed tasks are defined in JSON format, typically combining several sub-tasks. The sub-tasks are defined in [android_subtasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/android_subtasks.py) and [ubuntu_subtasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/ubuntu_subtasks.py). The JSON files storing composed tasks are categorized into [android](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/android/), [ubuntu](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/ubuntu/), and [cross-platform](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/cross/). The tasks in android and ubuntu directories are single-environment task and those in cross directory are cross-environment tasks. Additionally, we create several tasks by hand instead of composing sub-tasks to provide semantically more meaningful tasks, which are found in [handmade tasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/handmade_tasks.py).
+
+## Experiment
+
+After setting up the environment, you can start the experiment. A brief overview of the experiment is as follows:
+
+1. Open the Ubuntu environment virtual machine and the Android environment emulator.
+2. Start the CRAB server in the Ubuntu environment and get its IP address and port. Let's say they are `192.168.122.72` and `8000`.
+3. Choose a task. As an example, we take the task with ID `a3476778-e512-40ca-b1c0-d7aab0c7f18b` from [handmade_tasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/handmade_tasks.py). The task is: "Open the 'Tasks' app on Android, check the first incomplete task, then perform the task according to its description."
+4. Run [main.py](./main.py) with the command `poetry run python -m crab-benchmark-v0.main --model gpt4o --policy single --remote-url http://192.168.122.72:8000 --task-id a3476778-e512-40ca-b1c0-d7aab0c7f18b`. In this command, `--model gpt4o` and `--policy single` determine the agent system, `--remote-url` specifies the Ubuntu environment interface, and `--task-id` indicates the task to be performed.
+
diff --git a/docs/build_your_own_benchmark.md → docs/get_started/build_your_own_benchmark.md b/docs/build_your_own_benchmark.md → docs/get_started/build_your_own_benchmark.md
@@ -1,3 +1,5 @@
+# Build your own benchmark
+
 ## Overview
 
 ![](../assets/benchmark_config.png)

diff --git a/docs/get_start_with_benchmark.md → docs/get_started/quickstart.md b/docs/get_start_with_benchmark.md → docs/get_started/quickstart.md
@@ -1,4 +1,4 @@
-# Getting Started with the Benchmark Class
+# Quickstart
 
 The `Benchmark` class is a comprehensive framework for evaluating language model agents across various tasks and environments. It provides a flexible structure to manage multiple environments and tasks, offering single and multi-environment execution modes.
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -6,9 +6,26 @@
 Welcome to Crab's documentation!
 ================================
 
+.. toctree::
+   :maxdepth: 1
+   :caption: Get Started with CRAB:
+   :name: get_started
+
+   get_started/quickstart.md
+   get_started/build_your_own_benchmark.md
+
+.. toctree::
+   :maxdepth: 1
+   :caption: CRAB Benchmark-v0:
+   :name: crab_benchmark_v0
+
+   crab_benchmark_v0/get_started.md
+   crab_benchmark_v0/environment_gcp_setup.md
+   crab_benchmark_v0/environment_local_setup.md
+
 .. toctree::
    :maxdepth: 2
-   :caption: Contents:
+   :caption: API Reference:
 
    modules