Skip to content

Commit

Permalink
doc: setup CRAB documentation (#17)
Browse files Browse the repository at this point in the history
  • Loading branch information
dandansamax authored Aug 25, 2024
1 parent ac88118 commit 873f2ef
Show file tree
Hide file tree
Showing 18 changed files with 341 additions and 151 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Build and deploy CRAB documents
on:
push:
branches: [ "main" ]
workflow_dispatch:
permissions:
contents: write
jobs:
docs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up Python environment and install dependencies
uses: ./.github/actions/camel_install
with:
python-version: "3.10"
- name: Sphinx build
run: |
cd docs
poetry run make html
- name: Deploy
uses: peaceiris/actions-gh-pages@v3
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main'}}
with:
publish_branch: gh-pages
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/_build/html/
force_orphan: true
2 changes: 1 addition & 1 deletion crab-benchmark-v0/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Our benchmark contains two important parts: **Environments** and **Tasks**.
Since our Ubuntu environment is built upon KVM, setting it up locally requires you an experienced Linux user to deal with many small and miscellaneous issues. Therefore, we provide two environment setup methods:

* [Local setup](./docs/environment_local_setup.md) provides you a step-by-step guideline to build environments on a Linux Machine with **at least one monitor and 32G memory**, but it doesn't cover details like how to install KVM on your machine because they are various on different Linux distros.
* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./docs/environment_gcp_setup.md). Specifically, we publish a disk image contains all required softwares and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).
* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./docs/environment_gcp_setup.md). Specifically, we publish a disk image contains all required software and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).

We connect to the Android environment via ADB, so any Android device, from an emulator to a physical smartphone, will work. You should ensure ADB is installed on your system and can be directly called through the command line. In our experiment, we used the built-in emulator of [Android Studio](https://developer.android.com/studio) to create a Google Pixel 8 Pro virtual device with the release name \textit{R} and installed necessary extra Apps.

Expand Down
Binary file added docs/_static/CRAB_logo1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
Binary file added docs/_static/favicon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 13 additions & 4 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,20 @@
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'Crab'
project = 'CRAB'
copyright = '2024, CAMEL-AI.org'
author = 'CAMEL-AI.org'
release = '0.1.0'
version = '0.1'
release = '0.1.2'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx.ext.napoleon'
'sphinx.ext.napoleon',
'myst_parser',
]

templates_path = ['_templates']
Expand All @@ -51,5 +53,12 @@
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'
html_theme = 'sphinx_book_theme'
html_favicon = '_static/favicon.png'
html_static_path = ['_static']
html_logo = "_static/CRAB_logo1.png"
html_title = "CRAB Documentation"
html_theme_options = {
"repository_url": "https://github.com/camel-ai/crab",
"use_repository_button": True,
}
24 changes: 0 additions & 24 deletions docs/crab.benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,6 @@ crab.benchmarks package
Submodules
----------

crab.benchmarks.android module
------------------------------

.. automodule:: crab.benchmarks.android
:members:
:undoc-members:
:show-inheritance:

crab.benchmarks.template module
-------------------------------

Expand All @@ -20,22 +12,6 @@ crab.benchmarks.template module
:undoc-members:
:show-inheritance:

crab.benchmarks.ubuntu module
-----------------------------

.. automodule:: crab.benchmarks.ubuntu
:members:
:undoc-members:
:show-inheritance:

crab.benchmarks.ubuntu\_android module
--------------------------------------

.. automodule:: crab.benchmarks.ubuntu_android
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

Expand Down
File renamed without changes
File renamed without changes
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Google cloud platform setup

## Setup and Start the VM Instance

The development image is hosted in the project `capable-vista-420022` with image name `crab-benchmark-v0-1`.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Local setup

## Install CRAB

First you should install `poetry`, a modern python dependency management tool.
Expand Down
30 changes: 30 additions & 0 deletions docs/crab_benchmark_v0/get_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Get started

`crab-benchmark-v0` is a benchmark released with the crab framework to provide a standard usage. It includes two virtual machine environments: an Android smartphone and an Ubuntu desktop computer, with 100 tasks and 59 different evaluator functions in the dataset. It effectively evaluates the MLM-based agents' performance on operating real-world tasks across multiple platforms.

## Concept

Our benchmark contains two important parts: **Environments** and **Tasks**.

#### Environment

Since our Ubuntu environment is built upon KVM, setting it up locally requires you an experienced Linux user to deal with many small and miscellaneous issues. Therefore, we provide two environment setup methods:

* [Local setup](./environment_local_setup.md) provides you a step-by-step guideline to build environments on a Linux Machine with **at least one monitor and 32G memory**, but it doesn't cover details like how to install KVM on your machine because they are various on different Linux distros.
* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./environment_gcp_setup.md). Specifically, we publish a disk image contains all required software and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).

We connect to the Android environment via ADB, so any Android device, from an emulator to a physical smartphone, will work. You should ensure ADB is installed on your system and can be directly called through the command line. In our experiment, we used the built-in emulator of [Android Studio](https://developer.android.com/studio) to create a Google Pixel 8 Pro virtual device with the release name \textit{R} and installed necessary extra Apps.

#### Task

We manage our task dataset using a CRAB-recommended method. Sub-tasks are defined through Pydantic models written in Python code, and composed tasks are defined in JSON format, typically combining several sub-tasks. The sub-tasks are defined in [android_subtasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/android_subtasks.py) and [ubuntu_subtasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/ubuntu_subtasks.py). The JSON files storing composed tasks are categorized into [android](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/android/), [ubuntu](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/ubuntu/), and [cross-platform](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/cross/). The tasks in android and ubuntu directories are single-environment task and those in cross directory are cross-environment tasks. Additionally, we create several tasks by hand instead of composing sub-tasks to provide semantically more meaningful tasks, which are found in [handmade tasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/handmade_tasks.py).

## Experiment

After setting up the environment, you can start the experiment. A brief overview of the experiment is as follows:

1. Open the Ubuntu environment virtual machine and the Android environment emulator.
2. Start the CRAB server in the Ubuntu environment and get its IP address and port. Let's say they are `192.168.122.72` and `8000`.
3. Choose a task. As an example, we take the task with ID `a3476778-e512-40ca-b1c0-d7aab0c7f18b` from [handmade_tasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/handmade_tasks.py). The task is: "Open the 'Tasks' app on Android, check the first incomplete task, then perform the task according to its description."
4. Run [main.py](./main.py) with the command `poetry run python -m crab-benchmark-v0.main --model gpt4o --policy single --remote-url http://192.168.122.72:8000 --task-id a3476778-e512-40ca-b1c0-d7aab0c7f18b`. In this command, `--model gpt4o` and `--policy single` determine the agent system, `--remote-url` specifies the Ubuntu environment interface, and `--task-id` indicates the task to be performed.

Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Build your own benchmark

## Overview

![](../assets/benchmark_config.png)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Getting Started with the Benchmark Class
# Quickstart

The `Benchmark` class is a comprehensive framework for evaluating language model agents across various tasks and environments. It provides a flexible structure to manage multiple environments and tasks, offering single and multi-environment execution modes.

Expand Down
19 changes: 18 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,26 @@
Welcome to Crab's documentation!
================================

.. toctree::
:maxdepth: 1
:caption: Get Started with CRAB:
:name: get_started

get_started/quickstart.md
get_started/build_your_own_benchmark.md

.. toctree::
:maxdepth: 1
:caption: CRAB Benchmark-v0:
:name: crab_benchmark_v0

crab_benchmark_v0/get_started.md
crab_benchmark_v0/environment_gcp_setup.md
crab_benchmark_v0/environment_local_setup.md

.. toctree::
:maxdepth: 2
:caption: Contents:
:caption: API Reference:

modules

Expand Down
Loading

0 comments on commit 873f2ef

Please sign in to comment.