docs: update document structure. Add PR content (#171)

* Add main * update docs * rename and reorder * update documents structure * remove useless * Update docker
microsoft · Aug 6, 2024 · dcf9cd8 · dcf9cd8
1 parent 610abe3
commit dcf9cd8
Show file tree

Hide file tree

Showing 10 changed files with 126 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@
 
 # 🌟 Introduction
 
-![](docs/_static/scen.jpg)
+![Our focused scenario](docs/_static/scen.jpg)
 
 RDAgent aims to automate the most critical and valuable aspects of the industrial R&D process, and we begins with focusing on the data-driven scenarios to streamline the development of models and data. 
 Methodologically, we have identified a framework with two key components: 'R' for proposing new ideas and 'D' for implementing them.
@@ -114,7 +114,9 @@ In this project, we are aiming to build a Agent to automate Data-Driven R\&D tha
 
 ## 📈 Scenarios/Demos
 
-In the two key areas of data-driven scenarios, model implementation and data building, our system aims to serve two main roles: 🦾copilot and 🤖agent. The 🦾copilot follows human instructions to automate repetitive tasks. The 🤖agent, being more autonomous, actively proposes ideas for better results in the future.
+In the two key areas of data-driven scenarios, model implementation and data building, our system aims to serve two main roles: 🦾copilot and 🤖agent. 
+- The 🦾copilot follows human instructions to automate repetitive tasks. 
+- The 🤖agent, being more autonomous, actively proposes ideas for better results in the future.
 
 The supported scenarios are listed below:
 
@@ -164,7 +166,11 @@ We believe that the key to delivering high-quality solutions lies in the ability
 
 ## Research
 
-- We have implements agents equiped with  Evolvable Research ability to propose and refine ideas in our repo. [Demos](#📈 Scenarios/Demos) are released.
+In a data mining expert's daily research and development process, they propose a hypothesis (e.g., a model structure like RNN can capture patterns in time-series data), design experiments (e.g., finance data contains time-series and we can verify the hypothesis in this scenario), implement the experiment as code (e.g., Pytorch model structure), and then execute the code to get feedback (e.g., metrics, loss curve, etc.). The experts learn from the feedback and improve in the next iteration.
+
+Based on the principles above, we have established a basic method framework that continuously proposes hypotheses, verifies them, and gets feedback from the real-world practice. This is the first scientific research automation framework that supports linking with real-world verification.
+
+[Demos](#📈 Scenarios/Demos) are released.
 
 ## Development
 
@@ -191,11 +197,9 @@ This project welcomes contributions and suggestions.
 You can find issues in the issues list or simply running `grep -r "TODO:"`.
 
 Making contributions is not a hard thing. Solving an issue(maybe just answering a question raised in issues list ), fixing/issuing a bug, improving the documents and even fixing a typo are important contributions to RDAgent.
-
-# Disclaimer
-**The RD-agent is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. The RD-agent is aimed to facilitate research and development process in the financial industry and not ready-to-use for any financial investment or advice. Users shall independently assess and test the risks of the RD-agent in a specific use scenario, ensure the responsible use of AI technology, including but not limited to developing and integrating risk mitigation measures, and comply with all applicable laws and regulations in all applicable jurisdictions. The RD-agent does not provide financial opinions or reflect the opinions of Microsoft, nor is it designed to replace the role of qualified financial professionals in formulating, assessing, and approving finance products. The inputs and outputs of the RD-agent belong to the users and users shall assume all liability under any theory of liability, whether in contract, torts, regulatory, negligence, products liability, or otherwise, associated with use of the RD-agent and any inputs and outputs thereof.**
-
 <img src="https://img.shields.io/github/contributors-anon/microsoft/RD-Agent"/>
 
 <a href="https://github.com/microsoft/RD-Agent/graphs/contributors"><img src="https://contrib.rocks/image?repo=microsoft/RD-Agent&max=240&columns=18" /></a>
 
+# Disclaimer
+**The RD-agent is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. The RD-agent is aimed to facilitate research and development process in the financial industry and not ready-to-use for any financial investment or advice. Users shall independently assess and test the risks of the RD-agent in a specific use scenario, ensure the responsible use of AI technology, including but not limited to developing and integrating risk mitigation measures, and comply with all applicable laws and regulations in all applicable jurisdictions. The RD-agent does not provide financial opinions or reflect the opinions of Microsoft, nor is it designed to replace the role of qualified financial professionals in formulating, assessing, and approving finance products. The inputs and outputs of the RD-agent belong to the users and users shall assume all liability under any theory of liability, whether in contract, torts, regulatory, negligence, products liability, or otherwise, associated with use of the RD-agent and any inputs and outputs thereof.**
diff --git a/docs/index.rst b/docs/index.rst
@@ -10,11 +10,12 @@ Welcome to RDAgent's documentation!
    :maxdepth: 3
    :caption: Doctree:
 
-   demo_and_introduction
+   introduction
    installation_and_configuration
    scens/catalog
    project_framework_introduction
-   research/research
+   ui
+   research/catalog
    development
    api_reference
    policy

diff --git a/docs/installation_and_configuration.rst b/docs/installation_and_configuration.rst
@@ -5,10 +5,13 @@ Installation and Configuration
 Installation
 ============
 
-For different scenarios
-- for purely users:
+**Install RDAgent**: For different scenarios
+
+- for purely users: please use ``pip install rdagent`` to install RDAgent
 - for dev users: `See development <development.html>`_
 
+**Install Docker**: RDAgent is designed for research and development, acting like a human researcher and developer. It can write and run code in various environments, primarily using Docker for code execution. This keeps the remaining dependencies simple. Users must ensure Docker is installed before attempting most scenarios. Please refer to the `official 🐳Docker page <https://docs.docker.com/engine/install/>`_ for installation instructions.
+
 Configuration
 =============
 

diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -0,0 +1,18 @@
+=========================
+Introduction
+=========================
+
+
+
+In modern industry, research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automate these high-value generic R&D processes through our open source R&D automation tool RDAgent, which let AI drive data-driven AI.
+
+.. image:: _static/scen.jpg
+   :alt: Our focused scenario
+
+
+Our RDAgent is designed to automate the most critical industrial R&D processes, focusing first on data-driven scenarios, to greatly boost the development productivity of models and data. 
+
+Methodologically, we propose an autonomous agent framework that consists of two key parts: (R)esearch stands for actively exploring by proposing new ideas, and (D)evelopment stands for realizing these ideas. The effectiveness of these two components will ultimately get feedbacks through practice, and both research and development capabilities can continuously learn and grow in the process.
+
+
+For a quick start, visit `our GitHub home page <https://github.com/microsoft/RD-Agent>`_ ⚡. If you've already checked it out and want more details, please keep reading.
diff --git a/docs/project_framework_introduction.rst b/docs/project_framework_introduction.rst
@@ -12,11 +12,15 @@ Framework & Components
 
 The image above shows the overall framework of RDAgent.
 
+In a data mining expert's daily research and development process, they propose a hypothesis (e.g., a model structure like RNN can capture patterns in time-series data), design experiments (e.g., finance data contains time-series and we can verify the hypothesis in this scenario), implement the experiment as code (e.g., Pytorch model structure), and then execute the code to get feedback (e.g., metrics, loss curve, etc.). The experts learn from the feedback and improve in the next iteration.
+
+We have established a basic method framework that continuously proposes hypotheses, verifies them, and gets feedback from the real world. This is the first scientific research automation framework that supports linking with real-world verification.
+
 
 .. image:: https://github.com/user-attachments/assets/60cc2712-c32a-4492-a137-8aec59cdc66e
     :alt: Class Level Figure
 
-For those interested in the detailed code, the figure above illustrates the main classes and aligns them with the workflow.
+The figure above shows the main classes and how they fit into the workflow for those interested in the detailed code.
 
 
 Detailed Design

diff --git a/docs/research/benchmark.rst b/docs/research/benchmark.rst
@@ -97,3 +97,22 @@ A png file will be saved to the designated path as shown below.
 
 .. image:: ../_static/benchmark.png
 
+
+Related Paper
+-------------
+
+- `Towards Data-Centric Automatic R&D <https://arxiv.org/abs/2404.11276>`_:
+  We have developed a comprehensive benchmark called RD2Bench to assess data and model R&D capabilities. This benchmark includes a series of tasks that outline the features or structures of models. These tasks are used to evaluate the ability of LLM-Agents to implement them.
+
+.. code-block:: bibtex
+
+    @misc{chen2024datacentric,
+        title={Towards Data-Centric Automatic R&D},
+        author={Haotian Chen and Xinjie Shen and Zeqi Ye and Wenjun Feng and Haoxue Wang and Xiao Yang and Xu Yang and Weiqing Liu and Jiang Bian},
+        year={2024},
+        eprint={2404.11276},
+        archivePrefix={arXiv},
+        primaryClass={cs.AI}
+    }
+
+.. image:: https://github.com/user-attachments/assets/494f55d3-de9e-4e73-ba3d-a787e8f9e841
diff --git a/docs/research/catalog.rst b/docs/research/catalog.rst
@@ -0,0 +1,34 @@
+===========
+Research
+===========
+
+To achieve the good effects and improve R&D capabilities, we face multiple challenges, the most important of which is the continuous evolution capability. Existing large language models (LLMs) find it difficult to continue growing their capabilities after training is completed. Moreover, the training process of LLMs focuses more on general knowledge, and the lack of depth in more specialized knowledge becomes an obstacle to solving professional R&D problems within the industry. This specialized knowledge needs to be learned and acquired from in-depth industry practice.
+
+
+Our RD-Agent, on the other hand, can continuously acquire in-depth domain knowledge through deep exploration during the R&D phase, allowing its R&D capabilities to keep growing.
+
+To address these key challenges and achieve industrial value, a series of research work needs to be completed.
+
+
+.. list-table:: Research Areas and Descriptions
+   :header-rows: 1
+
+   * - Research Area
+     - Description
+   * - :doc:`Benchmark <benchmark>`
+     - Benchmark the R&D abilities
+   * - Research
+     - Idea proposal: Explore new ideas or refine existing ones
+   * - :doc:`Development <dev>`
+     - Ability to realize ideas: Implement and execute ideas
+
+
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Doctree:
+   :hidden:
+
+   benchmark
+   dev
diff --git a/docs/research/dev.rst b/docs/research/dev.rst
@@ -0,0 +1,25 @@
+==============================
+Development
+==============================
+
+
+Related Paper
+-------------
+
+- `Collaborative Evolving Strategy for Automatic Data-Centric Development <https://arxiv.org/abs/2407.18690>`_
+  Co-STEER is a method to tackle data-centric development (AD2) tasks and highlight its main challenges, which need expert-like implementation (i.e., learning domain knowledge from practice) and task scheduling capability (e.g., starting with easier tasks for better overall efficiency), areas that previous work has largely overlooked. Our Co-STEER agent enhances its domain knowledge through our evolving strategy and improves both its scheduling and implementation skills by gathering and using domain-specific practical experience. With a better schedule, implementation becomes faster. At the same time, as implementation feedback becomes more detailed, scheduling accuracy improves. These two capabilities grow together through practical feedback, enabling a collaborative evolution process.
+
+.. code-block:: bibtex
+
+    @misc{yang2024collaborative,
+        title={Collaborative Evolving Strategy for Automatic Data-Centric Development},
+        author={Xu Yang and Haotian Chen and Wenjun Feng and Haoxue Wang and Zeqi Ye and Xinjie Shen and Xiao Yang and Shizhao Sun and Weiqing Liu and Jiang Bian},
+        year={2024},
+        eprint={2407.18690},
+        archivePrefix={arXiv},
+        primaryClass={cs.AI}
+    }
+
+.. image:: https://github.com/user-attachments/assets/75d9769b-0edd-4caf-9d45-57d1e577054b
+   :alt: Collaborative Evolving Strategy for Automatic Data-Centric Development
+
diff --git a/docs/research/research.rst b/docs/research/research.rst
diff --git a/docs/demo_and_introduction.rst → docs/ui.rst b/docs/demo_and_introduction.rst → docs/ui.rst
@@ -1,11 +1,12 @@
-=========================
-Demo and Introduction
-=========================
+==============
+User Interface
+==============
+
 
 Introduction
 ============
 
-RD-Agent will generate some logs during the R&D process. These logs are very useful for debugging and understanding the R&D process. However, just viewing the terminal log is not intuitive enough. RD-Agent provides a web app to visualize the R&D process. You can easily view the R&D process and understand the R&D process better.
+RD-Agent will generate some logs during the R&D process. These logs are very useful for debugging and understanding the R&D process. However, just viewing the terminal log is not intuitive enough. RD-Agent provides a web app as UI to visualize the R&D process. You can easily view the R&D process and understand the R&D process better.
 
 A Quick Demo
 ============
@@ -40,4 +41,4 @@ Use Web App
     - All Loops: Show complete scenario execution process.
     - Next Loop: Show one success **R&D Loop**.
     - One Evolving: Show one **evolving** step of **development** part.
-    - refresh logs: clear shown logs.
+    - refresh logs: clear shown logs.