Merge pull request modelscope#19 from FredericW/dev/as_copilot_fei

address comments
FredericW · Jun 7, 2024 · c1c0814 · c1c0814
2 parents e6849e5 + f497985
commit c1c0814
Show file tree

Hide file tree

Showing 5 changed files with 71 additions and 7 deletions.
diff --git a/docs/sphinx_doc/en/source/tutorial/209-rag.md b/docs/sphinx_doc/en/source/tutorial/209-rag.md
@@ -66,6 +66,45 @@ The Knowledge modules (now only `LlamaIndexKnowledge`; support for LangChain wil
 
   </details>
 
+#### More about knowledge configurations
+The aforementioned configuration is usually saved as a JSON file, it musts
+contain the following key attributes,
+* `knowledge_id`: a unique identifier of the knowledge;
+* `emb_model_config_name`: the name of the embedding model;
+* `chunk_size`: default chunk size for the document transformation (node parser);
+* `chunk_overlap`: default chunk overlap for each chunk (node);
+* `data_processing`: a list of data processing methods.
+
+##### Using LlamaIndexKnowledge as an example
+
+Regarding the last attribute `data_processing`, each entry of the list (which is a dict) configures a data
+loader object that loads the needed data (i.e. `load_data`),
+and a transformation object that the process the loaded data (`store_and_index`).
+Accordingly, one may load data from multiple sources (with different data loaders),
+process with individually defined manners (i.e. transformation or node parser),
+and merge the processed data into a single index for later retrieval.
+For more information about the components, please refer to
+[LlamaIndex-Loading Data](https://docs.llamaindex.ai/en/stable/module_guides/loading/).
+In common, we need to set the following attributes
+* `create_object`: indicates whether to create a new object, must be true in this case;
+* `module`: where the class is located;
+* `class`: the name of the class.
+
+More specifically, for setting the `load_data`, you can use a wide collection of data loaders,
+    such as `SimpleDirectoryReader` (in `class`), provided by Llama-index, to load a various collection of data types
+    (e.g. txt, pdf, html, py, md, etc.). Regarding this data loader, you can set the following attributes
+* `input_dir`: the path to the data directory;
+* `required_exts`: the file extensions that the data loader will load.
+
+For more information about the data loaders, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)
+
+For `store_and_index`, it is optional and if it is not specified, the default transformation (a.k.a. node parser) is `SentenceSplitter`. For some specific node parser such as `CodeSplitter`, users can set the following attributes:
+* `language`: the language of the code;
+* `chunk_lines`: the number of lines for each of the code chunk.
+
+For more information about the node parsers, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/).
+
+
 If users want to avoid the detailed configuration, we also provide a quick way in `KnowledgeBank` (see the following).
 
 #### How to use a Knowledge object

diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md b/docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md
@@ -66,6 +66,35 @@
 
   </details>
 
+#### 更多关于 knowledge 配置
+以上提到的配置通常保存为一个JSON文件，它必须包含以下关键属性
+* `knowledge_id`: 每个knowledge模块的唯一标识符;
+* `emb_model_config_name`: embedding模型的名称;
+* `chunk_size`: 对文件分块的默认大小;
+* `chunk_overlap`: 文件分块之间的默认重叠大小;
+* `data_processing`: 一个list型的数据处理方法集合。
+
+##### 以配置 LlamaIndexKnowledge 为例
+
+当使用`llama_index_knowledge`是，对于上述的最后一项`data_processing` ，这个`list`型的参数中的每个条目（为`dict`型）都对应配置一个data loader对象，其功能包括用来加载所需的数据（即字段`load_data`中包含的信息），以及处理加载数据的转换对象（`store_and_index`）。换而言之，在一次载入数据时，可以同时从多个数据源中加载数据，并处理后合并在同一个索引下以供后面的数据提取使用（retrieve）。有关该组件的更多信息，请参阅 [LlamaIndex-Loading](https://docs.llamaindex.ai/en/stable/module_guides/loading/)。
+
+在这里，无论是针对数据加载还是数据处理，我们都需要配置以下属性
+* `create_object`：指示是否创建新对象，在此情况下必须为true；
+* `module`：对象对应的类所在的位置；
+* `class`：这个类的名称。
+
+更具体得说，当对`load_data`进行配置时候，您可以选择使用多种多样的的加载器，例如使用`SimpleDirectoryReader`（在`class`字段里配置）来读取各种类型的数据（例如txt、pdf、html、py、md等）。关于这个数据加载器，您还需要配置以下关键属性
+* `input_dir`：数据加载的路径；
+* `required_exts`：将加载的数据的文件扩展名。
+
+有关数据加载器的更多信息，请参阅[这里](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)。
+
+对于`store_and_index`而言，这个配置是可选的，如果用户未指定特定的转换方式，系统将使用默认的transformation（也称为node parser）方法，名称为`SentenceSplitter`。对于某些特定需求下也可以使用不同的转换方式，例如对于代码解析可以使用`CodeSplitter`，针对这种特殊的node parser，用户可以设置以下属性：
+* `language`：希望处理代码的语言名；
+* `chunk_lines`：分割后每个代码块的行数。
+
+有关节点解析器的更多信息，请参阅[这里](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/)。
+
 如果用户想要避免详细的配置，我们也在`KnowledgeBank`中提供了一种快速的方式（请参阅以下内容）。
 
 #### 如何使用一个 Knowledge 对象

diff --git a/examples/conversation_with_RAG_agents/rag_example.py b/examples/conversation_with_RAG_agents/rag_example.py
@@ -120,7 +120,7 @@ def main() -> None:
     while True:
         # The workflow is the following:
         # 1. user input a message,
-        # 2. if it mentions one of the agents, then the agent will be called
+        # 2. if it mentions (@) one of the agents, the agent will be called
         # 3. otherwise, the guide agent will decide which agent to call
         # 4. the called agent will respond to the user
         # 5. repeat

diff --git a/src/agentscope/agents/rag_agents.py b/src/agentscope/agents/rag_agents.py
@@ -6,7 +6,7 @@
 Notice, this is a Beta version of RAG agent.
 """
 
-from typing import Optional, Any
+from typing import Any
 from loguru import logger
 
 from agentscope.agents.agent import AgentBase
@@ -31,7 +31,6 @@ def __init__(
         name: str,
         sys_prompt: str,
         model_config_name: str,
-        memory_config: Optional[dict] = None,
         knowledge_list: list[Knowledge] = None,
         knowledge_id_list: list[str] = None,
         similarity_top_k: int = None,
@@ -48,8 +47,6 @@ def __init__(
                 system prompt for the RAG agent
             model_config_name (str):
                 language model for the agent
-            memory_config (dict):
-                memory configuration
             knowledge_list (list[Knowledge]):
                 a list of knowledge.
                 User can choose to pass a list knowledge object
@@ -77,7 +74,6 @@ def __init__(
             name=name,
             sys_prompt=sys_prompt,
             model_config_name=model_config_name,
-            memory_config=memory_config,
         )
         self.knowledge_list = knowledge_list or []
         self.knowledge_id_list = knowledge_id_list or []

diff --git a/src/agentscope/rag/llama_index_knowledge.py b/src/agentscope/rag/llama_index_knowledge.py
@@ -392,7 +392,7 @@ def _set_transformations(self, config: dict) -> Any:
         Set the transformations as needed, or just use the default setting.
 
         Args:
-            config (dict): a dictionary containing configurations
+            config (dict): a dictionary containing configurations.
         """
         if "store_and_index" in config:
             temp = self._prepare_args_from_config(