Skip to content

Commit

Permalink
Merge pull request modelscope#19 from FredericW/dev/as_copilot_fei
Browse files Browse the repository at this point in the history
address comments
  • Loading branch information
FredericW authored Jun 7, 2024
2 parents e6849e5 + f497985 commit c1c0814
Show file tree
Hide file tree
Showing 5 changed files with 71 additions and 7 deletions.
39 changes: 39 additions & 0 deletions docs/sphinx_doc/en/source/tutorial/209-rag.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,45 @@ The Knowledge modules (now only `LlamaIndexKnowledge`; support for LangChain wil

</details>

#### More about knowledge configurations
The aforementioned configuration is usually saved as a JSON file, it musts
contain the following key attributes,
* `knowledge_id`: a unique identifier of the knowledge;
* `emb_model_config_name`: the name of the embedding model;
* `chunk_size`: default chunk size for the document transformation (node parser);
* `chunk_overlap`: default chunk overlap for each chunk (node);
* `data_processing`: a list of data processing methods.

##### Using LlamaIndexKnowledge as an example

Regarding the last attribute `data_processing`, each entry of the list (which is a dict) configures a data
loader object that loads the needed data (i.e. `load_data`),
and a transformation object that the process the loaded data (`store_and_index`).
Accordingly, one may load data from multiple sources (with different data loaders),
process with individually defined manners (i.e. transformation or node parser),
and merge the processed data into a single index for later retrieval.
For more information about the components, please refer to
[LlamaIndex-Loading Data](https://docs.llamaindex.ai/en/stable/module_guides/loading/).
In common, we need to set the following attributes
* `create_object`: indicates whether to create a new object, must be true in this case;
* `module`: where the class is located;
* `class`: the name of the class.

More specifically, for setting the `load_data`, you can use a wide collection of data loaders,
such as `SimpleDirectoryReader` (in `class`), provided by Llama-index, to load a various collection of data types
(e.g. txt, pdf, html, py, md, etc.). Regarding this data loader, you can set the following attributes
* `input_dir`: the path to the data directory;
* `required_exts`: the file extensions that the data loader will load.

For more information about the data loaders, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)

For `store_and_index`, it is optional and if it is not specified, the default transformation (a.k.a. node parser) is `SentenceSplitter`. For some specific node parser such as `CodeSplitter`, users can set the following attributes:
* `language`: the language of the code;
* `chunk_lines`: the number of lines for each of the code chunk.

For more information about the node parsers, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/).


If users want to avoid the detailed configuration, we also provide a quick way in `KnowledgeBank` (see the following).

#### How to use a Knowledge object
Expand Down
29 changes: 29 additions & 0 deletions docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,35 @@

</details>

#### 更多关于 knowledge 配置
以上提到的配置通常保存为一个JSON文件,它必须包含以下关键属性
* `knowledge_id`: 每个knowledge模块的唯一标识符;
* `emb_model_config_name`: embedding模型的名称;
* `chunk_size`: 对文件分块的默认大小;
* `chunk_overlap`: 文件分块之间的默认重叠大小;
* `data_processing`: 一个list型的数据处理方法集合。

##### 以配置 LlamaIndexKnowledge 为例

当使用`llama_index_knowledge`是,对于上述的最后一项`data_processing` ,这个`list`型的参数中的每个条目(为`dict`型)都对应配置一个data loader对象,其功能包括用来加载所需的数据(即字段`load_data`中包含的信息),以及处理加载数据的转换对象(`store_and_index`)。换而言之,在一次载入数据时,可以同时从多个数据源中加载数据,并处理后合并在同一个索引下以供后面的数据提取使用(retrieve)。有关该组件的更多信息,请参阅 [LlamaIndex-Loading](https://docs.llamaindex.ai/en/stable/module_guides/loading/)。

在这里,无论是针对数据加载还是数据处理,我们都需要配置以下属性
* `create_object`:指示是否创建新对象,在此情况下必须为true;
* `module`:对象对应的类所在的位置;
* `class`:这个类的名称。

更具体得说,当对`load_data`进行配置时候,您可以选择使用多种多样的的加载器,例如使用`SimpleDirectoryReader`(在`class`字段里配置)来读取各种类型的数据(例如txt、pdf、html、py、md等)。关于这个数据加载器,您还需要配置以下关键属性
* `input_dir`:数据加载的路径;
* `required_exts`:将加载的数据的文件扩展名。

有关数据加载器的更多信息,请参阅[这里](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)。

对于`store_and_index`而言,这个配置是可选的,如果用户未指定特定的转换方式,系统将使用默认的transformation(也称为node parser)方法,名称为`SentenceSplitter`。对于某些特定需求下也可以使用不同的转换方式,例如对于代码解析可以使用`CodeSplitter`,针对这种特殊的node parser,用户可以设置以下属性:
* `language`:希望处理代码的语言名;
* `chunk_lines`:分割后每个代码块的行数。

有关节点解析器的更多信息,请参阅[这里](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/)。

如果用户想要避免详细的配置,我们也在`KnowledgeBank`中提供了一种快速的方式(请参阅以下内容)。

#### 如何使用一个 Knowledge 对象
Expand Down
2 changes: 1 addition & 1 deletion examples/conversation_with_RAG_agents/rag_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def main() -> None:
while True:
# The workflow is the following:
# 1. user input a message,
# 2. if it mentions one of the agents, then the agent will be called
# 2. if it mentions (@) one of the agents, the agent will be called
# 3. otherwise, the guide agent will decide which agent to call
# 4. the called agent will respond to the user
# 5. repeat
Expand Down
6 changes: 1 addition & 5 deletions src/agentscope/agents/rag_agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
Notice, this is a Beta version of RAG agent.
"""

from typing import Optional, Any
from typing import Any
from loguru import logger

from agentscope.agents.agent import AgentBase
Expand All @@ -31,7 +31,6 @@ def __init__(
name: str,
sys_prompt: str,
model_config_name: str,
memory_config: Optional[dict] = None,
knowledge_list: list[Knowledge] = None,
knowledge_id_list: list[str] = None,
similarity_top_k: int = None,
Expand All @@ -48,8 +47,6 @@ def __init__(
system prompt for the RAG agent
model_config_name (str):
language model for the agent
memory_config (dict):
memory configuration
knowledge_list (list[Knowledge]):
a list of knowledge.
User can choose to pass a list knowledge object
Expand Down Expand Up @@ -77,7 +74,6 @@ def __init__(
name=name,
sys_prompt=sys_prompt,
model_config_name=model_config_name,
memory_config=memory_config,
)
self.knowledge_list = knowledge_list or []
self.knowledge_id_list = knowledge_id_list or []
Expand Down
2 changes: 1 addition & 1 deletion src/agentscope/rag/llama_index_knowledge.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,7 @@ def _set_transformations(self, config: dict) -> Any:
Set the transformations as needed, or just use the default setting.
Args:
config (dict): a dictionary containing configurations
config (dict): a dictionary containing configurations.
"""
if "store_and_index" in config:
temp = self._prepare_args_from_config(
Expand Down

0 comments on commit c1c0814

Please sign in to comment.