Merge pull request #67 from alipay/dev

feat: Release version 0.0.8
alipay · Jun 6, 2024 · 343c179 · 343c179
2 parents 2d34ee9 + 68bb1d5
commit 343c179
Show file tree

Hide file tree

Showing 75 changed files with 1,582 additions and 221 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -24,6 +24,38 @@ Note - Additional remarks regarding the version.
 ***************************************************
 
 # Version Update History
+## [0.0.8] - 2024-06-06
+### Added
+- Introduced a new monitor module
+  - Data running in any agentUniverse can be collected and observed
+- Added webserver post_fork functionality
+  - Provides multi-node process intervention capabilities after starting the webserver in agentUniverse
+- Introduced SQLDB_WRAPPER wrapper class, offering typical database connection methods
+  - Through the SQLDB_WRAPPER wrapper class, you can conveniently connect to various databases and storage technologies including SQLite, MySQL, Oracle, PostgreSQL, and others
+- Added connection support for Milvus vector database component
+
+For more usage of the above features, please pay attention to the agentUniverse guidebook.
+
+### Changed
+- Flask is set as the default webserver startup method across all platforms, with gunicorn and gRPC capabilities disabled by default
+  - In the previous version, we found slight compatibility differences with gunicorn and gRPC across different operating systems. Thus, we have made Flask the primary startup method for all platforms. You can enable gunicorn and gRPC in the configuration as needed.
+
+### Security
+- Some aU dependencies were identified to have security vulnerabilities in third-party packages. For security reasons, we have upgraded their versions, with the main changes including:
+  - requests (^2.31.0 -> ^2.32.0)
+  - flask (^2.2 -> ^2.3.2)
+  - werkzeug (^2.2.2 -> ^3.0.3)
+  - langchain (0.0.352 -> 0.1.20)
+  - langchain-core (0.1.3 -> 0.1.52)
+  - langchain-community (no version lock -> 0.0.38)
+  - gunicorn (21.2.0 -> ^22.0.0)
+  - Jinja2 (no version lock -> ^3.1.4)
+  - tqdm (no version lock -> ^4.66.3)
+If your system has external access, we strongly recommend installing version v0.0.8 of agentUniverse to mitigate the security risks posed by these third-party packages. For more detailed information, you can visit https://security.snyk.io.
+
+### Note
+- Some code optimizations and documentation updates.
+
 ## [0.0.7] - 2024-05-29
 ### Added
 - LLM component supports multimodal parameter invocation.

diff --git a/CHANGELOG_zh.md b/CHANGELOG_zh.md
@@ -24,6 +24,38 @@ Note - 对于版本的额外说明。
 ***************************************************
 
 # 版本更新记录
+## [0.0.8] - 2024-06-06
+### Added
+- 新增monitor模块
+  - 任何agentUniverse运行时的数据可以被采集与观测
+- 新增webserver post_fork功能
+  - 开放agentUniverse中webserver启动后的多节点流程干预功能
+- 新增SQLDB_WRAPPER包装类，提供典型数据库连接方式
+  - 通过SQLDB_WRAPPER包装类您可以非常方便的连接如SQLite、MySQL、Oracle、PostgreSQL、SQLite等几十种数据库与存储技术组件。
+- 新增milvus向量数据库组件连接
+
+上述功能更多用法请关注agentUniverse指导手册部分。
+
+### Changed
+- 全平台以flask作为默认webserver启动方式，gunicorn与GRPC能力置为默认关闭
+  - 我们在上个版本中发现不同的操作系统对于gunicorn与GRPC的兼容性会有略微差异，我们将flask作为全平台的第一启动方式，您可以按需在配置中开启gunicorn与GRPC。
+
+### Security
+- 部分aU依赖第三方包识别到安全漏洞，出于安全考虑我们对其进行版本升级，主要包含如下变动：
+  - requests (^2.31.0 -> ^2.32.0)
+  - flask (^2.2 -> ^2.3.2)
+  - werkzeug (^2.2.2 -> ^3.0.3)
+  - langchain (0.0.352 -> 0.1.20)
+  - langchain-core (0.1.3 -> 0.1.52)
+  - langchain-community (no version lock -> 0.0.38)
+  - gunicorn (21.2.0 -> ^22.0.0)
+  - Jinja2 (no version lock -> ^3.1.4)
+  - tqdm (no version lock -> ^4.66.3)
+若您的系统存在外部访问，强烈建议您安装v0.0.8版本的aU以规避这些三方包的安全风险，更详细的说明您可以关注https://security.snyk.io。
+
+### Note
+- 部分代码优化与文档更新
+
 ## [0.0.7] - 2024-05-29
 ### Added
 - LLM组件支持多模态参数调用

diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ Language version: [English](./README.md) | [中文](./README_zh.md)
 ![](https://img.shields.io/badge/framework-agentUniverse-pink)
 ![](https://img.shields.io/badge/python-3.10%2B-blue?logo=Python)
 [![](https://img.shields.io/badge/%20license-Apache--2.0-yellow)](LICENSE)
-[![Static Badge](https://img.shields.io/badge/pypi-v0.0.7-blue?logo=pypi)](https://pypi.org/project/agentUniverse/)
+[![Static Badge](https://img.shields.io/badge/pypi-v0.0.8-blue?logo=pypi)](https://pypi.org/project/agentUniverse/)
 
 ![](docs/guidebook/_picture/logo_bar.jpg)
 ****************************************

diff --git a/README_zh.md b/README_zh.md
@@ -5,7 +5,7 @@
 ![](https://img.shields.io/badge/framework-agentUniverse-pink)
 ![](https://img.shields.io/badge/python-3.10%2B-blue?logo=Python)
 [![](https://img.shields.io/badge/%20license-Apache--2.0-yellow)](LICENSE)
-[![Static Badge](https://img.shields.io/badge/pypi-v0.0.7-blue?logo=pypi)](https://pypi.org/project/agentUniverse/)
+[![Static Badge](https://img.shields.io/badge/pypi-v0.0.8-blue?logo=pypi)](https://pypi.org/project/agentUniverse/)
 
 ![](docs/guidebook/_picture/logo_bar.jpg)
 ****************************************

diff --git a/agentuniverse/agent/action/knowledge/store/milvus_store.py b/agentuniverse/agent/action/knowledge/store/milvus_store.py
@@ -0,0 +1,228 @@
+# !/usr/bin/env python3
+# -*- coding:utf-8 -*-
+
+# @Time    : 2024/5/30 10:22
+# @Author  : fanen.lhy
+# @Email   : [email protected]
+# @FileName: milvus_store.py
+from typing import List, Optional
+
+try:
+    from pymilvus import connections, Collection, CollectionSchema, \
+        FieldSchema, DataType, utility
+except ImportError as e:
+    raise ImportError(
+        "pymilvus is not installed. Please install it with 'pip install pymilvus'") from e
+
+from agentuniverse.agent.action.knowledge.store.document import Document
+from agentuniverse.agent.action.knowledge.store.query import Query
+from agentuniverse.agent.action.knowledge.store.store import Store
+
+DEFAULT_CONNECTION_ARGS = {
+    "host": "localhost",
+    "port": "19530"
+}
+
+DEFAULT_SEARCH_ARGS = {
+    "metric_type": "L2",
+    "params": {"nprobe": 10}
+}
+
+DEFAULT_INDEX_PARAMS = {
+    "metric_type": "L2",
+    "index_type": "HNSW",
+    "params": {"M": 8, "efConstruction": 64},
+}
+
+
+class MilvusStore(Store):
+    collection_name: Optional[str] = 'milvus_db'
+    collection: Collection = None
+    connection_name: str = 'default_connection'
+
+    def __init__(
+            self,
+            connection_args: dict = None,
+            **kwargs
+    ):
+        """Initialize the Milvus store class."""
+        super().__init__(**kwargs)
+        if not connection_args:
+            connection_args = DEFAULT_CONNECTION_ARGS
+        host = connection_args["host"]
+        port = connection_args["port"]
+        db_name = connection_args.get("db_name", "default")
+        self.connection_name = f"{host}_{port}_{db_name}"
+        self._connect_to_milvus(connection_args)
+        if utility.has_collection(self.collection_name,
+                                  using=self.connection_name):
+            self.collection = Collection(
+                self.collection_name, using=self.connection_name
+            )
+            self.collection.load()
+
+
+    def _connect_to_milvus(self, connection_args: dict):
+        """Connect to Milvus server."""
+        if not connections.has_connection(self.connection_name):
+            connections.connect(
+                alias=self.connection_name, **connection_args
+            )
+
+    def _create_or_load_collection(self,
+                                   dim: int = 128,
+                                   max_length: int = 65535,
+                                   index_params: dict = None):
+        """
+        Create a new collection or load an existing collection.
+
+        This method handles the creation of a new collection with specified parameters
+        or loads an existing collection if it already exists. Collections are used
+        to store data with specific dimensional attributes and indexing parameters.
+
+        Parameters:
+        - dim (int): The dimension of the collection, default is 128.
+        - max_length (int): The maximum length of the collection, default is 65535.
+        - index_params (dict, optional): Additional parameters for indexing. This
+          dictionary can include specific configurations for the index creation or loading.
+
+        Returns:
+        None
+        """
+        if utility.has_collection(self.collection_name,
+                                  using=self.connection_name):
+            self.collection = Collection(
+                self.collection_name, using=self.connection_name
+            )
+        else:
+            if not index_params:
+                index_params = DEFAULT_INDEX_PARAMS
+            fields = [
+                FieldSchema(name="id", dtype=DataType.VARCHAR, max_length=100,
+                            is_primary=True),
+                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR,
+                            dim=dim),
+                FieldSchema(name="text", dtype=DataType.VARCHAR,
+                            max_length=max_length),
+                FieldSchema(name="metadata", dtype=DataType.JSON)
+            ]
+            schema = CollectionSchema(fields, "Milvus collection schema")
+            self.collection = Collection(self.collection_name, schema,
+                                         using=self.connection_name)
+            self.collection.create_index(
+                field_name="embedding",
+                index_params=index_params
+            )
+            self.collection.load()
+
+    def query(self,
+              query: Query,
+              search_args: dict = None,
+              **kwargs) -> List[Document]:
+        """
+        Query the Milvus collection with the given query and return the top k results.
+
+        Parameters:
+        - query (Query): The query object that contains the parameters and data for the search.
+        - search_args (dict, optional): A dictionary of additional arguments for the search.
+          This can include parameters such as the number of results to return, specific search
+          algorithms, or other configurations.
+
+        Returns:
+        - List[Document]: A list of Document objects that are the top k results from the query.
+        """
+        if not self.collection:
+            return self.to_documents([])
+        embedding = query.embedding
+        if self.embedding_model is not None and len(embedding) == 0:
+            embedding = self.embedding_model.get_embeddings([query.query_str])[
+                0]
+        if not search_args:
+            search_args = DEFAULT_SEARCH_ARGS
+        if len(embedding) > 0:
+            query_result = self.collection.search(
+                data=[embedding],
+                anns_field="embedding",
+                param=search_args,
+                limit=query.similarity_top_k,
+                output_fields=["id", "text", "embedding", "metadata"]
+            )
+        else:
+            query_result = []
+        return self.to_documents(query_result)
+
+
+    def upsert_document(self,
+                        documents: List[Document],
+                        max_length: int = 65535,
+                        index_params: dict = None,
+                        **kwargs):
+        """
+        This method inserts new documents or updates existing documents in the collection.
+        The upsert operation ensures that the documents are either added if they do not
+        already exist or updated if they do.
+
+        Parameters:
+        - documents (List[Document]): A list of Document objects to be upserted into the collection.
+        - max_length (int): The maximum length of the collection, default is 65535.
+        - index_params (dict, optional): Additional parameters for indexing. This dictionary
+          can include specific configurations for the index creation or updating.
+
+        Returns:
+        None
+        """
+        for document in documents:
+            embedding = document.embedding
+            if self.embedding_model is not None and len(embedding) == 0:
+                embedding = self.embedding_model.get_embeddings([document.text])[0]
+            if not self.collection:
+                self._create_or_load_collection(
+                    dim=len(embedding),
+                    max_length=max_length,
+                    index_params=index_params
+                )
+            expr = f'id == "{document.id}"'
+            existing_docs = self.collection.query(expr)
+            entities = [
+                [document.id],
+                [embedding],
+                [document.text],
+                [document.metadata]
+            ]
+            if existing_docs:
+                self.collection.delete(expr)
+                self.collection.insert(entities)
+            else:
+                self.collection.insert(entities)
+            self.collection.load()
+
+    def insert_documents(self,
+                         documents: List[Document],
+                         max_length: int = 65535,
+                         index_params: dict = None,
+                         ):
+        """Insert documents to the Milvus collection."""
+        self.upsert_document(documents, max_length, index_params)
+
+    def update_document(self, documents: List[Document], **kwargs):
+        """Update document into the store."""
+        self.upsert_document(documents)
+
+    @staticmethod
+    def to_documents(query_result) -> List[Document]:
+        """Convert the query results of Milvus to the AgentUniverse(AU)
+        document format."""
+        if query_result is None:
+            return []
+        documents = []
+        for result in query_result:
+            for res in result:
+                documents.append(Document(
+                    id=res.fields.get("id"),
+                    text=res.fields.get("text"),
+                    embedding=res.fields.get("embedding")
+                    if res.fields.get("embedding") else [],
+                    metadata=res.fields.get("metadata")
+                    if res.fields.get("metadata") else None)
+                )
+        return documents
diff --git a/agentuniverse/agent/default/planning_agent/planning_agent.py b/agentuniverse/agent/default/planning_agent/planning_agent.py
@@ -5,7 +5,8 @@
 # @Email   : [email protected]
 # @FileName: planning_agent.py
 """Planning agent module."""
-import json
+from langchain.output_parsers.json import parse_json_markdown
+
 from agentuniverse.agent.agent import Agent
 from agentuniverse.agent.input_object import InputObject
 
@@ -44,7 +45,7 @@ def parse_result(self, planner_result: dict) -> dict:
             dict: Agent result object.
         """
         output = planner_result.get('output')
-        output = json.loads(output)
+        output = parse_json_markdown(output)
         planner_result['framework'] = output['framework']
         planner_result['thought'] = output['thought']
         return planner_result
diff --git a/agentuniverse/agent/default/reviewing_agent/reviewing_agent.py b/agentuniverse/agent/default/reviewing_agent/reviewing_agent.py
@@ -5,7 +5,8 @@
 # @Email   : [email protected]
 # @FileName: reviewing_agent.py
 """Reviewing Agent class."""
-import json
+from langchain.output_parsers.json import parse_json_markdown
+
 from agentuniverse.agent.agent import Agent
 from agentuniverse.agent.input_object import InputObject
 
@@ -46,7 +47,7 @@ def parse_result(self, planner_result: dict) -> dict:
         agent_result = dict()
 
         output = planner_result.get('output')
-        output = json.loads(output)
+        output = parse_json_markdown(output)
         is_useful = output.get('is_useful')
         if is_useful is None:
             is_useful = False

diff --git a/agentuniverse/agent/memory/default/default_memory.py b/agentuniverse/agent/memory/default/default_memory.py
@@ -6,7 +6,7 @@
 # @Email   : [email protected]
 # @FileName: default_memory.py
 from agentuniverse.agent.memory.chat_memory import ChatMemory
-from agentuniverse.llm.openai_llm import OpenAILLM
+from agentuniverse.llm.default.default_openai_llm import DefaultOpenAILLM
 
 
 class DefaultMemory(ChatMemory):
@@ -24,4 +24,4 @@ def __init__(self, **kwargs):
             default memory uses OpenAILLM(gpt-3.5-turbo) object as the memory llm.
         """
         super().__init__(**kwargs)
-        self.llm = OpenAILLM(model_name="gpt-3.5-turbo")
+        self.llm = DefaultOpenAILLM(model_name="gpt-4o")