Merge branch 'main' into async_run

ShoggothAI · Jun 3, 2024 · 7a03448 · 7a03448
2 parents 73356b2 + 05c3a32
commit 7a03448
Show file tree

Hide file tree

Showing 13 changed files with 223 additions and 131 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -12,8 +12,8 @@ jobs:
   build:
     strategy:
       matrix:
-        python-version: ["3.10", "3.11", "3.12"]
-        os: [ubuntu-latest, macos-latest, windows-latest]
+        python-version: ["3.12"]
+        os: [ubuntu-latest]
     runs-on: ${{ matrix.os }}
     steps:
     - uses: actions/checkout@v4
@@ -40,12 +40,18 @@ jobs:
         path: .venv
         key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
 
-    - name: Install dependencies
-      if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
-      run: poetry install --no-interaction --no-root
-
     - name: Install project
-      run: poetry install --no-interaction
+      run: poetry install --no-interaction --with dev --all-extras
 
-    - name: Run tests
+    - name: Run build
       run: poetry build
+
+    - name: Install pandoc
+      working-directory: ./docs/source
+      run: poetry run python install_pandoc.py
+
+    - name: Run docs build
+      env:
+        TZ: UTC
+      working-directory: ./docs
+      run: poetry run make html
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 motleycrew
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/docs/source/_autosummary/motleycrew.rst b/docs/source/_autosummary/motleycrew.rst
diff --git a/docs/source/images/crew_diagram.png b/docs/source/images/crew_diagram.png
diff --git a/docs/source/install_pandoc.py b/docs/source/install_pandoc.py
@@ -0,0 +1,11 @@
+import os
+import shutil
+from pypandoc.pandoc_download import download_pandoc
+
+pandoc_location = os.path.abspath("../../.venv/_pandoc")
+
+with open(os.environ["GITHUB_PATH"], "a") as path:
+    path.write(str(pandoc_location) + "\n")
+
+if not shutil.which("pandoc"):
+    download_pandoc(targetfolder=pandoc_location)
diff --git a/docs/source/key_concepts.rst b/docs/source/key_concepts.rst
@@ -0,0 +1,124 @@
+Key Concepts and API
+====================
+
+This is an overview of motleycrew's key concepts.
+
+If you want to see them in action, see our `research agent example <examples/research_agent.html>`_.
+
+For a basic introduction, you can check out the `quickstart <quickstart.html>`_.
+
+
+Crew and knowledge graph
+------------------------
+
+The crew is a central concept in motleycrew. It is the orchestrator that knows what tasks sould be done in which order,
+and manages the execution of those tasks.
+
+The crew has an underlying knowledge graph, in which it stores all information relevant to the execution of the tasks.
+Besides storing the tasks themselves, the knowledge graph can act as a universal storage for any kind of context
+that is relevant to the tasks. You can find more info on how to use the knowledge graph in the `tutorial <knowledge_graph.html>`_.
+
+We currently use `Kùzu <https://kuzudb.com/>`_  as a knowledge graph backend because it's embeddable,
+available under an MIT license, and is one of the LlamaIndex-supported KG backends -
+please raise an issue on GitHub if you'd like us to support others.
+
+The relationships between tasks are automatically stored in the KG backend; but the agents that are working
+on the tasks can also read and write any other context they want to share.
+
+.. code-block:: python
+
+    from motleycrew import MotleyCrew
+
+    crew = MotleyCrew()
+    crew.graph_store
+    # MotleyKuzuGraphStore(path=/path/to/kuzu_db)
+
+
+If you want to persist the data or otherwise customize the graph store, you can pass a graph store instance to the crew.
+
+.. code-block:: python
+
+    import kuzu
+    from motleycrew.storage import MotleyKuzuGraphStore
+
+    database = kuzu.Database(database_path="kuzu_db")
+    graph_store = MotleyKuzuGraphStore(database=database)
+    crew = MotleyCrew(graph_store=graph_store)
+
+
+Tasks, task units, and workers
+------------------------------
+
+In motleycrew, a **task** is a body of work that is carried out according to certain rules. The task provides the crew
+with a description of what needs to be done in the form of **task units**, and who must do it - that's called a
+**worker**. A worker can be an agent, a tool, or for that matter any Runnable (in the Langchain sense).
+
+The worker receives a task unit as an input, processes it, and returns a result.
+
+In a simple case, a task will have a single task unit, and becomes completed as soon as the unit is done.
+For such cases, motleycrew provides a `SimpleTask` class, which basically contains an agent and a prompt.
+Refer to the `blog with images <examples/blog_with_images.html>`_ example for a more elaborate illustration.
+
+.. code-block:: python
+
+    from motleycrew.tasks import SimpleTask
+
+    crew = MotleyCrew()
+    agent = ...
+    task = SimpleTask(crew=crew, agent=agent, name="example task", description="Do something")
+
+    crew.run()
+    print(task.output)
+
+This task is basically a prompt ("Do something") that is fed to the provided agent. The task will be completed as
+soon as the agent finishes processing the only task unit.
+
+For describing more complex tasks, you should subclass the `Task` class. It has two abstract
+methods that you should implement: ``get_next_unit`` and ``get_worker``, as well as some optional methods
+that you can override to customize the task's behavior.
+
+#. ``get_next_unit`` should return the next task unit to be processed. If there are no units to do at the moment, it should return `None`.
+#. ``get_worker`` should return the worker (typically an agent) that will process the task's units.
+#. `optional` ``register_started_unit`` is called by the crew when a task unit is dispatched. By default, it just connects the unit to the task in the graph.
+#. `optional` ``register_completed_unit`` is called by the crew when a task unit is completed. By default, it does nothing.
+
+
+Task hierarchy
+--------------
+
+Tasks can be set to depend on other tasks, forming a directed acyclic graph. This is done by either calling a
+task's ``set_upstream`` method or by using the ``>>`` operator. The crew will then make sure that the upstream
+tasks are completed before starting the dependent task, and pass the former's output to the latter.
+
+.. code-block:: python
+
+    task1 = SimpleTask(crew=crew, agent=agent, name="first task", description="Do something")
+    task2 = SimpleTask(crew=crew, agent=agent, name="second task", description="Do something else")
+
+    task1 >> task2
+    crew.run()
+
+
+How the crew handles tasks
+--------------------------
+
+The crew queries the tasks for task units and dispatches them in a loop. The crew will keep running until either all
+tasks are completed or available tasks stop providing task units.
+
+A task is considered completed when it has ``done`` attribute set to ``True``. For example, in the case of `SimpleTask`,
+this happens when its only task unit is completed and the crew calls the task's ``register_completed_unit`` method.
+In case of a custom task, this behavior is up to the task's implementation.
+
+Available tasks are defined as tasks that have not been completed and have no incomplete
+upstream tasks. On each iteration, available tasks are queried for task units one by one,
+and the crew will dispatch the task unit to the worker that the task provides.
+
+When a task unit is dispatched, the crew adds it to the knowledge graph and calls the task's ``register_started_unit``
+method. When the worker finishes processing the task unit, the crew calls the task's ``register_completed_unit`` method.
+
+.. image:: images/crew_diagram.png
+    :alt: Crew main loop
+    :align: center
+
+Now that you know the basics, we suggest you check out the `research agent example <examples/research_agent.html>`_
+to see how it all works together.
diff --git a/examples/Key Concepts and API.ipynb b/examples/Key Concepts and API.ipynb
diff --git a/examples/Multi-step research agent.ipynb b/examples/Multi-step research agent.ipynb
@@ -20,6 +20,26 @@
     "When we decide we've done this for long enough (currently just a constraint on the number of nodes), we then walk back up the graph, first answering the leaf questions, then using these answers (along with the context retrieved for their parent question) to answer the parent question, etc. "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "98c1418b",
+   "metadata": {},
+   "source": [
+    "Technically speaking, the flow consists of two tasks, `QuestionTask` and `AnswerTask`. The `QuestionTask` starts with a user question, and embeds this into the graph as the first un-answered question. Its `get_next_unit()` method looks up all the as yet un-answered questions, and chooses the one that's most salient to the original question (so that question is the `TaskUnit` it returns). Its worker then retrieves the context (RAG-style) for that question, but instead of answering it, creates up to 3 further questions that would be most helpful to answer in order to answer the original question. We thus build up a tree of questions, where each non-leaf node has a retrieval context attached to it - all stored in the knowledge graph for easy retrieval. This goes on until we have enough questions (currently just a fixed number of iterations).\n",
+    "\n",
+    "The `AnswerTask` then rolls the tree back up. It ignores all the questions without a retrieved context; and the `TaskUnit` that its `get_next_unit()` returns is then any question that has no un-answered children. Its worker then proceeds to answer that question using its retrieved context and the answers from its children, if any. This goes on until we've worked our way back up to answering the original question."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6cbe252b",
+   "metadata": {},
+   "source": [
+    "This shows how the tasks can create `TaskUnit`s for themselves and for each other, which enables a whole new level of self-organization. \n",
+    "\n",
+    "The different `Task`s don't have to all form part of a connected DAG either. For example, two tasks could take turns creating `TaskUnit`s for one another - just one of many interaction patterns possible within the architecture."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,

diff --git a/examples/Quickstart.ipynb b/examples/Quickstart.ipynb
@@ -79,12 +79,11 @@
    "source": [
     "The functionality so far is convenient, allowing us to mix all the popular agents and tools, but otherwise fairly vanilla, little different from, for example, the CrewAI semantics. Fortunately, the above introduction just scratched the surface of the motleycrew `Task` API.\n",
     "\n",
-    "Each crew is automatically given an embedded [knowledge graph backend](knowledge_graph.html). We currently use [kuzu](https://github.com/kuzudb/kuzu) because it's embeddable, available under an MIT license, and is one of the LlamaIndex-supported KG backends - please raise an issue on GitHub if you'd like us to support others.\n",
-    "The relationships between tasks are automatically stored in the KG backend; but the agents that are working on the tasks can also read and write any other context they want to share.\n",
+    "In motleycrew, a task is basically a set of rules describing how to perform actions. It provides a **worker** (e.g. an agent) and sets of input data called **task units**. This allows defining workflows of any complexity concisely using crew semantics. For a deeper dive, check out the page on [key concepts](key_concepts.html).\n",
     "\n",
-    "A `Task` object must implement only two methods: `get_next_unit()` and `get_worker()`. The former returns a data object (`TaskUnit`) describing the next part of the task to be done (or `None` if there is nothing to be done for that particular `Task` at the moment), and the latter returns the worker (typically an agent) that this data object can be given to for execution. The crew keeps querying all available `Task`s for `TaskUnits` and dispatching them, until done.\n",
+    "The crew queries and dispatches available task units in a loop, managing task states using an embedded [knowledge graph](knowledge_graph.html).\n",
     "\n",
-    "You can see how this dispatch method easily supports different execution backends, from synchronous to asyncio, threaded, etc.\n"
+    "This dispatch method easily supports different execution backends, from synchronous to asyncio, threaded, etc.\n"
    ]
   },
   {
@@ -94,29 +93,18 @@
    "source": [
     "### Example: Recursive question-answering in the research agent\n",
     "\n",
-    "An example of the power of this approach is the [research agent](examples/research_agent.html). It consists of two tasks, `QuestionTask` and `AnswerTask`. The `QuestionTask` starts with a user question, and embeds this into the graph as the first un-answered question. Its `get_next_unit()` method looks up all the as yet un-answered questions, and chooses the one that's most salient to the original question (so that question is the `TaskUnit` it returns). Its worker then retrieves the context (RAG-style) for that question, but instead of answering it, creates up to 3 further questions that would be most helpful to answer in order to answer the original question. We thus build up a tree of questions, where each non-leaf node has a retrieval context attached to it - all stored in the knowledge graph for easy retrieval. This goes on until we have enough questions (currently just a fixed number of iterations).\n",
-    "\n",
-    "The `AnswerTask` then rolls the tree back up. It ignores all the questions without a retrieved context; and the `TaskUnit` that its `get_next_unit()` returns is then any question that has no un-answered children. Its worker then proceeds to answer that question using its retrieved context and the answers from its children, if any. This goes on until we've worked our way back up to answering the original question.\n",
+    "Motleycrew architecture described above easily allows to generate task units on the fly, if needed. An example of the power of this approach is the [research agent](examples/research_agent.html) that dynamically generates new questions based on retrieved context for previous questions.  \n",
+    "This example also shows how workers can collaborate via the shared knowledge graph, storing all necessary data in a way that is natural to the task.\n",
     "\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1762f89f-96c3-4e93-ba2f-4aa8accfb14a",
+   "id": "2cafa282-2111-4051-bf0f-7046048648bd",
    "metadata": {},
    "source": [
-    "This shows how the tasks can create `TaskUnit`s for themselves and for each other, which enables a whole new level of self-organization. \n",
-    "\n",
-    "The different `Task`s don't have to all form part of a connected DAG either. For example, two tasks could take turns creating `TaskUnit`s for one another - just one of many interaction patterns possible within the architecture."
+    "  "
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "220c0707-64f8-4415-b9b5-2b730672b5b7",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

diff --git a/motleycrew/agents/langchain/langchain.py b/motleycrew/agents/langchain/langchain.py
@@ -105,7 +105,7 @@ def from_function(
             llm = init_llm(llm_framework=LLMFramework.LANGCHAIN)
 
         if require_tools and not tools:
-            raise ValueError("You must provide at least one tool to the ReactMotleyAgent")
+            raise ValueError("You must provide at least one tool to the LangchainMotleyAgent")
 
         def agent_factory(tools: dict[str, MotleyTool]):
             langchain_tools = [t.to_langchain_tool() for t in tools.values()]

diff --git a/motleycrew/common/enums.py b/motleycrew/common/enums.py
@@ -6,6 +6,7 @@ class LLMFamily:
 
     """
     OPENAI = "openai"
+    ANTHROPIC = "anthropic"
 
 
 class LLMFramework: