diff --git a/CHANGELOG.md b/CHANGELOG.md index b700a4cf..3f2e282d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,9 +21,9 @@ - `ScenarioList` method `give_valid_names()` allows you to automatically generate valid Pythonic identifiers for scenario keys. -- `ScenarioList` method `group_by()` allows you to group scenarios by specified identifies and apply a function to the values of the specified variables. +- `ScenarioList` method `group_by()` allows you to group scenarios by specified identities and apply a function to the values of the specified variables. -- `ScenarioList` method `from_wikipedia_table()` allows you to convert a Wikipedia table into a scenario list. Example usage: https://www.expectedparrot.com/content/247589dd-ad1e-45f4-9c82-e71dbeac8c96 (Notebook: *Using an LLM to Augment Existing Tabular Data*) +- `ScenarioList` method `from_wikipedia_table()` allows you to convert a Wikipedia table into a scenario list. Example usage: https://docs.expectedparrot.com/en/latest/notebooks/scenario_list_wikipedia.html - `ScenarioList` method `to_docx()` allows you to export scenario lists as structured Docx documents. @@ -35,7 +35,7 @@ - `Results` methods `generate_html` and `save_html` can be called to generate and save HTML code for displaying results. -- Ability to run a `Model` with a boolean parameter `raise_validation_errors = False` or `raise_validation_errors = True`. If False, exceptions will only be raised (interrupting survey execution) when the model returns nothing at all. +- Ability to run a `Model` with a boolean parameter `raise_validation_errors = False` or `raise_validation_errors = True`. If False, exceptions will only be raised (interrupting survey execution) when the model returns nothing at all. Another optional parameter `print_exceptions = False` can be passed to not print exceptions at all. ### Changed - Improvements to exceptions reports. diff --git a/docs/conf.py b/docs/conf.py index a75af2b2..a9479cb8 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -82,4 +82,6 @@ def setup(app): "github_user": "", "github_repo": "", "github_version": "", -} \ No newline at end of file +} + +nbsphinx_allow_errors = True \ No newline at end of file diff --git a/docs/exceptions.rst b/docs/exceptions.rst index 1ab41869..c7e8c252 100644 --- a/docs/exceptions.rst +++ b/docs/exceptions.rst @@ -4,67 +4,7 @@ Exceptions & Debugging ====================== An exception is an error that occurs during the execution of a question or survey. -When an exception is raised, EDSL will display a message about the error that includes a link to a report with more details. - -Example -------- - -Here's an example of a poorly written question that is likely to raise an exception: - -.. code-block:: python - - from edsl.questions import QuestionMultipleChoice - - q = QuestionMultipleChoice( - question_name = "bad_instruction", - question_text = "What is your favorite color?", - question_options = ["breakfast", "lunch", "dinner"] # Non-sensical options for the question - ) - - results = q.run() - - -The above code will likely raise a `QuestionAnswerValidationError` exception because the question options are not related to the question text. -Output: - -.. code-block:: text - - Attempt 1 failed with exception:Answer code must be a string, a bytes-like object or a real number (got Invalid). now waiting 1.00 seconds before retrying.Parameters: start=1.0, max=60.0, max_attempts=5. - - - Attempt 2 failed with exception:Answer code must be a string, a bytes-like object or a real number (got The question asks for a favorite color, but the options provided are meal times, not colors. Therefore, I cannot select an option that accurately reflects a favorite color.). now waiting 2.00 seconds before retrying.Parameters: start=1.0, max=60.0, max_attempts=5. - - - Attempt 3 failed with exception:Answer code must be a string, a bytes-like object or a real number (got The question does not match the provided options as they pertain to meals, not colors.). now waiting 4.00 seconds before retrying.Parameters: start=1.0, max=60.0, max_attempts=5. - - - Attempt 4 failed with exception:Answer code must be a string, a bytes-like object or a real number (got This is an invalid question since colors are not listed as options. The options provided are meals, not colors.). now waiting 8.00 seconds before retrying.Parameters: start=1.0, max=60.0, max_attempts=5. - - - Exceptions were raised in 1 out of 1 interviews. - - Open report to see details. - - -Exceptions report ------------------ - -The exceptions report can be accessed by clicking on the link provided in the exceptions message. -It contains details on the exceptions that were raised: - -.. image:: /static/exceptions_message.png - :width: 800 - :align: center - - -Performance plot -^^^^^^^^^^^^^^^^ - -The report includes a Performance Plot with graphical details about the API calls that were made (started, failed, in progress, canceled, etc.; scroll to the end of the report to view it): - -.. image:: /static/exceptions_performance_plot.png - :width: 800 - :align: center +When an exception is raised, EDSL will display a message about the error and an interactive report with more details in a new browser tab. Help debugging @@ -77,16 +17,12 @@ You can use the following code to generate a link to your notebook: .. code-block:: python - from edsl import Coop, notebook - - coop = Coop() - - notebook = Notebook(path="path/to/your/notebook.ipynb") + from edsl import notebook - coop.create(notebook, description="Notebook with code that raises an exception", visibility="private") + n = Notebook(path="path/to/your/notebook.ipynb") + n.push(description="Notebook with code that raises an exception", visibility="private") -A notebook showing the above example question and exception message is available at the Coop: https://www.expectedparrot.com/content/f6a19c77-3f57-4900-b0c9-436058a2ad27 Common exceptions @@ -113,14 +49,6 @@ The default settings (which can be modified) are as follows: MAX_QUESTION_LENGTH = 100000 -JSON errors -^^^^^^^^^^^ - -Some exceptions may indicate that the response from the language model is not properly formatted JSON. -This can be caused by a problem with the inference provider or the way that the question has been constructed (e.g., the model is not capable of following the question prompts as written). -A useful starting point for debugging these exceptions is to check the `Settings` class for the `Questions` model (see *Answer validation errors* above) and try variations in the question prompts and types (e.g., does `QuestionFreeText` produce an answer to the same question formatted as a different question type). - - Missing API key ^^^^^^^^^^^^^^^ diff --git a/docs/index.rst b/docs/index.rst index 5181a847..5958e431 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -66,7 +66,7 @@ Working with Results - :ref:`results`: Access built-in methods for analyzing and utilizing survey results as datasets. - :ref:`caching`: Learn about caching and sharing results. - :ref:`exceptions`: Identify and handle exceptions in your survey design. -- :ref:`token_limits`: Manage token limits for language models. +- :ref:`token_usage`: Manage token limits for language models, and monitor and reduce token usage as desired. Coop ---- @@ -147,7 +147,7 @@ Information about additional functionality for developers. results data exceptions - token_limits + token_usage .. toctree:: :maxdepth: 2 @@ -171,6 +171,7 @@ Information about additional functionality for developers. :caption: How-to Guides :hidden: + notebooks/edsl_intro.ipynb notebooks/data_labeling_example.ipynb notebooks/image_scenario_example.ipynb notebooks/question_loop_scenario.ipynb @@ -190,6 +191,7 @@ Information about additional functionality for developers. :caption: Notebooks :hidden: + notebooks/next_token_probs.ipynb notebooks/scenariolist_unpivot.ipynb notebooks/nps_survey.ipynb notebooks/agentifying_responses.ipynb diff --git a/docs/notebooks/edsl_intro.ipynb b/docs/notebooks/edsl_intro.ipynb new file mode 100644 index 00000000..4ff497ac --- /dev/null +++ b/docs/notebooks/edsl_intro.ipynb @@ -0,0 +1,1019 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e3d6e645-2d59-42f5-9b01-252abff36f4c", + "metadata": {}, + "source": [ + "# Intro to EDSL\n", + "This notebook provides example code for base components of [EDSL, an open-source libary](https://github.com/expectedparrot/edsl) for simulating surveys, experiments and other research with AI agents and large language models. Details on the code below are provided in accompanying [slides: How to use EDSL](https://docs.google.com/presentation/d/10GxXhzu_TD09vN0gJhfne0Zum-GF5R-ppzTXb5IUKlU/edit?usp=sharing).\n", + "\n", + "## Technical setup\n", + "Before running the code below, please ensure that you have [installed the EDSL library](https://docs.expectedparrot.com/en/latest/installation.html) and either [activated remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) from your [Coop account](https://docs.expectedparrot.com/en/latest/coop.html) or [stored API keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for the language models that you want to use with EDSL. \n", + "\n", + "## Documentation\n", + "Please also see our [documentation page](https://docs.expectedparrot.com/) for tips, tutorials and more demo notebooks on using EDSL." + ] + }, + { + "cell_type": "markdown", + "id": "943c4147-7ea8-4953-9c8c-07f3c12d4726", + "metadata": {}, + "source": [ + "## Simple example\n", + "We start by [selecting a question type](https://docs.expectedparrot.com/en/latest/questions.html) and constructing a question in the relevant template:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "6179718e-0add-4c41-b690-3eb81ce6e3ca", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionMultipleChoice\n", + "\n", + "q = QuestionMultipleChoice(\n", + " question_name = \"marvel_movies\",\n", + " question_text = \"Do you enjoy Marvel movies?\",\n", + " question_options = [\"Yes\", \"No\", \"I do not know\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f988b55f-1569-445a-87bc-0d1602b4ba14", + "metadata": {}, + "source": [ + "We administer a question by calling the `run()` method. \n", + "This generates a dataset of `Results` including the model's response to the question:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "04a6ce5d-d1a9-48d6-862f-a818c0e3486c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━┓\n",
+       "┃ answer         ┃\n",
+       "┃ .marvel_movies ┃\n",
+       "┡━━━━━━━━━━━━━━━━┩\n",
+       "│ I do not know  │\n",
+       "└────────────────┘\n",
+       "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.marvel_movies\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mI do not know \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results = q.run()\n", + "\n", + "results.select(\"marvel_movies\").print(format=\"rich\")" + ] + }, + { + "cell_type": "markdown", + "id": "b85c5d99-06e3-4ce9-a266-3448c58fb77e", + "metadata": {}, + "source": [ + "## Designing AI agents\n", + "We can [create personas for agents](https://docs.expectedparrot.com/en/latest/agents.html) to answer the question:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "fb3bae2b-2aa0-4cdd-9acb-efe89bb409be", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import AgentList, Agent\n", + "\n", + "personas = [\"comic book collector\", \"movie critic\"]\n", + "\n", + "a = AgentList(\n", + " Agent(traits = {\"persona\": p}) for p in personas\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "aafdf590-0996-478c-b568-6dba2f45c3a3", + "metadata": {}, + "source": [ + "## Selecting language models\n", + "We can [select language models](https://docs.expectedparrot.com/en/latest/language_models.html) to generate the responses (in the example above we did not specify a model, so GPT 4 preview was used by default):" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "741e202a-7a90-4bc7-891d-11dbe489da9b", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import ModelList, Model\n", + "\n", + "models = [\"gpt-4o\", \"claude-3-5-sonnet-20240620\"]\n", + "\n", + "m = ModelList(\n", + " Model(m) for m in [\"gpt-4o\", \"claude-3-5-sonnet-20240620\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ae58cb3f-6088-4070-b85a-736dbca5cb31", + "metadata": {}, + "source": [ + "## Generating results\n", + "We add agents and models to a question when running it:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "a3ccf07e-b4f9-4b85-9618-4358e874c35c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓\n",
+       "┃ model                       agent                 answer         ┃\n",
+       "┃ .model                      .persona              .marvel_movies ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩\n",
+       "│ gpt-4o                      comic book collector  Yes            │\n",
+       "├────────────────────────────┼──────────────────────┼────────────────┤\n",
+       "│ claude-3-5-sonnet-20240620  comic book collector  Yes            │\n",
+       "├────────────────────────────┼──────────────────────┼────────────────┤\n",
+       "│ gpt-4o                      movie critic          Yes            │\n",
+       "├────────────────────────────┼──────────────────────┼────────────────┤\n",
+       "│ claude-3-5-sonnet-20240620  movie critic          Yes            │\n",
+       "└────────────────────────────┴──────────────────────┴────────────────┘\n",
+       "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35magent \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.persona \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.marvel_movies\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcomic book collector\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcomic book collector\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mmovie critic \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mmovie critic \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────────┴──────────────────────┴────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results = q.by(a).by(m).run()\n", + "\n", + "results.select(\"model\", \"persona\", \"marvel_movies\").print(format=\"rich\")" + ] + }, + { + "cell_type": "markdown", + "id": "c1817b6e-341b-493c-8156-7fe634e2bc61", + "metadata": {}, + "source": [ + "## Parameterizing questions\n", + "We can use `Scenario` objects to [add data or content to questions](https://docs.expectedparrot.com/en/latest/scenarios.html):" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "72825f4e-31fd-47f2-9bcc-d3e8b88a5196", + "metadata": {}, + "outputs": [], + "source": [ + "q1 = QuestionMultipleChoice(\n", + " question_name = \"politically_motivated\",\n", + " question_text = \"\"\"\n", + " Read the following movie review and determine whether it is politically motivated.\n", + " Movie: {{ title }}\n", + " Review: {{ review }}\n", + " \"\"\",\n", + " question_options = [\"Yes\", \"No\", \"I do not know\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "7feb7f9f-d48b-436f-8b93-b976973e1964", + "metadata": {}, + "source": [ + "EDSL comes with [methods for generating scenarios from many data sources](https://docs.expectedparrot.com/en/latest/scenarios.html), including PDFs, CSVs, docs, images, tables, lists, dicts:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "47d17b5f-4847-4f4b-82dd-b76045204961", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import Scenario\n", + "\n", + "example_review = {\n", + " \"year\": 2014,\n", + " \"title\": \"Captain America: The Winter Soldier\",\n", + " \"review\": \"\"\"\n", + " Part superhero flick, part 70s political thriller. \n", + " It's a bold mix that pays off, delivering a scathing \n", + " critique of surveillance states wrapped in spandex \n", + " and shield-throwing action. \n", + " \"\"\"\n", + "}\n", + "\n", + "s = Scenario.from_dict(example_review)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "3cf68fa7-f276-4008-9201-27f8269a651c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
+       "┃ model                       scenario  scenario                             answer                 ┃\n",
+       "┃ .model                      .year     .title                               .politically_motivated ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
+       "│ claude-3-5-sonnet-20240620  2014      Captain America: The Winter Soldier  No                     │\n",
+       "├────────────────────────────┼──────────┼─────────────────────────────────────┼────────────────────────┤\n",
+       "│ gpt-4o                      2014      Captain America: The Winter Soldier  Yes                    │\n",
+       "└────────────────────────────┴──────────┴─────────────────────────────────────┴────────────────────────┘\n",
+       "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mscenario\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mscenario \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.year \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.title \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.politically_motivated\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m2014 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mCaptain America: The Winter Soldier\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNo \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼──────────┼─────────────────────────────────────┼────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m2014 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mCaptain America: The Winter Soldier\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────────┴──────────┴─────────────────────────────────────┴────────────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results = q1.by(s).by(a).by(m).run()\n", + "\n", + "(\n", + " results.filter(\"persona == 'movie critic'\")\n", + " .sort_by(\"model\")\n", + " .select(\"model\", \"year\", \"title\", \"politically_motivated\")\n", + " .print(format=\"rich\")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "2661c963-05ca-4e8c-ad76-360ed72b5680", + "metadata": {}, + "source": [ + "## Comments\n", + "Questions automatically include a \"comment\" field.\n", + "This can be useful for understanding the context of a response, or debugging a non-response." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "7ed7720d-5c2e-4437-b7ae-3b881620bbaa", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
+       "┃ model                       answer                  comment                                                   ┃\n",
+       "┃ .model                      .politically_motivated  .politically_motivated_comment                            ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
+       "│ claude-3-5-sonnet-20240620  No                      This review appears to be a straightforward critique of   │\n",
+       "│                                                     the film's genre-blending and themes, without any overt   │\n",
+       "│                                                     political agenda or bias influencing the assessment.      │\n",
+       "├────────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────┤\n",
+       "│ gpt-4o                      Yes                     The review mentions a \"scathing critique of surveillance  │\n",
+       "│                                                     states,\" which indicates that the film's themes and the   │\n",
+       "│                                                     review itself have political undertones.                  │\n",
+       "└────────────────────────────┴────────────────────────┴───────────────────────────────────────────────────────────┘\n",
+       "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mcomment \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.politically_motivated\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.politically_motivated_comment \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNo \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mThis review appears to be a straightforward critique of \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mthe film's genre-blending and themes, without any overt \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mpolitical agenda or bias influencing the assessment. \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mThe review mentions a \"scathing critique of surveillance \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mstates,\" which indicates that the film's themes and the \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mreview itself have political undertones. \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────────┴────────────────────────┴───────────────────────────────────────────────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "(\n", + " results.filter(\"persona == 'movie critic'\")\n", + " .sort_by(\"model\")\n", + " .select(\"model\", \"politically_motivated\", \"politically_motivated_comment\")\n", + " .print(format=\"rich\")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "27e864b1-1651-4f95-bb94-73d4699192b8", + "metadata": {}, + "source": [ + "## Combining questions in a survey\n", + "We can [combine questions in a `Survey`](https://docs.expectedparrot.com/en/latest/surveys.html) to administer them together.\n", + "Here we create some variations on the above question to compare responses:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "c50bc0cd-e18e-4985-9e0e-5d92302356d0", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionYesNo\n", + "\n", + "q2 = QuestionYesNo(\n", + " question_name = \"yn\",\n", + " question_text = \"\"\"\n", + " Read the following movie review and determine whether it is politically motivated.\n", + " Movie: {{ title }}\n", + " Review: {{ review }}\n", + " \"\"\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "42c7f350-699a-47a6-a216-09d6db15af8f", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionLinearScale\n", + "\n", + "q3 = QuestionLinearScale(\n", + " question_name = \"ls\",\n", + " question_text = \"\"\"\n", + " Read the following movie review and indicate whether it is politically motivated.\n", + " Movie: {{ title }}\n", + " Review: {{ review }}\n", + " \"\"\",\n", + " question_options = [0,1,2,3,4,5],\n", + " option_labels = {0:\"Not at all\", 5:\"Very much\"}\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "7afa788b-d4fe-4cf3-abc8-0554707b6ca8", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionList\n", + "\n", + "q4 = QuestionList(\n", + " question_name = \"favorites\",\n", + " question_text = \"List your favorite Marvel movies.\",\n", + " max_list_items = 3\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9415c681-f1d6-423f-bb23-d6cacff8f2e5", + "metadata": {}, + "source": [ + "## Survey rules & logic\n", + "We can [add skip/stop and other rules](https://docs.expectedparrot.com/en/latest/surveys.html), and \"memory\" of other questions in a survey:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "6cf2afdf-0d13-4a38-8620-72fc302a92ad", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import Survey\n", + "\n", + "survey = Survey(questions = [q2, q3, q4])\n", + "\n", + "survey = survey.add_stop_rule(q3, \"ls < 3\")" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "9c8cf39f-db9a-4cf0-bc62-6d04ab51a6aa", + "metadata": {}, + "outputs": [], + "source": [ + "results = survey.by(s).by(a).by(m).run()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "367f8bc5-0936-4fc1-8416-b8b4eafae804", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
+       "┃ model.model             agent.persona         Yes/No version  Linear scale version  Favorites               ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
+       "│ claude-3-5-sonnet-202…  comic book collector  No              2                     None                    │\n",
+       "├────────────────────────┼──────────────────────┼────────────────┼──────────────────────┼─────────────────────────┤\n",
+       "│ gpt-4o                  comic book collector  Yes             3                     ['The Avengers',        │\n",
+       "│                                                                                     'Guardians of the       │\n",
+       "│                                                                                     Galaxy', 'Spider-Man:   │\n",
+       "│                                                                                     Into the Spider-Verse'] │\n",
+       "└────────────────────────┴──────────────────────┴────────────────┴──────────────────────┴─────────────────────────┘\n",
+       "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35magent.persona \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mYes/No version\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mLinear scale version\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mFavorites \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-202…\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcomic book collector\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNo \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m2 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNone \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────┼──────────────────────┼────────────────┼──────────────────────┼─────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcomic book collector\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m3 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['The Avengers', \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m'Guardians of the \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mGalaxy', 'Spider-Man: \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mInto the Spider-Verse']\u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────┴──────────────────────┴────────────────┴──────────────────────┴─────────────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "(\n", + " results.filter(\"persona == 'comic book collector'\")\n", + " .select(\"model\", \"persona\", \"yn\", \"ls\", \"favorites\")\n", + " .print(pretty_labels = {\n", + " \"answer.yn\": \"Yes/No version\",\n", + " \"answer.ls\": \"Linear scale version\",\n", + " \"answer.favorites\": \"Favorites\"\n", + " }, format=\"rich\")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "4c5d032d-87df-40d4-9e01-d7e86bf33010", + "metadata": {}, + "source": [ + "## Working with results as datasets\n", + "EDSL provides [built-in methods for analyzing results](https://docs.expectedparrot.com/en/latest/results.html), e.g., as SQL tables, dataframes:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "261e72cc-1621-4370-bf39-e6163dd9b192", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
modelpersonaynlsfavorites
0claude-3-5-sonnet-20240620comic book collectorNo2None
1claude-3-5-sonnet-20240620movie criticNo2None
2gpt-4ocomic book collectorYes3['The Avengers', 'Guardians of the Galaxy', 'S...
3gpt-4omovie criticYes3['Iron Man', 'Black Panther', 'Avengers: Endga...
\n", + "
" + ], + "text/plain": [ + " model persona yn ls \\\n", + "0 claude-3-5-sonnet-20240620 comic book collector No 2 \n", + "1 claude-3-5-sonnet-20240620 movie critic No 2 \n", + "2 gpt-4o comic book collector Yes 3 \n", + "3 gpt-4o movie critic Yes 3 \n", + "\n", + " favorites \n", + "0 None \n", + "1 None \n", + "2 ['The Avengers', 'Guardians of the Galaxy', 'S... \n", + "3 ['Iron Man', 'Black Panther', 'Avengers: Endga... " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.sql(\"select model, persona, yn, ls, favorites from self\", shape=\"wide\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "402d49ea-c040-482b-a223-9c8cdcaf1da5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
answer.lsanswer.ynanswer.favoritesscenario.yearscenario.reviewscenario.titleagent.personaagent.agent_instructionagent.agent_namemodel.temperature...question_options.favorites_question_optionsquestion_type.favorites_question_typequestion_type.ls_question_typequestion_type.yn_question_typecomment.ls_commentcomment.yn_commentcomment.favorites_commentgenerated_tokens.yn_generated_tokensgenerated_tokens.favorites_generated_tokensgenerated_tokens.ls_generated_tokens
02NoNaN2014\\n Part superhero flick, part 70s political...Captain America: The Winter Soldiercomic book collectorYou are answering questions as if you were a h...Agent_10.5...NaNlistlinear_scaleyes_noAs a comic book collector, I don't see this re...Comment: As a comic book collector, I don't se...Task was cancelled.No\\n\\nComment: As a comic book collector, I do...NaN2\\n\\nAs a comic book collector, I don't see th...
12NoNaN2014\\n Part superhero flick, part 70s political...Captain America: The Winter Soldiermovie criticYou are answering questions as if you were a h...Agent_20.5...NaNlistlinear_scaleyes_noWhile the review mentions political themes lik...Comment: This review does not appear to be pol...Task was cancelled.No\\n\\nComment: This review does not appear to ...NaN2\\n\\nWhile the review mentions political theme...
23Yes['The Avengers', 'Guardians of the Galaxy', 'S...2014\\n Part superhero flick, part 70s political...Captain America: The Winter Soldiercomic book collectorYou are answering questions as if you were a h...Agent_10.5...NaNlistlinear_scaleyes_noThe review mentions the movie's critique of su...The review mentions a \"scathing critique of su...These movies capture the essence of Marvel's s...Yes\\n\\nThe review mentions a \"scathing critiqu...[\"The Avengers\", \"Guardians of the Galaxy\", \"S...3\\n\\nThe review mentions the movie's critique ...
33Yes['Iron Man', 'Black Panther', 'Avengers: Endga...2014\\n Part superhero flick, part 70s political...Captain America: The Winter Soldiermovie criticYou are answering questions as if you were a h...Agent_20.5...NaNlistlinear_scaleyes_noThe review highlights a \"scathing critique of ...The review mentions that the movie delivers \"a...These films stand out for their groundbreaking...Yes\\n\\nThe review mentions that the movie deli...[\"Iron Man\", \"Black Panther\", \"Avengers: Endga...3\\n\\nThe review highlights a \"scathing critiqu...
\n", + "

4 rows × 48 columns

\n", + "
" + ], + "text/plain": [ + " answer.ls answer.yn answer.favorites \\\n", + "0 2 No NaN \n", + "1 2 No NaN \n", + "2 3 Yes ['The Avengers', 'Guardians of the Galaxy', 'S... \n", + "3 3 Yes ['Iron Man', 'Black Panther', 'Avengers: Endga... \n", + "\n", + " scenario.year scenario.review \\\n", + "0 2014 \\n Part superhero flick, part 70s political... \n", + "1 2014 \\n Part superhero flick, part 70s political... \n", + "2 2014 \\n Part superhero flick, part 70s political... \n", + "3 2014 \\n Part superhero flick, part 70s political... \n", + "\n", + " scenario.title agent.persona \\\n", + "0 Captain America: The Winter Soldier comic book collector \n", + "1 Captain America: The Winter Soldier movie critic \n", + "2 Captain America: The Winter Soldier comic book collector \n", + "3 Captain America: The Winter Soldier movie critic \n", + "\n", + " agent.agent_instruction agent.agent_name \\\n", + "0 You are answering questions as if you were a h... Agent_1 \n", + "1 You are answering questions as if you were a h... Agent_2 \n", + "2 You are answering questions as if you were a h... Agent_1 \n", + "3 You are answering questions as if you were a h... Agent_2 \n", + "\n", + " model.temperature ... question_options.favorites_question_options \\\n", + "0 0.5 ... NaN \n", + "1 0.5 ... NaN \n", + "2 0.5 ... NaN \n", + "3 0.5 ... NaN \n", + "\n", + " question_type.favorites_question_type question_type.ls_question_type \\\n", + "0 list linear_scale \n", + "1 list linear_scale \n", + "2 list linear_scale \n", + "3 list linear_scale \n", + "\n", + " question_type.yn_question_type \\\n", + "0 yes_no \n", + "1 yes_no \n", + "2 yes_no \n", + "3 yes_no \n", + "\n", + " comment.ls_comment \\\n", + "0 As a comic book collector, I don't see this re... \n", + "1 While the review mentions political themes lik... \n", + "2 The review mentions the movie's critique of su... \n", + "3 The review highlights a \"scathing critique of ... \n", + "\n", + " comment.yn_comment \\\n", + "0 Comment: As a comic book collector, I don't se... \n", + "1 Comment: This review does not appear to be pol... \n", + "2 The review mentions a \"scathing critique of su... \n", + "3 The review mentions that the movie delivers \"a... \n", + "\n", + " comment.favorites_comment \\\n", + "0 Task was cancelled. \n", + "1 Task was cancelled. \n", + "2 These movies capture the essence of Marvel's s... \n", + "3 These films stand out for their groundbreaking... \n", + "\n", + " generated_tokens.yn_generated_tokens \\\n", + "0 No\\n\\nComment: As a comic book collector, I do... \n", + "1 No\\n\\nComment: This review does not appear to ... \n", + "2 Yes\\n\\nThe review mentions a \"scathing critiqu... \n", + "3 Yes\\n\\nThe review mentions that the movie deli... \n", + "\n", + " generated_tokens.favorites_generated_tokens \\\n", + "0 NaN \n", + "1 NaN \n", + "2 [\"The Avengers\", \"Guardians of the Galaxy\", \"S... \n", + "3 [\"Iron Man\", \"Black Panther\", \"Avengers: Endga... \n", + "\n", + " generated_tokens.ls_generated_tokens \n", + "0 2\\n\\nAs a comic book collector, I don't see th... \n", + "1 2\\n\\nWhile the review mentions political theme... \n", + "2 3\\n\\nThe review mentions the movie's critique ... \n", + "3 3\\n\\nThe review highlights a \"scathing critiqu... \n", + "\n", + "[4 rows x 48 columns]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.to_pandas()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "bbf612a1-c8f1-4680-b154-53b7ff6865f0", + "metadata": {}, + "outputs": [], + "source": [ + "results.to_csv(\"marvel_movies_survey.csv\")" + ] + }, + { + "cell_type": "markdown", + "id": "fbbd6d8b-16c0-4d3c-827e-d5403420434a", + "metadata": {}, + "source": [ + "## Posting to the Coop" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "4f9772c7-f7e6-4e15-a360-dabe84529f55", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [], + "source": [ + "from edsl import Notebook" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "ab6a7452-3a82-4722-9d08-4d8e6a9230e9", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [], + "source": [ + "n = Notebook(path = \"edsl_intro.ipynb\")" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "5737e568-9129-4eb4-88c3-edbfc0d68d8b", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'description': 'Example survey: Using EDSL to analyze content',\n", + " 'object_type': 'notebook',\n", + " 'url': 'https://www.expectedparrot.com/content/b3c45b82-5d3a-4d79-9e00-0ce1f8b1ff65',\n", + " 'uuid': 'b3c45b82-5d3a-4d79-9e00-0ce1f8b1ff65',\n", + " 'version': '0.1.33.dev1',\n", + " 'visibility': 'public'}" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n.push(description = \"Example survey: Using EDSL to analyze content\", visibility = \"public\")" + ] + }, + { + "cell_type": "markdown", + "id": "0b09751d-6c73-4dc8-8c73-427c5c0d190f", + "metadata": {}, + "source": [ + "To update an object at the Coop:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "46a618fc-2935-45b2-9384-f3836d02085e", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [], + "source": [ + "n = Notebook(path = \"edsl_intro.ipynb\") # resave" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "a882006f-1393-4d41-84e2-71bf4ac709d9", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'status': 'success'}" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n.patch(uuid = \"b3c45b82-5d3a-4d79-9e00-0ce1f8b1ff65\", value = n)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/notebooks/next_token_probs.ipynb b/docs/notebooks/next_token_probs.ipynb new file mode 100644 index 00000000..8e778620 --- /dev/null +++ b/docs/notebooks/next_token_probs.ipynb @@ -0,0 +1,1273 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d9c10634-d54f-489b-b80f-b37d330b3006", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "# Calculating next token probabilities\n", + "This notebook provides sample [EDSL](https://docs.expectedparrot.com/) code for using language models to simulate a survey and calculating next token probabilities for models' responses to survey questions.\n", + "\n", + "[EDSL is an open-source libary](https://github.com/expectedparrot/edsl) for simulating surveys, experiments and other research with AI agents and large language models. \n", + "Before running the code below, please ensure that you have [installed the EDSL library](https://docs.expectedparrot.com/en/latest/installation.html) and either [activated remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) from your [Coop account](https://docs.expectedparrot.com/en/latest/coop.html) or [stored API keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for the language models that you want to use with EDSL. Please also see our [documentation page](https://docs.expectedparrot.com/) for tips and tutorials on getting started using EDSL. " + ] + }, + { + "cell_type": "markdown", + "id": "3ef451ab-cd37-4b5d-b386-9a1b38b7c1c6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Research question" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "024ee414-3c10-433d-a9c7-0cd8492ac6ca", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "

Aspirational wealth...doing better than you parents...an "Opportunity Economy!"
NO!
All are late 20th century neoliberal tropes.
Americans today seek financial security.
Decent jobs and government policy that will pay for the needs of life and old age.
Understand that Democrats! pic.twitter.com/eR3hbx4wbX

— Dan Alpert (@DanielAlpert) September 10, 2024
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from IPython.display import HTML\n", + "HTML(\"\"\"

Aspirational wealth...doing better than you parents...an "Opportunity Economy!"
NO!
All are late 20th century neoliberal tropes.
Americans today seek financial security.
Decent jobs and government policy that will pay for the needs of life and old age.
Understand that Democrats! pic.twitter.com/eR3hbx4wbX

— Dan Alpert (@DanielAlpert) September 10, 2024
\"\"\")" + ] + }, + { + "cell_type": "markdown", + "id": "517beb9d-6330-4905-bfcf-d9d010eedab7", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Simulating survey responses\n", + "In the steps below we demonstrate how to use EDSL to simulate responses to the above question: \n", + "\n", + "#### *\"Which of the following is more important to you: Financial stability / Moving up the income ladder\"* " + ] + }, + { + "cell_type": "markdown", + "id": "cd892e03-0a1d-424a-af23-9e2dc82508f5", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Creating questions\n", + "We start by selecting a question type and constructing a question in the relevant template.\n", + "[EDSL comes with many common question types](https://docs.expectedparrot.com/en/latest/questions.html) that we can choose from based on the desired form of the response:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "8c2e8416-32ea-4ce5-94c2-a8af7022d1c1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from edsl import QuestionMultipleChoice" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "0e182ca7-46ac-4661-8182-160cb09f31b4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "q = QuestionMultipleChoice(\n", + " question_name = \"income_pref\",\n", + " question_text = \"Which of the following is more important to you: \",\n", + " question_options = [\"Financial stability\", \"Moving up the income ladder\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ab0b5222-af12-438c-ade2-4fd48f81a4e3", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Designing AI agents\n", + "We can design AI agents with relevant `traits` to answer the question:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "16a064c1-fe04-4797-b958-39c64647db9b", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from edsl import Agent" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4d67f43b-088c-4672-a939-bbacea52adb3", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "a = Agent(traits = {\"persona\": \"You are an American answering a poll from Pew.\"})" + ] + }, + { + "cell_type": "markdown", + "id": "1263a421-6749-410b-9b3e-a1c60db6f216", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Selecting language models\n", + "[EDSL works with many popular models](https://docs.expectedparrot.com/en/latest/language_models.html) that we can use to generate responses:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "a03320e2-dba7-4ab6-80a5-8f8fb1f5f527", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from edsl import Model" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "62bc5526-b035-4599-943b-2c037a6a9a38", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "m = Model(\"gpt-4o\", temperature = 1)" + ] + }, + { + "cell_type": "markdown", + "id": "65faeaff-483a-4f6a-bc0f-19e0eeae9908", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Running a survey\n", + "We administer the question by adding the agent and model and then running it.\n", + "We can specify the number of times to administer the question:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1abbcee9-721f-4397-9689-7684db5a2472", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "results = q.by(a).by(m).run(n = 20)" + ] + }, + { + "cell_type": "markdown", + "id": "7e8021e1-a7cf-4965-b9d9-3483707aeb5d", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "EDSL comes with [built-in methods for analyzing the dataset of `Results`](https://docs.expectedparrot.com/en/latest/results.html) that is generated:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "9479e0b6-9088-442f-8695-6f0abf5c77e6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n",
+       "┃ value                count ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n",
+       "│ Financial stability  20    │\n",
+       "└─────────────────────┴───────┘\n",
+       "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mvalue \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mcount\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mFinancial stability\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m20 \u001b[0m\u001b[2m \u001b[0m│\n", + "└─────────────────────┴───────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results.select(\"income_pref\").tally().print(format=\"rich\")" + ] + }, + { + "cell_type": "markdown", + "id": "bb3b1662-9a7b-4d77-93ab-7968ee1071d1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Calculating token probabilities\n", + "In the above example we specified ***n = 20*** to run the question (with the agent and model) 20 times.\n", + "\n", + "We can also get the probabilities from the model by passing ***logprobs = True*** to the `Model`.\n", + "\n", + "To simplify the token probabilities calculation, we can also specify ***use_code = True*** in the `Question` parameters. \n", + "This will cause the question to be presented to the model with coded options: 0 for \"Financial stability\" and 1 for \"Moving up the income ladder\":" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "849bd67f-52a0-4b3c-9e3b-0f86cfd676a4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "m = Model(\"gpt-4o\", temperature = 1, logprobs = True)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "d01b21e4-5e5b-4e9e-b72b-d6df1d90ac98", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "q = QuestionMultipleChoice(\n", + " question_name = \"income_pref\", \n", + " question_text = \"Which of the following is more important to you: \", \n", + " question_options = [\"Financial stability\", \"Moving up the income ladder\"], \n", + " use_code = True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "760adcd3-df98-4ab9-a3c3-196148a49763", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "new_results = q.by(a).by(m).run(n = 20)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "dc213bbb-f962-43d7-bfce-b8dc759308fe", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n",
+       "┃ value                count ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n",
+       "│ Financial stability  20    │\n",
+       "└─────────────────────┴───────┘\n",
+       "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mvalue \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mcount\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mFinancial stability\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m20 \u001b[0m\u001b[2m \u001b[0m│\n", + "└─────────────────────┴───────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "new_results.select(\"income_pref\").tally().print(format = \"rich\")" + ] + }, + { + "cell_type": "markdown", + "id": "b04e14f5-567e-4f73-8f07-587e28ae39fb", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Inspecting results\n", + "The `Results` include information about all the inputs and outputs relating to the question and response. \n", + "\n", + "To see a list of all the components that can be accessed and analyzed: " + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "64369291-3993-44f3-8011-6e7e3a039dd1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['agent.agent_instruction',\n", + " 'agent.agent_name',\n", + " 'agent.persona',\n", + " 'answer.income_pref',\n", + " 'comment.income_pref_comment',\n", + " 'generated_tokens.income_pref_generated_tokens',\n", + " 'iteration.iteration',\n", + " 'model.frequency_penalty',\n", + " 'model.logprobs',\n", + " 'model.max_tokens',\n", + " 'model.model',\n", + " 'model.presence_penalty',\n", + " 'model.temperature',\n", + " 'model.top_logprobs',\n", + " 'model.top_p',\n", + " 'prompt.income_pref_system_prompt',\n", + " 'prompt.income_pref_user_prompt',\n", + " 'question_options.income_pref_question_options',\n", + " 'question_text.income_pref_question_text',\n", + " 'question_type.income_pref_question_type',\n", + " 'raw_model_response.income_pref_cost',\n", + " 'raw_model_response.income_pref_one_usd_buys',\n", + " 'raw_model_response.income_pref_raw_model_response']" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.columns" + ] + }, + { + "cell_type": "markdown", + "id": "f098538f-d02b-4335-bcef-d26e7c6a57a6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "We can inspect the `raw_model_response.income_pref_raw_model_response` component to identify next token probabilities:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "7f24658d-25bc-47e3-9348-9415788e6d3d", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "example = new_results.select(\"raw_model_response.income_pref_raw_model_response\").to_list()[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "7aeda54b-3569-4ee0-9689-f1df41dd0559", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'token': '0', 'bytes': [48], 'logprob': -0.00018506382},\n", + " {'token': '1', 'bytes': [49], 'logprob': -8.750185},\n", + " {'token': '\\n', 'bytes': [10], 'logprob': -11.625185}]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next_token_probs = example['choices'][0]['logprobs']['content'][0]['top_logprobs']\n", + "next_token_probs" + ] + }, + { + "cell_type": "markdown", + "id": "3b056519-b81b-4482-a844-a274a676ab0c", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Translating the information" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "cf84aecd-60e7-4b9b-88fe-fb9ecfedd592", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Probability of selecting 'Financial stability' was 1.000\n", + "Probability of selecting 'Moving up the income ladder' was 0.000\n", + "Probability of selecting 'Skipped' was 0.000\n" + ] + } + ], + "source": [ + "import math\n", + "\n", + "# Specifying the codes for the answer options and non-responses:\n", + "options = {'0': \"Financial stability\", '1':\"Moving up the income ladder\", '\\n': \"Skipped\"}\n", + "\n", + "for token_info in next_token_probs:\n", + " option = options[token_info['token']]\n", + " p = math.exp(token_info['logprob'])\n", + " \n", + " print(f\"Probability of selecting '{option}' was {p:.3f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "bacfc8d7-0262-4e98-93b2-c1de1077d70a", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Comparing models\n", + "We can rerun the survey with other available models.\n", + "\n", + "To see a list of all available models:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "1f502108-fb71-4de3-94d7-f60aa7fa15a4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Model.available()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "0ae756f8-c55f-45c3-b959-acf23e568812", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "models = [Model(model_name) for model_name, _, _ in Model.available()]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "94291287-c56e-4a70-91e4-4bc2ff9e6cf6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "153" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(models)" + ] + }, + { + "cell_type": "markdown", + "id": "bc840f66-bac0-405b-a93d-99d876a32e5f", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "We know some models will not be appropriate; we can add `print_exceptions = False` to skip the error report:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "6738fade-55fb-43f3-b4e3-abfdd0ae8f0b", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "results_with_many_models = q.by(a).by(models).run(print_exceptions = False)" + ] + }, + { + "cell_type": "markdown", + "id": "a5950c39-7d68-4e4e-8b1f-5a05a9e90e8b", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Performance\n", + "We can check which models did/not answer the question, and filter out the non-responses:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "258af61a-8d1d-472a-b84b-cfdc8b13eb87", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n",
+       "┃ value                        count ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n",
+       "│ Financial stability          86    │\n",
+       "├─────────────────────────────┼───────┤\n",
+       "│ Moving up the income ladder  8     │\n",
+       "└─────────────────────────────┴───────┘\n",
+       "
\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mvalue \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mcount\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mFinancial stability \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m86 \u001b[0m\u001b[2m \u001b[0m│\n", + "├─────────────────────────────┼───────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mMoving up the income ladder\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m8 \u001b[0m\u001b[2m \u001b[0m│\n", + "└─────────────────────────────┴───────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "(\n", + " results_with_many_models\n", + " .filter('income_pref is not None')\n", + " .select('income_pref')\n", + " .tally()\n", + " .print(format = \"rich\")\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "d3691406-e3d4-472a-ac71-2fa3bfa58421", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "
model.model
01-ai/Yi-34B-Chat
Austism/chronos-hermes-13b-v2
Gryphe/MythoMax-L2-13b
Gryphe/MythoMax-L2-13b-turbo
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Phind/Phind-CodeLlama-34B-v2
Qwen/Qwen2-72B-Instruct
Qwen/Qwen2-7B-Instruct
Qwen/Qwen2.5-72B-Instruct
Sao10K/L3-70B-Euryale-v2.1
Sao10K/L3.1-70B-Euryale-v2.2
bigcode/starcoder2-15b
bigcode/starcoder2-15b-instruct-v0.1
chatgpt-4o-latest
claude-3-5-sonnet-20240620
claude-3-haiku-20240307
claude-3-opus-20240229
claude-3-sonnet-20240229
codellama/CodeLlama-34b-Instruct-hf
codellama/CodeLlama-70b-Instruct-hf
codestral-2405
codestral-latest
codestral-mamba-2407
cognitivecomputations/dolphin-2.6-mixtral-8x7b
cognitivecomputations/dolphin-2.9.1-llama-3-70b
databricks/dbrx-instruct
deepinfra/airoboros-70b
gemini-1.0-pro
gemini-1.5-flash
gemini-1.5-pro
gemini-pro
gemma-7b-it
gemma2-9b-it
google/codegemma-7b-it
google/gemma-1.1-7b-it
google/gemma-2-27b-it
google/gemma-2-9b-it
gpt-3.5-turbo-0125
gpt-3.5-turbo-16k
gpt-4
gpt-4-0125-preview
gpt-4-0613
gpt-4-1106-preview
gpt-4-turbo
gpt-4-turbo-2024-04-09
gpt-4-turbo-preview
gpt-4o
gpt-4o-2024-05-13
gpt-4o-2024-08-06
gpt-4o-mini
gpt-4o-mini-2024-07-18
lizpreciatior/lzlv_70b_fp16_hf
llama-3.1-70b-versatile
llama-3.1-8b-instant
llama3-70b-8192
llama3-8b-8192
llama3-groq-70b-8192-tool-use-preview
llama3-groq-8b-8192-tool-use-preview
mattshumer/Reflection-Llama-3.1-70B
meta-llama/Llama-2-13b-chat-hf
meta-llama/Llama-2-70b-chat-hf
meta-llama/Llama-2-7b-chat-hf
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-8B-Instruct
meta-llama/Meta-Llama-3.1-405B-Instruct
meta-llama/Meta-Llama-3.1-70B-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct
microsoft/Phi-3-medium-4k-instruct
mistral-large-2407
mistral-large-latest
mistral-medium
mistral-medium-2312
mistral-medium-latest
mistral-small-2402
mistral-small-2409
mistral-small-latest
mistral-tiny
mistral-tiny-2312
mistral-tiny-2407
mistral-tiny-latest
mistralai/Mistral-Nemo-Instruct-2407
mistralai/Mixtral-8x22B-v0.1
mistralai/Mixtral-8x7B-Instruct-v0.1
nvidia/Nemotron-4-340B-Instruct
open-mistral-7b
open-mistral-nemo
open-mistral-nemo-2407
open-mixtral-8x22b
open-mixtral-8x22b-2404
openchat/openchat-3.6-8b
pixtral
pixtral-12b
pixtral-12b-2409
pixtral-latest
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results_with_many_models.filter(\"income_pref is not None\").select(\"model\").print()" + ] + }, + { + "cell_type": "markdown", + "id": "30800f33-9da6-46dd-9340-842f7ff22f2d", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Posting to the Coop\n", + "The [Coop](https://www.expectedparrot.com/explore) is a platform for creating, storing and sharing LLM-based research.\n", + "It is fully integrated with EDSL and accessible from your workspace or Coop account page.\n", + "Learn more about [creating an account](https://www.expectedparrot.com/login) and [using the Coop](https://docs.expectedparrot.com/en/latest/coop.html).\n", + "\n", + "Here we demonstrate how to post this notebook:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "e122ab38-2a3e-482c-8848-35ae00ca0502", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [], + "source": [ + "from edsl import Notebook" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "fbfdeec6-a71a-4131-a72c-0afe721328c3", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [], + "source": [ + "n = Notebook(path = \"next_token_probs.ipynb\")" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "eea788b6-699b-4197-94cd-b5766f92dea4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'description': 'Example code for calculating next token probabilities',\n", + " 'object_type': 'notebook',\n", + " 'url': 'https://www.expectedparrot.com/content/8be8de45-006c-484a-b677-8e3bb25f8ff7',\n", + " 'uuid': '8be8de45-006c-484a-b677-8e3bb25f8ff7',\n", + " 'version': '0.1.33.dev1',\n", + " 'visibility': 'public'}" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n.push(description = \"Example code for calculating next token probabilities\", visibility = \"public\")" + ] + }, + { + "cell_type": "markdown", + "id": "12902cd7-9137-45c2-8e83-ec71cea7d5b9", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "To update an object at the Coop:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "eb70666f-ad60-40a5-abf2-da9d75c2a13e", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [], + "source": [ + "n = Notebook(path = \"next_token_probs.ipynb\") # resave" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "6c7997a7-7d96-4de5-927d-6442923a7384", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'status': 'success'}" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n.patch(uuid = \"8be8de45-006c-484a-b677-8e3bb25f8ff7\", value = n)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/scenarios.rst b/docs/scenarios.rst index d70e35f5..b755e03c 100644 --- a/docs/scenarios.rst +++ b/docs/scenarios.rst @@ -10,7 +10,7 @@ Purpose ------- Scenarios allow you create variations and versions of questions efficiently. -For example, we could create a question `"What is your favorite {{ item }}?"` and use scenarios to replace the parameter `item` with `color` or `food` or other items. +For example, we could create a question `"How much do you enjoy {{ activity }}?"` and use scenarios to replace the parameter `activity` with `running` or `reading` or other activities. When we add the scenarios to the question, the question will be asked multiple times, once for each scenario, with the parameter replaced by the value in the scenario. This allows us to straightforwardly administer multiple versions of the question together in a survey, either asynchronously (by default) or according to :ref:`surveys` rules that we can specify (e.g., skip/stop logic). @@ -31,11 +31,12 @@ To use a scenario, we start by creating a question that takes a parameter in dou .. code-block:: python - from edsl import QuestionFreeText + from edsl import QuestionMultipleChoice - q = QuestionFreeText( - question_name = "favorite_item", - question_text = "What is your favorite {{ item }}?", + q = QuestionMultipleChoice( + question_name = "enjoy", + question_text = "How much do you enjoy {{ activity }}?", + question_options = ["Not at all", "Somewhat", "Very much"] ) @@ -45,7 +46,7 @@ Next we create a dictionary for a value that will replace the parameter and stor from edsl import Scenario - scenario = Scenario({"item": "color"}) + scenario = Scenario({"activity": "running"}) We can inspect the scenario and see that it consists of the key/value pair that we created: @@ -60,7 +61,7 @@ This will return: .. code-block:: python { - "item": "color" + "activity": "running" } @@ -71,7 +72,7 @@ If multiple values will be used, we can create a list of `Scenario` objects: .. code-block:: python - scenarios = [Scenario({"item": item}) for item in ["color", "weekday"]] + scenarios = [Scenario({"activity": a}) for a in ["running", "reading"]] We can inspect the scenarios: @@ -85,7 +86,7 @@ This will return: .. code-block:: python - [Scenario({'item': 'color'}), Scenario({'item': 'weekday'})] + [Scenario({'activity': 'running'}), Scenario({'activity': 'reading'})] We can also create a `ScenarioList` object to store multiple scenarios: @@ -94,7 +95,7 @@ We can also create a `ScenarioList` object to store multiple scenarios: from edsl import ScenarioList - scenariolist = ScenarioList([Scenario({"item": item}) for item in ["color", "weekday"]]) + scenariolist = ScenarioList([Scenario({"activity": a}) for a in ["running", "reading"]]) We can inspect it: @@ -111,10 +112,10 @@ This will return: { "scenarios": [ { - "item": "color" + "activity": "running" }, { - "item": "weekday" + "activity": "reading" } ] } @@ -129,55 +130,64 @@ We use the `by()` method to add a scenario to a question when running it: .. code-block:: python - from edsl import QuestionFreeText, Scenario + from edsl import QuestionMultipleChoice, Scenario, Agent - q = QuestionFreeText( - question_name = "favorite_item", - question_text = "What is your favorite {{ item }}?", + q = QuestionMultipleChoice( + question_name = "enjoy", + question_text = "How much do you enjoy {{ activity }}?", + question_options = ["Not at all", "Somewhat", "Very much"] ) - scenario = Scenario({"item": "color"}) + s = Scenario({"activity": "running"}) - results = q.by(scenario).run() + a = Agent(traits = {"persona":"You are a human."}) + + results = q.by(s).by(a).run() We can check the results to verify that the scenario has been used correctly: .. code-block:: python - results.select("item", "favorite_item").print(format="rich") + results.select("activity", "enjoy").print(format="rich") This will print a table of the selected components of the results: .. code-block:: text - ┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ scenario ┃ answer ┃ - ┃ .item ┃ .favorite_item ┃ - ┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ - │ color │ My favorite color is blue. │ - └──────────┴────────────────────────────┘ + ┏━━━━━━━━━━━┳━━━━━━━━━━┓ + ┃ scenario ┃ answer ┃ + ┃ .activity ┃ .enjoy ┃ + ┡━━━━━━━━━━━╇━━━━━━━━━━┩ + │ running │ Somewhat │ + └───────────┴──────────┘ + +Looping +------- -We use the `loop()` method To add a scenario to a question when constructing it, passing a `ScenarioList`. -This will create a list containing a new question for each scenario that was passed. +We use the `loop()` method to add a scenario to a question when constructing it, passing it a `ScenarioList`. +This creates a list containing a new question for each scenario that was passed. Note that we can optionally include the scenario key in the question name as well; otherwise a unique identifies is automatically added to each question name. +For example: + .. code-block:: python - from edsl import QuestionFreeText, ScenarioList + from edsl import QuestionMultipleChoice, ScenarioList, Scenario - q = QuestionFreeText( - question_name = "favorite_{{ item }}", - question_text = "What is your favorite {{ item }}?", + q = QuestionMultipleChoice( + question_name = "enjoy_{{ activity }}", + question_text = "How much do you enjoy {{ activity }}?", + question_options = ["Not at all", "Somewhat", "Very much"] ) - scenariolist = ScenarioList( - Scenario({"item": item}) for item in ["color", "weekday"] + sl = ScenarioList( + Scenario({"activity": a}) for a in ["running", "reading"] ) - questions = q.loop(scenariolist) + questions = q.loop(sl) We can inspect the questions to see that they have been created correctly: @@ -191,31 +201,35 @@ This will return: .. code-block:: python - [Question('free_text', question_name = """favorite_color""", question_text = """What is your favorite color?"""), - Question('free_text', question_name = """favorite_weekday""", question_text = """What is your favorite weekday?""")] + [Question('multiple_choice', question_name = """enjoy_running""", question_text = """How much do you enjoy running?""", question_options = ['Not at all', 'Somewhat', 'Very much']), + Question('multiple_choice', question_name = """enjoy_reading""", question_text = """How much do you enjoy reading?""", question_options = ['Not at all', 'Somewhat', 'Very much'])] We can pass the questions to a survey and run it: .. code-block:: python - results = Survey(questions = questions).run() + from edsl import Survey, Agent + + survey = Survey(questions = questions) + + a = Agent(traits = {"persona": "You are a human."}) + + results = survey.by(a).run() results.select("answer.*").print(format="rich") -This will print a table of the response for each question: +This will print a table of the response for each question (note that "activity" is no longer in a separate scenario field): .. code-block:: text - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ answer ┃ answer ┃ - ┃ .favorite_color ┃ .favorite_weekday ┃ - ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ - │ My favorite color is blue. │ My favorite weekday is Friday because it marks the end of the workweek and the │ - │ │ beginning of the weekend, offering a sense of relief and anticipation for leisure │ - │ │ time. │ - └────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────┘ + ┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ + ┃ answer ┃ answer ┃ + ┃ .enjoy_reading ┃ .enjoy_running ┃ + ┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ + │ Very much │ Somewhat │ + └────────────────┴────────────────┘ Multiple parameters @@ -416,7 +430,7 @@ Say we have some results from a survey where we asked agents to choose a random question_text = "Choose a random number between 1 and 1000." ) - agents = [Agent({"persona":p}) for p in ["Dog catcher", "Magician", "Spy"]] + agents = [Agent({"persona":p}) for p in ["Child", "Magician", "Olympic breakdancer"]] results = q_random.by(agents).run() results.select("persona", "random").print(format="rich") @@ -426,23 +440,23 @@ Our results are: .. code-block:: text - ┏━━━━━━━━━━━━━┳━━━━━━━━━┓ - ┃ agent ┃ answer ┃ - ┃ .persona ┃ .random ┃ - ┡━━━━━━━━━━━━━╇━━━━━━━━━┩ - │ Dog catcher │ 472 │ - ├─────────────┼─────────┤ - │ Magician │ 537 │ - ├─────────────┼─────────┤ - │ Spy │ 528 │ - └─────────────┴─────────┘ + ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓ + ┃ agent ┃ answer ┃ + ┃ .persona ┃ .random ┃ + ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩ + │ Child │ 7 │ + ├─────────────────────┼─────────┤ + │ Magician │ 472 │ + ├─────────────────────┼─────────┤ + │ Olympic breakdancer │ 529 │ + └─────────────────────┴─────────┘ We can use the `to_scenario_list()` method turn components of the results into a list of scenarios to use in a new survey: .. code-block:: python - scenarios = results.to_scenario_list() + scenarios = results.select("persona", "random").to_scenario_list() # excluding other columns of the results scenarios @@ -451,9 +465,22 @@ We can inspect the scenarios to see that they have been created correctly: .. code-block:: text - [Scenario({'persona': 'Dog catcher', 'random': 472}), - Scenario({'persona': 'Magician', 'random': 537}), - Scenario({'persona': 'Spy', 'random': 528})] + { + "scenarios": [ + { + "persona": "Child", + "random": 7 + }, + { + "persona": "Magician", + "random": 472 + }, + { + "persona": "Olympic breakdancer", + "random": 529 + } + ] + } PDFs as textual scenarios @@ -500,43 +527,91 @@ See a demo notebook of this method in the notebooks section of the docs index: " Image scenarios ^^^^^^^^^^^^^^^ -The `Scenario` method `from_image('path/to/image_file')` turns a PNG into into a scenario to be used with an image model (e.g., GPT-4o). -The scenario has the following keys: `file_path`, `encoded_image`. - -Note that we do *not* need to use a placeholder `{{ text }}` in the question text in order to add the scenario to the question. -Instead, we simply write the question with no parameters and add the scenario to the survey when running it as usual. +The `Scenario` method `from_image('.png')` converts a PNG into into a scenario that can be used with an image model (e.g., `gpt-4o`). +This method generates a scenario with a single key - `` - that can be used in a question text the same as scenarios from other data sources. Example usage: .. code-block:: python - from edsl import QuestionFreeText, QuestionList, Scenario, Survey, Model + from edsl import Scenario + + s = Scenario.from_image("logo.png") # Replace with your own local file + + +Here we use the example scenario, which is the Expected Parrot logo: + +.. code-block:: python + + from edsl import Scenario + + s = Scenario.example(has_image = True) + + +We can verify the scenario key (the filepath for the image from which the scenario was generated): + +.. code-block:: python + + s.keys() + + +Output: + +.. code-block:: text + + ['logo'] + + +We can add the key to questions as we do scenarios from other data sources: + +.. code-block:: python + + from edsl import Model, QuestionFreeText, QuestionList, Survey - m = Model("gpt-4o") # Need to use a vision model for image scenarios + m = Model("gpt-4o") # This is the default model; we specify it for demonstration purposes to highlight that a vision model is needed q1 = QuestionFreeText( - question_name = "show", - question_text = "What does this image show?", + question_name = "identify", + question_text = "What animal is in this picture: {{ logo }}" # The scenario key is the filepath ) q2 = QuestionList( - question_name = "count", - question_text = "How many things are in this image?", + question_name = "colors", + question_text = "What colors do you see in this picture: {{ logo }}" ) survey = Survey([q1, q2]) - scenario = Scenario.from_image("path/to/image_file") + results = survey.by(s).run() + + results.select("logo", "identify", "colors").print(format="rich") + + +Output using the Expected Parrot logo: + +.. code-block:: text + + ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + ┃ answer ┃ answer ┃ + ┃ .identify ┃ .colors ┃ + ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ + │ The image shows a large letter "E" followed by a pair of │ ['gray', 'green', 'orange', 'pink', 'blue', 'black'] │ + │ square brackets containing an illustration of a parrot. │ │ + │ The parrot is green with a yellow beak and some red and │ │ + │ blue coloring on its body. This combination suggests the │ │ + │ mathematical notation for the expected value, often │ │ + │ denoted as "E" followed by a random variable in │ │ + │ brackets, commonly used in probability and statistics. │ │ + └──────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────┘ - results = survey.by(scenario).run() - results.select("file_path", "answer.*").print(format="rich") +See an example of this method in the notebooks section of the docs index: `Using images in a survey `_. Creating a scenario list from a list ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The `ScenarioList` method `from_list()` can be used to create a list of scenarios for a specified key and list of values that is passed. +The `ScenarioList` method `from_list()` creates a list of scenarios for a specified key and list of values that is passed to it. Example usage: @@ -571,15 +646,45 @@ This will return: Creating a scenario list from a dictionary ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The `ScenarioList` method `from_dict()` can be used to create a list of scenarios for a specified key and dictionary of values that is passed. +The `Scenario` method `from_dict()` creates a scenario for a dictionary that is passed to it. + +The `ScenarioList` method `from_nested_dict()` creates a list of scenarios for a specified key and nested dictionary. Example usage: .. code-block:: python + # Example dictionary + d = {"item": ["color", "food", "animal"]} + + + from edsl import Scenario + + scenario = Scenario.from_dict(d) + + scenario + + +This will return a single scenario for the list of items in the dict: + +.. code-block:: text + + { + "item": [ + "color", + "food", + "animal" + ] + } + + +If we instead want to create a scenario for each item in the list individually: + +.. code-block:: python + from edsl import ScenarioList - scenariolist = ScenarioList.from_dict({"item": ["color", "food", "animal"]}) + scenariolist = ScenarioList.from_nested_dict(d) scenariolist @@ -614,7 +719,7 @@ Example usage: from edsl import ScenarioList - scenarios = ScenarioList.from_wikipedia("https://en.wikipedia.org/wiki/1990s_in_film", 0) + scenarios = ScenarioList.from_wikipedia("https://en.wikipedia.org/wiki/1990s_in_film", 3) scenarios.print(format="rich") @@ -761,10 +866,11 @@ The scenarios can be used to ask questions about the data in the table: results = q_leads.by(scenarios).run() - (results - .sort_by("Title") - .select("Title", "leads") - .print(format="rich") + ( + results + .sort_by("Title") + .select("Title", "leads") + .print(format="rich") ) @@ -906,13 +1012,10 @@ Output: Creating a scenario list from a CSV ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The `ScenarioList` method `from_csv('path/to/csv')` can be used to create a list of scenarios from a CSV file. +The `ScenarioList` method `from_csv('.csv')` creates a list of scenarios from a CSV file. The method reads the CSV file and creates a scenario for each row in the file, with the keys as the column names and the values as the row values. - -Example usage: - -Say we have a CSV file with the following data: +For example, say we have a CSV file containing the following data: .. code-block:: text @@ -929,12 +1032,12 @@ We can create a list of scenarios from the CSV file: from edsl import ScenarioList - scenariolist = ScenarioList.from_csv("path/to/csv_file.csv") + scenariolist = ScenarioList.from_csv(".csv") scenariolist -This will return a list consisting of a scenario for each row with the keys as the column names and the values as the row values: +This will return a scenario for each row: .. code-block:: text @@ -987,7 +1090,7 @@ We can create a list of scenarios from the CSV file: from edsl import ScenarioList - scenariolist = ScenarioList.from_csv("path/to/csv_file.csv") + scenariolist = ScenarioList.from_csv(".csv") scenariolist = scenariolist.give_valid_names() @@ -1076,21 +1179,20 @@ Methods for un/pivoting and grouping scenarios There are a variety of methods for modifying scenarios and scenario lists. + Unpivoting a scenario list ^^^^^^^^^^^^^^^^^^^^^^^^^^ The `ScenarioList` method `unpivot()` can be used to unpivot a scenario list based on one or more specified identifiers. It takes a list of `id_vars` which are the names of the key/value pairs to keep in each new scenario, and a list of `value_vars` which are the names of the key/value pairs to unpivot. -Example usage: - -Say we have a scenario list for the above CSV file: +For example, say we have a scenario list for the above CSV file: .. code-block:: python from edsl import ScenarioList - scenariolist = ScenarioList.from_csv("path/to/csv_file.csv") + scenariolist = ScenarioList.from_csv(".csv") scenariolist @@ -1268,19 +1370,18 @@ This will return a list of scenarios with the `a` and `b` key/value pairs groupe Data labeling tasks ------------------- -Scenarios are particularly useful for conducting data labeling or data coding tasks, where we can design the task as a question or series of questions about each piece of data in our dataset. +Scenarios are particularly useful for conducting data labeling or data coding tasks, where the task can be designed as a survey of questions about each piece of data in a dataset. + For example, say we have a dataset of text messages that we want to sort by topic. -We could perform this task by running multiple choice questions such as `"What is the primary topic of this message: {{ message }}?"` or `"Does this message mention a safety issue? {{ message }}"` where each text message is inserted in the `message` placeholder of the question text. +We can perform this task by using a language model to answer questions such as `"What is the primary topic of this message: {{ message }}?"` or `"Does this message mention a safety issue? {{ message }}"`, where each text message is inserted in the `message` placeholder of the question text. -The following code demonstrates how to use scenarios to conduct this task. -For more step-by-step details, please see the next section below: `Constructing a Scenario`. +Here we use scenarios to conduct the task: .. code-block:: python - from edsl.questions import QuestionMultipleChoice - from edsl import Survey, Scenario + from edsl import QuestionMultipleChoice, Survey, Scenario - # Create a question with a parameter + # Create a question with that takes a parameter q1 = QuestionMultipleChoice( question_name = "topic", question_text = "What is the topic of this message: {{ message }}?", @@ -1313,7 +1414,7 @@ We can then analyze the results to see how the agent answered the questions for .. code-block:: python - results.select("message", "topic", "safety").print(format="rich") + results.select("message", "safety", "topic").print(format="rich") This will print a table of the scenarios and the answers to the questions for each scenario: @@ -1326,11 +1427,11 @@ This will print a table of the scenarios and the answers to the questions for ea ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ I can't log in... │ No │ Login issue │ ├───────────────────────────────┼─────────┼─────────────────┤ + │ I need help with a product... │ No │ Product support │ + ├───────────────────────────────┼─────────┼─────────────────┤ │ I need help with my bill... │ No │ Billing │ ├───────────────────────────────┼─────────┼─────────────────┤ │ I have a safety concern... │ Yes │ Safety │ - ├───────────────────────────────┼─────────┼─────────────────┤ - │ I need help with a product... │ Unclear │ Product support │ └───────────────────────────────┴─────────┴─────────────────┘ @@ -1344,8 +1445,7 @@ Note that the question texts are unchanged: .. code-block:: python - from edsl.questions import QuestionMultipleChoice - from edsl import Survey, Scenario + from edsl import QuestionMultipleChoice, Survey, ScenarioList, Scenario # Create a question with a parameter q1 = QuestionMultipleChoice( @@ -1366,11 +1466,11 @@ Note that the question texts are unchanged: {"message": "I need help with my bill...", "user": "Bob", "source": "Phone", "date": "2022-01-02"}, {"message": "I have a safety concern...", "user": "Charlie", "source": "Email", "date": "2022-01-03"}, {"message": "I need help with a product...", "user": "David", "source": "Chat", "date": "2022-01-04"} - ] - scenarios = [Scenario({"message": msg["message"], - "user": msg["user"], - "source": msg["source"], - "date": msg["date"]}) for msg in user_messages] + ] + + scenarios = ScenarioList( + Scenario.from_dict(m) for m in user_messages + ) # Create a survey with the question survey = Survey(questions = [q1, q2]) @@ -1378,23 +1478,26 @@ Note that the question texts are unchanged: # Run the survey with the scenarios results = survey.by(scenarios).run() + # Inspect the results + results.select("scenario.*", "answer.*").print(format="rich") + -We can then analyze the results to see how the agent answered the questions for each scenario, including the metadata: +We can see how the agent answered the questions for each scenario, together with the metadata that was not included in the question text: .. code-block:: text - ┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ - ┃ scenario ┃ scenario ┃ scenario ┃ scenario ┃ answer ┃ answer ┃ - ┃ .user ┃ .source ┃ .date ┃ .message ┃ .safety ┃ .topic ┃ - ┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ - │ Alice │ Customer support │ 2022-01-01 │ I can't log in... │ No │ Login issue │ - ├──────────┼──────────────────┼────────────┼───────────────────────────────┼─────────┼─────────────────┤ - │ Bob │ Phone │ 2022-01-02 │ I need help with my bill... │ No │ Billing │ - ├──────────┼──────────────────┼────────────┼───────────────────────────────┼─────────┼─────────────────┤ - │ Charlie │ Email │ 2022-01-03 │ I have a safety concern... │ Yes │ Safety │ - ├──────────┼──────────────────┼────────────┼───────────────────────────────┼─────────┼─────────────────┤ - │ David │ Chat │ 2022-01-04 │ I need help with a product... │ Unclear │ Product support │ - └──────────┴──────────────────┴────────────┴───────────────────────────────┴─────────┴─────────────────┘ + ┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓ + ┃ scenario ┃ scenario ┃ scenario ┃ scenario ┃ answer ┃ answer ┃ + ┃ .user ┃ .source ┃ .message ┃ .date ┃ .topic ┃ .safety ┃ + ┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩ + │ Alice │ Customer support │ I can't log in... │ 2022-01-01 │ Login issue │ No │ + ├──────────┼──────────────────┼───────────────────────────────┼────────────┼─────────────────┼─────────┤ + │ Bob │ Phone │ I need help with my bill... │ 2022-01-02 │ Billing │ No │ + ├──────────┼──────────────────┼───────────────────────────────┼────────────┼─────────────────┼─────────┤ + │ Charlie │ Email │ I have a safety concern... │ 2022-01-03 │ Safety │ Yes │ + ├──────────┼──────────────────┼───────────────────────────────┼────────────┼─────────────────┼─────────┤ + │ David │ Chat │ I need help with a product... │ 2022-01-04 │ Product support │ No │ + └──────────┴──────────────────┴───────────────────────────────┴────────────┴─────────────────┴─────────┘ To learn more about accessing, analyzing and visualizing survey results, please see the :ref:`results` section. @@ -1418,10 +1521,11 @@ Example usage: text_scenario = Scenario({"my_text": my_haiku}) - word_chunks_scenariolist = text_scenario.chunk("my_text", - num_words = 5, # use num_words or num_lines but not both - include_original = True, # optional - hash_original = True # optional + word_chunks_scenariolist = text_scenario.chunk( + "my_text", + num_words = 5, # use num_words or num_lines but not both + include_original = True, # optional + hash_original = True # optional ) word_chunks_scenariolist diff --git a/docs/token_usage.rst b/docs/token_usage.rst new file mode 100644 index 00000000..e0fefeb0 --- /dev/null +++ b/docs/token_usage.rst @@ -0,0 +1,180 @@ +.. _token_usage: + +Token usage +=========== + +EDSL comes with a variety of features for monitoring token usage. +These include: + +* A method for setting the requests per minute (RPM) and tokens per minute (TPM) for a model that you are using. +* Methods for turning off default prompt features to reduce token usage. +* Features for calculating next token probabilities. + + +Token limits +------------ + +Token limits refer to the maximum number of tokens that a language model can process in a single input prompt or output generation. +A token limit affects how much text you can send to a model in one go. +A language model provider should provide information about the token limits for each model that is associated with your account and API key. +When running a big job in EDSL, you may encounter token limits, which can be managed by adjusting the token limits for a model. + + +RPM: Requests Per Minute +^^^^^^^^^^^^^^^^^^^^^^^^ +RPM stands for Requests Per Minute, which measures the number of API requests that a user can make to a language model within a minute. +This is a metric for managing the load and traffic that a model can handle. + + +TPM: Tokens Per Minute +^^^^^^^^^^^^^^^^^^^^^^ +TPM stands for Tokens Per Minute, which is a metric for tracking the volume of tokens processed by a language model within a minute. +This metric typically tracks usage for billing purposes. + + +Default token limits +-------------------- +Here we inspect the default language model and its parameters, including the token limits: + +.. code-block:: python + + from edsl import Model + + model = Model() + model + + +This will show the following information: + +.. code-block:: python + + { + "model": "gpt-4o", + "parameters": { + "temperature": 0.5, + "max_tokens": 1000, + "top_p": 1, + "frequency_penalty": 0, + "presence_penalty": 0, + "logprobs": false, + "top_logprobs": 3 + } + } + + +We can also inspect the RPM and TPM for the model: + +.. code-block:: python + + [model.RPM, model.TPM] + + +This will show the following information: + +.. code-block:: python + + [100, 480000.0] + + + +Modifying token limits +---------------------- + +We can reset the default RPM and TPM and then check the new values. +Note that the new RPM and TPM are automatically offset by 20% of the specified values to ensure that the model does not exceed the token limits: + +.. code-block:: python + + model.set_rate_limits(rpm=10, tpm=10) + + [model.RPM, model.TPM] + + +This will show the following information: + +.. code-block:: python + + [8.0, 8.0] + + +Here we change it again: + +.. code-block:: python + + model = Model() + + model.set_rate_limits(rpm=100, tpm=1000) + + [model.RPM, model.TPM] + + +This will again show the specified values have been reset with a 20% offset: + +.. code-block:: python + + [80.0, 800.0] + + +Please note that the token limits are subject to the constraints of the model and the API key associated with the model. +Let us know if you have any questions or need further assistance with token limits. + + +Methods for reducing token usage +-------------------------------- + +There are several ways to reduce the tokens required to run a question or survey. + + +Turning off question commments +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Each question type (other than `free_text`) automatically includes a `comment` field that gives the answering model a place to put additional information about its response to a question. +This serves as an outlet for a chatty model to return context about an answer without violating formatting instructions (e.g., a model may want to provide an explanation for a mutiple choice response but the answer to the question must only be one of the answer options). +Question comments can also be useful when used with survey "memory" rules, giving a model an opportunity to simulate a "chain of thought" across multiple survey questions. +(By default, questions are administered asynchronously; a model does not have context of other questions and answers in a survey unless memory rules are applied.) +Comments can also provide insight into non-responsive (`None`) answers: a model may use the comments field to describe a point of confusion about a question. + +Because the question `comment` field requires additional tokens, it can sometimes be cost-effective to exclude the field from question prompts. +This is done by passing a boolean parameter `include_comment = False` when constructing a question. +For example: + +.. code-block:: python + + from edsl import QuestionNumerical, ScenarioList + + q = QuestionNumerical( + question_name = "sum", + question_text = "What is the sum of {{ number_1 }} and {{ number_2 }}?", + include_comment = False + ) + + some_numbers = { + "number_1": [0,1,2,3,4], + "number_2": [5,4,3,2,1] + } + + s = ScenarioList.from_nested_dict(some_numbers) + + results = q.by(s).run() + + results.select("number_1", "number_2", "sum").print(format="rich") + + +Output: + +.. code-block:: text + + ┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┓ + ┃ scenario ┃ scenario ┃ answer ┃ + ┃ .number_1 ┃ .number_2 ┃ .sum ┃ + ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━┩ + │ 0 │ 5 │ 5 │ + ├───────────┼───────────┼────────┤ + │ 1 │ 4 │ 5 │ + ├───────────┼───────────┼────────┤ + │ 2 │ 3 │ 5 │ + ├───────────┼───────────┼────────┤ + │ 3 │ 2 │ 5 │ + ├───────────┼───────────┼────────┤ + │ 4 │ 1 │ 5 │ + └───────────┴───────────┴────────┘ \ No newline at end of file