diff --git a/CHANGELOG.md b/CHANGELOG.md index b700a4cf..3f2e282d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,9 +21,9 @@ - `ScenarioList` method `give_valid_names()` allows you to automatically generate valid Pythonic identifiers for scenario keys. -- `ScenarioList` method `group_by()` allows you to group scenarios by specified identifies and apply a function to the values of the specified variables. +- `ScenarioList` method `group_by()` allows you to group scenarios by specified identities and apply a function to the values of the specified variables. -- `ScenarioList` method `from_wikipedia_table()` allows you to convert a Wikipedia table into a scenario list. Example usage: https://www.expectedparrot.com/content/247589dd-ad1e-45f4-9c82-e71dbeac8c96 (Notebook: *Using an LLM to Augment Existing Tabular Data*) +- `ScenarioList` method `from_wikipedia_table()` allows you to convert a Wikipedia table into a scenario list. Example usage: https://docs.expectedparrot.com/en/latest/notebooks/scenario_list_wikipedia.html - `ScenarioList` method `to_docx()` allows you to export scenario lists as structured Docx documents. @@ -35,7 +35,7 @@ - `Results` methods `generate_html` and `save_html` can be called to generate and save HTML code for displaying results. -- Ability to run a `Model` with a boolean parameter `raise_validation_errors = False` or `raise_validation_errors = True`. If False, exceptions will only be raised (interrupting survey execution) when the model returns nothing at all. +- Ability to run a `Model` with a boolean parameter `raise_validation_errors = False` or `raise_validation_errors = True`. If False, exceptions will only be raised (interrupting survey execution) when the model returns nothing at all. Another optional parameter `print_exceptions = False` can be passed to not print exceptions at all. ### Changed - Improvements to exceptions reports. diff --git a/docs/conf.py b/docs/conf.py index a75af2b2..a9479cb8 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -82,4 +82,6 @@ def setup(app): "github_user": "", "github_repo": "", "github_version": "", -} \ No newline at end of file +} + +nbsphinx_allow_errors = True \ No newline at end of file diff --git a/docs/exceptions.rst b/docs/exceptions.rst index 1ab41869..c7e8c252 100644 --- a/docs/exceptions.rst +++ b/docs/exceptions.rst @@ -4,67 +4,7 @@ Exceptions & Debugging ====================== An exception is an error that occurs during the execution of a question or survey. -When an exception is raised, EDSL will display a message about the error that includes a link to a report with more details. - -Example -------- - -Here's an example of a poorly written question that is likely to raise an exception: - -.. code-block:: python - - from edsl.questions import QuestionMultipleChoice - - q = QuestionMultipleChoice( - question_name = "bad_instruction", - question_text = "What is your favorite color?", - question_options = ["breakfast", "lunch", "dinner"] # Non-sensical options for the question - ) - - results = q.run() - - -The above code will likely raise a `QuestionAnswerValidationError` exception because the question options are not related to the question text. -Output: - -.. code-block:: text - - Attempt 1 failed with exception:Answer code must be a string, a bytes-like object or a real number (got Invalid). now waiting 1.00 seconds before retrying.Parameters: start=1.0, max=60.0, max_attempts=5. - - - Attempt 2 failed with exception:Answer code must be a string, a bytes-like object or a real number (got The question asks for a favorite color, but the options provided are meal times, not colors. Therefore, I cannot select an option that accurately reflects a favorite color.). now waiting 2.00 seconds before retrying.Parameters: start=1.0, max=60.0, max_attempts=5. - - - Attempt 3 failed with exception:Answer code must be a string, a bytes-like object or a real number (got The question does not match the provided options as they pertain to meals, not colors.). now waiting 4.00 seconds before retrying.Parameters: start=1.0, max=60.0, max_attempts=5. - - - Attempt 4 failed with exception:Answer code must be a string, a bytes-like object or a real number (got This is an invalid question since colors are not listed as options. The options provided are meals, not colors.). now waiting 8.00 seconds before retrying.Parameters: start=1.0, max=60.0, max_attempts=5. - - - Exceptions were raised in 1 out of 1 interviews. - - Open report to see details. - - -Exceptions report ------------------ - -The exceptions report can be accessed by clicking on the link provided in the exceptions message. -It contains details on the exceptions that were raised: - -.. image:: /static/exceptions_message.png - :width: 800 - :align: center - - -Performance plot -^^^^^^^^^^^^^^^^ - -The report includes a Performance Plot with graphical details about the API calls that were made (started, failed, in progress, canceled, etc.; scroll to the end of the report to view it): - -.. image:: /static/exceptions_performance_plot.png - :width: 800 - :align: center +When an exception is raised, EDSL will display a message about the error and an interactive report with more details in a new browser tab. Help debugging @@ -77,16 +17,12 @@ You can use the following code to generate a link to your notebook: .. code-block:: python - from edsl import Coop, notebook - - coop = Coop() - - notebook = Notebook(path="path/to/your/notebook.ipynb") + from edsl import notebook - coop.create(notebook, description="Notebook with code that raises an exception", visibility="private") + n = Notebook(path="path/to/your/notebook.ipynb") + n.push(description="Notebook with code that raises an exception", visibility="private") -A notebook showing the above example question and exception message is available at the Coop: https://www.expectedparrot.com/content/f6a19c77-3f57-4900-b0c9-436058a2ad27 Common exceptions @@ -113,14 +49,6 @@ The default settings (which can be modified) are as follows: MAX_QUESTION_LENGTH = 100000 -JSON errors -^^^^^^^^^^^ - -Some exceptions may indicate that the response from the language model is not properly formatted JSON. -This can be caused by a problem with the inference provider or the way that the question has been constructed (e.g., the model is not capable of following the question prompts as written). -A useful starting point for debugging these exceptions is to check the `Settings` class for the `Questions` model (see *Answer validation errors* above) and try variations in the question prompts and types (e.g., does `QuestionFreeText` produce an answer to the same question formatted as a different question type). - - Missing API key ^^^^^^^^^^^^^^^ diff --git a/docs/index.rst b/docs/index.rst index 5181a847..5958e431 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -66,7 +66,7 @@ Working with Results - :ref:`results`: Access built-in methods for analyzing and utilizing survey results as datasets. - :ref:`caching`: Learn about caching and sharing results. - :ref:`exceptions`: Identify and handle exceptions in your survey design. -- :ref:`token_limits`: Manage token limits for language models. +- :ref:`token_usage`: Manage token limits for language models, and monitor and reduce token usage as desired. Coop ---- @@ -147,7 +147,7 @@ Information about additional functionality for developers. results data exceptions - token_limits + token_usage .. toctree:: :maxdepth: 2 @@ -171,6 +171,7 @@ Information about additional functionality for developers. :caption: How-to Guides :hidden: + notebooks/edsl_intro.ipynb notebooks/data_labeling_example.ipynb notebooks/image_scenario_example.ipynb notebooks/question_loop_scenario.ipynb @@ -190,6 +191,7 @@ Information about additional functionality for developers. :caption: Notebooks :hidden: + notebooks/next_token_probs.ipynb notebooks/scenariolist_unpivot.ipynb notebooks/nps_survey.ipynb notebooks/agentifying_responses.ipynb diff --git a/docs/notebooks/edsl_intro.ipynb b/docs/notebooks/edsl_intro.ipynb new file mode 100644 index 00000000..4ff497ac --- /dev/null +++ b/docs/notebooks/edsl_intro.ipynb @@ -0,0 +1,1019 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e3d6e645-2d59-42f5-9b01-252abff36f4c", + "metadata": {}, + "source": [ + "# Intro to EDSL\n", + "This notebook provides example code for base components of [EDSL, an open-source libary](https://github.com/expectedparrot/edsl) for simulating surveys, experiments and other research with AI agents and large language models. Details on the code below are provided in accompanying [slides: How to use EDSL](https://docs.google.com/presentation/d/10GxXhzu_TD09vN0gJhfne0Zum-GF5R-ppzTXb5IUKlU/edit?usp=sharing).\n", + "\n", + "## Technical setup\n", + "Before running the code below, please ensure that you have [installed the EDSL library](https://docs.expectedparrot.com/en/latest/installation.html) and either [activated remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) from your [Coop account](https://docs.expectedparrot.com/en/latest/coop.html) or [stored API keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for the language models that you want to use with EDSL. \n", + "\n", + "## Documentation\n", + "Please also see our [documentation page](https://docs.expectedparrot.com/) for tips, tutorials and more demo notebooks on using EDSL." + ] + }, + { + "cell_type": "markdown", + "id": "943c4147-7ea8-4953-9c8c-07f3c12d4726", + "metadata": {}, + "source": [ + "## Simple example\n", + "We start by [selecting a question type](https://docs.expectedparrot.com/en/latest/questions.html) and constructing a question in the relevant template:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "6179718e-0add-4c41-b690-3eb81ce6e3ca", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionMultipleChoice\n", + "\n", + "q = QuestionMultipleChoice(\n", + " question_name = \"marvel_movies\",\n", + " question_text = \"Do you enjoy Marvel movies?\",\n", + " question_options = [\"Yes\", \"No\", \"I do not know\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f988b55f-1569-445a-87bc-0d1602b4ba14", + "metadata": {}, + "source": [ + "We administer a question by calling the `run()` method. \n", + "This generates a dataset of `Results` including the model's response to the question:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "04a6ce5d-d1a9-48d6-862f-a818c0e3486c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━┓\n", + "┃ answer ┃\n", + "┃ .marvel_movies ┃\n", + "┡━━━━━━━━━━━━━━━━┩\n", + "│ I do not know │\n", + "└────────────────┘\n", + "\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.marvel_movies\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mI do not know \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results = q.run()\n", + "\n", + "results.select(\"marvel_movies\").print(format=\"rich\")" + ] + }, + { + "cell_type": "markdown", + "id": "b85c5d99-06e3-4ce9-a266-3448c58fb77e", + "metadata": {}, + "source": [ + "## Designing AI agents\n", + "We can [create personas for agents](https://docs.expectedparrot.com/en/latest/agents.html) to answer the question:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "fb3bae2b-2aa0-4cdd-9acb-efe89bb409be", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import AgentList, Agent\n", + "\n", + "personas = [\"comic book collector\", \"movie critic\"]\n", + "\n", + "a = AgentList(\n", + " Agent(traits = {\"persona\": p}) for p in personas\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "aafdf590-0996-478c-b568-6dba2f45c3a3", + "metadata": {}, + "source": [ + "## Selecting language models\n", + "We can [select language models](https://docs.expectedparrot.com/en/latest/language_models.html) to generate the responses (in the example above we did not specify a model, so GPT 4 preview was used by default):" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "741e202a-7a90-4bc7-891d-11dbe489da9b", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import ModelList, Model\n", + "\n", + "models = [\"gpt-4o\", \"claude-3-5-sonnet-20240620\"]\n", + "\n", + "m = ModelList(\n", + " Model(m) for m in [\"gpt-4o\", \"claude-3-5-sonnet-20240620\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ae58cb3f-6088-4070-b85a-736dbca5cb31", + "metadata": {}, + "source": [ + "## Generating results\n", + "We add agents and models to a question when running it:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "a3ccf07e-b4f9-4b85-9618-4358e874c35c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓\n", + "┃ model ┃ agent ┃ answer ┃\n", + "┃ .model ┃ .persona ┃ .marvel_movies ┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩\n", + "│ gpt-4o │ comic book collector │ Yes │\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│ claude-3-5-sonnet-20240620 │ comic book collector │ Yes │\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│ gpt-4o │ movie critic │ Yes │\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│ claude-3-5-sonnet-20240620 │ movie critic │ Yes │\n", + "└────────────────────────────┴──────────────────────┴────────────────┘\n", + "\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35magent \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.persona \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.marvel_movies\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcomic book collector\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcomic book collector\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mmovie critic \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼──────────────────────┼────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mmovie critic \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────────┴──────────────────────┴────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results = q.by(a).by(m).run()\n", + "\n", + "results.select(\"model\", \"persona\", \"marvel_movies\").print(format=\"rich\")" + ] + }, + { + "cell_type": "markdown", + "id": "c1817b6e-341b-493c-8156-7fe634e2bc61", + "metadata": {}, + "source": [ + "## Parameterizing questions\n", + "We can use `Scenario` objects to [add data or content to questions](https://docs.expectedparrot.com/en/latest/scenarios.html):" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "72825f4e-31fd-47f2-9bcc-d3e8b88a5196", + "metadata": {}, + "outputs": [], + "source": [ + "q1 = QuestionMultipleChoice(\n", + " question_name = \"politically_motivated\",\n", + " question_text = \"\"\"\n", + " Read the following movie review and determine whether it is politically motivated.\n", + " Movie: {{ title }}\n", + " Review: {{ review }}\n", + " \"\"\",\n", + " question_options = [\"Yes\", \"No\", \"I do not know\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "7feb7f9f-d48b-436f-8b93-b976973e1964", + "metadata": {}, + "source": [ + "EDSL comes with [methods for generating scenarios from many data sources](https://docs.expectedparrot.com/en/latest/scenarios.html), including PDFs, CSVs, docs, images, tables, lists, dicts:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "47d17b5f-4847-4f4b-82dd-b76045204961", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import Scenario\n", + "\n", + "example_review = {\n", + " \"year\": 2014,\n", + " \"title\": \"Captain America: The Winter Soldier\",\n", + " \"review\": \"\"\"\n", + " Part superhero flick, part 70s political thriller. \n", + " It's a bold mix that pays off, delivering a scathing \n", + " critique of surveillance states wrapped in spandex \n", + " and shield-throwing action. \n", + " \"\"\"\n", + "}\n", + "\n", + "s = Scenario.from_dict(example_review)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "3cf68fa7-f276-4008-9201-27f8269a651c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃ model ┃ scenario ┃ scenario ┃ answer ┃\n", + "┃ .model ┃ .year ┃ .title ┃ .politically_motivated ┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│ claude-3-5-sonnet-20240620 │ 2014 │ Captain America: The Winter Soldier │ No │\n", + "├────────────────────────────┼──────────┼─────────────────────────────────────┼────────────────────────┤\n", + "│ gpt-4o │ 2014 │ Captain America: The Winter Soldier │ Yes │\n", + "└────────────────────────────┴──────────┴─────────────────────────────────────┴────────────────────────┘\n", + "\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mscenario\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mscenario \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.year \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.title \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.politically_motivated\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m2014 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mCaptain America: The Winter Soldier\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNo \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼──────────┼─────────────────────────────────────┼────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m2014 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mCaptain America: The Winter Soldier\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────────┴──────────┴─────────────────────────────────────┴────────────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results = q1.by(s).by(a).by(m).run()\n", + "\n", + "(\n", + " results.filter(\"persona == 'movie critic'\")\n", + " .sort_by(\"model\")\n", + " .select(\"model\", \"year\", \"title\", \"politically_motivated\")\n", + " .print(format=\"rich\")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "2661c963-05ca-4e8c-ad76-360ed72b5680", + "metadata": {}, + "source": [ + "## Comments\n", + "Questions automatically include a \"comment\" field.\n", + "This can be useful for understanding the context of a response, or debugging a non-response." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "7ed7720d-5c2e-4437-b7ae-3b881620bbaa", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃ model ┃ answer ┃ comment ┃\n", + "┃ .model ┃ .politically_motivated ┃ .politically_motivated_comment ┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│ claude-3-5-sonnet-20240620 │ No │ This review appears to be a straightforward critique of │\n", + "│ │ │ the film's genre-blending and themes, without any overt │\n", + "│ │ │ political agenda or bias influencing the assessment. │\n", + "├────────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────┤\n", + "│ gpt-4o │ Yes │ The review mentions a \"scathing critique of surveillance │\n", + "│ │ │ states,\" which indicates that the film's themes and the │\n", + "│ │ │ review itself have political undertones. │\n", + "└────────────────────────────┴────────────────────────┴───────────────────────────────────────────────────────────┘\n", + "\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mcomment \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.politically_motivated\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.politically_motivated_comment \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNo \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mThis review appears to be a straightforward critique of \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mthe film's genre-blending and themes, without any overt \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mpolitical agenda or bias influencing the assessment. \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mThe review mentions a \"scathing critique of surveillance \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mstates,\" which indicates that the film's themes and the \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mreview itself have political undertones. \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────────┴────────────────────────┴───────────────────────────────────────────────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "(\n", + " results.filter(\"persona == 'movie critic'\")\n", + " .sort_by(\"model\")\n", + " .select(\"model\", \"politically_motivated\", \"politically_motivated_comment\")\n", + " .print(format=\"rich\")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "27e864b1-1651-4f95-bb94-73d4699192b8", + "metadata": {}, + "source": [ + "## Combining questions in a survey\n", + "We can [combine questions in a `Survey`](https://docs.expectedparrot.com/en/latest/surveys.html) to administer them together.\n", + "Here we create some variations on the above question to compare responses:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "c50bc0cd-e18e-4985-9e0e-5d92302356d0", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionYesNo\n", + "\n", + "q2 = QuestionYesNo(\n", + " question_name = \"yn\",\n", + " question_text = \"\"\"\n", + " Read the following movie review and determine whether it is politically motivated.\n", + " Movie: {{ title }}\n", + " Review: {{ review }}\n", + " \"\"\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "42c7f350-699a-47a6-a216-09d6db15af8f", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionLinearScale\n", + "\n", + "q3 = QuestionLinearScale(\n", + " question_name = \"ls\",\n", + " question_text = \"\"\"\n", + " Read the following movie review and indicate whether it is politically motivated.\n", + " Movie: {{ title }}\n", + " Review: {{ review }}\n", + " \"\"\",\n", + " question_options = [0,1,2,3,4,5],\n", + " option_labels = {0:\"Not at all\", 5:\"Very much\"}\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "7afa788b-d4fe-4cf3-abc8-0554707b6ca8", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionList\n", + "\n", + "q4 = QuestionList(\n", + " question_name = \"favorites\",\n", + " question_text = \"List your favorite Marvel movies.\",\n", + " max_list_items = 3\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9415c681-f1d6-423f-bb23-d6cacff8f2e5", + "metadata": {}, + "source": [ + "## Survey rules & logic\n", + "We can [add skip/stop and other rules](https://docs.expectedparrot.com/en/latest/surveys.html), and \"memory\" of other questions in a survey:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "6cf2afdf-0d13-4a38-8620-72fc302a92ad", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import Survey\n", + "\n", + "survey = Survey(questions = [q2, q3, q4])\n", + "\n", + "survey = survey.add_stop_rule(q3, \"ls < 3\")" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "9c8cf39f-db9a-4cf0-bc62-6d04ab51a6aa", + "metadata": {}, + "outputs": [], + "source": [ + "results = survey.by(s).by(a).by(m).run()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "367f8bc5-0936-4fc1-8416-b8b4eafae804", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃ model.model ┃ agent.persona ┃ Yes/No version ┃ Linear scale version ┃ Favorites ┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│ claude-3-5-sonnet-202… │ comic book collector │ No │ 2 │ None │\n", + "├────────────────────────┼──────────────────────┼────────────────┼──────────────────────┼─────────────────────────┤\n", + "│ gpt-4o │ comic book collector │ Yes │ 3 │ ['The Avengers', │\n", + "│ │ │ │ │ 'Guardians of the │\n", + "│ │ │ │ │ Galaxy', 'Spider-Man: │\n", + "│ │ │ │ │ Into the Spider-Verse'] │\n", + "└────────────────────────┴──────────────────────┴────────────────┴──────────────────────┴─────────────────────────┘\n", + "\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35magent.persona \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mYes/No version\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mLinear scale version\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mFavorites \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-202…\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcomic book collector\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNo \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m2 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNone \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────┼──────────────────────┼────────────────┼──────────────────────┼─────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcomic book collector\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mYes \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m3 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['The Avengers', \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m'Guardians of the \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mGalaxy', 'Spider-Man: \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mInto the Spider-Verse']\u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────┴──────────────────────┴────────────────┴──────────────────────┴─────────────────────────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "(\n", + " results.filter(\"persona == 'comic book collector'\")\n", + " .select(\"model\", \"persona\", \"yn\", \"ls\", \"favorites\")\n", + " .print(pretty_labels = {\n", + " \"answer.yn\": \"Yes/No version\",\n", + " \"answer.ls\": \"Linear scale version\",\n", + " \"answer.favorites\": \"Favorites\"\n", + " }, format=\"rich\")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "4c5d032d-87df-40d4-9e01-d7e86bf33010", + "metadata": {}, + "source": [ + "## Working with results as datasets\n", + "EDSL provides [built-in methods for analyzing results](https://docs.expectedparrot.com/en/latest/results.html), e.g., as SQL tables, dataframes:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "261e72cc-1621-4370-bf39-e6163dd9b192", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " | model | \n", + "persona | \n", + "yn | \n", + "ls | \n", + "favorites | \n", + "
---|---|---|---|---|---|
0 | \n", + "claude-3-5-sonnet-20240620 | \n", + "comic book collector | \n", + "No | \n", + "2 | \n", + "None | \n", + "
1 | \n", + "claude-3-5-sonnet-20240620 | \n", + "movie critic | \n", + "No | \n", + "2 | \n", + "None | \n", + "
2 | \n", + "gpt-4o | \n", + "comic book collector | \n", + "Yes | \n", + "3 | \n", + "['The Avengers', 'Guardians of the Galaxy', 'S... | \n", + "
3 | \n", + "gpt-4o | \n", + "movie critic | \n", + "Yes | \n", + "3 | \n", + "['Iron Man', 'Black Panther', 'Avengers: Endga... | \n", + "
\n", + " | answer.ls | \n", + "answer.yn | \n", + "answer.favorites | \n", + "scenario.year | \n", + "scenario.review | \n", + "scenario.title | \n", + "agent.persona | \n", + "agent.agent_instruction | \n", + "agent.agent_name | \n", + "model.temperature | \n", + "... | \n", + "question_options.favorites_question_options | \n", + "question_type.favorites_question_type | \n", + "question_type.ls_question_type | \n", + "question_type.yn_question_type | \n", + "comment.ls_comment | \n", + "comment.yn_comment | \n", + "comment.favorites_comment | \n", + "generated_tokens.yn_generated_tokens | \n", + "generated_tokens.favorites_generated_tokens | \n", + "generated_tokens.ls_generated_tokens | \n", + "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", + "2 | \n", + "No | \n", + "NaN | \n", + "2014 | \n", + "\\n Part superhero flick, part 70s political... | \n", + "Captain America: The Winter Soldier | \n", + "comic book collector | \n", + "You are answering questions as if you were a h... | \n", + "Agent_1 | \n", + "0.5 | \n", + "... | \n", + "NaN | \n", + "list | \n", + "linear_scale | \n", + "yes_no | \n", + "As a comic book collector, I don't see this re... | \n", + "Comment: As a comic book collector, I don't se... | \n", + "Task was cancelled. | \n", + "No\\n\\nComment: As a comic book collector, I do... | \n", + "NaN | \n", + "2\\n\\nAs a comic book collector, I don't see th... | \n", + "
1 | \n", + "2 | \n", + "No | \n", + "NaN | \n", + "2014 | \n", + "\\n Part superhero flick, part 70s political... | \n", + "Captain America: The Winter Soldier | \n", + "movie critic | \n", + "You are answering questions as if you were a h... | \n", + "Agent_2 | \n", + "0.5 | \n", + "... | \n", + "NaN | \n", + "list | \n", + "linear_scale | \n", + "yes_no | \n", + "While the review mentions political themes lik... | \n", + "Comment: This review does not appear to be pol... | \n", + "Task was cancelled. | \n", + "No\\n\\nComment: This review does not appear to ... | \n", + "NaN | \n", + "2\\n\\nWhile the review mentions political theme... | \n", + "
2 | \n", + "3 | \n", + "Yes | \n", + "['The Avengers', 'Guardians of the Galaxy', 'S... | \n", + "2014 | \n", + "\\n Part superhero flick, part 70s political... | \n", + "Captain America: The Winter Soldier | \n", + "comic book collector | \n", + "You are answering questions as if you were a h... | \n", + "Agent_1 | \n", + "0.5 | \n", + "... | \n", + "NaN | \n", + "list | \n", + "linear_scale | \n", + "yes_no | \n", + "The review mentions the movie's critique of su... | \n", + "The review mentions a \"scathing critique of su... | \n", + "These movies capture the essence of Marvel's s... | \n", + "Yes\\n\\nThe review mentions a \"scathing critiqu... | \n", + "[\"The Avengers\", \"Guardians of the Galaxy\", \"S... | \n", + "3\\n\\nThe review mentions the movie's critique ... | \n", + "
3 | \n", + "3 | \n", + "Yes | \n", + "['Iron Man', 'Black Panther', 'Avengers: Endga... | \n", + "2014 | \n", + "\\n Part superhero flick, part 70s political... | \n", + "Captain America: The Winter Soldier | \n", + "movie critic | \n", + "You are answering questions as if you were a h... | \n", + "Agent_2 | \n", + "0.5 | \n", + "... | \n", + "NaN | \n", + "list | \n", + "linear_scale | \n", + "yes_no | \n", + "The review highlights a \"scathing critique of ... | \n", + "The review mentions that the movie delivers \"a... | \n", + "These films stand out for their groundbreaking... | \n", + "Yes\\n\\nThe review mentions that the movie deli... | \n", + "[\"Iron Man\", \"Black Panther\", \"Avengers: Endga... | \n", + "3\\n\\nThe review highlights a \"scathing critiqu... | \n", + "
4 rows × 48 columns
\n", + "" + ], + "text/plain": [ + "Aspirational wealth...doing better than you parents...an "Opportunity Economy!"
— Dan Alpert (@DanielAlpert) September 10, 2024
NO!
All are late 20th century neoliberal tropes.
Americans today seek financial security.
Decent jobs and government policy that will pay for the needs of life and old age.
Understand that Democrats! pic.twitter.com/eR3hbx4wbX
\"\"\")" + ] + }, + { + "cell_type": "markdown", + "id": "517beb9d-6330-4905-bfcf-d9d010eedab7", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Simulating survey responses\n", + "In the steps below we demonstrate how to use EDSL to simulate responses to the above question: \n", + "\n", + "#### *\"Which of the following is more important to you: Financial stability / Moving up the income ladder\"* " + ] + }, + { + "cell_type": "markdown", + "id": "cd892e03-0a1d-424a-af23-9e2dc82508f5", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Creating questions\n", + "We start by selecting a question type and constructing a question in the relevant template.\n", + "[EDSL comes with many common question types](https://docs.expectedparrot.com/en/latest/questions.html) that we can choose from based on the desired form of the response:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "8c2e8416-32ea-4ce5-94c2-a8af7022d1c1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from edsl import QuestionMultipleChoice" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "0e182ca7-46ac-4661-8182-160cb09f31b4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "q = QuestionMultipleChoice(\n", + " question_name = \"income_pref\",\n", + " question_text = \"Which of the following is more important to you: \",\n", + " question_options = [\"Financial stability\", \"Moving up the income ladder\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ab0b5222-af12-438c-ade2-4fd48f81a4e3", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Designing AI agents\n", + "We can design AI agents with relevant `traits` to answer the question:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "16a064c1-fe04-4797-b958-39c64647db9b", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from edsl import Agent" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4d67f43b-088c-4672-a939-bbacea52adb3", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "a = Agent(traits = {\"persona\": \"You are an American answering a poll from Pew.\"})" + ] + }, + { + "cell_type": "markdown", + "id": "1263a421-6749-410b-9b3e-a1c60db6f216", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Selecting language models\n", + "[EDSL works with many popular models](https://docs.expectedparrot.com/en/latest/language_models.html) that we can use to generate responses:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "a03320e2-dba7-4ab6-80a5-8f8fb1f5f527", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from edsl import Model" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "62bc5526-b035-4599-943b-2c037a6a9a38", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "m = Model(\"gpt-4o\", temperature = 1)" + ] + }, + { + "cell_type": "markdown", + "id": "65faeaff-483a-4f6a-bc0f-19e0eeae9908", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Running a survey\n", + "We administer the question by adding the agent and model and then running it.\n", + "We can specify the number of times to administer the question:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1abbcee9-721f-4397-9689-7684db5a2472", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "results = q.by(a).by(m).run(n = 20)" + ] + }, + { + "cell_type": "markdown", + "id": "7e8021e1-a7cf-4965-b9d9-3483707aeb5d", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "EDSL comes with [built-in methods for analyzing the dataset of `Results`](https://docs.expectedparrot.com/en/latest/results.html) that is generated:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "9479e0b6-9088-442f-8695-6f0abf5c77e6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "Aspirational wealth...doing better than you parents...an "Opportunity Economy!"
— Dan Alpert (@DanielAlpert) September 10, 2024
NO!
All are late 20th century neoliberal tropes.
Americans today seek financial security.
Decent jobs and government policy that will pay for the needs of life and old age.
Understand that Democrats! pic.twitter.com/eR3hbx4wbX
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃ value ┃ count ┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│ Financial stability │ 20 │\n", + "└─────────────────────┴───────┘\n", + "\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mvalue \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mcount\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mFinancial stability\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m20 \u001b[0m\u001b[2m \u001b[0m│\n", + "└─────────────────────┴───────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results.select(\"income_pref\").tally().print(format=\"rich\")" + ] + }, + { + "cell_type": "markdown", + "id": "bb3b1662-9a7b-4d77-93ab-7968ee1071d1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Calculating token probabilities\n", + "In the above example we specified ***n = 20*** to run the question (with the agent and model) 20 times.\n", + "\n", + "We can also get the probabilities from the model by passing ***logprobs = True*** to the `Model`.\n", + "\n", + "To simplify the token probabilities calculation, we can also specify ***use_code = True*** in the `Question` parameters. \n", + "This will cause the question to be presented to the model with coded options: 0 for \"Financial stability\" and 1 for \"Moving up the income ladder\":" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "849bd67f-52a0-4b3c-9e3b-0f86cfd676a4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "m = Model(\"gpt-4o\", temperature = 1, logprobs = True)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "d01b21e4-5e5b-4e9e-b72b-d6df1d90ac98", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "q = QuestionMultipleChoice(\n", + " question_name = \"income_pref\", \n", + " question_text = \"Which of the following is more important to you: \", \n", + " question_options = [\"Financial stability\", \"Moving up the income ladder\"], \n", + " use_code = True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "760adcd3-df98-4ab9-a3c3-196148a49763", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "new_results = q.by(a).by(m).run(n = 20)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "dc213bbb-f962-43d7-bfce-b8dc759308fe", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃ value ┃ count ┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│ Financial stability │ 20 │\n", + "└─────────────────────┴───────┘\n", + "\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mvalue \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mcount\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mFinancial stability\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m20 \u001b[0m\u001b[2m \u001b[0m│\n", + "└─────────────────────┴───────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "new_results.select(\"income_pref\").tally().print(format = \"rich\")" + ] + }, + { + "cell_type": "markdown", + "id": "b04e14f5-567e-4f73-8f07-587e28ae39fb", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Inspecting results\n", + "The `Results` include information about all the inputs and outputs relating to the question and response. \n", + "\n", + "To see a list of all the components that can be accessed and analyzed: " + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "64369291-3993-44f3-8011-6e7e3a039dd1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['agent.agent_instruction',\n", + " 'agent.agent_name',\n", + " 'agent.persona',\n", + " 'answer.income_pref',\n", + " 'comment.income_pref_comment',\n", + " 'generated_tokens.income_pref_generated_tokens',\n", + " 'iteration.iteration',\n", + " 'model.frequency_penalty',\n", + " 'model.logprobs',\n", + " 'model.max_tokens',\n", + " 'model.model',\n", + " 'model.presence_penalty',\n", + " 'model.temperature',\n", + " 'model.top_logprobs',\n", + " 'model.top_p',\n", + " 'prompt.income_pref_system_prompt',\n", + " 'prompt.income_pref_user_prompt',\n", + " 'question_options.income_pref_question_options',\n", + " 'question_text.income_pref_question_text',\n", + " 'question_type.income_pref_question_type',\n", + " 'raw_model_response.income_pref_cost',\n", + " 'raw_model_response.income_pref_one_usd_buys',\n", + " 'raw_model_response.income_pref_raw_model_response']" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.columns" + ] + }, + { + "cell_type": "markdown", + "id": "f098538f-d02b-4335-bcef-d26e7c6a57a6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "We can inspect the `raw_model_response.income_pref_raw_model_response` component to identify next token probabilities:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "7f24658d-25bc-47e3-9348-9415788e6d3d", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "example = new_results.select(\"raw_model_response.income_pref_raw_model_response\").to_list()[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "7aeda54b-3569-4ee0-9689-f1df41dd0559", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'token': '0', 'bytes': [48], 'logprob': -0.00018506382},\n", + " {'token': '1', 'bytes': [49], 'logprob': -8.750185},\n", + " {'token': '\\n', 'bytes': [10], 'logprob': -11.625185}]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next_token_probs = example['choices'][0]['logprobs']['content'][0]['top_logprobs']\n", + "next_token_probs" + ] + }, + { + "cell_type": "markdown", + "id": "3b056519-b81b-4482-a844-a274a676ab0c", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Translating the information" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "cf84aecd-60e7-4b9b-88fe-fb9ecfedd592", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Probability of selecting 'Financial stability' was 1.000\n", + "Probability of selecting 'Moving up the income ladder' was 0.000\n", + "Probability of selecting 'Skipped' was 0.000\n" + ] + } + ], + "source": [ + "import math\n", + "\n", + "# Specifying the codes for the answer options and non-responses:\n", + "options = {'0': \"Financial stability\", '1':\"Moving up the income ladder\", '\\n': \"Skipped\"}\n", + "\n", + "for token_info in next_token_probs:\n", + " option = options[token_info['token']]\n", + " p = math.exp(token_info['logprob'])\n", + " \n", + " print(f\"Probability of selecting '{option}' was {p:.3f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "bacfc8d7-0262-4e98-93b2-c1de1077d70a", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Comparing models\n", + "We can rerun the survey with other available models.\n", + "\n", + "To see a list of all available models:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "1f502108-fb71-4de3-94d7-f60aa7fa15a4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Model.available()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "0ae756f8-c55f-45c3-b959-acf23e568812", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "models = [Model(model_name) for model_name, _, _ in Model.available()]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "94291287-c56e-4a70-91e4-4bc2ff9e6cf6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "153" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(models)" + ] + }, + { + "cell_type": "markdown", + "id": "bc840f66-bac0-405b-a93d-99d876a32e5f", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "We know some models will not be appropriate; we can add `print_exceptions = False` to skip the error report:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "6738fade-55fb-43f3-b4e3-abfdd0ae8f0b", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "results_with_many_models = q.by(a).by(models).run(print_exceptions = False)" + ] + }, + { + "cell_type": "markdown", + "id": "a5950c39-7d68-4e4e-8b1f-5a05a9e90e8b", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Performance\n", + "We can check which models did/not answer the question, and filter out the non-responses:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "258af61a-8d1d-472a-b84b-cfdc8b13eb87", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃ value ┃ count ┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│ Financial stability │ 86 │\n", + "├─────────────────────────────┼───────┤\n", + "│ Moving up the income ladder │ 8 │\n", + "└─────────────────────────────┴───────┘\n", + "\n" + ], + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mvalue \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mcount\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mFinancial stability \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m86 \u001b[0m\u001b[2m \u001b[0m│\n", + "├─────────────────────────────┼───────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mMoving up the income ladder\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m8 \u001b[0m\u001b[2m \u001b[0m│\n", + "└─────────────────────────────┴───────┘\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "(\n", + " results_with_many_models\n", + " .filter('income_pref is not None')\n", + " .select('income_pref')\n", + " .tally()\n", + " .print(format = \"rich\")\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "d3691406-e3d4-472a-ac71-2fa3bfa58421", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + "
model.model | \n", + "
---|
01-ai/Yi-34B-Chat | \n", + "
Austism/chronos-hermes-13b-v2 | \n", + "
Gryphe/MythoMax-L2-13b | \n", + "
Gryphe/MythoMax-L2-13b-turbo | \n", + "
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 | \n", + "
Phind/Phind-CodeLlama-34B-v2 | \n", + "
Qwen/Qwen2-72B-Instruct | \n", + "
Qwen/Qwen2-7B-Instruct | \n", + "
Qwen/Qwen2.5-72B-Instruct | \n", + "
Sao10K/L3-70B-Euryale-v2.1 | \n", + "
Sao10K/L3.1-70B-Euryale-v2.2 | \n", + "
bigcode/starcoder2-15b | \n", + "
bigcode/starcoder2-15b-instruct-v0.1 | \n", + "
chatgpt-4o-latest | \n", + "
claude-3-5-sonnet-20240620 | \n", + "
claude-3-haiku-20240307 | \n", + "
claude-3-opus-20240229 | \n", + "
claude-3-sonnet-20240229 | \n", + "
codellama/CodeLlama-34b-Instruct-hf | \n", + "
codellama/CodeLlama-70b-Instruct-hf | \n", + "
codestral-2405 | \n", + "
codestral-latest | \n", + "
codestral-mamba-2407 | \n", + "
cognitivecomputations/dolphin-2.6-mixtral-8x7b | \n", + "
cognitivecomputations/dolphin-2.9.1-llama-3-70b | \n", + "
databricks/dbrx-instruct | \n", + "
deepinfra/airoboros-70b | \n", + "
gemini-1.0-pro | \n", + "
gemini-1.5-flash | \n", + "
gemini-1.5-pro | \n", + "
gemini-pro | \n", + "
gemma-7b-it | \n", + "
gemma2-9b-it | \n", + "
google/codegemma-7b-it | \n", + "
google/gemma-1.1-7b-it | \n", + "
google/gemma-2-27b-it | \n", + "
google/gemma-2-9b-it | \n", + "
gpt-3.5-turbo-0125 | \n", + "
gpt-3.5-turbo-16k | \n", + "
gpt-4 | \n", + "
gpt-4-0125-preview | \n", + "
gpt-4-0613 | \n", + "
gpt-4-1106-preview | \n", + "
gpt-4-turbo | \n", + "
gpt-4-turbo-2024-04-09 | \n", + "
gpt-4-turbo-preview | \n", + "
gpt-4o | \n", + "
gpt-4o-2024-05-13 | \n", + "
gpt-4o-2024-08-06 | \n", + "
gpt-4o-mini | \n", + "
gpt-4o-mini-2024-07-18 | \n", + "
lizpreciatior/lzlv_70b_fp16_hf | \n", + "
llama-3.1-70b-versatile | \n", + "
llama-3.1-8b-instant | \n", + "
llama3-70b-8192 | \n", + "
llama3-8b-8192 | \n", + "
llama3-groq-70b-8192-tool-use-preview | \n", + "
llama3-groq-8b-8192-tool-use-preview | \n", + "
mattshumer/Reflection-Llama-3.1-70B | \n", + "
meta-llama/Llama-2-13b-chat-hf | \n", + "
meta-llama/Llama-2-70b-chat-hf | \n", + "
meta-llama/Llama-2-7b-chat-hf | \n", + "
meta-llama/Meta-Llama-3-70B-Instruct | \n", + "
meta-llama/Meta-Llama-3-8B-Instruct | \n", + "
meta-llama/Meta-Llama-3.1-405B-Instruct | \n", + "
meta-llama/Meta-Llama-3.1-70B-Instruct | \n", + "
meta-llama/Meta-Llama-3.1-8B-Instruct | \n", + "
microsoft/Phi-3-medium-4k-instruct | \n", + "
mistral-large-2407 | \n", + "
mistral-large-latest | \n", + "
mistral-medium | \n", + "
mistral-medium-2312 | \n", + "
mistral-medium-latest | \n", + "
mistral-small-2402 | \n", + "
mistral-small-2409 | \n", + "
mistral-small-latest | \n", + "
mistral-tiny | \n", + "
mistral-tiny-2312 | \n", + "
mistral-tiny-2407 | \n", + "
mistral-tiny-latest | \n", + "
mistralai/Mistral-Nemo-Instruct-2407 | \n", + "
mistralai/Mixtral-8x22B-v0.1 | \n", + "
mistralai/Mixtral-8x7B-Instruct-v0.1 | \n", + "
nvidia/Nemotron-4-340B-Instruct | \n", + "
open-mistral-7b | \n", + "
open-mistral-nemo | \n", + "
open-mistral-nemo-2407 | \n", + "
open-mixtral-8x22b | \n", + "
open-mixtral-8x22b-2404 | \n", + "
openchat/openchat-3.6-8b | \n", + "
pixtral | \n", + "
pixtral-12b | \n", + "
pixtral-12b-2409 | \n", + "
pixtral-latest | \n", + "