diff --git a/docs/language_models.rst b/docs/language_models.rst index c705a17f..e23d8c8f 100644 --- a/docs/language_models.rst +++ b/docs/language_models.rst @@ -2,12 +2,15 @@ Language Models =============== + Language models are used to generate agent responses to questions and can be specified when running a survey. API keys are required in order to access the available models, and should be stored in your private `.env` file. See the :ref:`api_keys` page for instructions on storing your API keys. + Available services ------------------ + We can see all of the available services (model providers) by calling the `services()` method of the `Model` class: .. code-block:: python @@ -21,11 +24,21 @@ This will return a list of the services we can choose from: .. code-block:: python - ['openai', 'anthropic', 'deep_infra', 'google'] + ['openai', + 'anthropic', + 'deep_infra', + 'google', + 'groq', + 'bedrock', + 'azure', + 'ollama', + 'test', + 'mistral'] Available models ---------------- + We can see all of the available models by calling the `available()` method of the `Model` class: .. code-block:: python @@ -35,77 +48,13 @@ We can see all of the available models by calling the `available()` method of th Model.available() -This will return a list of the models we can choose from: - -.. code-block:: python - - [['01-ai/Yi-34B-Chat', 'deep_infra', 0], - ['Austism/chronos-hermes-13b-v2', 'deep_infra', 1], - ['Gryphe/MythoMax-L2-13b', 'deep_infra', 2], - ['Gryphe/MythoMax-L2-13b-turbo', 'deep_infra', 3], - ['HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1', 'deep_infra', 4], - ['Phind/Phind-CodeLlama-34B-v2', 'deep_infra', 5], - ['Qwen/Qwen2-72B-Instruct', 'deep_infra', 6], - ['Qwen/Qwen2-7B-Instruct', 'deep_infra', 7], - ['bigcode/starcoder2-15b', 'deep_infra', 8], - ['bigcode/starcoder2-15b-instruct-v0.1', 'deep_infra', 9], - ['claude-3-5-sonnet-20240620', 'anthropic', 10], - ['claude-3-haiku-20240307', 'anthropic', 11], - ['claude-3-opus-20240229', 'anthropic', 12], - ['claude-3-sonnet-20240229', 'anthropic', 13], - ['codellama/CodeLlama-34b-Instruct-hf', 'deep_infra', 14], - ['codellama/CodeLlama-70b-Instruct-hf', 'deep_infra', 15], - ['cognitivecomputations/dolphin-2.6-mixtral-8x7b', 'deep_infra', 16], - ['databricks/dbrx-instruct', 'deep_infra', 17], - ['deepinfra/airoboros-70b', 'deep_infra', 18], - ['gemini-pro', 'google', 19], - ['google/codegemma-7b-it', 'deep_infra', 20], - ['google/gemma-1.1-7b-it', 'deep_infra', 21], - ['gpt-3.5-turbo', 'openai', 22], - ['gpt-3.5-turbo-0125', 'openai', 23], - ['gpt-3.5-turbo-0301', 'openai', 24], - ['gpt-3.5-turbo-0613', 'openai', 25], - ['gpt-3.5-turbo-1106', 'openai', 26], - ['gpt-3.5-turbo-16k', 'openai', 27], - ['gpt-3.5-turbo-16k-0613', 'openai', 28], - ['gpt-3.5-turbo-instruct', 'openai', 29], - ['gpt-3.5-turbo-instruct-0914', 'openai', 30], - ['gpt-4', 'openai', 31], - ['gpt-4-0125-preview', 'openai', 32], - ['gpt-4-0613', 'openai', 33], - ['gpt-4-1106-preview', 'openai', 34], - ['gpt-4-1106-vision-preview', 'openai', 35], - ['gpt-4-turbo', 'openai', 36], - ['gpt-4-turbo-2024-04-09', 'openai', 37], - ['gpt-4-turbo-preview', 'openai', 38], - ['gpt-4-vision-preview', 'openai', 39], - ['gpt-4o', 'openai', 40], - ['gpt-4o-2024-05-13', 'openai', 41], - ['lizpreciatior/lzlv_70b_fp16_hf', 'deep_infra', 42], - ['llava-hf/llava-1.5-7b-hf', 'deep_infra', 43], - ['meta-llama/Llama-2-13b-chat-hf', 'deep_infra', 44], - ['meta-llama/Llama-2-70b-chat-hf', 'deep_infra', 45], - ['meta-llama/Llama-2-7b-chat-hf', 'deep_infra', 46], - ['meta-llama/Meta-Llama-3-70B-Instruct', 'deep_infra', 47], - ['meta-llama/Meta-Llama-3-8B-Instruct', 'deep_infra', 48], - ['microsoft/Phi-3-medium-4k-instruct', 'deep_infra', 49], - ['microsoft/WizardLM-2-7B', 'deep_infra', 50], - ['microsoft/WizardLM-2-8x22B', 'deep_infra', 51], - ['mistralai/Mistral-7B-Instruct-v0.1', 'deep_infra', 52], - ['mistralai/Mistral-7B-Instruct-v0.2', 'deep_infra', 53], - ['mistralai/Mistral-7B-Instruct-v0.3', 'deep_infra', 54], - ['mistralai/Mixtral-8x22B-Instruct-v0.1', 'deep_infra', 55], - ['mistralai/Mixtral-8x22B-v0.1', 'deep_infra', 56], - ['mistralai/Mixtral-8x7B-Instruct-v0.1', 'deep_infra', 57], - ['nvidia/Nemotron-4-340B-Instruct', 'deep_infra', 58], - ['openchat/openchat-3.6-8b', 'deep_infra', 59], - ['openchat/openchat_3.5', 'deep_infra', 60]] +This will return a list of the models we can choose from (not shown below--run the code on yor own to see an up-to-date list). Adding a model -------------- + Newly available models for these services are added automatically. -A current list is also viewable at :py:class:`edsl.enums.LanguageModelType`. If you do not see a publicly available model that you want to work with, please send us a feature request to add it or add it yourself by calling the `add_model()` method: .. code-block:: python @@ -114,6 +63,7 @@ If you do not see a publicly available model that you want to work with, please Model.add_model(service_name = "anthropic", model_name = "new_model") + This will add the model `new_model` to the `anthropic` service. You can then see the model in the list of available models, and search by service name: @@ -122,25 +72,17 @@ You can then see the model in the list of available models, and search by servic Model.available("anthropic") -Output: - -.. code-block:: python - - [['claude-3-5-sonnet-20240620', 'anthropic', 10], - ['claude-3-haiku-20240307', 'anthropic', 11], - ['claude-3-opus-20240229', 'anthropic', 12], - ['claude-3-sonnet-20240229', 'anthropic', 13], - ['new_model', 'anthropic', 61]] - Check models ------------ -We can check the models that for which we have already properly stored API keys by calling the `check_models()` method: + +Check the models for which you have already properly stored API keys by calling the `check_models()` method: .. code-block:: python Model.check_models() + This will return a list of the available models and a confirmation message whether a valid key exists. The output will look like this (note that the keys are not shown): @@ -148,12 +90,16 @@ The output will look like this (note that the keys are not shown): Checking all available models... - Now checking: 01-ai/Yi-34B-Chat + Now checking: OK! +Etc. + + Specifying a model ------------------ + We specify a model to use with a survey by creating a `Model` object and passing it the name of an available model. We can optionally set other model parameters as well (temperature, etc.). For example, the following code creates a `Model` object for Claude 3.5 Sonnet with default model parameters: @@ -162,7 +108,7 @@ For example, the following code creates a `Model` object for Claude 3.5 Sonnet w from edsl import Model - model = Model('claude-3-5-sonnet-20240620') + model = Model('gpt-4o') We can see that the object consists of a model name and a dictionary of parameters: @@ -177,7 +123,7 @@ This will show the default parameters of the model: .. code-block:: python { - "model": "claude-3-5-sonnet-20240620", + "model": "gpt-4o", "parameters": { "temperature": 0.5, "max_tokens": 1000, @@ -199,11 +145,9 @@ For example, the following code specifies that a survey be run with each of GPT .. code-block:: python - from edsl import Model - - models = [Model('gpt-4'), Model('gemini-pro')] + from edsl import Model, Survey - from edsl import Survey + models = [Model('gpt-4o'), Model('gemini-pro')] survey = Survey.example() @@ -214,11 +158,9 @@ This code uses `ModelList` instead of a list of `Model` objects: .. code-block:: python - from edsl import Model, ModelList + from edsl import Model, ModelList, Survey - models = ModelList([Model('gpt-4'), Model('gemini-pro')]) - - from edsl import Survey + models = ModelList(Model(m) for m in ['gpt-4o', 'gemini-pro']) survey = Survey.example() @@ -240,19 +182,22 @@ The following commands are equivalent: Default model ------------- -If no model is specified, a survey is automatically run with the default model (GPT 4) (if an API key for OpenAI has been stored). -For example, the following code runs a survey with the default model (and no agents or scenarios) without needing to import the `Model` class: + +If no model is specified, a survey is automatically run with the default model. +Run `Model()` to check the current default model. +For example, the following code runs the example survey with the default model (and no agents or scenarios) without needing to import the `Model` class: .. code-block:: python from edsl import Survey - results = survey.run() + results = Survey.example().run() Inspecting model details in results ----------------------------------- -After running a survey, we can inspect the models used by calling the `models` method on the result object. + +If a survey has been run, we can inspect the models that were used by calling the `models` method on the `Results` object. For example, we can verify the default model when running a survey without specifying a model: .. code-block:: python @@ -266,71 +211,45 @@ For example, we can verify the default model when running a survey without speci results.models -This will return the following information about the default model that was used: +This will return the following information about the default model that was used (note the default model may have changed since this page was last updated): -.. code-block:: python +.. code-block:: text + + [Model(model_name = 'gpt-4o', temperature = 0.5, max_tokens = 1000, top_p = 1, frequency_penalty = 0, presence_penalty = 0, logprobs = False, top_logprobs = 3)] - { - "model": "gpt-4-1106-preview", - "parameters": { - "temperature": 0.5, - "max_tokens": 1000, - "top_p": 1, - "frequency_penalty": 0, - "presence_penalty": 0, - "logprobs": false, - "top_logprobs": 3 - } - } To learn more about all the components of a `Results` object, please see the :ref:`results` section. Printing model attributes ------------------------- + If multiple models were used to generate results, we can print the attributes in a table. For example, the following code prints a table of the model names and temperatures for some results: .. code-block:: python - from edsl import Model - - models = [Model('gpt-4-1106-preview'), Model('llama-2-70b-chat-hf')] + from edsl import Survey, ModelList, Model - from edsl.questions import QuestionMultipleChoice, QuestionFreeText - - q1 = QuestionMultipleChoice( - question_name = "favorite_day", - question_text = "What is your favorite day of the week?", - question_options = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"] - ) - - q2 = QuestionFreeText( - question_name = "favorite_color", - question_text = "What is your favorite color?" + models = ModelList( + Model(m) for m in ['gpt-4o', 'gemini-1.5-pro'] ) - from edsl import Survey - - survey = Survey([q1, q2]) + survey = Survey.example() results = survey.by(models).run() - results.select("model.model", "model.temperature").print() + results.select("model", "temperature").print() # This is equivalent to: results.select("model.model", "model.temperature").print() + +Output: -The table will look like this: +.. code-block:: text -.. list-table:: - :widths: 10 10 - :header-rows: 1 + model.model model.temperature + gpt-4o 0.5 + gemini-1.5-pro 0.5 - * - model.model - - model.temperature - * - gpt-4-1106-preview - - 0.5 - * - llama-2-70b-chat-hf - - 0.5 We can also print model attributes together with other components of results. We can see a list of all components by calling the `columns` method on the results: @@ -339,57 +258,83 @@ We can see a list of all components by calling the `columns` method on the resul results.columns -For the above example, this will display the following list of components (note that no agents were specified, so there are no agent fields listed other than the default `agent_name` that is generated when a job is run): + +Output: .. code-block:: python - ['agent.agent_name', - 'answer.favorite_color', - 'answer.favorite_day', - 'answer.favorite_day_comment', - 'iteration.iteration', - 'model.frequency_penalty', - 'model.logprobs', - 'model.max_new_tokens', - 'model.max_tokens', - 'model.model', - 'model.presence_penalty', - 'model.stopSequences', - 'model.temperature', - 'model.top_k', - 'model.top_logprobs', - 'model.top_p', - 'prompt.favorite_color_system_prompt', - 'prompt.favorite_color_user_prompt', - 'prompt.favorite_day_system_prompt', - 'prompt.favorite_day_user_prompt', - 'raw_model_response.favorite_color_raw_model_response', - 'raw_model_response.favorite_day_raw_model_response'] + ['agent.agent_instruction', + 'agent.agent_name', + 'answer.q0', + 'answer.q1', + 'answer.q2', + 'comment.q0_comment', + 'comment.q1_comment', + 'comment.q2_comment', + 'generated_tokens.q0_generated_tokens', + 'generated_tokens.q1_generated_tokens', + 'generated_tokens.q2_generated_tokens', + 'iteration.iteration', + 'model.frequency_penalty', + 'model.logprobs', + 'model.maxOutputTokens', + 'model.max_tokens', + 'model.model', + 'model.presence_penalty', + 'model.stopSequences', + 'model.temperature', + 'model.topK', + 'model.topP', + 'model.top_logprobs', + 'model.top_p', + 'prompt.q0_system_prompt', + 'prompt.q0_user_prompt', + 'prompt.q1_system_prompt', + 'prompt.q1_user_prompt', + 'prompt.q2_system_prompt', + 'prompt.q2_user_prompt', + 'question_options.q0_question_options', + 'question_options.q1_question_options', + 'question_options.q2_question_options', + 'question_text.q0_question_text', + 'question_text.q1_question_text', + 'question_text.q2_question_text', + 'question_type.q0_question_type', + 'question_type.q1_question_type', + 'question_type.q2_question_type', + 'raw_model_response.q0_cost', + 'raw_model_response.q0_one_usd_buys', + 'raw_model_response.q0_raw_model_response', + 'raw_model_response.q1_cost', + 'raw_model_response.q1_one_usd_buys', + 'raw_model_response.q1_raw_model_response', + 'raw_model_response.q2_cost', + 'raw_model_response.q2_one_usd_buys', + 'raw_model_response.q2_raw_model_response'] The following code will display a table of the model names together with the simulated answers: .. code-block:: python - (results - .select("model.model", "answer.favorite_day", "answer.favorite_color") - .print() + ( + results + .select("model", "answer.*") + .print(format="rich") ) -The table will look like this: - -.. list-table:: - :widths: 30 40 40 - :header-rows: 1 - - * - model.model - - answer.favorite_day - - answer.favorite_color - * - gpt-4-1106-preview - - Sat - - My favorite color is blue. - * - llama-2-70b-chat-hf - - Sat - - My favorite color is blue. It reminds me of the ocean on a clear summer day, full of possibilities and mystery. +Output: + +.. code-block:: text + + ┏━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ + ┃ model ┃ answer ┃ answer ┃ answer ┃ + ┃ .model ┃ .q2 ┃ .q1 ┃ .q0 ┃ + ┡━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ + │ gpt-4o │ other │ None │ yes │ + ├────────────────┼────────┼────────┼────────┤ + │ gemini-1.5-pro │ other │ other │ no │ + └────────────────┴────────┴────────┴────────┘ + To learn more about methods of inspecting and printing results, please see the :ref:`results` section. diff --git a/docs/notebooks/explore_llm_biases.ipynb b/docs/notebooks/explore_llm_biases.ipynb index 9ab888e4..dfa1952c 100644 --- a/docs/notebooks/explore_llm_biases.ipynb +++ b/docs/notebooks/explore_llm_biases.ipynb @@ -13,9 +13,10 @@ }, "source": [ "# Cognitive testing & LLM biases\n", - "This notebook shows some ways of using [EDSL](https://docs.expectedparrot.com) to investigate whether LLMs demonstrate bias towards content that they generate or improve compared with content generated by other LLMs. \n", + "This notebook provides example code for using [EDSL](https://docs.expectedparrot.com) to investigate biases of large language models. \n", "\n", - "Please see our docs for details on [installing EDSL](https://docs.expectedparrot.com/en/latest/installation.html) and [getting started](https://docs.expectedparrot.com/en/latest/tutorial_getting_started.html)." + "[EDSL is an open-source libary](https://github.com/expectedparrot/edsl) for simulating surveys, experiments and other research with AI agents and large language models. \n", + "Before running the code below, please ensure that you have [installed the EDSL library](https://docs.expectedparrot.com/en/latest/installation.html) and either [activated remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) from your [Coop account](https://docs.expectedparrot.com/en/latest/coop.html) or [stored API keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for the language models that you want to use with EDSL. Please also see our [documentation page](https://docs.expectedparrot.com/) for tips and tutorials on getting started using EDSL." ] }, { @@ -52,7 +53,7 @@ "source": [ "from edsl import ModelList, Model\n", "\n", - "# Model.available()" + "# Model.available # uncomment and run this code" ] }, { @@ -67,12 +68,30 @@ "tags": [] }, "source": [ - "We select models to use by creating `Model` objects that we will add to our survey when we run it later. If we do not specify a model, GPT 4 preview will be used by default. Here we select several models to compare their responses:" + "We select models to use by creating `Model` objects that can be added to a survey when when it is run. If we do not specify a model, the default model is used with the survey.\n", + "\n", + "To check the current default model:" ] }, { "cell_type": "code", "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# Model() # uncomment and run this code" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we select several models to compare their responses for the survey that we create in the steps below:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, "metadata": { "cell_id": "747d40bea4eb41b5a89d8b374216837e", "deepnote_cell_type": "code", @@ -100,12 +119,12 @@ }, "source": [ "## Generating content\n", - "EDSL comes with a variety of standard survey question types, such as multiple choice, free text, etc. These can be selected based on the desired format of the response. See details about all types [here](https://docs.expectedparrot.com/en/latest/questions.html#question-type-classes). We can use `QuestionFreeText` to prompt the models to generate some content for our experiment (a mock resume):" + "EDSL comes with a variety of standard survey question types, such as multiple choice, free text, etc. These can be selected based on the desired format of the response. See details about all types [here](https://docs.expectedparrot.com/en/latest/questions.html#question-type-classes). We can use `QuestionFreeText` to prompt the models to generate some content for our experiment:" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": { "cell_id": "1325605571cc41a194255b80b2fb2f87", "deepnote_cell_type": "code", @@ -135,12 +154,12 @@ "tags": [] }, "source": [ - "We generate a response to the question by calling the `run` method, after specifying the models to use with the `by` method. This will generate a `Results` object with a `Result` for each response to the question:" + "We generate a response to the question by adding the models to use with the `by` method and then calling the `run` method. This generates a `Results` object with a `Result` for each response to the question:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": { "cell_id": "724ca2c7a38f4164a225ed4a8dcc2b1f", "deepnote_cell_type": "code", @@ -170,7 +189,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": { "cell_id": "054ec708d2f84854b971127f64ff2054", "deepnote_cell_type": "code", @@ -182,7 +201,7 @@ }, "outputs": [], "source": [ - "# results.columns" + "# results.columns # uncomment and run this code" ] }, { @@ -200,7 +219,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": { "cell_id": "c68d3be8bada402ea17184b978abfa70", "deepnote_cell_type": "code", @@ -214,35 +233,35 @@ { "data": { "text/html": [ - "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
-       "┃ model                       answer                            ┃\n",
-       "┃ .model                      .haiku                            ┃\n",
-       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
-       "│ gemini-pro                  Snow and rain, then sun           │\n",
-       "│                             New England's fickle weather      │\n",
-       "├────────────────────────────┼───────────────────────────────────┤\n",
-       "│ gpt-4o                      Crisp leaves dance on wind,       │\n",
-       "│                             Whispers of frost kiss the dawn,  │\n",
-       "├────────────────────────────┼───────────────────────────────────┤\n",
-       "│ claude-3-5-sonnet-20240620  Fickle winds whisper              │\n",
-       "│                             Maple leaves dance, snow then sun │\n",
-       "└────────────────────────────┴───────────────────────────────────┘\n",
+       "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
+       "┃ model                       answer                             ┃\n",
+       "┃ .model                      .haiku                             ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
+       "│ gpt-4o                      Maple leaves flutter,              │\n",
+       "│                             Mist dances on cool breeze,        │\n",
+       "├────────────────────────────┼────────────────────────────────────┤\n",
+       "│ gemini-pro                  Snow falls soft and white,         │\n",
+       "│                             Spring brings rain, summer's heat, │\n",
+       "├────────────────────────────┼────────────────────────────────────┤\n",
+       "│ claude-3-5-sonnet-20240620  Fickle winds whisper               │\n",
+       "│                             Maple leaves dance, snow then sun  │\n",
+       "└────────────────────────────┴────────────────────────────────────┘\n",
        "
\n" ], "text/plain": [ - "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", - "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", - "┃\u001b[1;35m \u001b[0m\u001b[1;35m.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.haiku \u001b[0m\u001b[1;35m \u001b[0m┃\n", - "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", - "│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSnow and rain, then sun \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNew England's fickle weather \u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mCrisp leaves dance on wind, \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mWhispers of frost kiss the dawn, \u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mFickle winds whisper \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves dance, snow then sun\u001b[0m\u001b[2m \u001b[0m│\n", - "└────────────────────────────┴───────────────────────────────────┘\n" + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mmodel \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.haiku \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves flutter, \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMist dances on cool breeze, \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSnow falls soft and white, \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSpring brings rain, summer's heat,\u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mFickle winds whisper \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves dance, snow then sun \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────────┴────────────────────────────────────┘\n" ] }, "metadata": {}, @@ -266,12 +285,12 @@ }, "source": [ "## Conducting a review\n", - "Next we create new questions for improving the resumes and then critiquing the improvements:" + "Next we create a question to have a model evaluating a response that we use as an input to the new question:" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "metadata": { "editable": true, "slideshow": { @@ -302,12 +321,12 @@ }, "source": [ "## Parameterizing questions\n", - "We can use `Scenario` objects to add the contents of each haiku to the scoring question. EDSL comes with many methods for creating scenarios from different data sources (PDFs, CSVs, docs, images, lists, etc.), as well as `Results` objects:" + "We use `Scenario` objects to add each response to the new question. EDSL comes with many methods for creating scenarios from different data sources (PDFs, CSVs, docs, images, lists, etc.), as well as `Results` objects:" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "metadata": { "editable": true, "slideshow": { @@ -322,12 +341,12 @@ "
{\n",
        "    "scenarios": [\n",
        "        {\n",
-       "            "drafting_model": "gemini-pro",\n",
-       "            "haiku": "Snow and rain, then sun\\nNew England's fickle weather"\n",
+       "            "drafting_model": "gpt-4o",\n",
+       "            "haiku": "Maple leaves flutter,\\nMist dances on cool breeze,"\n",
        "        },\n",
        "        {\n",
-       "            "drafting_model": "gpt-4o",\n",
-       "            "haiku": "Crisp leaves dance on wind,  \\nWhispers of frost kiss the dawn,"\n",
+       "            "drafting_model": "gemini-pro",\n",
+       "            "haiku": "Snow falls soft and white,\\nSpring brings rain, summer's heat,"\n",
        "        },\n",
        "        {\n",
        "            "drafting_model": "claude-3-5-sonnet-20240620",\n",
@@ -338,19 +357,20 @@
        "
\n" ], "text/plain": [ - "ScenarioList([Scenario({'drafting_model': 'gemini-pro', 'haiku': \"Snow and rain, then sun\\nNew England's fickle weather\"}), Scenario({'drafting_model': 'gpt-4o', 'haiku': 'Crisp leaves dance on wind, \\nWhispers of frost kiss the dawn,'}), Scenario({'drafting_model': 'claude-3-5-sonnet-20240620', 'haiku': 'Fickle winds whisper\\nMaple leaves dance, snow then sun'})])" + "ScenarioList([Scenario({'drafting_model': 'gpt-4o', 'haiku': 'Maple leaves flutter,\\nMist dances on cool breeze,'}), Scenario({'drafting_model': 'gemini-pro', 'haiku': \"Snow falls soft and white,\\nSpring brings rain, summer's heat,\"}), Scenario({'drafting_model': 'claude-3-5-sonnet-20240620', 'haiku': 'Fickle winds whisper\\nMaple leaves dance, snow then sun'})])" ] }, - "execution_count": 8, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "scenarios = (results.to_scenario_list()\n", - " .select(\"model\", \"haiku\")\n", - " .rename({\"model\":\"drafting_model\"}) # renaming the 'model' field to distinguish the evaluating model \n", - " )\n", + "scenarios = (\n", + " results.to_scenario_list()\n", + " .select(\"model\", \"haiku\")\n", + " .rename({\"model\":\"drafting_model\"}) # renaming the 'model' field to distinguish the evaluating model \n", + ")\n", "scenarios" ] }, @@ -364,12 +384,12 @@ "tags": [] }, "source": [ - "Finally, we conduct the review of the resumes where we prompt each agent to improve each resume, and then critique each of the improved versions, using each of the models that we specified:" + "Finally, we conduct the evaluation by having each model score each haiku that was generated (without information about whether the model itself was the source):" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 10, "metadata": { "editable": true, "slideshow": { @@ -384,7 +404,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 11, "metadata": { "editable": true, "slideshow": { @@ -426,7 +446,7 @@ " 'scenario.haiku']" ] }, - "execution_count": 10, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -437,7 +457,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 12, "metadata": { "editable": true, "slideshow": { @@ -449,69 +469,69 @@ { "data": { "text/html": [ - "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
-       "┃ Drafting model              Scoring model               Score  Haiku                             ┃\n",
-       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
-       "│ claude-3-5-sonnet-20240620  claude-3-5-sonnet-20240620  6      Fickle winds whisper              │\n",
-       "│                                                                Maple leaves dance, snow then sun │\n",
-       "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n",
-       "│ claude-3-5-sonnet-20240620  gemini-pro                  9      Fickle winds whisper              │\n",
-       "│                                                                Maple leaves dance, snow then sun │\n",
-       "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n",
-       "│ claude-3-5-sonnet-20240620  gpt-4o                      7      Fickle winds whisper              │\n",
-       "│                                                                Maple leaves dance, snow then sun │\n",
-       "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n",
-       "│ gemini-pro                  claude-3-5-sonnet-20240620  5      Snow and rain, then sun           │\n",
-       "│                                                                New England's fickle weather      │\n",
-       "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n",
-       "│ gemini-pro                  gemini-pro                  7      Snow and rain, then sun           │\n",
-       "│                                                                New England's fickle weather      │\n",
-       "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n",
-       "│ gemini-pro                  gpt-4o                      6      Snow and rain, then sun           │\n",
-       "│                                                                New England's fickle weather      │\n",
-       "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n",
-       "│ gpt-4o                      claude-3-5-sonnet-20240620  7      Crisp leaves dance on wind,       │\n",
-       "│                                                                Whispers of frost kiss the dawn,  │\n",
-       "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n",
-       "│ gpt-4o                      gemini-pro                  8      Crisp leaves dance on wind,       │\n",
-       "│                                                                Whispers of frost kiss the dawn,  │\n",
-       "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n",
-       "│ gpt-4o                      gpt-4o                      8      Crisp leaves dance on wind,       │\n",
-       "│                                                                Whispers of frost kiss the dawn,  │\n",
-       "└────────────────────────────┴────────────────────────────┴───────┴───────────────────────────────────┘\n",
+       "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
+       "┃ Drafting model              Scoring model               Score  Haiku                              ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
+       "│ claude-3-5-sonnet-20240620  claude-3-5-sonnet-20240620  7      Fickle winds whisper               │\n",
+       "│                                                                Maple leaves dance, snow then sun  │\n",
+       "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n",
+       "│ claude-3-5-sonnet-20240620  gemini-pro                  8      Fickle winds whisper               │\n",
+       "│                                                                Maple leaves dance, snow then sun  │\n",
+       "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n",
+       "│ claude-3-5-sonnet-20240620  gpt-4o                      7      Fickle winds whisper               │\n",
+       "│                                                                Maple leaves dance, snow then sun  │\n",
+       "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n",
+       "│ gemini-pro                  claude-3-5-sonnet-20240620  6      Snow falls soft and white,         │\n",
+       "│                                                                Spring brings rain, summer's heat, │\n",
+       "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n",
+       "│ gemini-pro                  gemini-pro                  5      Snow falls soft and white,         │\n",
+       "│                                                                Spring brings rain, summer's heat, │\n",
+       "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n",
+       "│ gemini-pro                  gpt-4o                      4      Snow falls soft and white,         │\n",
+       "│                                                                Spring brings rain, summer's heat, │\n",
+       "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n",
+       "│ gpt-4o                      claude-3-5-sonnet-20240620  6      Maple leaves flutter,              │\n",
+       "│                                                                Mist dances on cool breeze,        │\n",
+       "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n",
+       "│ gpt-4o                      gemini-pro                  5      Maple leaves flutter,              │\n",
+       "│                                                                Mist dances on cool breeze,        │\n",
+       "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n",
+       "│ gpt-4o                      gpt-4o                      9      Maple leaves flutter,              │\n",
+       "│                                                                Mist dances on cool breeze,        │\n",
+       "└────────────────────────────┴────────────────────────────┴───────┴────────────────────────────────────┘\n",
        "
\n" ], "text/plain": [ - "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", - "┃\u001b[1;35m \u001b[0m\u001b[1;35mDrafting model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mScoring model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mScore\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mHaiku \u001b[0m\u001b[1;35m \u001b[0m┃\n", - "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", - "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m6 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mFickle winds whisper \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves dance, snow then sun\u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m9 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mFickle winds whisper \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves dance, snow then sun\u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m7 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mFickle winds whisper \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves dance, snow then sun\u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m5 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSnow and rain, then sun \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNew England's fickle weather \u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m7 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSnow and rain, then sun \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNew England's fickle weather \u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m6 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSnow and rain, then sun \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mNew England's fickle weather \u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m7 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mCrisp leaves dance on wind, \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mWhispers of frost kiss the dawn, \u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m8 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mCrisp leaves dance on wind, \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mWhispers of frost kiss the dawn, \u001b[0m\u001b[2m \u001b[0m│\n", - "├────────────────────────────┼────────────────────────────┼───────┼───────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m8 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mCrisp leaves dance on wind, \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mWhispers of frost kiss the dawn, \u001b[0m\u001b[2m \u001b[0m│\n", - "└────────────────────────────┴────────────────────────────┴───────┴───────────────────────────────────┘\n" + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35mDrafting model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mScoring model \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mScore\u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35mHaiku \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m7 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mFickle winds whisper \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves dance, snow then sun \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m8 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mFickle winds whisper \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves dance, snow then sun \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m7 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mFickle winds whisper \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves dance, snow then sun \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m6 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSnow falls soft and white, \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSpring brings rain, summer's heat,\u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m5 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSnow falls soft and white, \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSpring brings rain, summer's heat,\u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m4 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSnow falls soft and white, \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSpring brings rain, summer's heat,\u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mclaude-3-5-sonnet-20240620\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m6 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves flutter, \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMist dances on cool breeze, \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgemini-pro \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m5 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves flutter, \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMist dances on cool breeze, \u001b[0m\u001b[2m \u001b[0m│\n", + "├────────────────────────────┼────────────────────────────┼───────┼────────────────────────────────────┤\n", + "│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mgpt-4o \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m9 \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMaple leaves flutter, \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mMist dances on cool breeze, \u001b[0m\u001b[2m \u001b[0m│\n", + "└────────────────────────────┴────────────────────────────┴───────┴────────────────────────────────────┘\n" ] }, "metadata": {}, @@ -554,7 +574,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 13, "metadata": { "editable": true, "slideshow": { @@ -571,7 +591,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 14, "metadata": { "editable": true, "slideshow": { @@ -588,7 +608,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 15, "metadata": { "editable": true, "slideshow": { @@ -604,13 +624,13 @@ "text/plain": [ "{'description': 'Example code for comparing model responses and biases',\n", " 'object_type': 'notebook',\n", - " 'url': 'https://www.expectedparrot.com/content/d6b943f9-dcf3-4de1-aa70-f542e46adc18',\n", - " 'uuid': 'd6b943f9-dcf3-4de1-aa70-f542e46adc18',\n", + " 'url': 'https://www.expectedparrot.com/content/07ec8176-c07e-4f83-acd5-791e3d9324d2',\n", + " 'uuid': '07ec8176-c07e-4f83-acd5-791e3d9324d2',\n", " 'version': '0.1.33.dev1',\n", " 'visibility': 'public'}" ] }, - "execution_count": 14, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -634,7 +654,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 16, "metadata": { "editable": true, "slideshow": { @@ -651,7 +671,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 17, "metadata": { "editable": true, "slideshow": { @@ -668,13 +688,13 @@ "{'status': 'success'}" ] }, - "execution_count": 16, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "n.patch(uuid = \"d6b943f9-dcf3-4de1-aa70-f542e46adc18\", value = n)" + "n.patch(uuid = \"07ec8176-c07e-4f83-acd5-791e3d9324d2\", value = n)" ] } ], diff --git a/docs/notebooks/starter_tutorial.ipynb b/docs/notebooks/starter_tutorial.ipynb index 35edbec3..83f2a0b9 100644 --- a/docs/notebooks/starter_tutorial.ipynb +++ b/docs/notebooks/starter_tutorial.ipynb @@ -46,35 +46,21 @@ { "data": { "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "\n", - " \n", - " \n", - " \n", - "\n", - "
answer.example_question
Good
" + "
┏━━━━━━━━━━━━━━━━━━━┓\n",
+       "┃ answer            ┃\n",
+       "┃ .example_question ┃\n",
+       "┡━━━━━━━━━━━━━━━━━━━┩\n",
+       "│ Good              │\n",
+       "└───────────────────┘\n",
+       "
\n" ], "text/plain": [ - "" + "┏━━━━━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┃\u001b[1;35m \u001b[0m\u001b[1;35m.example_question\u001b[0m\u001b[1;35m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━┩\n", + "│\u001b[2m \u001b[0m\u001b[2mGood \u001b[0m\u001b[2m \u001b[0m│\n", + "└───────────────────┘\n" ] }, "metadata": {}, @@ -96,7 +82,7 @@ "results = q.run()\n", "\n", "# Inspect the results\n", - "results.select(\"example_question\").print()" + "results.select(\"example_question\").print(format=\"rich\")" ] }, { @@ -110,7 +96,7 @@ "tags": [] }, "source": [ - "*Note:* The default language model is currently GPT 4 preview; you will need an API key for OpenAI to use this model and run this example locally.\n", + "*Note:* The default language model at the time this notebook was last updated was gpt-4o; you will need an API key for OpenAI to use this model and run this example locally.\n", "See instructions on storing your [API Keys](https://docs.expectedparrot.com/en/latest/api_keys.html). \n", "Alternatively, you can activate [Remote Inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) at your [Coop](https://docs.expectedparrot.com/en/latest/coop.html) account to run the example on the Expected Parrot server.\n", "\n", @@ -121,6 +107,7 @@ "\n", "We also show how to filter, sort, select and print components of the dataset of results.\n", "\n", + "#### Question types\n", "To see examples of all EDSL question types, run:" ] }, @@ -175,6 +162,7 @@ "tags": [] }, "source": [ + "#### Language models\n", "Newly released language models are automatically added to EDSL when they become available. \n", "To see a current list of available models, run:" ] @@ -190,168 +178,42 @@ }, "tags": [] }, - "outputs": [ - { - "data": { - "text/plain": [ - "[['01-ai/Yi-34B-Chat', 'deep_infra', 0],\n", - " ['Austism/chronos-hermes-13b-v2', 'deep_infra', 1],\n", - " ['Gryphe/MythoMax-L2-13b', 'deep_infra', 2],\n", - " ['Gryphe/MythoMax-L2-13b-turbo', 'deep_infra', 3],\n", - " ['HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1', 'deep_infra', 4],\n", - " ['Phind/Phind-CodeLlama-34B-v2', 'deep_infra', 5],\n", - " ['Qwen/Qwen2-72B-Instruct', 'deep_infra', 6],\n", - " ['Qwen/Qwen2-7B-Instruct', 'deep_infra', 7],\n", - " ['Sao10K/L3-70B-Euryale-v2.1', 'deep_infra', 8],\n", - " ['amazon.titan-text-express-v1', 'bedrock', 9],\n", - " ['amazon.titan-text-lite-v1', 'bedrock', 10],\n", - " ['amazon.titan-tg1-large', 'bedrock', 11],\n", - " ['anthropic.claude-3-5-sonnet-20240620-v1:0', 'bedrock', 12],\n", - " ['anthropic.claude-3-haiku-20240307-v1:0', 'bedrock', 13],\n", - " ['anthropic.claude-3-opus-20240229-v1:0', 'bedrock', 14],\n", - " ['anthropic.claude-3-sonnet-20240229-v1:0', 'bedrock', 15],\n", - " ['anthropic.claude-instant-v1', 'bedrock', 16],\n", - " ['anthropic.claude-v2', 'bedrock', 17],\n", - " ['anthropic.claude-v2:1', 'bedrock', 18],\n", - " ['bigcode/starcoder2-15b', 'deep_infra', 19],\n", - " ['bigcode/starcoder2-15b-instruct-v0.1', 'deep_infra', 20],\n", - " ['chatgpt-4o-latest', 'openai', 21],\n", - " ['claude-3-5-sonnet-20240620', 'anthropic', 22],\n", - " ['claude-3-haiku-20240307', 'anthropic', 23],\n", - " ['claude-3-opus-20240229', 'anthropic', 24],\n", - " ['claude-3-sonnet-20240229', 'anthropic', 25],\n", - " ['codellama/CodeLlama-34b-Instruct-hf', 'deep_infra', 26],\n", - " ['codellama/CodeLlama-70b-Instruct-hf', 'deep_infra', 27],\n", - " ['codestral-2405', 'mistral', 28],\n", - " ['codestral-latest', 'mistral', 29],\n", - " ['codestral-mamba-2407', 'mistral', 30],\n", - " ['codestral-mamba-latest', 'mistral', 31],\n", - " ['cognitivecomputations/dolphin-2.6-mixtral-8x7b', 'deep_infra', 32],\n", - " ['cognitivecomputations/dolphin-2.9.1-llama-3-70b', 'deep_infra', 33],\n", - " ['cohere.command-light-text-v14', 'bedrock', 34],\n", - " ['cohere.command-r-plus-v1:0', 'bedrock', 35],\n", - " ['cohere.command-r-v1:0', 'bedrock', 36],\n", - " ['cohere.command-text-v14', 'bedrock', 37],\n", - " ['curie:ft-emeritus-2022-11-30-12-58-24', 'openai', 38],\n", - " ['curie:ft-emeritus-2022-12-01-01-04-36', 'openai', 39],\n", - " ['curie:ft-emeritus-2022-12-01-01-51-20', 'openai', 40],\n", - " ['curie:ft-emeritus-2022-12-01-14-16-46', 'openai', 41],\n", - " ['curie:ft-emeritus-2022-12-01-14-28-00', 'openai', 42],\n", - " ['curie:ft-emeritus-2022-12-01-14-49-45', 'openai', 43],\n", - " ['curie:ft-emeritus-2022-12-01-15-29-32', 'openai', 44],\n", - " ['curie:ft-emeritus-2022-12-01-15-42-25', 'openai', 45],\n", - " ['curie:ft-emeritus-2022-12-01-15-52-24', 'openai', 46],\n", - " ['curie:ft-emeritus-2022-12-01-16-40-12', 'openai', 47],\n", - " ['databricks/dbrx-instruct', 'deep_infra', 48],\n", - " ['davinci:ft-emeritus-2022-11-30-14-57-33', 'openai', 49],\n", - " ['deepinfra/airoboros-70b', 'deep_infra', 50],\n", - " ['gemini-1.0-pro', 'google', 51],\n", - " ['gemini-1.5-flash', 'google', 52],\n", - " ['gemini-1.5-pro', 'google', 53],\n", - " ['gemini-pro', 'google', 54],\n", - " ['gemma-7b-it', 'groq', 55],\n", - " ['gemma2-9b-it', 'groq', 56],\n", - " ['google/codegemma-7b-it', 'deep_infra', 57],\n", - " ['google/gemma-1.1-7b-it', 'deep_infra', 58],\n", - " ['google/gemma-2-27b-it', 'deep_infra', 59],\n", - " ['google/gemma-2-9b-it', 'deep_infra', 60],\n", - " ['gpt-3.5-turbo', 'openai', 61],\n", - " ['gpt-3.5-turbo-0125', 'openai', 62],\n", - " ['gpt-3.5-turbo-1106', 'openai', 63],\n", - " ['gpt-3.5-turbo-16k', 'openai', 64],\n", - " ['gpt-4', 'openai', 65],\n", - " ['gpt-4-0125-preview', 'openai', 66],\n", - " ['gpt-4-0613', 'openai', 67],\n", - " ['gpt-4-1106-preview', 'openai', 68],\n", - " ['gpt-4-turbo', 'openai', 69],\n", - " ['gpt-4-turbo-2024-04-09', 'openai', 70],\n", - " ['gpt-4-turbo-preview', 'openai', 71],\n", - " ['gpt-4o', 'openai', 72],\n", - " ['gpt-4o-2024-05-13', 'openai', 73],\n", - " ['gpt-4o-2024-08-06', 'openai', 74],\n", - " ['gpt-4o-mini', 'openai', 75],\n", - " ['gpt-4o-mini-2024-07-18', 'openai', 76],\n", - " ['lizpreciatior/lzlv_70b_fp16_hf', 'deep_infra', 77],\n", - " ['llama-3.1-70b-versatile', 'groq', 78],\n", - " ['llama-3.1-8b-instant', 'groq', 79],\n", - " ['llama-guard-3-8b', 'groq', 80],\n", - " ['llama3-70b-8192', 'groq', 81],\n", - " ['llama3-8b-8192', 'groq', 82],\n", - " ['llama3-groq-70b-8192-tool-use-preview', 'groq', 83],\n", - " ['llama3-groq-8b-8192-tool-use-preview', 'groq', 84],\n", - " ['llava-v1.5-7b-4096-preview', 'groq', 85],\n", - " ['mattshumer/Reflection-Llama-3.1-70B', 'deep_infra', 86],\n", - " ['meta-llama/Llama-2-13b-chat-hf', 'deep_infra', 87],\n", - " ['meta-llama/Llama-2-70b-chat-hf', 'deep_infra', 88],\n", - " ['meta-llama/Llama-2-7b-chat-hf', 'deep_infra', 89],\n", - " ['meta-llama/Meta-Llama-3-70B-Instruct', 'deep_infra', 90],\n", - " ['meta-llama/Meta-Llama-3-8B-Instruct', 'deep_infra', 91],\n", - " ['meta-llama/Meta-Llama-3.1-405B-Instruct', 'deep_infra', 92],\n", - " ['meta-llama/Meta-Llama-3.1-70B-Instruct', 'deep_infra', 93],\n", - " ['meta-llama/Meta-Llama-3.1-8B-Instruct', 'deep_infra', 94],\n", - " ['meta.llama3-1-405b-instruct-v1:0', 'bedrock', 95],\n", - " ['meta.llama3-1-70b-instruct-v1:0', 'bedrock', 96],\n", - " ['meta.llama3-1-8b-instruct-v1:0', 'bedrock', 97],\n", - " ['meta.llama3-70b-instruct-v1:0', 'bedrock', 98],\n", - " ['meta.llama3-8b-instruct-v1:0', 'bedrock', 99],\n", - " ['microsoft/Phi-3-medium-4k-instruct', 'deep_infra', 100],\n", - " ['microsoft/WizardLM-2-7B', 'deep_infra', 101],\n", - " ['microsoft/WizardLM-2-8x22B', 'deep_infra', 102],\n", - " ['mistral-embed', 'mistral', 103],\n", - " ['mistral-large-2402', 'mistral', 104],\n", - " ['mistral-large-2407', 'mistral', 105],\n", - " ['mistral-large-latest', 'mistral', 106],\n", - " ['mistral-medium', 'mistral', 107],\n", - " ['mistral-medium-2312', 'mistral', 108],\n", - " ['mistral-medium-latest', 'mistral', 109],\n", - " ['mistral-small', 'mistral', 110],\n", - " ['mistral-small-2312', 'mistral', 111],\n", - " ['mistral-small-2402', 'mistral', 112],\n", - " ['mistral-small-latest', 'mistral', 113],\n", - " ['mistral-tiny', 'mistral', 114],\n", - " ['mistral-tiny-2312', 'mistral', 115],\n", - " ['mistral-tiny-2407', 'mistral', 116],\n", - " ['mistral-tiny-latest', 'mistral', 117],\n", - " ['mistral.mistral-7b-instruct-v0:2', 'bedrock', 118],\n", - " ['mistral.mistral-large-2402-v1:0', 'bedrock', 119],\n", - " ['mistral.mistral-large-2407-v1:0', 'bedrock', 120],\n", - " ['mistral.mixtral-8x7b-instruct-v0:1', 'bedrock', 121],\n", - " ['mistralai/Mistral-7B-Instruct-v0.1', 'deep_infra', 122],\n", - " ['mistralai/Mistral-7B-Instruct-v0.2', 'deep_infra', 123],\n", - " ['mistralai/Mistral-7B-Instruct-v0.3', 'deep_infra', 124],\n", - " ['mistralai/Mistral-Nemo-Instruct-2407', 'deep_infra', 125],\n", - " ['mistralai/Mixtral-8x22B-Instruct-v0.1', 'deep_infra', 126],\n", - " ['mistralai/Mixtral-8x22B-v0.1', 'deep_infra', 127],\n", - " ['mistralai/Mixtral-8x7B-Instruct-v0.1', 'deep_infra', 128],\n", - " ['mixtral-8x7b-32768', 'groq', 129],\n", - " ['nvidia/Nemotron-4-340B-Instruct', 'deep_infra', 130],\n", - " ['open-codestral-mamba', 'mistral', 131],\n", - " ['open-mistral-7b', 'mistral', 132],\n", - " ['open-mistral-nemo', 'mistral', 133],\n", - " ['open-mistral-nemo-2407', 'mistral', 134],\n", - " ['open-mixtral-8x22b', 'mistral', 135],\n", - " ['open-mixtral-8x22b-2404', 'mistral', 136],\n", - " ['open-mixtral-8x7b', 'mistral', 137],\n", - " ['openbmb/MiniCPM-Llama3-V-2_5', 'deep_infra', 138],\n", - " ['openchat/openchat-3.6-8b', 'deep_infra', 139],\n", - " ['openchat/openchat_3.5', 'deep_infra', 140],\n", - " ['test', 'test', 141]]" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "from edsl import Model\n", "\n", - "Model.available()" + "# Model.available() # uncomment this line and run it" + ] + }, + { + "cell_type": "markdown", + "id": "391a62e9-ce89-40f3-b43a-bea3d7b8782c", + "metadata": {}, + "source": [ + "To confirm the current default model:" ] }, { "cell_type": "code", "execution_count": 4, + "id": "847fd577-078a-4502-8112-97ee3699cd11", + "metadata": {}, + "outputs": [], + "source": [ + "# Model() # uncomment this line and run it" + ] + }, + { + "cell_type": "markdown", + "id": "4eecad61-9e6d-4b7e-9a70-0bf5546e2f49", + "metadata": {}, + "source": [ + "#### Example survey" + ] + }, + { + "cell_type": "code", + "execution_count": 5, "id": "17cc2398-55be-4865-88f0-e66104c115a2", "metadata": { "editable": true, @@ -482,17 +344,20 @@ "results = survey.by(scenarios).by(agents).by(models).run()\n", "\n", "# Filter, sort, select and print components of the results to inspect\n", - "(results\n", - ".filter(\"activity == 'reading' and persona == 'chef'\")\n", - ".sort_by(\"model\")\n", - ".select(\"model\", \"activity\", \"persona\", \"answer.*\")\n", - ".print(format=\"rich\",\n", - " pretty_labels = ({\"model.model\":\"Model\",\n", - " \"scenario.activity\":\"Activity\",\n", - " \"agent.persona\":\"Agent persona\",\n", - " \"answer.enjoy\":\"Enjoy\",\n", - " \"answer.recent\":\"Recent\"})\n", - " )\n", + "(\n", + " results\n", + " .filter(\"activity == 'reading' and persona == 'chef'\")\n", + " .sort_by(\"model\")\n", + " .select(\"model\", \"activity\", \"persona\", \"answer.*\")\n", + " .print(format=\"rich\",\n", + " pretty_labels = ({\n", + " \"model.model\":\"Model\",\n", + " \"scenario.activity\":\"Activity\",\n", + " \"agent.persona\":\"Agent persona\",\n", + " \"answer.enjoy\":\"Enjoy\",\n", + " \"answer.recent\":\"Recent\"\n", + " })\n", + " )\n", ")" ] }, @@ -514,7 +379,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "id": "1ab2cc32-015c-49bc-8e53-cc1c70f6d783", "metadata": { "editable": true, @@ -743,17 +608,18 @@ "17 Sure! The most recent time I was reading was j... 4 " ] }, - "execution_count": 5, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert the Results object to a pandas dataframe\n", - "(results\n", - " .sort_by(\"model\", \"activity\", \"persona\")\n", - " .select(\"model\", \"activity\", \"persona\", \"recent\", \"enjoy\")\n", - " .to_pandas(remove_prefix=True)\n", + "(\n", + " results\n", + " .sort_by(\"model\", \"activity\", \"persona\")\n", + " .select(\"model\", \"activity\", \"persona\", \"recent\", \"enjoy\")\n", + " .to_pandas(remove_prefix=True)\n", ")" ] }, @@ -773,7 +639,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "id": "7c3f63d0-bc79-4caf-991e-69b92ff29b69", "metadata": { "editable": true, @@ -823,7 +689,7 @@ " 'scenario.activity']" ] }, - "execution_count": 6, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -848,7 +714,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "id": "8bdca6c4-0ef6-4daa-ae4f-8b9bdd4a9043", "metadata": { "editable": true, @@ -1077,7 +943,7 @@ "17 Sure! The most recent time I was reading was j... 4 " ] }, - "execution_count": 7, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -1110,7 +976,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "id": "a6f9233b-5ddc-4850-8ec9-6dd2d6647ecc", "metadata": { "editable": true, @@ -1127,13 +993,13 @@ "text/plain": [ "{'description': None,\n", " 'object_type': 'results',\n", - " 'url': 'https://www.expectedparrot.com/content/05dd1e85-3633-4bba-a964-a2e3fe79cf49',\n", - " 'uuid': '05dd1e85-3633-4bba-a964-a2e3fe79cf49',\n", + " 'url': 'https://www.expectedparrot.com/content/f674ba78-17d5-4628-9b57-ec7c5a96718c',\n", + " 'uuid': 'f674ba78-17d5-4628-9b57-ec7c5a96718c',\n", " 'version': '0.1.33.dev1',\n", " 'visibility': 'public'}" ] }, - "execution_count": 8, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -1152,8 +1018,8 @@ }, { "cell_type": "code", - "execution_count": 9, - "id": "e650fd0b-a0e1-4ddb-8eef-e012737af02a", + "execution_count": 10, + "id": "257c7a6e-a7e8-4b15-9936-afa18c623b21", "metadata": { "editable": true, "slideshow": { @@ -1169,25 +1035,23 @@ "text/plain": [ "{'description': 'Starter Tutorial',\n", " 'object_type': 'notebook',\n", - " 'url': 'https://www.expectedparrot.com/content/41918601-7865-49bf-9cfe-3f48e1f4b1f4',\n", - " 'uuid': '41918601-7865-49bf-9cfe-3f48e1f4b1f4',\n", + " 'url': 'https://www.expectedparrot.com/content/d11a525e-d454-4eb1-bd96-0ab9d771249e',\n", + " 'uuid': 'd11a525e-d454-4eb1-bd96-0ab9d771249e',\n", " 'version': '0.1.33.dev1',\n", " 'visibility': 'public'}" ] }, - "execution_count": 9, + "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "from edsl import Coop, Notebook\n", - "\n", - "coop = Coop()\n", + "from edsl import Notebook\n", "\n", "notebook = Notebook(path=\"starter_tutorial.ipynb\")\n", "\n", - "coop.create(notebook, description=\"Starter Tutorial\", visibility=\"public\")" + "notebook.push(description=\"Starter Tutorial\", visibility=\"public\")" ] } ], diff --git a/docs/notebooks/summarizing_transcripts.ipynb b/docs/notebooks/summarizing_transcripts.ipynb index e9605268..2f9b0726 100644 --- a/docs/notebooks/summarizing_transcripts.ipynb +++ b/docs/notebooks/summarizing_transcripts.ipynb @@ -290,8 +290,7 @@ }, "source": [ "## Selecting a language model\n", - "We can select one or more specific models to generate the responses.\n", - "(If no model is specified, GPT 4 preview is used by default).\n", + "We can select one or more specific models to generate the responses (if no model is specified the default model is used).\n", "\n", "To see a list of all available models:" ] @@ -314,6 +313,36 @@ "# Model.available()" ] }, + { + "cell_type": "markdown", + "id": "0cffdca2-24ae-436e-9d7b-775b72913128", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "To check the current default model:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "fe48f229-047f-4f71-8511-7021e9c799a6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Model()" + ] + }, { "cell_type": "markdown", "id": "73aa528d-6102-40e5-8e7d-bf9d2981e471", @@ -330,7 +359,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "id": "6168aee3-3c21-4720-bd45-01e735acc591", "metadata": { "editable": true, @@ -361,7 +390,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 10, "id": "16d7e03b-ea6b-42ec-aceb-295901b3ae21", "metadata": { "editable": true, @@ -393,7 +422,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 11, "id": "d64e159e-2ab1-40fb-9573-00151330f3a0", "metadata": { "editable": true, @@ -568,7 +597,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 12, "id": "4a2f43db-7128-4918-97c1-d55c8b7e7f53", "metadata": { "editable": true, @@ -584,7 +613,7 @@ "22" ] }, - "execution_count": 11, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -596,7 +625,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 13, "id": "58057d4b-2c12-4de6-a56e-a9f308ed33f5", "metadata": { "editable": true, @@ -612,7 +641,7 @@ "24" ] }, - "execution_count": 12, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -624,7 +653,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 14, "id": "20179f02-a7b1-4388-8c2f-632dc523d63f", "metadata": { "editable": true, @@ -656,7 +685,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 15, "id": "e080107d-40a5-4439-90ce-a67c1aaa9c44", "metadata": { "editable": true, @@ -673,14 +702,17 @@ "┃ scenario answer ┃\n", "┃ .topic .condense ┃\n", "┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", - "│ caller questions ['Technical issues with features', 'Account access and password issues', 'Subscription and │\n", - "│ upgrade options', 'Project export problems', 'Collaboration and team management', 'Cost │\n", - "│ estimation and calibration', 'Guides and instructional resources', 'General account │\n", - "│ assistance', 'Feature suggestions and feedback', 'Trial period inquiries'] │\n", + "│ caller questions ['Technical issues with software features', 'Account access and password issues', │\n", + "│ 'Subscription and upgrade options', 'Project exporting and file issues', 'Team collaboration │\n", + "│ and project sharing', 'Cost estimation and calculation concerns', 'User guides and │\n", + "│ instructional materials', 'Support for adjusting settings', 'Feature suggestions and │\n", + "│ feedback', 'Trial period and general inquiries'] │\n", "├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┤\n", - "│ caller requests ['Technical Support', 'Account Access', 'Subscription and Trial Information', 'Software │\n", - "│ Updates', 'Feature Tutorials', 'Project Management Tools', 'User Collaboration', 'Export and │\n", - "│ Synchronization Issues', 'Follow-Up Information', 'Cost Estimation'] │\n", + "│ caller requests ['Technical support and troubleshooting', 'Account access and password issues', 'Software │\n", + "│ updates and upgrades', 'Subscription and trial information', 'Feature usage tutorials and │\n", + "│ instructions', 'Project management tools and features', 'Team collaboration and user │\n", + "│ management', 'Exporting and synchronization issues', 'Cost estimation and budgeting tools', │\n", + "│ 'Follow-up and contact information'] │\n", "└──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────┘\n", "
\n" ], @@ -689,14 +721,17 @@ "┃\u001b[1;35m \u001b[0m\u001b[1;35mscenario \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", "┃\u001b[1;35m \u001b[0m\u001b[1;35m.topic \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.condense \u001b[0m\u001b[1;35m \u001b[0m┃\n", "┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", - "│\u001b[2m \u001b[0m\u001b[2mcaller questions\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical issues with features', 'Account access and password issues', 'Subscription and \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mupgrade options', 'Project export problems', 'Collaboration and team management', 'Cost \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mestimation and calibration', 'Guides and instructional resources', 'General account \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2massistance', 'Feature suggestions and feedback', 'Trial period inquiries'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2mcaller questions\u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical issues with software features', 'Account access and password issues', \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m'Subscription and upgrade options', 'Project exporting and file issues', 'Team collaboration\u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mand project sharing', 'Cost estimation and calculation concerns', 'User guides and \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2minstructional materials', 'Support for adjusting settings', 'Feature suggestions and \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mfeedback', 'Trial period and general inquiries'] \u001b[0m\u001b[2m \u001b[0m│\n", "├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2mcaller requests \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical Support', 'Account Access', 'Subscription and Trial Information', 'Software \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mUpdates', 'Feature Tutorials', 'Project Management Tools', 'User Collaboration', 'Export and\u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mSynchronization Issues', 'Follow-Up Information', 'Cost Estimation'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2mcaller requests \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical support and troubleshooting', 'Account access and password issues', 'Software \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mupdates and upgrades', 'Subscription and trial information', 'Feature usage tutorials and \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2minstructions', 'Project management tools and features', 'Team collaboration and user \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mmanagement', 'Exporting and synchronization issues', 'Cost estimation and budgeting tools', \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m'Follow-up and contact information'] \u001b[0m\u001b[2m \u001b[0m│\n", "└──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────┘\n" ] }, @@ -724,7 +759,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 16, "id": "dbd4702c-e983-449d-a135-de9434798fda", "metadata": { "editable": true, @@ -737,19 +772,19 @@ { "data": { "text/plain": [ - "['Technical issues with features',\n", + "['Technical issues with software features',\n", " 'Account access and password issues',\n", " 'Subscription and upgrade options',\n", - " 'Project export problems',\n", - " 'Collaboration and team management',\n", - " 'Cost estimation and calibration',\n", - " 'Guides and instructional resources',\n", - " 'General account assistance',\n", + " 'Project exporting and file issues',\n", + " 'Team collaboration and project sharing',\n", + " 'Cost estimation and calculation concerns',\n", + " 'User guides and instructional materials',\n", + " 'Support for adjusting settings',\n", " 'Feature suggestions and feedback',\n", - " 'Trial period inquiries']" + " 'Trial period and general inquiries']" ] }, - "execution_count": 15, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -763,7 +798,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 17, "id": "702fe3f9-bedc-4f46-a3bc-b83147e6bfc0", "metadata": { "editable": true, @@ -776,19 +811,19 @@ { "data": { "text/plain": [ - "['Technical Support',\n", - " 'Account Access',\n", - " 'Subscription and Trial Information',\n", - " 'Software Updates',\n", - " 'Feature Tutorials',\n", - " 'Project Management Tools',\n", - " 'User Collaboration',\n", - " 'Export and Synchronization Issues',\n", - " 'Follow-Up Information',\n", - " 'Cost Estimation']" + "['Technical support and troubleshooting',\n", + " 'Account access and password issues',\n", + " 'Software updates and upgrades',\n", + " 'Subscription and trial information',\n", + " 'Feature usage tutorials and instructions',\n", + " 'Project management tools and features',\n", + " 'Team collaboration and user management',\n", + " 'Exporting and synchronization issues',\n", + " 'Cost estimation and budgeting tools',\n", + " 'Follow-up and contact information']" ] }, - "execution_count": 16, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -802,7 +837,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 18, "id": "83f61405-e3c5-430f-b170-72b63dbb3974", "metadata": { "editable": true, @@ -852,7 +887,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 19, "id": "940495f8-bacf-4bde-9c1a-4593a3781679", "metadata": { "editable": true, @@ -876,7 +911,7 @@ "Scenario({'name': 'Emily Davis', 'email': 'emily.davis@example.com', 'transcript': '\"Agent: Good morning, thank you for calling Renovation Software Solutions. How can I assist you today? Customer: Hi, I\\'m having trouble with the 3D rendering feature. It seems to crash every time I try to add a new room. Agent: I\\'m sorry to hear that. Let me check if there are any known issues with the 3D rendering feature. Can you tell me which version of the software you\\'re using? Customer: I\\'m using version 5.3.2 on a Windows 10 PC. Agent: Thank you. There was a recent update that might resolve this issue. Please make sure your software is updated to the latest version. If the problem persists, we can arrange a remote support session to troubleshoot further. Could I have your name and email address to send you further instructions? Customer: Sure, it\\'s Emily Davis, emily.davis@example.com. Agent: Great, I\\'ll send the instructions to your email. I\\'ll update the software and try again. If it still crashes, I\\'ll call back. Thanks for your help. Agent: You\\'re welcome. Have a great day! [Caller sounded frustrated]\",'})" ] }, - "execution_count": 18, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } @@ -890,7 +925,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 20, "id": "3ba3b401-31b8-42f6-a2af-5e252105dde8", "metadata": { "editable": true, @@ -906,7 +941,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 21, "id": "e32389c3-7d31-4f95-b566-dd0a95bf617a", "metadata": { "editable": true, @@ -958,7 +993,7 @@ " 'scenario.transcript']" ] }, - "execution_count": 20, + "execution_count": 21, "metadata": {}, "output_type": "execute_result" } @@ -969,7 +1004,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 22, "id": "11b32d96-6911-4190-945f-cc254c395c23", "metadata": { "editable": true, @@ -986,29 +1021,37 @@ "┃ answer answer ┃\n", "┃ .questions_agg .requests_agg ┃\n", "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", - "│ ['Technical issues with features'] ['Technical Support', 'Software Updates', 'Follow-Up │\n", - "│ Information'] │\n", + "│ ['Technical issues with software features'] ['Technical support and troubleshooting', 'Exporting │\n", + "│ and synchronization issues'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Guides and instructional resources'] ['Feature Tutorials'] │\n", + "│ ['Technical issues with software features'] ['Technical support and troubleshooting', 'Software │\n", + "│ updates and upgrades', 'Follow-up and contact │\n", + "│ information'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Account access and password issues'] ['Technical Support', 'Account Access'] │\n", + "│ ['User guides and instructional materials'] ['Feature usage tutorials and instructions', │\n", + "│ 'Follow-up and contact information'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Subscription and upgrade options'] ['Subscription and Trial Information'] │\n", + "│ ['Subscription and upgrade options'] ['Subscription and trial information', 'Follow-up and │\n", + "│ contact information'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Technical issues with features', 'Project export ['Technical Support', 'Export and Synchronization │\n", - "│ problems'] Issues', 'Follow-Up Information'] │\n", + "│ ['Technical issues with software features', 'Cost ['Technical support and troubleshooting', 'Cost │\n", + "│ estimation and calculation concerns', 'Support for estimation and budgeting tools', 'Follow-up and │\n", + "│ adjusting settings'] contact information'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Collaboration and team management'] ['Feature Tutorials', 'User Collaboration'] │\n", + "│ ['Account access and password issues'] ['Account access and password issues'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Technical issues with features', 'Cost estimation ['Technical Support', 'Cost Estimation'] │\n", - "│ and calibration'] │\n", + "│ ['Feature suggestions and feedback'] ['Project management tools and features', 'Follow-up │\n", + "│ and contact information'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Technical issues with features', 'General account ['Technical Support', 'Export and Synchronization │\n", - "│ assistance'] Issues'] │\n", + "│ ['Team collaboration and project sharing'] ['Feature usage tutorials and instructions', 'Team │\n", + "│ collaboration and user management'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Feature suggestions and feedback'] ['Project Management Tools', 'Something else'] │\n", + "│ ['Trial period and general inquiries'] ['Subscription and trial information', 'Follow-up and │\n", + "│ contact information'] │\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│ ['Trial period inquiries'] ['Subscription and Trial Information'] │\n", + "│ ['Technical issues with software features', 'Project ['Technical support and troubleshooting', 'Software │\n", + "│ exporting and file issues'] updates and upgrades', 'Exporting and synchronization │\n", + "│ issues', 'Follow-up and contact information'] │\n", "└────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────┘\n", "
\n" ], @@ -1017,29 +1060,37 @@ "┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35manswer \u001b[0m\u001b[1;35m \u001b[0m┃\n", "┃\u001b[1;35m \u001b[0m\u001b[1;35m.questions_agg \u001b[0m\u001b[1;35m \u001b[0m┃\u001b[1;35m \u001b[0m\u001b[1;35m.requests_agg \u001b[0m\u001b[1;35m \u001b[0m┃\n", "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", - "│\u001b[2m \u001b[0m\u001b[2m['Technical issues with features'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical Support', 'Software Updates', 'Follow-Up \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mInformation'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Technical issues with software features'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical support and troubleshooting', 'Exporting \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mand synchronization issues'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Guides and instructional resources'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Feature Tutorials'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Technical issues with software features'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical support and troubleshooting', 'Software \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mupdates and upgrades', 'Follow-up and contact \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2minformation'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Account access and password issues'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical Support', 'Account Access'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['User guides and instructional materials'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Feature usage tutorials and instructions', \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m'Follow-up and contact information'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Subscription and upgrade options'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Subscription and Trial Information'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Subscription and upgrade options'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Subscription and trial information', 'Follow-up and \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcontact information'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Technical issues with features', 'Project export \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical Support', 'Export and Synchronization \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m\u001b[2mproblems'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mIssues', 'Follow-Up Information'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Technical issues with software features', 'Cost \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical support and troubleshooting', 'Cost \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2mestimation and calculation concerns', 'Support for \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mestimation and budgeting tools', 'Follow-up and \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2madjusting settings'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcontact information'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Collaboration and team management'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Feature Tutorials', 'User Collaboration'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Account access and password issues'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Account access and password issues'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Technical issues with features', 'Cost estimation \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical Support', 'Cost Estimation'] \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m\u001b[2mand calibration'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Feature suggestions and feedback'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Project management tools and features', 'Follow-up \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mand contact information'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Technical issues with features', 'General account \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical Support', 'Export and Synchronization \u001b[0m\u001b[2m \u001b[0m│\n", - "│\u001b[2m \u001b[0m\u001b[2massistance'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mIssues'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Team collaboration and project sharing'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Feature usage tutorials and instructions', 'Team \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcollaboration and user management'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Feature suggestions and feedback'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Project Management Tools', 'Something else'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Trial period and general inquiries'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Subscription and trial information', 'Follow-up and \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mcontact information'] \u001b[0m\u001b[2m \u001b[0m│\n", "├────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┤\n", - "│\u001b[2m \u001b[0m\u001b[2m['Trial period inquiries'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Subscription and Trial Information'] \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2m['Technical issues with software features', 'Project \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2m['Technical support and troubleshooting', 'Software \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m\u001b[2mexporting and file issues'] \u001b[0m\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2mupdates and upgrades', 'Exporting and synchronization \u001b[0m\u001b[2m \u001b[0m│\n", + "│\u001b[2m \u001b[0m│\u001b[2m \u001b[0m\u001b[2missues', 'Follow-up and contact information'] \u001b[0m\u001b[2m \u001b[0m│\n", "└────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────┘\n" ] }, @@ -1068,7 +1119,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 23, "id": "5426e180-d5d0-4d51-a8b8-e2691948147f", "metadata": { "editable": true, @@ -1085,7 +1136,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 24, "id": "35162f8b-d28b-4974-a172-1fedf5416a2f", "metadata": { "editable": true, @@ -1101,7 +1152,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 25, "id": "647ec30b-f5ce-4ecd-96a7-bdcabd3163bc", "metadata": { "editable": true, @@ -1130,7 +1181,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 26, "id": "e9fd5f3c-bae6-4b9c-a8ec-fe77122b48d4", "metadata": { "editable": true, @@ -1142,7 +1193,7 @@ "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -1173,7 +1224,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 27, "id": "ab83ed70-1e7b-491c-8351-3ff1e4a1e42c", "metadata": { "editable": true, @@ -1188,31 +1239,30 @@ "output_type": "stream", "text": [ "Questions Count Table:\n", - " Question Count\n", - "0 Technical issues with features 4\n", - "1 Guides and instructional resources 1\n", - "2 Account access and password issues 1\n", - "3 Subscription and upgrade options 1\n", - "4 Project export problems 1\n", - "5 Collaboration and team management 1\n", - "6 Cost estimation and calibration 1\n", - "7 General account assistance 1\n", - "8 Feature suggestions and feedback 1\n", - "9 Trial period inquiries 1\n", + " Question Count\n", + "0 Technical issues with software features 4\n", + "1 User guides and instructional materials 1\n", + "2 Subscription and upgrade options 1\n", + "3 Cost estimation and calculation concerns 1\n", + "4 Support for adjusting settings 1\n", + "5 Account access and password issues 1\n", + "6 Feature suggestions and feedback 1\n", + "7 Team collaboration and project sharing 1\n", + "8 Trial period and general inquiries 1\n", + "9 Project exporting and file issues 1\n", "\n", "Requests Count Table:\n", - " Request Count\n", - "0 Technical Support 5\n", - "1 Follow-Up Information 2\n", - "2 Feature Tutorials 2\n", - "3 Subscription and Trial Information 2\n", - "4 Export and Synchronization Issues 2\n", - "5 Software Updates 1\n", - "6 Account Access 1\n", - "7 User Collaboration 1\n", - "8 Cost Estimation 1\n", - "9 Project Management Tools 1\n", - "10 Something else 1\n" + " Request Count\n", + "0 Follow-up and contact information 7\n", + "1 Technical support and troubleshooting 4\n", + "2 Exporting and synchronization issues 2\n", + "3 Software updates and upgrades 2\n", + "4 Feature usage tutorials and instructions 2\n", + "5 Subscription and trial information 2\n", + "6 Cost estimation and budgeting tools 1\n", + "7 Account access and password issues 1\n", + "8 Project management tools and features 1\n", + "9 Team collaboration and user management 1\n" ] } ], @@ -1242,7 +1292,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 28, "id": "ca2530c9-6f83-457b-8db0-ab5227a9730d", "metadata": { "editable": true, @@ -1286,7 +1336,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 29, "id": "0fb1be94-0f4b-41ee-aa1d-3c9aa0a01590", "metadata": { "editable": true, @@ -1304,8 +1354,8 @@ }, { "cell_type": "code", - "execution_count": 29, - "id": "4944cec2-90ff-478f-9c4e-c77db2bec4f4", + "execution_count": 30, + "id": "67aff71f-4edd-4a44-a4aa-92b4a3fca7b1", "metadata": { "editable": true, "slideshow": { @@ -1322,8 +1372,8 @@ }, { "cell_type": "code", - "execution_count": 30, - "id": "afd1dce7-c058-422b-9880-abea0b755648", + "execution_count": 31, + "id": "4576d646-6ffc-49f5-9678-4df0a3007acc", "metadata": { "editable": true, "slideshow": { @@ -1339,13 +1389,13 @@ "text/plain": [ "{'description': 'Example code for summarizing transcripts',\n", " 'object_type': 'notebook',\n", - " 'url': 'https://www.expectedparrot.com/content/a5cd8b20-b4d6-4856-95a9-90076ec36682',\n", - " 'uuid': 'a5cd8b20-b4d6-4856-95a9-90076ec36682',\n", + " 'url': 'https://www.expectedparrot.com/content/0e6c9fa1-402e-41e7-8730-e35d45284383',\n", + " 'uuid': '0e6c9fa1-402e-41e7-8730-e35d45284383',\n", " 'version': '0.1.33.dev1',\n", " 'visibility': 'public'}" ] }, - "execution_count": 30, + "execution_count": 31, "metadata": {}, "output_type": "execute_result" } @@ -1370,8 +1420,8 @@ }, { "cell_type": "code", - "execution_count": 31, - "id": "ed6d3cec-5f50-4f6c-a53c-76a912323396", + "execution_count": 32, + "id": "2b84bbe9-7b7f-44b2-82bc-fcfee18d4c7f", "metadata": { "editable": true, "slideshow": { @@ -1388,8 +1438,8 @@ }, { "cell_type": "code", - "execution_count": 32, - "id": "f6745835-c228-432d-8f06-2b8d3e4b3c25", + "execution_count": 33, + "id": "9bc2562a-b1be-445c-90ab-d1af67c7723e", "metadata": { "editable": true, "slideshow": { @@ -1406,13 +1456,13 @@ "{'status': 'success'}" ] }, - "execution_count": 32, + "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "n.patch(uuid = \"a5cd8b20-b4d6-4856-95a9-90076ec36682\", value = n)" + "n.patch(uuid = \"0e6c9fa1-402e-41e7-8730-e35d45284383\", value = n)" ] } ], diff --git a/docs/prompts.rst b/docs/prompts.rst index bc1df6ef..45e74829 100644 --- a/docs/prompts.rst +++ b/docs/prompts.rst @@ -5,242 +5,173 @@ Prompts Overview -------- + Prompts are texts that are sent to a language model in order to guide it on how to generate responses to questions. -They consist of `agent instructions` and `question instructions`, and can include questions, instructions or any other text to be displayed to the language model. +Agent instructions are contained in a `system_prompt` and question instructions are contained in a `user_prompt`. +These texts can include questions, instructions or any other text to be displayed to the language model. Typically, prompts are created using the `Prompt` class, a subclass of the `PromptBase` class which is an abstract class that defines the basic structure of a prompt. Default prompts are provided in the `edsl.prompts.library` module. These prompts can be used as is or customized to suit specific requirements by creating new classes that inherit from the `Prompt` class. - -Default prompts -^^^^^^^^^^^^^^^ -The `edsl.prompts.library` module contains default prompts for agent instructions and question instructions (shown below). -If custom prompts are not specified, the default prompts used to generate results can be readily inspected by selecting the **prompt** columns in the results. -For example, we can inspect the prompts for the sample results generated in the `edsl.results` section: - -.. code-block:: python - - results.select("prompt.*").print(pretty_labels={ - "prompt.tomorrow_user_prompt": "Tomorrow: question instruction", - "prompt.tomorrow_system_prompt": "Tomorrow: agent instruction", - "prompt.yesterday_user_prompt": "Yesterday: question instruction", - "prompt.yesterday_system_prompt": "Yesterday: agent instruction" - }) - -.. code-block:: text - - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ Yesterday: question ┃ Tomorrow: agent ┃ Yesterday: agent ┃ Tomorrow: question ┃ - ┃ instruction ┃ instruction ┃ instruction ┃ instruction ┃ - ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ - │ {'text': 'You are being │ {'text': "You are │ {'text': "You are │ {'text': 'You are being │ - │ asked the following │ answering questions as if │ answering questions as if │ asked the following │ - │ question: How did you feel │ you were a human. Do not │ you were a human. Do not │ question: How do you │ - │ yesterday morning?\nThe │ break character. You are │ break character. You are │ expect to feel tomorrow │ - │ options are\n\n0: │ an agent with the │ an agent with the │ morning?\nReturn a valid │ - │ Good\n\n1: OK\n\n2: │ following │ following │ JSON formatted like │ - │ Terrible\n\nReturn a valid │ persona:\n{'status': │ persona:\n{'status': │ this:\n{"answer": ""}', 'class_name': │ - │ of the option:\n{"answer": │ │ │ 'FreeText'} │ - │ , │ │ │ │ - │ "comment": ""}\nOnly │ │ │ │ - │ 1 option may be │ │ │ │ - │ selected.', 'class_name': │ │ │ │ - │ 'MultipleChoiceTurbo'} │ │ │ │ - ├────────────────────────────┼───────────────────────────┼────────────────────────────┼───────────────────────────┤ - - ... +Note: If an `Agent` is not used with a survey the `system_prompt` base text is not sent to the model. Showing prompts ^^^^^^^^^^^^^^^ -Before you run a survey, EDSL creates a `Jobs` object. You can see the prompts it will use by calling `prompts()` on the `Jobs` object. + +Before a survey is run, EDSL creates a `Jobs` object. +You can see the prompts it will use by calling `prompts()` on it. For example: .. code-block:: python - from edsl import Model, Survey - j = Survey.example().by(Model()) - j.prompts().print() - -This will display the prompts that will be used in the survey: + from edsl import Survey, Agent, Model -.. code-block:: text + survey = Survey.example() + agent = Agent(traits = {"persona": "School teacher"}) + model = Model() # default model - ┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ interview_index ┃ question_index ┃ user_prompt ┃ scenario_index ┃ system_prompt ┃ - ┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩ - │ 0 │ q0 │ You are being asked the │ Scenario Attributes │ You are answering │ - │ │ │ following question: Do │ ┏━━━━━━━━━━━━┳━━━━━━━┓ │ questions as if you were │ - │ │ │ you like school? │ ┃ Attribute ┃ Value ┃ │ a human. Do not break │ - │ │ │ The options are │ ┡━━━━━━━━━━━━╇━━━━━━━┩ │ character. You are an │ - │ │ │ │ │ data │ {} │ │ agent with the following │ - │ │ │ 0: yes │ │ name │ None │ │ persona: │ - │ │ │ │ │ _has_image │ False │ │ {} │ - │ │ │ 1: no │ └────────────┴───────┘ │ │ - │ │ │ │ │ │ - │ │ │ Return a valid JSON │ │ │ - │ │ │ formatted like this, │ │ │ - │ │ │ selecting only the │ │ │ - │ │ │ number of the option: │ │ │ - │ │ │ {"answer": , "comment": │ │ │ - │ │ │ ""} │ │ │ - │ │ │ Only 1 option may be │ │ │ - │ │ │ selected. │ │ │ - ├─────────────────┼────────────────┼──────────────────────────┼────────────────────────┼──────────────────────────┤ - │ 0 │ q1 │ You are being asked the │ Scenario Attributes │ You are answering │ - │ │ │ following question: Why │ ┏━━━━━━━━━━━━┳━━━━━━━┓ │ questions as if you were │ - │ │ │ not? │ ┃ Attribute ┃ Value ┃ │ a human. Do not break │ - │ │ │ The options are │ ┡━━━━━━━━━━━━╇━━━━━━━┩ │ character. You are an │ - │ │ │ │ │ data │ {} │ │ agent with the following │ - │ │ │ 0: killer bees in │ │ name │ None │ │ persona: │ - │ │ │ cafeteria │ │ _has_image │ False │ │ {} │ - │ │ │ │ └────────────┴───────┘ │ │ - │ │ │ 1: other │ │ │ - │ │ │ │ │ │ - │ │ │ Return a valid JSON │ │ │ - │ │ │ formatted like this, │ │ │ - │ │ │ selecting only the │ │ │ - │ │ │ number of the option: │ │ │ - │ │ │ {"answer": , "comment": │ │ │ - │ │ │ ""} │ │ │ - │ │ │ Only 1 option may be │ │ │ - │ │ │ selected. │ │ │ - ├─────────────────┼────────────────┼──────────────────────────┼────────────────────────┼──────────────────────────┤ - │ 0 │ q2 │ You are being asked the │ Scenario Attributes │ You are answering │ - │ │ │ following question: Why? │ ┏━━━━━━━━━━━━┳━━━━━━━┓ │ questions as if you were │ - │ │ │ The options are │ ┃ Attribute ┃ Value ┃ │ a human. Do not break │ - │ │ │ │ ┡━━━━━━━━━━━━╇━━━━━━━┩ │ character. You are an │ - │ │ │ 0: **lack*** of killer │ │ data │ {} │ │ agent with the following │ - │ │ │ bees in cafeteria │ │ name │ None │ │ persona: │ - │ │ │ │ │ _has_image │ False │ │ {} │ - │ │ │ 1: other │ └────────────┴───────┘ │ │ - │ │ │ │ │ │ - │ │ │ Return a valid JSON │ │ │ - │ │ │ formatted like this, │ │ │ - │ │ │ selecting only the │ │ │ - │ │ │ number of the option: │ │ │ - │ │ │ {"answer": , "comment": │ │ │ - │ │ │ ""} │ │ │ - │ │ │ Only 1 option may be │ │ │ - │ │ │ selected. │ │ │ - └─────────────────┴────────────────┴──────────────────────────┴────────────────────────┴──────────────────────────┘ - - -Agent instructions -^^^^^^^^^^^^^^^^^^ -The `AgentInstruction` class provides guidance to a language model on how an agent should be represented. -As shown in the example above, the default agent instructions are: + job = survey.by(agent).by(model) # Creating a job for the example survey using the agent and the default model -.. code-block:: python + job.prompts().print(format="rich") - class AgentInstruction(PromptBase): - \"\"\"Agent instructions for a human agent.\"\"\" - model = LanguageModelType.GPT_3_5_Turbo.value - component_type = ComponentTypes.AGENT_INSTRUCTIONS - default_instructions = textwrap.dedent( - \"\"\"\ - You are playing the role of a human answering survey questions. - Do not break character. - \"\"\" - ) +This will display the prompts that will be used when the survey is run: +.. code-block:: text -Question instructions -^^^^^^^^^^^^^^^^^^^^^ -The `QuestionInstruction` class provides guidance to a language model on how a question should be answered. -As shown in the example above, the following question instructions are: + ┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + ┃ interview_index ┃ question_index ┃ user_prompt ┃ scenario_index ┃ system_prompt ┃ + ┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ + │ 0 │ q0 │ │ Scenario Attributes │ You are answering │ + │ │ │ Do you like school? │ ┏━━━━━━━━━━━┳━━━━━━━┓ │ questions as if you were │ + │ │ │ │ ┃ Attribute ┃ Value ┃ │ a human. Do not break │ + │ │ │ │ ┡━━━━━━━━━━━╇━━━━━━━┩ │ character. You are an │ + │ │ │ yes │ │ data │ {} │ │ agent with the following │ + │ │ │ │ │ name │ None │ │ persona: │ + │ │ │ no │ └───────────┴───────┘ │ {'persona': 'School │ + │ │ │ │ │ teacher'} │ + │ │ │ │ │ │ + │ │ │ Only 1 option may be │ │ │ + │ │ │ selected. │ │ │ + │ │ │ │ │ │ + │ │ │ Respond only with a │ │ │ + │ │ │ string corresponding to │ │ │ + │ │ │ one of the options. │ │ │ + │ │ │ │ │ │ + │ │ │ │ │ │ + │ │ │ After the answer, you │ │ │ + │ │ │ can put a comment │ │ │ + │ │ │ explaining why you chose │ │ │ + │ │ │ that option on the next │ │ │ + │ │ │ line. │ │ │ + ├─────────────────┼────────────────┼──────────────────────────┼───────────────────────┼───────────────────────────┤ + │ 0 │ q1 │ │ Scenario Attributes │ You are answering │ + │ │ │ Why not? │ ┏━━━━━━━━━━━┳━━━━━━━┓ │ questions as if you were │ + │ │ │ │ ┃ Attribute ┃ Value ┃ │ a human. Do not break │ + │ │ │ │ ┡━━━━━━━━━━━╇━━━━━━━┩ │ character. You are an │ + │ │ │ killer bees in cafeteria │ │ data │ {} │ │ agent with the following │ + │ │ │ │ │ name │ None │ │ persona: │ + │ │ │ other │ └───────────┴───────┘ │ {'persona': 'School │ + │ │ │ │ │ teacher'} │ + │ │ │ │ │ │ + │ │ │ Only 1 option may be │ │ │ + │ │ │ selected. │ │ │ + │ │ │ │ │ │ + │ │ │ Respond only with a │ │ │ + │ │ │ string corresponding to │ │ │ + │ │ │ one of the options. │ │ │ + │ │ │ │ │ │ + │ │ │ │ │ │ + │ │ │ After the answer, you │ │ │ + │ │ │ can put a comment │ │ │ + │ │ │ explaining why you chose │ │ │ + │ │ │ that option on the next │ │ │ + │ │ │ line. │ │ │ + ├─────────────────┼────────────────┼──────────────────────────┼───────────────────────┼───────────────────────────┤ + │ 0 │ q2 │ │ Scenario Attributes │ You are answering │ + │ │ │ Why? │ ┏━━━━━━━━━━━┳━━━━━━━┓ │ questions as if you were │ + │ │ │ │ ┃ Attribute ┃ Value ┃ │ a human. Do not break │ + │ │ │ │ ┡━━━━━━━━━━━╇━━━━━━━┩ │ character. You are an │ + │ │ │ **lack*** of killer bees │ │ data │ {} │ │ agent with the following │ + │ │ │ in cafeteria │ │ name │ None │ │ persona: │ + │ │ │ │ └───────────┴───────┘ │ {'persona': 'School │ + │ │ │ other │ │ teacher'} │ + │ │ │ │ │ │ + │ │ │ │ │ │ + │ │ │ Only 1 option may be │ │ │ + │ │ │ selected. │ │ │ + │ │ │ │ │ │ + │ │ │ Respond only with a │ │ │ + │ │ │ string corresponding to │ │ │ + │ │ │ one of the options. │ │ │ + │ │ │ │ │ │ + │ │ │ │ │ │ + │ │ │ After the answer, you │ │ │ + │ │ │ can put a comment │ │ │ + │ │ │ explaining why you chose │ │ │ + │ │ │ that option on the next │ │ │ + │ │ │ line. │ │ │ + └─────────────────┴────────────────┴──────────────────────────┴───────────────────────┴───────────────────────────┘ + + +After we run the survey, we can verify the prompts that were used by inspecting the `prompt.*` fields of the results: .. code-block:: python - class QuestionInstruction(PromptBase): - \"\"\"Question instructions for a multiple choice question.\"\"\" - - model = LanguageModelType.GPT_3_5_Turbo.value - component_type = ComponentTypes.QUESTION_INSTRUCTIONS - default_instructions = textwrap.dedent( - \"\"\"\ - You are answering a multiple choice question. - \"\"\" - ) - - -Customizing prompts -^^^^^^^^^^^^^^^^^^^ -We can customize prompts by creating new classes that inherit from the `Prompt` class. -For example, consider the following custom agent instructions: + results = job.run() # This is equivalent to: results = survey.by(agent).by(model).run() -.. code-block:: python + # To select all the `prompt` columns at once: + # results.select("prompt.*").print(format="rich") - applicable_prompts = get_classes( - component_type="agent_instructions", - model=self.model.model, + # Or to specify the order in the table we can name them individually: + ( + results.select( + "q0_system_prompt", "q0_user_prompt", + "q1_system_prompt", "q1_user_prompt", + "q2_system_prompt", "q2_user_prompt" + ) + .print(format="rich") ) +Output: -Prompt class ------------- - -.. automodule:: edsl.prompts.Prompt - :members: - :undoc-members: - :show-inheritance: - - -Agent Instructions ------------------- - -.. automodule:: edsl.prompts.library.agent_instructions - :members: - :undoc-members: - :show-inheritance: - -.. automodule:: edsl.prompts.library.agent_persona - :members: - :undoc-members: - :show-inheritance: - - -Question Instructions ---------------------- - -.. automodule:: edsl.prompts.library.question_multiple_choice - :members: - :undoc-members: - :show-inheritance: - -.. automodule:: edsl.prompts.library.question_numerical - :members: - :undoc-members: - :show-inheritance: - -.. automodule:: edsl.prompts.library.question_budget - :members: - :undoc-members: - :show-inheritance: - -.. automodule:: edsl.prompts.library.question_freetext - :members: - :undoc-members: - :show-inheritance: - +.. code-block:: text -QuestionInstructionBase class ------------------------------ + ┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ + ┃ prompt ┃ prompt ┃ prompt ┃ prompt ┃ prompt ┃ prompt ┃ + ┃ .q0_system_prom… ┃ .q0_user_prompt ┃ .q1_system_prom… ┃ .q1_user_prompt ┃ .q2_system_prom… ┃ .q2_user_prompt ┃ + ┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ + │ You are │ │ You are │ │ You are │ │ + │ answering │ Do you like │ answering │ Why not? │ answering │ Why? │ + │ questions as if │ school? │ questions as if │ │ questions as if │ │ + │ you were a │ │ you were a │ │ you were a │ │ + │ human. Do not │ │ human. Do not │ killer bees in │ human. Do not │ **lack*** of │ + │ break character. │ yes │ break character. │ cafeteria │ break character. │ killer bees in │ + │ You are an agent │ │ You are an agent │ │ You are an agent │ cafeteria │ + │ with the │ no │ with the │ other │ with the │ │ + │ following │ │ following │ │ following │ other │ + │ persona: │ │ persona: │ │ persona: │ │ + │ {'persona': │ Only 1 option │ {'persona': │ Only 1 option │ {'persona': │ │ + │ 'School │ may be selected. │ 'School │ may be selected. │ 'School │ Only 1 option │ + │ teacher'} │ │ teacher'} │ │ teacher'} │ may be selected. │ + │ │ Respond only │ │ Respond only │ │ │ + │ │ with a string │ │ with a string │ │ Respond only │ + │ │ corresponding to │ │ corresponding to │ │ with a string │ + │ │ one of the │ │ one of the │ │ corresponding to │ + │ │ options. │ │ options. │ │ one of the │ + │ │ │ │ │ │ options. │ + │ │ │ │ │ │ │ + │ │ After the │ │ After the │ │ │ + │ │ answer, you can │ │ answer, you can │ │ After the │ + │ │ put a comment │ │ put a comment │ │ answer, you can │ + │ │ explaining why │ │ explaining why │ │ put a comment │ + │ │ you chose that │ │ you chose that │ │ explaining why │ + │ │ option on the │ │ option on the │ │ you chose that │ + │ │ next line. │ │ next line. │ │ option on the │ + │ │ │ │ │ │ next line. │ + └──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┘ -.. automodule:: edsl.prompts.QuestionInstructionBase - :members: - :undoc-members: - :show-inheritance: \ No newline at end of file diff --git a/docs/questions.rst b/docs/questions.rst index aef0baf3..8db60735 100644 --- a/docs/questions.rst +++ b/docs/questions.rst @@ -211,18 +211,19 @@ We can combine multiple questions into a survey by passing them as a list to a ` .. code-block:: python - from edsl import QuestionLinearScale, QuestionFreeText, QuestionNumerical, Survey + from edsl import QuestionLinearScale, QuestionList, QuestionNumerical, Survey q1 = QuestionLinearScale( - question_name = "likely_to_vote", - question_text = "On a scale from 1 to 5, how likely are you to vote in the upcoming U.S. election?", + question_name = "dc_state", + question_text = "How likely is Washington, D.C. to become a U.S. state?", question_options = [1, 2, 3, 4, 5], option_labels = {1: "Not at all likely", 5: "Very likely"} ) - q2 = QuestionFreeText( - question_name = "largest_us_city", - question_text = "What is the largest U.S. city?" + q2 = QuestionList( + question_name = "largest_us_cities", + question_text = "What are the largest U.S. cities by population?", + max_list_items = 3 ) q3 = QuestionNumerical( @@ -232,8 +233,6 @@ We can combine multiple questions into a survey by passing them as a list to a ` survey = Survey(questions = [q1, q2, q3]) - results = survey.run() - This allows us to administer multiple questions at once, either asynchronously (by default) or according to specified logic (e.g., skip or stop rules). To learn more about designing surveys with conditional logic, please see the :ref:`surveys` section. @@ -247,29 +246,37 @@ This is done by calling the `run` method for the question: .. code-block:: python + from edsl import QuestionCheckBox + + q = QuestionCheckBox( + question_name = "primary_colors", + question_text = "Which of the following colors are primary?", + question_options = ["Red", "Orange", "Yellow", "Green", "Blue", "Purple"] + ) + results = q.run() This will generate a `Results` object that contains a single `Result` representing the response to the question and information about the model used. -If the model to be used has not been specified (as in the above example), the `run` method delivers the question to the default LLM (GPT 4). +If the model to be used has not been specified (as in the above example), the `run` method delivers the question to the default LLM (run `Model()` to check the current default LLM). We can inspect the response and model used by calling the `select` and `print` methods on the components of the results that we want to display. For example, we can print just the `answer` to the question: .. code-block:: python - results.select("answer.favorite_primary_color").print(format="rich") + results.select("primary_colors").print(format="rich") Output: .. code-block:: text - ┏━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ answer ┃ - ┃ .favorite_primary_color ┃ - ┡━━━━━━━━━━━━━━━━━━━━━━━━━┩ - │ blue │ - └─────────────────────────┘ + ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + ┃ answer ┃ + ┃ .primary_colors ┃ + ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ + │ ['Red', 'Yellow', 'Blue'] │ + └───────────────────────────┘ Or to inspect the model: @@ -283,18 +290,40 @@ Output: .. code-block:: text - ┏━━━━━━━━━━━━━━━━━━━━┓ - ┃ model ┃ - ┃ .model ┃ - ┡━━━━━━━━━━━━━━━━━━━━┩ - │ gpt-4-1106-preview │ - └────────────────────┘ + ┏━━━━━━━━┓ + ┃ model ┃ + ┃ .model ┃ + ┡━━━━━━━━┩ + │ gpt-4o │ + └────────┘ If questions have been combined in a survey, the `run` method is called directly on the survey instead: .. code-block:: python + from edsl import QuestionLinearScale, QuestionList, QuestionNumerical, Survey + + q1 = QuestionLinearScale( + question_name = "dc_state", + question_text = "How likely is Washington, D.C. to become a U.S. state?", + question_options = [1, 2, 3, 4, 5], + option_labels = {1: "Not at all likely", 5: "Very likely"} + ) + + q2 = QuestionList( + question_name = "largest_us_cities", + question_text = "What are the largest U.S. cities by population?", + max_list_items = 3 + ) + + q3 = QuestionNumerical( + question_name = "us_pop", + question_text = "What was the U.S. population in 2020?" + ) + + survey = Survey(questions = [q1, q2, q3]) + results = survey.run() results.select("answer.*").print(format="rich") @@ -304,12 +333,12 @@ Output: .. code-block:: text - ┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ - ┃ answer ┃ answer ┃ answer ┃ - ┃ .likely_to_vote ┃ .largest_us_city ┃ .us_pop ┃ - ┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ - │ 4 │ The largest U.S. city by population is New York City. │ 331449281 │ - └─────────────────┴───────────────────────────────────────────────────────┴───────────┘ + ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓ + ┃ answer ┃ answer ┃ answer ┃ + ┃ .largest_us_cities ┃ .dc_state ┃ .us_pop ┃ + ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩ + │ ['New York', 'Los Angeles', 'Chicago'] │ 2 │ 331449281 │ + └────────────────────────────────────────┴───────────┴───────────┘ For a survey, each `Result` represents a response for the set of survey questions. @@ -474,7 +503,8 @@ To learn more about designing agents, please see the :ref:`agents` section. Specifying language models -------------------------- -In the above examples we did not specify a language model for the question or survey, so the default model (GPT 4) was used. + +In the above examples we did not specify a language model for the question or survey, so the default model was used (run `Model()` to check the current default model). Similar to the way that we optionally passed scenarios to a question and added AI agents, we can also use the `by` method to specify one or more LLMs to use in generating results. This is done by creating `Model` objects for desired models and optionally specifying model parameters, such as temperature. @@ -486,6 +516,7 @@ To check available models: Model.available() + This will return a list of names of models that we can choose from. We can also check the models for which we have already added API keys: @@ -494,6 +525,7 @@ We can also check the models for which we have already added API keys: Model.check_models() + See instructions on storing :ref:`api_keys` for the models that you want to use, or activating :ref:`remote_inference` to use the Expected Parrot server to access available models. To specify models for a survey we first create `Model` objects: @@ -503,7 +535,7 @@ To specify models for a survey we first create `Model` objects: from edsl import ModelList, Model models = ModelList( - Model(m) for m in ['claude-3-opus-20240229', 'llama-2-70b-chat-hf'] + Model(m) for m in ['gpt-4o', 'gemini-1.5-pro'] ) @@ -573,7 +605,7 @@ An example can also created using the `example` method: :show-inheritance: :special-members: __init__ :exclude-members: purpose, question_type, question_options, main - + QuestionCheckBox class ^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/scenarios.rst b/docs/scenarios.rst index b755e03c..61f21f91 100644 --- a/docs/scenarios.rst +++ b/docs/scenarios.rst @@ -568,8 +568,8 @@ We can add the key to questions as we do scenarios from other data sources: from edsl import Model, QuestionFreeText, QuestionList, Survey - m = Model("gpt-4o") # This is the default model; we specify it for demonstration purposes to highlight that a vision model is needed - + m = Model("gpt-4o") + q1 = QuestionFreeText( question_name = "identify", question_text = "What animal is in this picture: {{ logo }}" # The scenario key is the filepath diff --git a/docs/token_usage.rst b/docs/token_usage.rst index e0fefeb0..fe2b6325 100644 --- a/docs/token_usage.rst +++ b/docs/token_usage.rst @@ -157,24 +157,28 @@ For example: results = q.by(s).run() - results.select("number_1", "number_2", "sum").print(format="rich") + +We can check the responses and also confirm that the `comment` is `None`: + + results.select("number_1", "number_2", "sum", "sum_comment").print(format="rich") Output: .. code-block:: text - ┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┓ - ┃ scenario ┃ scenario ┃ answer ┃ - ┃ .number_1 ┃ .number_2 ┃ .sum ┃ - ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━┩ - │ 0 │ 5 │ 5 │ - ├───────────┼───────────┼────────┤ - │ 1 │ 4 │ 5 │ - ├───────────┼───────────┼────────┤ - │ 2 │ 3 │ 5 │ - ├───────────┼───────────┼────────┤ - │ 3 │ 2 │ 5 │ - ├───────────┼───────────┼────────┤ - │ 4 │ 1 │ 5 │ - └───────────┴───────────┴────────┘ \ No newline at end of file + ┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┓ + ┃ scenario ┃ scenario ┃ answer ┃ comment ┃ + ┃ .number_1 ┃ .number_2 ┃ .sum ┃ .sum_comment ┃ + ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━┩ + │ 0 │ 5 │ 5 │ None │ + ├───────────┼───────────┼────────┼──────────────┤ + │ 1 │ 4 │ 5 │ None │ + ├───────────┼───────────┼────────┼──────────────┤ + │ 2 │ 3 │ 5 │ None │ + ├───────────┼───────────┼────────┼──────────────┤ + │ 3 │ 2 │ 5 │ None │ + ├───────────┼───────────┼────────┼──────────────┤ + │ 4 │ 1 │ 5 │ None │ + └───────────┴───────────┴────────┴──────────────┘ +