updating docs

expectedparrot · Sep 20, 2024 · c171f79 · c171f79
1 parent d1116a2
commit c171f79
Showing 1 changed file with 134 additions and 30 deletions.
diff --git a/docs/token_usage.rst b/docs/token_usage.rst
@@ -7,8 +7,8 @@ EDSL comes with a variety of features for monitoring token usage.
 These include:
 
 * A method for setting the requests per minute (RPM) and tokens per minute (TPM) for a model that you are using.
-* Features for calculating next token probabilities.
 * Methods for turning off default prompt features to reduce token usage. 
+* Features for calculating next token probabilities.
 
 
 Token limits 
@@ -76,7 +76,6 @@ This will show the following information:
     [100, 2000000]
 
 
-
 Modifying token limits
 ----------------------
 
@@ -118,12 +117,6 @@ Please note that the token limits are subject to the constraints of the model an
 Let us know if you have any questions or need further assistance with token limits. 
 
 
-
-Calculating next token probabilities
-------------------------------------
-
-
-
 Methods for reducing token usage 
 --------------------------------
 
@@ -139,53 +132,62 @@ Question comments can also be useful when used with survey "memory" rules, givin
 (By default, questions are administered asynchronously; a model does not have context of other questions and answers in a survey unless memory rules are applied.)
 Comments can also provide insight into non-responsive (`None`) answers: a model may use the comments field to describe a point of confusion about a question.
 
-Because the question `comment` field requires additional tokens, it can sometimes be cost-effective to exclude the field from question prompts.
+Because the question `comment` field requires additional tokens, it can sometimes be cost-effective to exclude the field from question prompts, especially when the comment is unlikely to be useful.
 This is done by passing a boolean parameter `include_comment = False` when constructing a question. 
-For example:
+For example, here we compare a question with comments left on and turned off:
 
 .. code-block:: python
 
-    from edsl import QuestionNumerical, ScenarioList
+    from edsl import QuestionNumerical, Survey, ScenarioList
 
-    q = QuestionNumerical(
+    q1 = QuestionNumerical(
         question_name = "sum",
+        question_text = "What is the sum of {{ number_1 }} and {{ number_2 }}?"
+    )
+
+    q2 = QuestionNumerical(
+        question_name = "sum_silent",
         question_text = "What is the sum of {{ number_1 }} and {{ number_2 }}?",
         include_comment = False
     )
 
+    survey = Survey([q1, q2])
+
     some_numbers = {
         "number_1": [0,1,2,3,4],
         "number_2": [5,4,3,2,1]
     }
 
     s = ScenarioList.from_nested_dict(some_numbers)
 
-    results = q.by(s).run()
+    results = survey.by(s).run()
 
 
-We can check the responses and also confirm that the `comment` is `None`:
+We can check the responses and confirm that the `comment` field for the `sum_silent` question is `None`:
+
+.. code-block:: python 
 
-    results.select("number_1", "number_2", "sum", "sum_comment").print(format="rich")
+    results.select("number_1", "number_2", "sum", "sum_comment", "sum_silent", "sum_silent_comment").print(format="rich")
 
 
 Output:
 
 .. code-block:: text 
 
-    ┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┓
-    ┃ scenario  ┃ scenario  ┃ answer ┃ comment      ┃
-    ┃ .number_1 ┃ .number_2 ┃ .sum   ┃ .sum_comment ┃
-    ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━┩
-    │ 0         │ 5         │ 5      │ None         │
-    ├───────────┼───────────┼────────┼──────────────┤
-    │ 1         │ 4         │ 5      │ None         │
-    ├───────────┼───────────┼────────┼──────────────┤
-    │ 2         │ 3         │ 5      │ None         │
-    ├───────────┼───────────┼────────┼──────────────┤
-    │ 3         │ 2         │ 5      │ None         │
-    ├───────────┼───────────┼────────┼──────────────┤
-    │ 4         │ 1         │ 5      │ None         │
-    └───────────┴───────────┴────────┴──────────────┘
+    ┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
+    ┃ scenario  ┃ scenario  ┃ answer ┃ comment                  ┃ answer      ┃ comment             ┃
+    ┃ .number_1 ┃ .number_2 ┃ .sum   ┃ .sum_comment             ┃ .sum_silent ┃ .sum_silent_comment ┃
+    ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
+    │ 0         │ 5         │ 5      │ The sum of 0 and 5 is 5. │ 5           │ None                │
+    ├───────────┼───────────┼────────┼──────────────────────────┼─────────────┼─────────────────────┤
+    │ 1         │ 4         │ 5      │ The sum of 1 and 4 is 5. │ 5           │ None                │
+    ├───────────┼───────────┼────────┼──────────────────────────┼─────────────┼─────────────────────┤
+    │ 2         │ 3         │ 5      │ The sum of 2 and 3 is 5. │ 5           │ None                │
+    ├───────────┼───────────┼────────┼──────────────────────────┼─────────────┼─────────────────────┤
+    │ 3         │ 2         │ 5      │ The sum of 3 and 2 is 5. │ 5           │ None                │
+    ├───────────┼───────────┼────────┼──────────────────────────┼─────────────┼─────────────────────┤
+    │ 4         │ 1         │ 5      │ The sum of 4 and 1 is 5. │ 5           │ None                │
+    └───────────┴───────────┴────────┴──────────────────────────┴─────────────┴─────────────────────┘
 
 
 Coding question options 
@@ -325,4 +327,106 @@ No agent instructions
 ^^^^^^^^^^^^^^^^^^^^^
 
 If no agents are used with the survey, the base agent instructions are not sent to the model, reducing overall tokens.
-(This is a change from prior versions of EDSL.)
+(This is a change from prior versions of EDSL.)
+
+
+Calculating next token probabilities
+------------------------------------
+
+We can monitor tokens by calculating next token probabilities. 
+This is done by setting model `logprobs = True` and then accessing the `raw_model_response` information in the results that are generated.
+For example:
+
+.. code-block:: python 
+
+    from edsl import QuestionMultipleChoice, Model
+
+    m = Model("gpt-4o", temperature = 1, logprobs = True)
+
+    q = QuestionMultipleChoice(
+        question_name = "income_pref_coded", 
+        question_text = "Which of the following is more important to you: ", 
+        question_options = ["Financial stability", "Moving up the income ladder"], 
+        use_code = True,
+        include_comment = False
+    )
+
+    results = q.by(m).run()
+
+    example = results.select("raw_model_response.income_pref_coded_raw_model_response").to_list()[0]  
+
+    example
+
+
+Output:
+
+.. code-block:: python 
+
+    {'id': 'chatcmpl-A9cawzuAcQJ2xygIcQziMc4kqR4lp',
+    'choices': [{'finish_reason': 'stop',
+    'index': 0,
+    'logprobs': {'content': [{'token': '0',
+        'bytes': [48],
+        'logprob': -0.00063428195,
+        'top_logprobs': [{'token': '0',
+            'bytes': [48],
+            'logprob': -0.00063428195},
+        {'token': '1', 'bytes': [49], 'logprob': -7.375634},
+        {'token': ' ', 'bytes': [32], 'logprob': -12.250634}]}],
+        'refusal': None},
+    'message': {'content': '0',
+        'refusal': None,
+        'role': 'assistant',
+        'function_call': None,
+        'tool_calls': None}}],
+    'created': 1726856674,
+    'model': 'gpt-4o-2024-05-13',
+    'object': 'chat.completion',
+    'service_tier': None,
+    'system_fingerprint': 'fp_52a7f40b0b',
+    'usage': {'completion_tokens': 1,
+    'prompt_tokens': 66,
+    'total_tokens': 67,
+    'completion_tokens_details': {'reasoning_tokens': 0}}}
+
+
+We can use the information to calculate next token probabilities:
+
+.. code-block:: python 
+        
+    next_token_probs = example['choices'][0]['logprobs']['content'][0]['top_logprobs']
+    next_token_probs
+
+
+Output:
+
+.. code-block:: text 
+
+    [{'token': '0', 'bytes': [48], 'logprob': -0.00063428195},
+    {'token': '1', 'bytes': [49], 'logprob': -7.375634},
+    {'token': ' ', 'bytes': [32], 'logprob': -12.250634}]
+
+
+Translating the information:
+
+.. code-block:: python 
+
+    import math
+
+    # Specifying the codes for the answer options and non-responses:
+    options = {'0': "Financial stability", '1':"Moving up the income ladder", ' ': "Skipped"}
+
+    for token_info in next_token_probs:
+        option = options[token_info['token']]
+        p = math.exp(token_info['logprob'])
+        
+        print(f"Probability of selecting '{option}' was {p:.3f}")
+
+
+Output:
+
+.. code-block:: text 
+
+    Probability of selecting 'Financial stability' was 0.999
+    Probability of selecting 'Moving up the income ladder' was 0.001
+    Probability of selecting 'Skipped' was 0.000