Skip to content

Latest commit

 

History

History
510 lines (396 loc) · 16.7 KB

03-详解模板.md

File metadata and controls

510 lines (396 loc) · 16.7 KB

什么是提示模板

提示模板是指生成提示的可在现的方法。它包含一个文本字符串(”模板“),该字符串可以从最终用户中获取一组参数并生成提示。

所以,模板由三部分组成:

  • 给语言模型的指令
  • 一些例子
  • 对语言模型的提问

下面给出一个例子代码:

from langchian import PromptTemplate

template = ""
 I want you to act as a naming consultant for new companies.
 Here are some examples of good company names:

 - search engine,Google
 - social media,Facebook
 - video sharing,Youtube

 The name should be short, catchy and easy to remember.

 What is a good name for a company that make {product}?

 prompt = PromptTemplate(
    input_variables = ["product"],
    template = template,
 )


模板的简单使用

## 没有参数
no_input_prompt = PromptTemplate(
    input_variables =[],
    template = "Tell me a joke.")
print(no_input_prompt.format())

## 一个参数
one_input_prompt = PromptTemplate(
    input_variables = ["adjective"],
    template = "Tell me a {adjective} joke."
)
print(one_input_prompt.format(adjective="funny"))

## 多个参数
multiple_input_prompt = PromptTemplate(
    input_variables = ["adjective","content"],
    template = "Tell me a {adjective} joke about {content}."
)
print(multiple_input_prompt.format(adjective = "funny",content = "chickens"))

从LangChainHub载入模板

from langchain.prompts import load_prompt
prompt = load.prompt("lc://prompts/conversation/prompt.json")
prompt.format(history="", input="What is 1 + 1?")

将一些examples放入prompt template

## First, create the list of fewshot example
    examples = [
        {"word": "happy", "antonym": "sad"},
        {"word": "tall", "antonym": "short"},
    ]

    ## Next, specify the template to format the example we have provided
    ## we can use the prompt Template class for this

    example_formatter_template="""
    Word:{word}
    Antonym:{antonym}
    """

    example_prompt = PromptTemplate(
        input_variables= ["word","antonym"],
        template = example_formatter_template,
    )

    ## Finally, we create the fewShotPromptTemplate
    few_shot_prompt = FewShotPromptTemplate(
        examples = examples,
        example_prompt = example_prompt,
        # this prefix is some text that goes before the examples in the prompt.
        prefix = "Give the antonym of every input",
        # the suffix is some text that goes after the examples in the prompt.
        suffix = "Word:{input} \n Antonym:",
        input_variables = ["input"],
        example_separator = "\n\n",
    )

    print(few_shot_prompt.format(input="big"))

随机选择例子

from langchain.prompts.example_selector import LengthBasedExampleSelector

    examples = [
        {"word": "happy", "antonym": "sad"},
        {"word": "tall", "antonym": "short"},
        {"word": "energetic", "antonym": "lethargic"},
        {"word": "sunny", "antonym": "gloomy"},
        {"word": "windy", "antonym": "calm"},
    ]

    example_formatter_template = """
    Word: {word}
    Antonym: {antonym}\n
    """
    example_prompt = PromptTemplate(
        input_variables=["word", "antonym"],
        template=example_formatter_template,
    )
    example_selector = LengthBasedExampleSelector(
        # These are the examples is has available to choose from.
        examples=examples,
        # This is the PromptTemplate being used to format the examples.
        example_prompt=example_prompt,
        # This is the maximum length that the formatted examples should be.
        # Length is measured by the get_text_length function below.
        max_length=25,
    )

    dynamic_prompt = FewShotPromptTemplate(
        # We provide an ExampleSelector instead of examples.
        example_selector=example_selector,
        example_prompt=example_prompt,
        prefix="Give the antonym of every input",
        suffix="Word: {input}\nAntonym:",
        input_variables=["input"],
        example_separator="\n",
    )

    print(dynamic_prompt.format(input="big"))

自定义模板

为了自定义模板,有两条要求:

  • 要有input_variables属性,暴露了提示模板所期望的输入变量
  • 暴露一个格式化方法,该方法接收与预期输入变量对应的关键字参数,并返回格式化的提示。

下面,创建了一个自定义的提示模板,将函数名称作为输入,并将提示格式化以提供函数的源代码。为了达到这个目的,我们首先创建一个函数。它将返回给定的函数的源代码。

import inspect
from langchain.prompts import StringPromptTemplate
from pydantic import BaseModel, validator

def get_source_code(function_name):
    return inspect.getsource(function_name)

class FunctionExplainerPromptTemplate(StringPromptTemplate,BaseModel):
    @validator("input_variables")
    def validate_input_variales(cls,v):
        if len(v)!=1 or "function_name" not in v:
            raise ValueError("function_name must be the only input_variable.")
        return v
    def format(self,**kwargs) -> str:
        source_code = get_source_code(kwargs["function_name"])

        ## generate the prompt to be sent to the language model
        prompt = f"""
        Given the function name and source code, generate an English language explanation of the function.
        Function Name: {kwargs["function_name"].__name__}
        Source Code:
        {source_code}
        Explanation:
        """

        return prompt
    def _prompt_type(self):
        return "function-explainer"

if __name__ == '__main__':
    fn_explainer = FunctionExplainerPromptTemplate(
        input_varibales = ["function_name"]
    )

    prompt = fn_explainer.format(function_name=get_source_code)
    print(prompt)

自定义例子选择器

一个例子选择器必须实现两个方法:

  • 一个add_example方法,它接收一个例子并将其调价到ExampleSelector中
  • 一个select_example方法,该方法接收输入变量,并返回一个例子列表

下面用一个例子说明:


## 创建

from langchain.prompts.example_selector.base import BaseExampleSelector
from typing import Dict,List
import numpy as np

class CustomExampleSelector(BaseExampleSelector):
    def __init__(self,examples:List[Dict[str,str]]):
        self.examples = examples

    def add_example(self, example: Dict[str, str]) -> None:
        self.examples.append(example)

    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        return np.random.choice(self.examples, size=2, replace=False)

## 使用
examples = [
        {"foo": "1"},
        {"foo": "2"},
        {"foo": "3"}
    ]

example_selector = CustomExampleSelector(examples)

# Select examples
example_selector.select_examples({"foo": "foo"})


# Add new example to the set of examples
example_selector.add_example({"foo": "4"})
example_selector.examples
# -> [{'foo': '1'}, {'foo': '2'}, {'foo': '3'}, {'foo': '4'}]

# Select examples
example_selector.select_examples({"foo": "foo"})
# -> array([{'foo': '1'}, {'foo': '4'}], dtype=object)

使用Fewshot的流程

  1. 定义shot的格式
  2. 定义example_prompt的格式
  3. 定义ExampleSelector
  4. 将exampleselector 传给 FewShotPromptTemplate

下面是一个例子:

from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

example = [
    {
        "question": "Who lived longer, Muhammad Ali or Alan Turing?",
        "answer":
        """
        Are follow up questions needed here: Yes.
        Follow up: How old was Muhammad Ali when he died?
        Intermediate answer: Muhammad Ali was 74 years old when he died.
        Follow up: How old was Alan Turing when he died?
        Intermediate answer: Alan Turing was 41 years old when he died.
        So the final answer is: Muhammad Ali
        """
    },
    {
        "question": "When was the founder of craigslist born?",
        "answer":
        """
        Are follow up questions needed here: Yes.
        Follow up: Who was the founder of craigslist?
        Intermediate answer: Craigslist was founded by Craig Newmark.
        Follow up: When was Craig Newmark born?
        Intermediate answer: Craig Newmark was born on December 6, 1952.
        So the final answer is: December 6, 1952
        """
    },
    {
        "question": "Who was the maternal grandfather of George Washington?",
        "answer":
        """
        Are follow up questions needed here: Yes.
        Follow up: Who was the mother of George Washington?
        Intermediate answer: The mother of George Washington was Mary Ball Washington.
        Follow up: Who was the father of Mary Ball Washington?
        Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
        So the final answer is: Joseph Ball
        """
    },
    {
        "question": "Are both the directors of Jaws and Casino Royale from the same country?",
        "answer":
        """
        Are follow up questions needed here: Yes.
        Follow up: Who is the director of Jaws?
        Intermediate Answer: The director of Jaws is Steven Spielberg.
        Follow up: Where is Steven Spielberg from?
        Intermediate Answer: The United States.
        Follow up: Who is the director of Casino Royale?
        Intermediate Answer: The director of Casino Royale is Martin Campbell.
        Follow up: Where is Martin Campbell from?
        Intermediate Answer: New Zealand.
        So the final answer is: No
        """
    }
]
example_prompt = PromptTemplate(
    input_variables = ["question","answer"],
    template = "Question:{question}\n{answer}"
)

example_selector = SemanticSimilarityExampleSelector.from_examples(
    example,
    OpenAIEmbeddings(),
    Chroma,
    k=1
)

prompt = FewShotPromptTemplate(
    example_selector = example_selector,
    example_prompt = example_prompt,
    suffix = "Question:{input}",
    input_variables = ["input"]
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))

提示模板序列化

通常情况下,最好不要把提示信息作为python代码来存储,而是作为文件来存储。这可以使分享、存储和版本提示变得容易。

在高层次上,以下设计原则适用于序列化:

  1. 同时支持JSON和YAML。
  2. 支持在一个文件中指定所有内容,或者将不同的组件存储在不同的文件中。

普通提示模板

新建一个文件simple_prompt.yaml,放置提示模板的信息

_type: prompt
input_variables:
    ["adjective", "content"]
template:
    Tell me a {adjective} joke about {content}.

然后python代码调用就行

from langchain.prompts import load_prompt

if __name__=="__main__":
    prompt = load_prompt("simple_prompt.yaml")
    print(prompt.format(adjective="funny"))

fewshot提示模板

新建一个文件examples.json,放置shot

[
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"}
]

新建一个文件few_shot_prompt.yaml,放置提示模板的配置信息

_type: few_shot
input_variables:
    ["adjective"]
prefix:
    Write antonyms for the following words.
example_prompt:
    _type: prompt
    input_variables:
        ["input", "output"]
    template:
        "Input: {input}\nOutput: {output}"
examples:
    examples.json
suffix:
    "Input: {adjective}\nOutput:"

最后python代码调用就行

from langchain.prompts import load_prompt

if __name__=="__main__":
    prompt = load_prompt("few_shot_prompt.yaml")
    print(prompt.format(adjective="funny"))

例子选择器

如果有大量的例子,我们需要选择使用哪些例子来包括在提示中,ExampleSelector是负责做这的类。目前包含5种例子选择器。

如果有大量的例子,需要选择哪些例子来包括在现实中。ExampleSelector是负责这样做的类。其基本接口定义如下:

class BaseExampleSelector(ABC):
    """Interface for selecting examples to include in prompts"""

    @abstractmethod
    def select_examples(self,input_variables:Dict[str,str]) -> List[Dict]:
        """select with examples to use based on the inputs"""

它唯一需要公开的方法是一个select_examples的方法。这个方法接收输入的变量,然后返回一个例子的列表。至于如何选择这些例子,取决于每个具体的实现。

lengthBased ExampleSelector

这个例子选择器是根据长度选择要使用的例子。当你担心构建一个超过上下文窗口长度的提示时,这很有用。对于长的输入,它将选择较少的例子,而对于短的输入,它将选择更多的例子。

Similarity ExampleSelector

这个例子选择其根据哪些例子与输入最相近来选择例子,它通过寻找与输入有最大余弦相似度的嵌入的例子来实现这一点。

Maximal Marginal Relevance

这个例子选择器是根据与输入最相似的例子的组合来选择例子,同时也对多样性进行优化。它通过寻找与输入有最大余弦相似性的嵌入的例子,然后迭代添加他们,同时惩罚他们与已经选择的例子的接近程度。

NGram Overlap ExampleSelector

这个例子选择器是根据ngram重叠分数,根据哪些例子与输入最相似,来选择和排序例子。ngram重叠得分是一个介于0.0和1.0之间的浮点数,包括在内。

该选择器件允许设置一个阈值分数。ngram重叠分数小于或等于阈值的例子被排除。

输出格式化

语言模型输出文本。但是很多时候,我们可能希望得到更多的结构化信息,而不仅仅是文本。这就是输出分析器的作用。

输出解析器是帮助结构语言模型相应的类。有两个主要的方法是输出解析器必须实现的:

  • get_format_instructions() -> str:它返回一个字符串,其中包括语言模型的输出应该如何格式化的说明
  • parser(str)->Any:它接收一个字符串(假定是来自语言模型的相应),并将其解析为某种结构。

还有一个可选的实现方法:

  • parser_with_prompt(str)->Any:它接收一个字符串(假定是来自语言模型的相应)和一个提示(假设是产生这种相应的提示),并将其解析为一些结构。

PydanticOutputParser

这个输出解析器允许用户指定一个任意的JSON模式,并查询LLM的符合该模式的JSON输出。

请记住,大预言模型是有漏洞的抽象概念!我们必须使用有足够能力的LLM来生成良好的JSON。我们必须使用一个有足够能力的LLM来生成格式良好的JSON。在OpenAI家族中,Davinci可以做到,但是Curies的能力已经急剧嘉奖了。

使用Pydantic来生命你的数据模型。Pydantic的BaseModel就像Python的数据类,但是有实际的类型检查+强制力。

from langchain.prompts import PromptTemplate,ChatPromptTemplate,HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel,Field, validator
from typing import List


model_name = 'text-davinci-003'
temperture = 0.0
model = OpenAI(model_name=model_name,temperture=temperture)

# D定义我们期待的数据结构
class Joke(BaseModel):
    setup:str = Field(description="question to set up a joke")
    punchine:str = Field(description="answer to resolve the joke")

    @validator('setup')
    def question_ends_with_question_mark(cls,field):
        if field[-1]!='?':
            raise ValueError("Badly formed question~")
        return field
joke_query = "Tell me a joke."

parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

_input = prompt.format_prompt(query=joke_query)

print(_input)

output = model(_input.to_string())

print(parser.parse(output))

上述只是试图解析LLM相应,如果不能正确解析,就会报错。但是除了抛出错误外,我们还可以做其他的事情。具体来说,我们可以把格式错误的输出,连同格式化的指令,传递给模型,并要求它修复它。

对于这个例子,我们时殷弘上面的OutputParser。下面是如果我们把一个不符号模型的结果传递给它的情况

from langchain.output_parsers impot OutputFixingParser

new _parser = OutputFixingParser.from_llm(parser=parser,llm= ChatOpenAI())

new_parser.parse(misformatted)

使用原始提示修复解析错误

虽然在某些情况下,只看输出就可以修复任何解析错误,但是在其他情况下却不能。