Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Prompt Engineering #82

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions bibliography.bib
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,35 @@ @misc{zhou2022large
primaryClass={cs.LG}
}

@misc{zhang2022tempera,
title={TEMPERA: Test-Time Prompting via Reinforcement Learning},
author={Tianjun Zhang and Xuezhi Wang and Denny Zhou and Dale Schuurmans and Joseph E. Gonzalez},
year={2022},
eprint={2211.11890},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

@misc{deng2022rlprompt,
title={RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning},
author={Mingkai Deng and Jianyu Wang and Cheng-Ping Hsieh and Yihan Wang and Han Guo and Tianmin Shu and Meng Song and Eric P. Xing and Zhiting Hu},
year={2022},
eprint={2205.12548},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

@misc{guo2021efficient,
title={Efficient (Soft) Q-Learning for Text Generation with Limited Good Data},
author={Han Guo and Bowen Tan and Zhengzhong Liu and Eric P. Xing and Zhiting Hu},
year={2021},
eprint={2106.07704},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

% Models

% Language Model Guides

@book{jurafsky2009,
Expand Down
8 changes: 8 additions & 0 deletions docs/automated_pe/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "⚙️ Automated Prompting",
"position": 70,
"link": {
"type": "generated-index",
"description": "Methods that automate prompt engineering"
}
}
47 changes: 47 additions & 0 deletions docs/automated_pe/ape.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
sidebar_position: 1
---

# 🟢 APE

Automatic Prompt Engineering (APE)(@zhou2022large) is an approach to automating the generation and
selection of prompts. The basic idea of APE is to give a LLM a prompt containing
a few shot exemplars, and ask it generate a prompt that would create these exemplars.

## Example

For example, if we give the LLM the following prompt:

```text
Is a banana a fruit?
Yes
Is a tomato a fruit?
No
Is a fish a fruit?
No

What would be a good prompt to generate an answer to the above questions?
```

```text
banana
Yes

tomato
No

fish
No

watermelon
Yes

What would be a good prompt to generate an answer to the above questions?
// highlight-start
Is the following item a fruit:
// highlight-end
```

## Notes

Another simple automatic prompt engineering strategy is to simply give GPT-3 your prompt and ask GPT-3 to improve it.
File renamed without changes.
7 changes: 7 additions & 0 deletions docs/automated_pe/more.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 200
---

# More

Other methods exist, such as Autoprompt(@shin2020autoprompt), which uses gradient based search to build prompts for MLMs.
7 changes: 7 additions & 0 deletions docs/automated_pe/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 0
---

# Overview

Can prompt engineering really be automated? Sometimes.
68 changes: 68 additions & 0 deletions docs/automated_pe/rl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
sidebar_position: 130
---

# 🟣 Reinforcement Learning

This section covers reinforcement learning methods which optimize discrete prompts (not soft prompts). <br/>This is extremely complicated.

## RLPrompt

RLPrompt(@deng2022rlprompt) is a method that takes an input and trains a language model (the policy)
to generate a good prompt for that input.

More formally, given an input sequence $x$, the policy designs a prompt $z$ by selecting $[z_1, z_2, ..., z_T]$ tokens from the vocabulary sequentially.

After creating the prompt, it combines it with $x$, and uses another language model to
generate the completion. The LM output of x prompted by z can be described as $y_{LM}(\hat{z}, x)$.

Then, the policy receives some reward according to this output: $R(y_{LM}(\hat{z}, x))$

### Example

Assuming we have partially trained RLPrompt on classifying movie reviews, and our next
training point example is `x = "I hate this movie."`. RLPrompt will generate a prompt like
`z = "Movie review bad or good:`. Then, it will combine the prompt with the input to get
`x' = "Movie review bad or good: I hate this movie."`. Then, it will use a language model
to generate the completion. Say it generates `bad`. Then, the reward is computed as
`R(y_{LM}(\hat{z}, x))`. Deng et al. do not use a simple 0/1 reward.

## Training

RLPrompt embeds a task specific MLP inside a frozen LM. The MLP is trained with Soft Q Learning(@guo2021efficient).

## TEMPERA

**TE**st-ti**M**e **P**rompt **E**diting using **R**einforcement le**A**rning
(TEMPERA)(@zhang2022tempera) is a method for automatically generating
interpretable prompts.

At a high level, instead of building a prompt from scratch like RLPrompt, TEMPERA takes a starting prompt and modifies different parts of it in order to see what changes help most.

## Action Space

TEMPERA is allowed to edit 3 parts of the prompt:

### 1) The instruction

Given the instruction $i$, one could parse it through `nltk.tokenize.treebank` into a set of phrases. Then the actions allow swapping, addition and deletion between current set of phrases. For example, this will first parse the sentence `"Given text, classify whether it is good or bad."` to `["Given text", "classify", "whether", "it is", "good", "or", "bad"]`. Then we can perform different editing strategies (e.g., swapping two phrases, delete one phrase or repeat one phrase) on this set of phrases.

### 2) In-context examples

Given a example pool of $K$ examples (aka %%exemplars|exemplars%%), we want to select $k$ from them to formulate the final prompt. The action space allows change position of examples $i, j$ with $0 < i < j < k$. It also supports replacing example $0 < i < k$ with any candidate from the pool $k < j < K+1$.

### 3) The verbalizers

The editing space simply allows changing the current verbalizer to any other verbalizer from the `promptsource` collections. For examples, changing from `["positive", "negative"]` to `["great", "terrible"]`.

## Reward

They use a reward which consists of the difference of score between a prompt before/after an edit.

TEMPERA is densely reward, computing a reward for each edit step according to the accuracy improvement comparing the current prompt (after editing) and the previous prompt (before editing).

## Training

TEMPERA uses a GPT architecture and is trained with proximal policy optimization.

They use a reward which consists of the difference of score between a prompt before/after an edit.
File renamed without changes.
6 changes: 5 additions & 1 deletion docs/bibliography.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,11 @@ cite them as such.

#### AutoPrompt(@shin2020autoprompt) 🔵

#### Automatic Prompt Engineer(@zhou2022large)
#### Automatic Prompt Engineer(@zhou2022large) 🔵

#### TEMPERA(@zhang2022tempera) 🔵

#### RLPrompt(@deng2022rlprompt)

## Models

Expand Down
8 changes: 0 additions & 8 deletions docs/trainable/_category_.json

This file was deleted.