Skip to content

Commit

Permalink
Update TRAIN.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mgalley authored May 22, 2023
1 parent e3c7a4b commit 8db61db
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion TRAIN.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Retraining full models

*Important:* 5/22/2023: It is no longer possible to retrain GODEL models from scratch as the [dump files](https://files.pushshift.io/reddit) of the Pushshift Reddit Dataset have been recently deleted. If you would like to recreate Reddit data, please consider using the Pushshift [API](https://github.com/pushshift/api) instead, but please note that the API is not supported by the GODEL codebase. We left the instructions below for historical reasons (e.g., for users who still have the Reddit dump files), but these instructions no longer work.
**Important:** 5/22/2023: It is no longer possible to retrain GODEL models from scratch as the [dump files](https://files.pushshift.io/reddit) of the Pushshift Reddit Dataset have been recently deleted. If you would like to recreate Reddit data, please consider using the Pushshift [API](https://github.com/pushshift/api) instead, but please note that the API is not supported by the GODEL codebase. We left the instructions below for historical reasons (e.g., for users who still have the Reddit dump files), but these instructions no longer work without the dump files.

### Data preparation
GODEL is pre-trained with three phases 1) Linguistic pre-training on public web documents to gain the capability of text generation. 2) Dialog pre-training on public dialog data to learn to chat like a human. 3) Grounded dialog pre-training to enable a dialog model to generate responses grounding on specific goals.
Expand Down

0 comments on commit 8db61db

Please sign in to comment.