You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm learning how to fine-tune RuGPT3 models with my own dataset to generate similar texts. I'm wondering if there is the documentation describing the right dataset format and a list of special tokens.
The specific questions are following:
The problem is that in my dataset there are both one-line and multiline samples, and I'm wondering how to separate them from each other, as it seems to be assumed that new line is a separator by default.
All the texts in my corpus are of the same type (for example, let's say that these are jokes, but they cannot be combined on some big topics) and I want to generate a new text without a specific input, e.g. I don't assume to give beginning of some text. Should I use in my dataset a keyword like "Анекдот", e.g. "Анекдот: <text_1>", and then use this keyword as a prompt? If so, do I need some special token for that word?
I would be grateful for any information on the data format.
The text was updated successfully, but these errors were encountered:
Hello!
I'm learning how to fine-tune RuGPT3 models with my own dataset to generate similar texts. I'm wondering if there is the documentation describing the right dataset format and a list of special tokens.
The specific questions are following:
I would be grateful for any information on the data format.
The text was updated successfully, but these errors were encountered: