Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the meaning of data set attribute representation #30

Open
zh57398 opened this issue Apr 6, 2021 · 2 comments
Open

Questions about the meaning of data set attribute representation #30

zh57398 opened this issue Apr 6, 2021 · 2 comments

Comments

@zh57398
Copy link

zh57398 commented Apr 6, 2021

About your dataset, does the "length" attribute represent the length of the "text" attribute? Or something else? I don't think it means the length of the "text" attribute, for example, in the file "medium-345m-k40 train.jsonl ”"Length" = 1024, but I calculated the length of text is equal to 4750, so I want to know the meaning of "length" attribute. I look forward to your reply. Thank you very much.

@DaveXanatos
Copy link

If you're referring to the length parameter as per this:

def interact_model(
    model_name='345M', #345M/774M on Pi4B 8G only (memory allocation issue) 1558 too big for Pi4b8G
    seed=None,
    nsamples=1,
    batch_size=1,
    length=140,
    temperature=1.2,
    top_k=48,
    top_p=0.7,
    models_dir='models',
):

Then length refers to the maximum number of words the output will contain. I keep mine short & sweet at 140 max length because I use GPT-2 for my robots for a conversational response. But if you want it to write an article, it certainly can...

@zh57398
Copy link
Author

zh57398 commented Apr 9, 2021

First of all, thank you very much for your reply, but I still don't understand. I can understand that 1024 is the maximum length. I understand the "text" attribute as the text generated by gpt-2. I'm not sure if my understanding is correct? If correct, the "length" attribute should be equal to the length of "text". In the dataset you provided, I calculated the length of the "text" attribute, but it is not equal to the given value of the "length" attribute, so I want to know what the "length" attribute stands for?Looking forward to your help and reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants