Ok so perhaps this might be a use of data contracts #20

netskink · 2023-10-28T22:56:57Z

netskink
Oct 28, 2023

Hello

So, I'm wondering about data contracts again.

Please bear with me as I orient this a question. I am in the process of trying to fine tune a LLM mode regarding a specific programming language. While it sounds simple, it's hard to understand what data was used languages were used in a curated dataset. Each time, the info is specified in different ways. Here are a few examples.

GOOG Codey/codechat-bison, code-bison
- The input data set consisted of the type of code listed in the URL, right?
Kaggle Code Dataset
- Read through the text and it might be just python and R Jupyter notebooks.
Kaggle user contribution
- This one has it right up front which programing languages were used.
chatGPT
- This one species "can understand and generate natural language or code.", but damn, if I can find what type of code it was trained on.

So, my question to you, is the open data contract meant to address this kind of info? If an API supported this contract, would I be able to locate this file in a dataset and read it to find info regarding contents of the dataset? If so, perhaps a pointer to the spec in this regard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ok so perhaps this might be a use of data contracts #20

{{title}}

Replies: 0 comments

Select a reply

Ok so perhaps this might be a use of data contracts #20

netskink Oct 28, 2023

Replies: 0 comments

netskink
Oct 28, 2023