You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please bear with me as I orient this a question. I am in the process of trying to fine tune a LLM mode regarding a specific programming language. While it sounds simple, it's hard to understand what data was used languages were used in a curated dataset. Each time, the info is specified in different ways. Here are a few examples.
This one species "can understand and generate natural language or code.", but damn, if I can find what type of code it was trained on.
So, my question to you, is the open data contract meant to address this kind of info? If an API supported this contract, would I be able to locate this file in a dataset and read it to find info regarding contents of the dataset? If so, perhaps a pointer to the spec in this regard.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello
So, I'm wondering about data contracts again.
Please bear with me as I orient this a question. I am in the process of trying to fine tune a LLM mode regarding a specific programming language. While it sounds simple, it's hard to understand what data was used languages were used in a curated dataset. Each time, the info is specified in different ways. Here are a few examples.
GOOG Codey/codechat-bison, code-bison
Kaggle Code Dataset
Kaggle user contribution
chatGPT
So, my question to you, is the open data contract meant to address this kind of info? If an API supported this contract, would I be able to locate this file in a dataset and read it to find info regarding contents of the dataset? If so, perhaps a pointer to the spec in this regard.
Beta Was this translation helpful? Give feedback.
All reactions