-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing Data Dictionary in readme #300
Comments
We do have a pretty extensive data catalog, but I take your point that it's not exactly accessible. Do you have a schema/format in mind for a data dictionary that would be the most helpful? We can try to automatically generate one from the data catalog. |
@dfsnow I have shared a schema and format already - please refer to condo-avm issue #72. See also: https://help.osf.io/article/217-how-to-make-a-data-dictionary Data dictionaries need to reflect all variables, as well as variable names in the data - otherwise it is not a data dict, just a list of features. There is currently no way to connect the features to their variable names. Furthermore, please ensure the information is correct (i have not bothered to check res-avm, but condo-avm is not internally consistent). That data catalog is nice, but useless to someone who hopes to use this repo - it is confounding to me for the CCAO to expect volunteer contributors to individually parse AWS and DBT infrastructure to build a data dict - when a data dict is a basic best practice requirement and and initial task when building a readme on any open source project. This is not a big ask - please reread my initial submission, include variable names in the feature table to create a data dictionary, and then make sure the information is correct. |
Alright, we'll work on constructing a machine-readable data dictionary that's similar to your schema. We'll plan to include it in the Getting Data subsection. @ccao-data/core-team Let's modify the code that constructs the "Features Used" table to create a data dict. We should be able to pull all the info we need directly from dbt. Once created, dicts can live in the |
Hey all, this is pretty low hanging fruit - we need to include data dictionaries in readmes. As it currently stands, there is no way for an outsider to make heads or tails of this dataset - data dicts are standard best practice when creating a readme. This is a mirror of condo-avm #72. I have not bothered to try to organize this feature table. As it stands, these models are not open source without the ability for users to know what params are used in the model.
The text was updated successfully, but these errors were encountered: