Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Data Dictionary in readme #300

Open
ssaurbier opened this issue Dec 22, 2024 · 3 comments · May be fixed by #315
Open

Missing Data Dictionary in readme #300

ssaurbier opened this issue Dec 22, 2024 · 3 comments · May be fixed by #315
Assignees

Comments

@ssaurbier
Copy link

ssaurbier commented Dec 22, 2024

Hey all, this is pretty low hanging fruit - we need to include data dictionaries in readmes. As it currently stands, there is no way for an outsider to make heads or tails of this dataset - data dicts are standard best practice when creating a readme. This is a mirror of condo-avm #72. I have not bothered to try to organize this feature table. As it stands, these models are not open source without the ability for users to know what params are used in the model.

@dfsnow
Copy link
Member

dfsnow commented Dec 22, 2024

We do have a pretty extensive data catalog, but I take your point that it's not exactly accessible. Do you have a schema/format in mind for a data dictionary that would be the most helpful? We can try to automatically generate one from the data catalog.

@ssaurbier
Copy link
Author

ssaurbier commented Dec 22, 2024

@dfsnow I have shared a schema and format already - please refer to condo-avm issue #72. See also: https://help.osf.io/article/217-how-to-make-a-data-dictionary

Data dictionaries need to reflect all variables, as well as variable names in the data - otherwise it is not a data dict, just a list of features. There is currently no way to connect the features to their variable names.

Furthermore, please ensure the information is correct (i have not bothered to check res-avm, but condo-avm is not internally consistent). That data catalog is nice, but useless to someone who hopes to use this repo - it is confounding to me for the CCAO to expect volunteer contributors to individually parse AWS and DBT infrastructure to build a data dict - when a data dict is a basic best practice requirement and and initial task when building a readme on any open source project.

This is not a big ask - please reread my initial submission, include variable names in the feature table to create a data dictionary, and then make sure the information is correct.

@dfsnow
Copy link
Member

dfsnow commented Dec 22, 2024

Alright, we'll work on constructing a machine-readable data dictionary that's similar to your schema. We'll plan to include it in the Getting Data subsection.

@ccao-data/core-team Let's modify the code that constructs the "Features Used" table to create a data dict. We should be able to pull all the info we need directly from dbt. Once created, dicts can live in the docs/ section under version control.

@jeancochrane jeancochrane linked a pull request Jan 8, 2025 that will close this issue
@jeancochrane jeancochrane linked a pull request Jan 8, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants