Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pdp] Map feature columns to human-friendly names #43

Merged
merged 14 commits into from
Dec 23, 2024

Conversation

bdewilde
Copy link
Member

@bdewilde bdewilde commented Dec 21, 2024

changes

  • adds tomli as optional project dependency (for PY3.10 only)
  • adds "assets" directory, and within it, adds a "features table" toml file mapping PDP column names to human-friendly feature names (and, eventually, fuller descriptions)
  • adds a utils func to load+parse features tables from disk
  • adds feature values to inference topn feature display function (ported from https://github.com/datakind/student-success-intervention/pull/120) and also optional feature name mapping by way of a features table

context

PDP's internal feature column naming is a bit... technical. Advisors need more friendly naming to help them interpret model outputs correctly.

questions

  • Does this seem like a reasonable implementation?
  • How do y'all feel about TOML? Personally I love it, and much prefer it over YAML. :D

@bdewilde bdewilde marked this pull request as ready for review December 21, 2024 18:36
Copy link

@vishpillai123 vishpillai123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to update the unit test for inference as well to reflect the new output format. Once we do that, I think we're good to merge.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having human friendly names in our output feature file is a great idea. Definitely onboard with this. I'm also open to .toml file, I don't necessarily have a preference either way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also update the unit test as well under tests/modeling/test_inference.py?

Other than that, this looks good to me and similar to the changes in the private repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good call. I ported those tests over as-is from the private repo; will modernize and extend here.

@bdewilde bdewilde merged commit f6fc68a into develop Dec 23, 2024
5 checks passed
@bdewilde bdewilde deleted the pdp-feature-name-translation branch December 23, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants