Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean Up "Synonyms" in the Lookup Table #48

Open
gabrielodom opened this issue Jul 13, 2021 · 0 comments
Open

Clean Up "Synonyms" in the Lookup Table #48

gabrielodom opened this issue Jul 13, 2021 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@gabrielodom
Copy link
Contributor

gabrielodom commented Jul 13, 2021

  1. We have some issues with the free-text entries of lookup_df$synonym. For example, some drug names have " " at the end (e.g. 'blanca ", "monos ", "nieve "). I have noticed that most of these are Spanish words.
  2. There are symbols in some of the drug names, like "c & m", "m-cat", or "el perico ("parrot")". I think parse() removes these symbols, which means that these drug names will never be matched if we called parse() first (which is bad, because this is our recommended workflow).
  3. The string "mixed with" shows up 25 times. Can this formula of "drug a (mixed with drug b)" be re-expressed?
  4. The word "and" is a stop word, but " and " shows up in 40 times. We can't match to these drug synonyms either.
  5. There are 20 synonyms that include one or more ".", for example "l.a. ice" or "m.j.". We remove all periods in parse(), and lookup("m j") returns no matches.

@RaymondBalise, @labouz, what do you recommend we do here?

@gabrielodom gabrielodom added the bug Something isn't working label Jul 13, 2021
@gabrielodom gabrielodom self-assigned this Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant