Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine genders and plurals into comma separated strings for noun que… #564

Merged
merged 8 commits into from
Mar 2, 2025

Conversation

you-think-you-know-me
Copy link
Contributor

Contributor checklist


Description

Modified the SPARQL query for nouns for spanish language such that all gender and plurals associated with a lexmme are shown in a single row as comma separated string.

I have tested the query by running it at https://query.wikidata.org/.

Screenshot from 2025-01-30 14-27-31
Above image shows that for one lexmme there is only one result having combined comma separated string for gender and plural.

Related issue

Combine genders into comma separated strings for noun queries #544

Copy link

github-actions bot commented Jan 30, 2025

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The linting and formatting workflow within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First PR Commit Check

  • The commit messages for the remote branch of a new contributor should be checked to make sure their email is set up correctly so that they receive credit for their contribution
    • The contributor's name and icon in remote commits should be the same as what appears in the PR
    • If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Data repo (can be set with git config --global user.email "GITHUB_EMAIL")

@andrewtavis
Copy link
Member

Giving this another look, @you-think-you-know-me, I'm wondering if it might make more sense to handle this in post formatting. If we add this kind of group by functionality, then this will also need to go into the query generation functionality, and then it's also going to mean that the queries are going to be a lot slower. Would you have interest in reverting the changes to the query and then updating src/scribe_data/wikidata/format_data.py so that if the value already exists in the dictionary, that a new one will be appended with such that the new value is the old and new in alphabetical order separated by commas?

Let me know how this sounds! This will also mean that we won't have to change the queries one by one 😊

@you-think-you-know-me
Copy link
Contributor Author

Giving this another look, @you-think-you-know-me, I'm wondering if it might make more sense to handle this in post formatting. If we add this kind of group by functionality, then this will also need to go into the query generation functionality, and then it's also going to mean that the queries are going to be a lot slower. Would you have interest in reverting the changes to the query and then updating src/scribe_data/wikidata/format_data.py so that if the value already exists in the dictionary, that a new one will be appended with such that the new value is the old and new in alphabetical order separated by commas?

Let me know how this sounds! This will also mean that we won't have to change the queries one by one 😊

Sounds good, I will make the required changes.

@andrewtavis
Copy link
Member

Great! Looking forward to them :)

@you-think-you-know-me
Copy link
Contributor Author

@andrewtavis I have made the required changes. Please review them.

@andrewtavis
Copy link
Member

@you-think-you-know-me: Sorry there were pressing changes that needed to be brought in :) Would you be able to solve the merge conflicts on this PR? I can help as well if need be :) After that this PR will be the next to be brought in 🚀

@andrewtavis andrewtavis reopened this Feb 10, 2025
@andrewtavis
Copy link
Member

Sorry, closed on accident! Let me know on the rebase, and again happy to help 😊

@andrewtavis
Copy link
Member

Discussing this with @axif0 right now: We may need to comma separate all responses that have the same form, not just gender and plurals :)

@andrewtavis
Copy link
Member

Related issue for this is #573 :)

@andrewtavis
Copy link
Member

@axif0 Will be doing a first review to combine the changes that were recently made to this file :)

@axif0
Copy link
Member

axif0 commented Mar 2, 2025

image

image

Copy link
Member

@andrewtavis andrewtavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great contribution here @you-think-you-know-me and also for the amazing work to connect all of this and resolve #573 as well, @axif0! 🚀

@andrewtavis andrewtavis merged commit 327942d into scribe-org:main Mar 2, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants