Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format Query Data adding Last Modified Date #566

Merged
merged 2 commits into from
Feb 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions src/scribe_data/wikidata/format_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,16 +51,18 @@ def format_data(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the changes already in the PR, should the changes also include modifying that new workflow that's being worked on, check_and_update_missing_query_forms.yaml? To also include the additional segments for lastModified in the generated queries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True yes :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need the gender concatenations for #564 to be included in the query generations 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then I worry a bit about putting group by into all of the queries like this. Maybe the thing to do is handle this in the formatting script where basically values are concatenated if they're already present in the resulting data. Spanish nouns gender just isn't really a useful field with the way the data has been structured.

And also, I just remembered that @axif0 mentioned that this has been added to the other PR in e363282 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're good to go here :) :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also, I just remembered that @axif0 mentioned that this has been added to the other PR

Ah nice!
I hadn't seen that before - good stuff! 👍 way to go

for data_vals in data_list:
lexeme_id = data_vals["lexemeID"]
modified_date = data_vals["lastModified"]

if lexeme_id not in data_formatted:
data_formatted[lexeme_id] = {}
data_formatted[lexeme_id] = {modified_date: {}}

# Reverse to make sure that we're getting the same order as the query.
query_identifiers = list(reversed(data_vals.keys()))
query_identifiers.remove("lexemeID")
query_identifiers.remove("lastModified")

for k in query_identifiers:
data_formatted[lexeme_id][k] = data_vals[k]
data_formatted[lexeme_id][modified_date][k] = data_vals[k]

data_formatted = collections.OrderedDict(sorted(data_formatted.items()))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,13 @@ SELECT
?pausalMasculineIndefiniteSingular
?pausalMasculineIndefinitePlural
?pausalMasculineIndefiniteDual
?lastModified

WHERE {
?lexeme dct:language wd:Q13955 ;
wikibase:lexicalCategory wd:Q34698 ;
wikibase:lemma ?adjective .
wikibase:lemma ?adjective ;
schema:dateModified ?lastModified .

# MARK: Nominative

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?adverb
?lastModified

WHERE {
?lexeme dct:language wd:Q13955 ;
wikibase:lexicalCategory wd:Q380057 ;
wikibase:lemma ?adverb .
wikibase:lemma ?adverb ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?noun
?lastModified

?nominativeFeminineIndefiniteSingular
?nominativeFeminineIndefinitePlural
Expand Down Expand Up @@ -37,7 +38,8 @@ SELECT
WHERE {
?lexeme dct:language wd:Q13955 ;
wikibase:lexicalCategory wd:Q1084 ;
wikibase:lemma ?noun .
wikibase:lemma ?noun ;
schema:dateModified ?lastModified .

# MARK: Nominative

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?properNoun
?lastModified

WHERE {
?lexeme dct:language wd:Q13955 ;
wikibase:lexicalCategory wd:Q147276 ;
wikibase:lemma ?properNoun .
wikibase:lemma ?properNoun ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?verb
?lastModified

?indicativeFirstPersonSingularFiilMudari
?indicativeFirstPersonPluralFiilMudari
Expand All @@ -23,7 +24,8 @@ SELECT
WHERE {
?lexeme dct:language wd:Q13955 ;
wikibase:lexicalCategory wd:Q24905 ;
wikibase:lemma ?verb .
wikibase:lemma ?verb ;
schema:dateModified ?lastModified .

# MARK: Indicative Present
OPTIONAL {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,28 @@

SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?verb
?verb
?lastModified

?activePerformativeFirstPersonSingular
?activePerformativeFirstPersonPlural
?activePerformativeSecondPersonDual
?activePerformativeFirstPersonSingular
?activePerformativeFirstPersonPlural
?activePerformativeSecondPersonDual

?feminineActivePerformativeSecondPersonSingular
?feminineActivePerformativeSecondPersonPlural
?feminineActivePerformativeThirdPersonSingular
?feminineActivePerformativeThirdPersonDual
?feminineActivePerformativeSecondPersonSingular
?feminineActivePerformativeSecondPersonPlural
?feminineActivePerformativeThirdPersonSingular
?feminineActivePerformativeThirdPersonDual

?masculineActivePerformativeSecondPersonSingular
?masculineActivePerformativeSecondPersonPlural
?masculineActivePerformativeThirdPersonSingular
?masculineActivePerformativeThirdPersonDual
?masculineActivePerformativeSecondPersonSingular
?masculineActivePerformativeSecondPersonPlural
?masculineActivePerformativeThirdPersonSingular
?masculineActivePerformativeThirdPersonDual

WHERE {
?lexeme dct:language wd:Q13955 ;
wikibase:lexicalCategory wd:Q24905 ;
wikibase:lemma ?verb .
wikibase:lemma ?verb ;
schema:dateModified ?lastModified .

# MARK: Performative Past

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?adjective
?lastModified

WHERE {
?lexeme dct:language wd:Q8752 ;
wikibase:lexicalCategory wd:Q34698 ;
wikibase:lemma ?adjective .
wikibase:lemma ?adjective ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?adverb
?lastModified

WHERE {
?lexeme dct:language wd:Q8752;
wikibase:lexicalCategory wd:Q380057 ;
wikibase:lemma ?adverb .
wikibase:lemma ?adverb ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@ SELECT
?absIndefinite
?absolutiveSingular
?absolutivePlural
?lastModified

WHERE {
?lexeme dct:language wd:Q8752 ;
wikibase:lexicalCategory wd:Q1084 ;
wikibase:lemma ?absIndefinite .
wikibase:lemma ?absIndefinite ;
schema:dateModified ?lastModified .

# MARK: Absolutive Singular

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?preposition
?lastModified

WHERE {
?lexeme dct:language wd:Q8752 ;
wikibase:lexicalCategory wd:Q4833830 ;
wikibase:lemma ?preposition .
wikibase:lemma ?preposition ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?properNoun
?lastModified

WHERE {
?lexeme dct:language wd:Q8752 ;
wikibase:lexicalCategory wd:Q147276 ;
wikibase:lemma ?properNoun .
wikibase:lemma ?properNoun ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@ SELECT
?imperfective
?nominalized
?participle
?lastModified

WHERE {
# MARK: Infinitive

?lexeme dct:language wd:Q8752 ;
wikibase:lexicalCategory wd:Q24905 ;
wikibase:lemma ?infinitive .
wikibase:lemma ?infinitive ;
schema:dateModified ?lastModified .

# MARK: Future

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?adjective
?lastModified

WHERE {
?lexeme dct:language wd:Q9610 ;
wikibase:lexicalCategory wd:Q34698 ;
wikibase:lemma ?adjective .
wikibase:lemma ?adjective ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?adverb
?lastModified

WHERE {
?lexeme dct:language wd:Q9610 ;
wikibase:lexicalCategory wd:Q380057 ;
wikibase:lemma ?adverb .
wikibase:lemma ?adverb ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ SELECT
?genitive
?accusative
?locative
?lastModified

WHERE {
?lexeme dct:language wd:Q9610 ;
wikibase:lexicalCategory wd:Q1084 ;
schema:dateModified ?lastModified .

# MARK: Nominative

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@ SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?preposition
?grammaticalCase
?lastModified

WHERE {
?lexeme dct:language wd:Q9610 ;
wikibase:lexicalCategory wd:Q161873 ;
wikibase:lemma ?preposition .
wikibase:lemma ?preposition ;
schema:dateModified ?lastModified .

# MARK: Case

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?preposition
?grammaticalCase
?lastModified

WHERE {
?lexeme dct:language wd:Q9610 ;
wikibase:lexicalCategory wd:Q4833830 ;
wikibase:lemma ?preposition .
wikibase:lemma ?preposition ;
schema:dateModified ?lastModified .

# MARK: Case

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ SELECT
?genitive
?accusative
?locative
?lastModified

WHERE {
?lexeme dct:language wd:Q9610 ;
wikibase:lexicalCategory wd:Q147276 ;
schema:dateModified ?lastModified .

# MARK: Nominative

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?verb
?lastModified

WHERE {
?lexeme dct:language wd:Q9610 ;
wikibase:lexicalCategory wd:Q24905 ;
wikibase:lemma ?verb .
wikibase:lemma ?verb ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?adjective
?lastModified

WHERE {
?lexeme dct:language wd:Q727694 ;
wikibase:lexicalCategory wd:Q34698 ;
wikibase:lemma ?adjective .
wikibase:lemma ?adjective ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?adverb
?lastModified

WHERE {
?lexeme dct:language wd:Q727694 ;
wikibase:lexicalCategory wd:Q380057 ;
wikibase:lemma ?adverb .
wikibase:lemma ?adverb ;
schema:dateModified ?lastModified .
FILTER(LANG(?adverb) = "zh")
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?noun
?lastModified

WHERE {
?lexeme dct:language wd:Q727694 ;
wikibase:lexicalCategory wd:Q1084 ;
wikibase:lemma ?noun .
wikibase:lemma ?noun ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?preposition
?lastModified

WHERE {
?lexeme dct:language wd:Q727694 ;
wikibase:lexicalCategory wd:Q4833830 ;
wikibase:lemma ?preposition .
wikibase:lemma ?preposition ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?properNoun
?lastModified

WHERE {
?lexeme dct:language wd:Q727694 ;
wikibase:lexicalCategory wd:Q147276 ;
wikibase:lemma ?properNoun .
wikibase:lemma ?properNoun ;
schema:dateModified ?lastModified .
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?verb
?lastModified

WHERE {
?lexeme dct:language wd:Q727694 ;
wikibase:lexicalCategory wd:Q24905 ;
wikibase:lemma ?verb .
wikibase:lemma ?verb ;
schema:dateModified ?lastModified .
}
Loading
Loading