BBIQ push command & JSON testdata #50

kellnerd · 2024-07-22T17:56:39Z

The BBIQ CLI got a new command which allows you to directly push the contents of a JSON file into the queue, which is useful for testing and debugging.
Especially with the upcoming entity import schema change I wanted to make the current BB importer setup more battle-proofed before I am going break everything again.

So I've produced a few files with test data which cover all currently available properties of the following entity types:

I am going to mark this PR as a draft until I have tested the remaining entity types as much as possible (but without doing changes which would have to be redone after the schema change).

The main issue which I discovered with these tools was the lack of support for annotations on imported entities. See the following PRs for all issues:

Support for release events is generally limited to a single date so far.

kellnerd · 2024-07-24T21:48:34Z

I think this is ready now, I don't want to get this stuck with the failing approval of imported editions and series. All other entity types can be imported, viewed, edited and approved.

For series imports, approval fails at the SQL level only, everything else should be working:

error: insert into "bookbrainz"."series" ("alias_set_id", "annotation_id", "bbid", "disambiguation_id", "entity_type", "identifier_set_id", "ordering_type_id", "revision_id") values ($1, $2, $3, $4, $5, $6, $7, $8) returning * - cannot insert into view "series"

I have no clue what is wrong here, such an insert statement works for the views of all other entity types which I could test. Only for series it fails, via both code paths and directly in psql, so something about this view must be different from the rest?

Solution: Triggers for series were missing in the minimal DB, creating them solves the issue:

cat sql/scripts/create_triggers.sql | docker exec -i postgres psql -U bookbrainz -d bookbrainz_min

For edition imports, the approval fails during the automatic creation of an edition group (for both the "Approve" and the "Edit & Approve" action):

error: insert into "bookbrainz"."edition_group" ("alias_set_id", "author_credit_id", "bbid", "credit_section", "revision_id") values ($1, DEFAULT, $2, DEFAULT, $3) returning * - column "author_credit_id" of relation "edition_group" does not exist
From previous event: [...]
     at createEditionGroupForNewEdition (/home/bookbrainz/bookbrainz-site/node_modules/bookbrainz-data/lib/util.js:236:6)
     at autoCreateNewEditionGroup (/home/bookbrainz/bookbrainz-site/node_modules/bookbrainz-data/lib/models/entities/edition.js:41:31)
     at Child.<anonymous> (/home/bookbrainz/bookbrainz-site/node_modules/bookbrainz-data/lib/models/entities/edition.js:74:11)

Solution:

cat sql/migrations/2021-author-credits/up.sql | docker exec -i postgres psql -U bookbrainz -d bookbrainz_min

The below is outdated already, the remaining issues have been addressed in metabrainz/bookbrainz-site#1109

I didn't have the time to fully investigate that error so far, but my test import data definitely had a valid edition group BBID, so there should have been no need to create an EG automatically. It doesn't show up in the edition editor though, so it might have been lost at an intermediate transformation step.

But this still doesn't explain why the automatic creation fails.
It seems that the error occurs because the minimal DB has an outdated edition_group view (and edition_group_data table), back from the days when there were no author credits...
As I said before already, time to use a full DB dump 😅

kellnerd · 2024-07-25T14:09:46Z

I've enhanced the author testdata with identifiers and more aliases. With both of these working fine, I think I have reached the limit of what data is currently possible to import.

P.S. We should replace the numeric IDs at the parsed entity level soon, it is cumbersome to obtain them in every producer implementation.
Additionally, specifying an invalid ID can lead to obscure errors which do not surface in the logs.

Example: Try changing the author aliases language ID from 134 to 34 and see how the real issue is hidden by the following error message:

error: Transaction for 'Test Author' (Author A1234) failed: error: insert into "bookbrainz"."identifier" ("type_id", "value") values ($1, $2) returning * - current transaction is aborted, commands ignored until end of transaction block

Rather than the last(?) statement of a failing transaction, the log should show the statement that caused the failure. I only discovered the real cause by gradually removing the newly added identifiers:

error: Transaction for 'Test Author' (Author A1234) failed: error: insert into "bookbrainz"."alias" ("language_id", "name", "primary", "sort_name") values ($1, $2, $3, $4) returning * - insert or update on table "alias" violates foreign key constraint "alias_language_id_fkey"

MonkeyDo · 2024-08-06T11:44:26Z

I've enhanced the author testdata with identifiers and more aliases. With both of these working fine, I think I have reached the limit of what data is currently possible to import.

On this topic, I just found out (or forgot and found out again) about OL publishing JSON-schema files for their dumps, which could be used to cross-reference if we are missing any data point, perhaps even could be useful for parsing and validating OL imports and give us useful error messages?
https://github.com/internetarchive/openlibrary-client/tree/master/olclient/schemata

Rather than the last(?) statement of a failing transaction, the log should show the statement that caused the failure. I only discovered the real cause by gradually removing the newly added identifiers:

That's a pickle. I'm trying to see if it's a know issue, but apart from 10-year-old issues related to Promise.all I can't find anything specificly similar to our case.
Maybe we are using nested try-catch blocks in a way that knex doesn't like, but it's going to be a PITA to test that hypothseis.

MonkeyDo

Thanks for making useful debugging commands!
Sure beats manually entering sample data in the database :)

kellnerd · 2024-08-12T10:07:51Z

On this topic, I just found out (or forgot and found out again) about OL publishing JSON-schema files for their dumps, which could be used to cross-reference if we are missing any data point, perhaps even could be useful for parsing and validating OL imports and give us useful error messages?

Yeah, I am already aware of these, I had converted them into type definitions a month ago before I focussed on other tasks.
That should be sufficient to handle all data points from the dumps which we are missing so far.
We will see whether all entries follow that schema or if we need additional runtime validation once I get back to the OL importer.

kellnerd added 5 commits July 17, 2024 12:16

feat(bbiq): Add command to push JSON files into queue

79633dd

feat(consumer): Validate annotation

44f73bc

chore: Create entity testdata for the consumer

52ef8ed

feat(consumer): Validate series entities

c3eb7da

chore: Create publisher and series testdata for the consumer

81a8ffa

kellnerd force-pushed the bbiq-push-file branch from be7218d to 81a8ffa Compare July 24, 2024 20:00

kellnerd added 3 commits July 24, 2024 23:01

fix(consumer): Handle edition with undefined languages

e5320b2

feat(consumer): Validate edition release date

0a48a93

Support for release events is generally limited to a single date so far.

chore: Create edition testdata for the consumer

a52679d

kellnerd marked this pull request as ready for review July 24, 2024 21:25

kellnerd added 4 commits July 25, 2024 13:33

Merge branch 'master' into bbiq-push-file

3186437

doc(importer): Mention potentially missing author credit migration

e48658c

doc(importer): Mention script to create missing series view trigger

fe8b0e4

chore: Enhance author testdata with aliases, identifiers and areas

74bee3f

kellnerd force-pushed the bbiq-push-file branch from 7ffb609 to 74bee3f Compare July 25, 2024 14:08

MonkeyDo approved these changes Aug 6, 2024

View reviewed changes

MonkeyDo merged commit 734abc0 into metabrainz:master Aug 6, 2024

kellnerd deleted the bbiq-push-file branch August 12, 2024 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BBIQ push command & JSON testdata #50

BBIQ push command & JSON testdata #50

kellnerd commented Jul 22, 2024 •

edited

Loading

kellnerd commented Jul 24, 2024 •

edited

Loading

kellnerd commented Jul 25, 2024

MonkeyDo commented Aug 6, 2024

MonkeyDo left a comment

kellnerd commented Aug 12, 2024

BBIQ push command & JSON testdata #50

BBIQ push command & JSON testdata #50

Conversation

kellnerd commented Jul 22, 2024 • edited Loading

kellnerd commented Jul 24, 2024 • edited Loading

kellnerd commented Jul 25, 2024

MonkeyDo commented Aug 6, 2024

MonkeyDo left a comment

Choose a reason for hiding this comment

kellnerd commented Aug 12, 2024

kellnerd commented Jul 22, 2024 •

edited

Loading

kellnerd commented Jul 24, 2024 •

edited

Loading