-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port schema to MySQL 5.7 / MariaDB 10.2 #203
Conversation
87b2fb5
to
7bf3c7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you spin this up for a test without any other compatibility issues?
Good catch! I completely forgot the GHA! |
1c25cc3
to
74d69b5
Compare
74d69b5
to
0e55777
Compare
0e55777
to
1758cb9
Compare
Then, why is the |
Because it's not mandatory for porting the schema. Just an accidental finding. |
Then open a bug, referencing the Web issue. Don't reference it here, if you don't plan to change it here. |
Whether I change something here depends on whether this PR or Icinga/icinga-notifications-web#196 is completed first. |
That's fine. But then create a separate issue! It doesn't magically change otherwise. This PR is about the MySQL port, not about fixes in the Postgres schema. |
1758cb9
to
e0f1e14
Compare
09175f1
to
058eb59
Compare
50e0d37
to
1690d69
Compare
1690d69
to
011fa09
Compare
Continuing #203 (review). Turns out, the icinga-notifications/internal/object/object.go Lines 118 to 121 in edaa61c
To be continued... again. |
Continuing #203 (review), #203 (comment). The A
While I am not completely certain, I would guess based on
After some tinkering, I was able to reproduce this with to conflicting sessions. First, I selected the maximum
Furthermore, I tried adding a new For comparison, the query plan generated by PostgreSQL looks identical, at least for the selection part.
|
If MySQL isn't able to handle what we do in concurrent transactions, we have to change something anyways. Even without understanding the exact cause yet, I think there are some obvious things that we can try:
|
9653e73
to
0406d24
Compare
Even though I am not the biggest fan of retrying everything, it might be a strategy here. Especially since situations like this could be lurking anywhere in the codebase.
After brainstorming together with @yhabteab, he suggested the same. Unfortunately, this does not worked either. Thus, I just moved the logic from the database into the code, which worked, but isn't something I am proud of. diff --git a/internal/object/object.go b/internal/object/object.go
index fc15c26..3a3e9c9 100644
--- a/internal/object/object.go
+++ b/internal/object/object.go
@@ -114,14 +114,26 @@ func FromEvent(ctx context.Context, db *database.DB, ev *event.Event) (*Object,
return nil, fmt.Errorf("failed to upsert object id tags: %w", err)
}
- extraTag := &ExtraTagRow{ObjectId: newObject.ID}
- _, err = tx.NamedExecContext(ctx, `DELETE FROM "object_extra_tag" WHERE "object_id" = :object_id`, extraTag)
+ var tags []string
+ stmt = db.Rebind(`SELECT "tag" FROM "object_extra_tag" WHERE "object_id" = ?`)
+ err = tx.SelectContext(ctx, &tags, stmt, newObject.ID)
if err != nil {
- return nil, fmt.Errorf("failed to delete object extra tags: %w", err)
+ return nil, fmt.Errorf("failed to fetch object_extra_tags: %w", err)
+ }
+
+ stmt = db.Rebind(`DELETE FROM "object_extra_tag" WHERE "object_id" = ? AND "tag" = ?`)
+ for _, tag := range tags {
+ if _, ok := ev.ExtraTags[tag]; ok {
+ continue
+ }
+ _, err := tx.ExecContext(ctx, stmt, newObject.ID, tag)
+ if err != nil {
+ return nil, fmt.Errorf("failed to delete object extra tags: %w", err)
+ }
}
if len(ev.ExtraTags) > 0 {
- stmt, _ := db.BuildInsertStmt(extraTag)
+ stmt, _ := db.BuildUpsertStmt(&ExtraTagRow{ObjectId: newObject.ID})
_, err = tx.NamedExecContext(ctx, stmt, mapToTagRows(newObject.ID, ev.ExtraTags))
if err != nil {
return nil, fmt.Errorf("failed to insert object extra tags: %w", err) Edit: This might just work because the |
b686876
to
d6c3668
Compare
In general, those should stay unchanged most of the time. So in combination with retrying on deadlocks, this might be good enough. |
d6c3668
to
737b302
Compare
Out of curiosity, I changed the code to
While starting to question what "database engine" even means1, I would second the idea to retry deadlocks. Footnotes
|
737b302
to
51c815e
Compare
I don't know exactly how that's even possible but here you're. 2024-07-17T16:02:11.160+0200 INFO incident Contact "Icinga Admin" role changed from subscriber to manager {"object": "docker-master!swap", "incident": "#5"}
2024-07-17T16:02:11.185+0200 ERROR incident Can't insert incident event to the database {"object": "docker-master!swap", "incident": "#5", "error": "Error 1062 (23000): Duplicate entry '5-348' for key 'PRIMARY'"}
2024-07-17T16:02:11.185+0200 ERROR icinga2 Cannot process event {"source-id": 1, "event": "[time=2024-07-17 16:02:11.150006 +0200 CEST m=+0.912499501 type=\"acknowledgement-set\" severity=]", "error": "can't insert incident event to the database"}
---
2024-07-17T16:08:16.259+0200 ERROR incident Can't insert incident event to the database {"object": "docker-master!swap", "incident": "#5", "error": "Error 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (\"notifications\".\"incident_event\", CONSTRAINT \"fk_incident_event_event\" FOREIGN KEY (\"event_id\") REFERENCES \"event\" (\"id\"))"}
2024-07-17T16:08:16.260+0200 ERROR icinga2 Cannot process event {"source-id": 1, "event": "[time=2024-07-17 16:08:16.231294 +0200 CEST m=+0.969004418 type=\"acknowledgement-set\" severity=]", "error": "can't insert incident event to the database"} |
Fixed in #243 🙈 |
Regarding this error, @oxzi pointed the following out to me: It should currently only be possible to get this error within the test cases. The code issuing the queries running into a conflict are actually done while holding a mutex: icinga-notifications/internal/object/object.go Lines 75 to 133 in edaa61c
Thus, an Icinga Notifications daemon won't deadlock with itself here. However, when running the tests, So I took the liberty to prepare a commit (8a824f1) that would be ready to use here that disables this behavior. And indeed, with it, the tests work reliably: https://github.com/Icinga/icinga-notifications/actions/runs/9977349345 Only running a single test case against a given database sounds like a good idea in general, so I consider setting (There's still the issue that we probably want to be able to perform these database queries in parallel at some point, but that's a story for another day and PR.) @Al2Klimov Feel free to adopt this commit in this PR, that will save you a rebase (otherwise, I'd create it as an independent PR). |
Apart from #203 (comment) looks fine for me, I don't know exactly why you replaced |
By default, multiple packages may be tested in parallel in different processes. Passing -p 1 reduces this to one process to prevent test cases in different packages from accessing the same database. Note that t.Parallel() only affects parallelism within one process, i.e. within the tests of one package. Co-authored-by: Alvar Penning <[email protected]>
8a824f1
to
20717c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LFTM.
One last thought regarding #203 (comment): I just had the idea to explicitly use index hints and gave it a last try. First, I have added another CREATE INDEX idx_object_extra_tag_object_id_tag ON object_extra_tag(object_id, tag); Then, I crafted a
As one can see, this query would now be of the Click here to see the abandoned diff
diff --git a/internal/object/object.go b/internal/object/object.go
index 9e54253..be0591e 100644
--- a/internal/object/object.go
+++ b/internal/object/object.go
@@ -114,8 +114,22 @@ func FromEvent(ctx context.Context, db *database.DB, ev *event.Event) (*Object,
return nil, fmt.Errorf("failed to upsert object id tags: %w", err)
}
+ if tx.DriverName() == database.MySQL {
+ // Without referring an index, the query will choose the suboptimal "range" join type query strategy. Doing so
+ // will result in a next-key lock, resulting in a potential deadlock.
+ // https://github.com/Icinga/icinga-notifications/pull/203#issuecomment-2232740619
+ //
+ // Note: MySQL/MariaDB only supports indexes for DELETEs when using multi-table statements.
+ // https://dev.mysql.com/doc/refman/8.0/en/index-hints.html
+ stmt = `DELETE object_extra_tag.* FROM "object_extra_tag" USE INDEX (idx_object_extra_tag_object_id_tag)`
+ } else {
+ // PostgreSQL just does the right thing by default :)
+ stmt = `DELETE FROM "object_extra_tag"`
+ }
+ stmt += ` WHERE "object_id" = :object_id`
+
extraTag := &ExtraTagRow{ObjectId: newObject.ID}
- _, err = tx.NamedExecContext(ctx, `DELETE FROM "object_extra_tag" WHERE "object_id" = :object_id`, extraTag)
+ _, err = tx.NamedExecContext(ctx, stmt, extraTag)
if err != nil {
return nil, fmt.Errorf("failed to delete object extra tags: %w", err)
}
diff --git a/schema/mysql/schema.sql b/schema/mysql/schema.sql
index 37040cf..8810f14 100644
--- a/schema/mysql/schema.sql
+++ b/schema/mysql/schema.sql
@@ -269,6 +269,10 @@ CREATE TABLE object_extra_tag (
CONSTRAINT fk_object_extra_tag_object FOREIGN KEY (object_id) REFERENCES object(id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
+-- This index is identical to the primary key, but is allowed to be used as an index hint in both MySQL and MariaDB.
+-- Using it for DELETE statements optimizes the query's "join type" from "range" to "ref".
+CREATE INDEX idx_object_extra_tag_object_id_tag ON object_extra_tag(object_id, tag);
+
CREATE TABLE event (
id bigint NOT NULL AUTO_INCREMENT,
time bigint NOT NULL, Footnotes
|
I'm not entirely sure what the purpose of that second identical index would have been here. Should it have some or did you just miss the following sentence in the documentation? 😅
|
Honestly, I was not quite sure if this would work in MariaDB, thus I wanted to be certain. For example, MariaDB's "documentation" does not mention I just tested it, and both |
fixes #154
Required
Preferred
Optional