Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/icingadb-migrate: if Downtime cancel time is missing, fall back to scheduled end #622

Conversation

Al2Klimov
Copy link
Member

@Al2Klimov Al2Klimov commented Jul 27, 2023

sla_history_downtime#downtime_end is required. In case of Downtime cancellation it's the cancel time. If it's (unexpectedly) missing, assume the scheduled end.

fixes #621

Reproduction

  • Cancel a Downtime using IDO
  • update icinga_downtimehistory set actual_end_time=null, actual_end_time_usec=0;
  • Migrate the history

Before

2023-07-27T12:14:30.458+0200	FATAL	icingadb-migrate/main.go:449	Error 1048 (23000): Column 'downtime_end' cannot be null
can't perform "INSERT INTO \"sla_history_downtime\" (\"endpoint_id\", \"object_type\", \"host_id\", \"service_id\", \"downtime_end\", \"downtime_start\", \"downtime_id\", \"environment_id\") VALUES (:endpoint_id, :object_type, :host_id, :service_id, :downtime_end, :downtime_start, :downtime_id, :environment_id) ON DUPLICATE KEY UPDATE \"endpoint_id\" = \"endpoint_id\""

After

2023-07-27T12:15:17.868+0200	INFO	icingadb-migrate/main.go:106	Actually migrating
ack & comment   0 %    0s  0/s
downtime 100 %    0s  0/s
flapping   0 %    0s  0/s
notification   0 %    0s  0/s
state   0 %    0s  0/s
2023-07-27T12:15:17.887+0200	INFO	icingadb-migrate/main.go:109	Cleaning up cache

…o scheduled end

sla_history_downtime#downtime_end is required. In case of Downtime cancellation
it's the cancel time. If it's (unexpectedly) missing, assume the scheduled end.
@Al2Klimov Al2Klimov added the bug Something isn't working label Jul 27, 2023
@Al2Klimov Al2Klimov added this to the 1.2.0 milestone Jul 27, 2023
@cla-bot cla-bot bot added the cla/signed label Jul 27, 2023
Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out where these rows are coming from: schedule a flexible downtime in the future and remove it immediately afterwards (before it started/triggered). This resulted in a IDO history entry like this one:

MariaDB [icingaido]> SELECT * FROM icinga_downtimehistory WHERE was_cancelled = 1 AND actual_end_time IS NULL ORDER BY downtimehistory_id DESC LIMIT 1 \G
*************************** 1. row ***************************
    downtimehistory_id: 27670
           instance_id: 1
         downtime_type: 1
             object_id: 55556
            entry_time: 2023-07-27 14:28:26
           author_name: icingaadmin
          comment_data: hello
  internal_downtime_id: 18
       triggered_by_id: NULL
              is_fixed: 0
              duration: 7200
  scheduled_start_time: 2023-08-27 14:28:14
    scheduled_end_time: 2023-09-27 15:28:14
           was_started: 0
     actual_start_time: NULL
actual_start_time_usec: 0
       actual_end_time: NULL
  actual_end_time_usec: 0
         was_cancelled: 1
          is_in_effect: 0
          trigger_time: NULL
                  name: master-1!keep-state!7bbfd6b9-4824-4428-a471-788d5c00240f
    endpoint_object_id: 239
1 row in set (0.000 sec)

Note was_started = 0. From a quick look, this column seems to be completely ignored by the migration tool so with this change, the migration would add some bogus downtime in the future that was actually never in effect.

So rows with was_started = 0 should probably just be ignored (I think Icinga DB would write no history in this case). And then the question is if there are any remaining cases where we should fix/ignore rows.

@Al2Klimov
Copy link
Member Author

And then the question is if there are any remaining cases where we should fix/ignore rows.

Let's keep this and let the reporter test just the successor PR in 1.1.1, then we'll see.

@julianbrost
Copy link
Contributor

Do we still want this PR after #623?

If yes: do we want to try to fix up things or would it be better to say "well, the downtime was cancelled, we don't know when, so if we'd try to fix it up, it would be wrong for sure, so consider it broken and discard it"?

@Al2Klimov
Copy link
Member Author

Do we still want this PR after #623?

I don't see a reason for it, but I kept it open for the case you do.

@julianbrost julianbrost removed this from the 1.1.1 milestone Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla/signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants