Skip to content

Commit

Permalink
Fix issue that would cause the poller not to panic correctly (#1030)
Browse files Browse the repository at this point in the history
  • Loading branch information
farhatahmad authored Oct 18, 2023
1 parent 1aa21c6 commit f2deb3e
Showing 1 changed file with 17 additions and 1 deletion.
18 changes: 17 additions & 1 deletion lib/tasks/poll.rake
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,26 @@ namespace :poll do
next if server.increment_unhealthy < Rails.configuration.x.server_unhealthy_threshold

Rails.logger.warn("Server id=#{server.id} is unhealthy. Panicking and setting offline...")
Rake::Task['servers:panic'].invoke(server.id, true) # Panic server to clear meetings

meetings = Meeting.all.select { |m| m.server_id == server.id }
meetings.each do |meeting|
puts("Clearing Meeting id=#{meeting.id}")
moderator_pw = meeting.try(:moderator_pw)
meeting.destroy!
get_post_req(encode_bbb_uri('end', server.url, server.secret, meetingID: meeting.id, password: moderator_pw))
rescue ApplicationRedisRecord::RecordNotDestroyed => e
raise("ERROR: Could not destroy meeting id=#{meeting.id}: #{e}")
rescue StandardError => e
puts("WARNING: Could not end meeting id=#{meeting.id}: #{e}")
end

server.reset_counters
server.load = nil
server.online = false
server.meetings = 0
server.users = 0
server.largest_meeting = 0
server.videos = 0
ensure
begin
server.save!
Expand Down

1 comment on commit f2deb3e

@defnull
Copy link
Contributor

@defnull defnull commented on f2deb3e Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It I understand the code right, there are still two potential race conditions in that code path:

  1. Meetings can be created between the Meeting.all.select line, and the meeting.save! line that finally marks the server as offline and prevents new meetings. Those new meetings will not be destroyed.
  2. Meetings could be ended via the normal API while the meetings.each loop still runs. The meeting.destroy! line will then raise a RecordNotDestroyed error which is re-raised and ends the loop early. All remaining meetings will not be destroyed.

Both are actually not that unlikely, because the loop tries to send lots of end-meeting API calls to a BBB server that we already know is unresponsive.

To avoid those race condition, the server should be marked as offline first, then all meetings should be stopped and destroyed. Any errors should be suppressed, not re-raised.

Please sign in to comment.