Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push_async executing forever #75

Closed
brlo opened this issue Aug 24, 2018 · 4 comments
Closed

push_async executing forever #75

brlo opened this issue Aug 24, 2018 · 4 comments

Comments

@brlo
Copy link

brlo commented Aug 24, 2018

Hello.
I use Apnotic push_async in Sidekiq.
The fix from this issue #68 resolve problem when Sidekiq is crushed because of exception in main thread. But from time to time one of my 10 Sidekiq workers is stuck forever with job where we need to send a push async.
I've explored this problem and found what if SocketError happened, for example, here in socket_loop https://github.com/ostinelli/net-http2/blob/master/lib/net-http2/client.rb#L142 (to reproduce just raise it here) next try of push_async will stuck here https://github.com/ostinelli/apnotic/blob/master/lib/apnotic/connection.rb#L83
streams_available? will always be false because SocketError resetting @client.remote_settings to default value from http-2 gem https://github.com/igrigorik/http-2/blob/master/lib/http/2/connection.rb#L11-L19
and remote_max_concurrent_streams always return zero because of that condition https://github.com/ostinelli/apnotic/blob/master/lib/apnotic/connection.rb#L94-L98.

To find this bug I've used monkey patch:

module Apnotic
  class Connection
    private
    def delayed_push_async(push)
      i = 1
      until streams_available? do
        if i < 5000
          sleep 0.001
          i+=1
        else
          raise StandardError.new("Timeout 5s on Apnotic::Connection#delayed_push_async: #{ @client.remote_settings[:settings_max_concurrent_streams] }/#{ @client.stream_count }")
        end
      end
      @client.call_async(push.http2_request)
    end
  end
end

And, as workaround - this monkey patch:

module Apnotic
  class Connection
    private
    def remote_max_concurrent_streams
      # 0x7fffffff is the default value from http-2 gem (2^31)
      if @client.remote_settings[:settings_max_concurrent_streams] == 0x7fffffff
        1
      else
        @client.remote_settings[:settings_max_concurrent_streams]
      end
    end
  end
end

So, I just changed 0 to 1. But actually I didn't understand why we use 0 if we have default value from http-2 gem. If it means some connection troubles and because of that we say that we can't use any concurrent connection, so maybe better raise some kind of exception.

Also, important to know, what another Sidekiq workers continue to successfully use push_async until next SocketError happened in another worker and it also will stuck. If I apply monkey patch, which break loop after 5 seconds, the next push_async in same worker will also trapped in this loop. So, only that worker, which raise SocketError goes into degraded state.

useful comment: #64 (comment)

@ostinelli
Copy link
Owner

Please see if latest 1.6.0 solves your issue.

@wogg
Copy link

wogg commented Oct 14, 2019

We appear to be having this exact issue as well, with this combination (yes, 1.6.0):

   apnotic (1.6.0)
      connection_pool (~> 2)
      net-http2 (>= 0.18, < 2)

Correction: on further investigation, it seems our batch sometimes gets stuck waiting forever on .join of the items that have been pushed by push_async:

.../vendor/ruby/2.3.0/gems/net-http2-0.18.2/lib/net-http2/client.rb:59:in `sleep'
.../vendor/ruby/2.3.0/gems/net-http2-0.18.2/lib/net-http2/client.rb:59:in `join'
.../vendor/ruby/2.3.0/gems/apnotic-1.6.0/lib/apnotic/connection.rb:76:in `join'

are where the processes are when I send TERM
I'll move this to a separate topic... sorry for noise,

@brlo
Copy link
Author

brlo commented Oct 14, 2019

Please see if latest 1.6.0 solves your issue.

Thank you. My issue was resolved in 1.6.0.
Tested in production about 2 weeks.

@benubois
Copy link
Collaborator

benubois commented Sep 7, 2021

Please open a new issue if you're still seeing this on 1.6.1.

@benubois benubois closed this as completed Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants