Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTP: Update ssl to make receiving data before handshake easier #243

Open
essen opened this issue Jun 12, 2019 · 11 comments
Open

OTP: Update ssl to make receiving data before handshake easier #243

essen opened this issue Jun 12, 2019 · 11 comments
Labels
OTP Work on Erlang/OTP itself.

Comments

@essen
Copy link
Member

essen commented Jun 12, 2019

Currently we are using a hack and that's fine but should eventually be made better. See https://bugs.erlang.org/browse/ERL-757 for a short discussion on the topic.

@essen essen changed the title Update OTP's ssl application to make receiving data before handshake easier OTP: Update ssl to make receiving data before handshake easier Jun 21, 2019
@essen essen added the OTP Work on Erlang/OTP itself. label Jul 18, 2019
@essen
Copy link
Member Author

essen commented Feb 2, 2020

Maybe update ranch_ssl to listen using TCP and have TCP/TLS options separate. Alternatively provide TLS options in handshake only, provide a separate option for validation, or something. Anyway the idea is to keep the options separated.

@essen
Copy link
Member Author

essen commented Feb 2, 2020

Ranch 3.0 I guess?

@bullno1
Copy link

bullno1 commented Sep 22, 2021

I am having a very strange bug while using ssl with PROXY protocol.
The ssl process hangs in waiting while the controlling_process is already dead.
As a result, the underlying socket is still kept open.

Can't confirm it yet but it seems this could be the cause since it could trigger some unexpected state/race condition in ssl.

There are too many moving parts in my infra so I can't create a minimal example yet.

@essen
Copy link
Member Author

essen commented Sep 22, 2021

It's possible the hack is no longer working. Can you check which OTP version you are starting to have the problem with?

@bullno1
Copy link

bullno1 commented Sep 22, 2021

OTP-23, cowboy 2.9.0.
The app was built based on erlang:23-alpine image.

I can reliably trigger this by:

  1. Setup ssl in cowboy with proxy_header => true
  2. Connect directly with curl. Ranch will complain with: 'The PROXY protocol header signature was not recognized. (PP 2.1, PP 2.2)'
  3. curl will hang but I close it with Ctrl+C
  4. With recon and length(recon:tcp()) I can confirm that the number of sockets goes up.
    ranch:info() however, shows {active_connections,0} since ranch already "let go" of the connection.

Then I dig around with recon:port_info and recon:get_state to inspect the ssl process handling the gen_tcp socket to find out that the process held in the record connection_env (registered in ssl_gen_statem:new_user/2 as part of calling ssl:controlling_process) was already dead.

I know ranch is expecting a PROXY header but I think ssl should also get to teardown the connection.

All this happens in kubernetes so there's also that piece of the puzzle...
Will try to create a minimal example later now that it's narrowed down.

Update 2: I can confirm that the load balancer is being dumb and it ... randomly injects a null byte at the beginning of the PROXY header.
That said, I'd still expect a clean teardown of the connection instead of hanging.
It seems to be because cowboy_tls.erl crash on badmatch (flatlog is being used):

when=2021-09-22T18:59:55.117203+00:00 level=error pid=<0.682.0> at=: label={error_logger,error_msg} format="Ranch listener ~p had connection process started with ~p:start_link/4 at ~p exit with reason: ~999999p~n" args="[overlord_https,cowboy_tls,<0.3515.8>,{{badmatch,{error,protocol_error,'The PROXY protocol header signature was not recognized. (PP 2.1, PP 2.2)'}},[{cowboy_tls,connection_process,4,[{file,\"/opt/app/_build/default/lib/cowboy/src/cowboy_tls.erl\"},{line,38}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,226}]}]}]"

It didn't get to call handshake.

Update 3:

Inspecting the stuck ssl process:

> sys:get_status(list_to_pid("<0.11451.10>")).

{status,<0.11451.10>,
    {module,gen_statem},
    [[{'$initial_call',{ssl_gen_statem,init,1}},
      {ssl_pem_cache,ssl_pem_cache},
      {'$ancestors',
          [tls_connection_sup,tls_sup,ssl_connection_sup,ssl_sup,
           <0.587.0>]},
      {ssl_manager,ssl_manager}],
     running,<0.598.0>,[],
     [{header,"Status for state machine <0.11451.10>"},
      {data,
          [{"Status",running},
           {"Parent",<0.598.0>},
           {"Modules",[tls_connection]},
           {"Time-outs",{0,[]}},
           {"Logged Events",[]},
           {"Postponed",
            [{info,
                 {'DOWN',#Ref<0.1238859887.2693005313.234482>,process,
                     <0.11454.10>,{...}}}]}]},
      {data,
          [{"State",
            {initial_hello,
                {state,
                    {static_env,server,gen_tcp,tls_gen_connection,tcp,
                        tcp_closed,tcp_error,tcp_passive,...},
                    {connection_env,
                        {#Ref<0.1238859887.2693005313.234482>,<0.11454.10>},
                        undefined,false,undefined,undefined,{...}},
                    #{honor_cipher_order => false,cacerts => undefined,
                      session_tickets => disabled,key => undefined,
                      use_ticket => undefined,psk_identity => undefined,...},
                    {socket_options,binary,raw,0,0,...},
                    {handshake_env,undefined,0,{...},...},
                    [],false,
                    #{active_n => 100,...},
                    {...},...}}}]}]]}

The down message is postponed when it's in the initial_hello state.
Therefore, the ssl process keeps hanging in that state.

It seems using ssl:transport_accept and then not calling ssl:handshake is unsafe.

So now there are 3 blame targets:

  • Is ssl API safe if misuse and resource leak can happen? Erlang's philosophy is to link resource to process and this is clearly a violation of that.
    Note that calling recv didn't break ssl, it's calling transport_accept and then proceed to crash that did the trick.
  • Should cowboy call handshake even on an invalid proxy header?
  • Is ranch hack appropriate given that this can happen?

I don't have the answer right now.

@bullno1
Copy link

bullno1 commented Sep 22, 2021

This is clearly an OTP bug, if controlling_process, dies, the ssl process should die too.
Still, I think ranch should work around this in mean time.

I wonder if a well-placed link would help.
And, of course, the ssl process traps exit.

I think I'll just do a manual close or something dumb like that on proxy header error.

@bullno1
Copy link

bullno1 commented Sep 24, 2021

Filed a bug with OTP: erlang/otp#5239

@bullno1
Copy link

bullno1 commented Sep 24, 2021

Back to ranch, IMO, the cleanest fix would be to listen on gen_tcp and upgrade with ssl:handshake and force options to be separate: {tcp => TcpOpts, ssl => SslOpts}.

This would, of course, break all existing code. But it will remain safe for the foreseeable future.

On my side, I need my app to work ASAP so I'll write custom ranch_ssl and cowboy_tls.
Will share later.

@essen
Copy link
Member Author

essen commented Sep 24, 2021

Separating options is something that I had to do in Gun and would likely be a good idea in Ranch as well. But it's unlikely to come any time soon as it would for example require reworking RabbitMQ listener configuration. Or at least it means we can't replace ranch_ssl but we can most likely introduce a new module or option to trigger the new behavior.

@bullno1
Copy link

bullno1 commented Sep 25, 2021

There's also another way: hardcode a list of ssl options. There are not that many... We can filter at runtime.

Alternatively, I think it's possible to extract the type definition from the module at runtime (or how does dialyzer work?). It's possible to filter out all ssl options.

Edit: I just looked into dialyzer, it grabs core erlang from debug info embedded in the module and manipulate it. Seems complicated. A dialyzer dependency is too much and it won't work with OTP compiled without debug info either (but that's rare). However, it could be done at compile time to generate an option list for every ssl version...

@essen
Copy link
Member Author

essen commented Sep 25, 2021

No the ssl options change too much.

What can work is exporting the function in ssl that splits ssl/non-ssl options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OTP Work on Erlang/OTP itself.
Projects
None yet
Development

No branches or pull requests

2 participants