Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: exchangeable internal HTTP clients (from Net::HTTP to Curb, for example) #658

Open
adrianodennanni opened this issue Dec 6, 2024 · 2 comments

Comments

@adrianodennanni
Copy link
Contributor

The objective of this issue is to propose a new feature that gives the user the possibility to change the inner HTTP client (currently being Net::HTTP) with other clients (CURL, via curb gem, for example).

First, let me introduce my personal need for this. Some websites check for your user agent before allowing the connection to proceed, checking if the client is a real browser, for example. This can be easily fixed by setting a real browser user agent with agent.user_agent = 'real user agent', for example.

However, some websites are using a more sophisticated approach to this:

  • Servers can check the TLS version used by the client and compare it to the real version that should be used by the browser presented at the user agent header;
  • Servers can check the list of TLS ciphers supported by the client and compare it to the real version that should be used by the browser presented at the user agent header. You can check it in https://browserleaks.com/tls , and Chrome has a different list from Firefox, for example. Net::HTTP has a different list. CURL also have a different list, but you can used modified versions of CURL (such as curl-impersonate) to have the same ciphers as Chrome/Firefox.

We at Infosimples proceeded to develop a fork of Mechanize called Mechanize-Curl, that replaces Net::HTTP with Curl::Easy. Its available at https://github.com/infosimples/mechanize_curl . We use this in an environment where curl-imperonate is installed, and we can set the user agent, the TLS version, and the ciphers to be used by a Mechanize-Curl instance to be the same as a real Chrome/Firefox.

We think that it would be a good idea to have this feature in the main Mechanize gem. Not only it is hard for us to maintain the fork (we couldn't figure ou how to use a moneky patched version of mechanize at the same time that we use the original mechanize gem), but we also think it would be a good feature for the community.

So, we propose the following changes to the Mechanize gem, that can be developed in steps:

  1. Change the project so the Net::HTTP client can be "detached" and replaced by any other client. Ruby does'nt have Interface types, but we can think of this featuere in some similarity to interfaces. We could have "attacheable backends" that implement all necessary methods to be used by Mechanize.

    I'm not sure if I was clear enough, so I will show with some examples:

    # lib/mechanize/http/agent.rb#190
    @http = Net::HTTP::Persistent.new(name: connection_name)

    The example above is inflexible. @http will always be a Net::HTTP::Persistent instance. We could change it to:

     # User sets the backend to be used
     agent = Mechanize.new
     agent.backend = :net_http_persistent # or :curb, for example
    
     # ...internally
     # lib/mechanize/http/agent.rb#190
     @backend_client = Mechanize::HTTP::BACKENDS[@backend]
     @http = @backend_client.new(name: connection_name)

    In this example, all methods that are called in @http should be implemented in Mechanize::HTTP::BACKENDS[:net_http_persistent] class and Mechanize::HTTP::BACKENDS[:curb] (and others). This way, we can change the backend used by Mechanize without changing the code in the rest of the project. New backends can be added by the community, or easily added in a monkey patch for a specific private project.

    In this first step, we could only detach Net::HTTP and implement it as a backend.

  2. Add CURL backend to the project. This is a easier step, since the backend feature would already be implemented. Would be a matter of implementing the Mechanize::HTTP::BACKENDS[:curb] class.

I know this would cause a lot of changes internally, and it would be harder to guarantee that no new bugs would be introduced. But I can develop it and add tests to it.

Before starting this project, I would like to know if this feature would be accepted by the community and if it would be merged into the main project, in case it is ready. It will be time consuming, so I'd like to know if it would be possibly merged before committing myself to it.

Thank you for your attention. I hope to hear from the community soon.

@flavorjones
Copy link
Member

Hi @adrianodennanni, thanks for starting this conversation.

the inner HTTP client (currently being Net::HTTP)

Nit: the default HTTP client is Net::HTTP::Persistent which is a different library which is found at https://github.com/drbrain/net-http-persistent

Back in 2010, @drbrain swapped the default Net::HTTP to Net::HTTP::Persistent in 4d074f4, and he did it in a way that makes the agent library almost swappable via the Mechanize::HTTP::Agent#http accessor that you mention above.

If we simply changed:

-  attr_reader :http
+  attr_accessor :http

Would that be sufficient to unblock you? Then you might be able to simply write:

m = Mechanize.new
m.agent.http = YourBackend.new

@adrianodennanni
Copy link
Contributor Author

Hello @flavorjones . Sorry for the long time for the reply. I will validate this today and will return soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants