You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The objective of this issue is to propose a new feature that gives the user the possibility to change the inner HTTP client (currently being Net::HTTP) with other clients (CURL, via curb gem, for example).
First, let me introduce my personal need for this. Some websites check for your user agent before allowing the connection to proceed, checking if the client is a real browser, for example. This can be easily fixed by setting a real browser user agent with agent.user_agent = 'real user agent', for example.
However, some websites are using a more sophisticated approach to this:
Servers can check the TLS version used by the client and compare it to the real version that should be used by the browser presented at the user agent header;
Servers can check the list of TLS ciphers supported by the client and compare it to the real version that should be used by the browser presented at the user agent header. You can check it in https://browserleaks.com/tls , and Chrome has a different list from Firefox, for example. Net::HTTP has a different list. CURL also have a different list, but you can used modified versions of CURL (such as curl-impersonate) to have the same ciphers as Chrome/Firefox.
We at Infosimples proceeded to develop a fork of Mechanize called Mechanize-Curl, that replaces Net::HTTP with Curl::Easy. Its available at https://github.com/infosimples/mechanize_curl . We use this in an environment where curl-imperonate is installed, and we can set the user agent, the TLS version, and the ciphers to be used by a Mechanize-Curl instance to be the same as a real Chrome/Firefox.
We think that it would be a good idea to have this feature in the main Mechanize gem. Not only it is hard for us to maintain the fork (we couldn't figure ou how to use a moneky patched version of mechanize at the same time that we use the original mechanize gem), but we also think it would be a good feature for the community.
So, we propose the following changes to the Mechanize gem, that can be developed in steps:
Change the project so the Net::HTTP client can be "detached" and replaced by any other client. Ruby does'nt have Interface types, but we can think of this featuere in some similarity to interfaces. We could have "attacheable backends" that implement all necessary methods to be used by Mechanize.
I'm not sure if I was clear enough, so I will show with some examples:
The example above is inflexible. @http will always be a Net::HTTP::Persistent instance. We could change it to:
# User sets the backend to be usedagent=Mechanize.newagent.backend=:net_http_persistent# or :curb, for example# ...internally# lib/mechanize/http/agent.rb#190@backend_client=Mechanize::HTTP::BACKENDS[@backend]@http=@backend_client.new(name: connection_name)
In this example, all methods that are called in @http should be implemented in Mechanize::HTTP::BACKENDS[:net_http_persistent] class and Mechanize::HTTP::BACKENDS[:curb] (and others). This way, we can change the backend used by Mechanize without changing the code in the rest of the project. New backends can be added by the community, or easily added in a monkey patch for a specific private project.
In this first step, we could only detach Net::HTTP and implement it as a backend.
Add CURL backend to the project. This is a easier step, since the backend feature would already be implemented. Would be a matter of implementing the Mechanize::HTTP::BACKENDS[:curb] class.
I know this would cause a lot of changes internally, and it would be harder to guarantee that no new bugs would be introduced. But I can develop it and add tests to it.
Before starting this project, I would like to know if this feature would be accepted by the community and if it would be merged into the main project, in case it is ready. It will be time consuming, so I'd like to know if it would be possibly merged before committing myself to it.
Thank you for your attention. I hope to hear from the community soon.
The text was updated successfully, but these errors were encountered:
Back in 2010, @drbrain swapped the default Net::HTTP to Net::HTTP::Persistent in 4d074f4, and he did it in a way that makes the agent library almost swappable via the Mechanize::HTTP::Agent#http accessor that you mention above.
If we simply changed:
- attr_reader :http+ attr_accessor :http
Would that be sufficient to unblock you? Then you might be able to simply write:
The objective of this issue is to propose a new feature that gives the user the possibility to change the inner HTTP client (currently being
Net::HTTP
) with other clients (CURL, viacurb
gem, for example).First, let me introduce my personal need for this. Some websites check for your user agent before allowing the connection to proceed, checking if the client is a real browser, for example. This can be easily fixed by setting a real browser user agent with
agent.user_agent = 'real user agent'
, for example.However, some websites are using a more sophisticated approach to this:
Net::HTTP
has a different list. CURL also have a different list, but you can used modified versions of CURL (such as curl-impersonate) to have the same ciphers as Chrome/Firefox.We at Infosimples proceeded to develop a fork of
Mechanize
calledMechanize-Curl
, that replacesNet::HTTP
withCurl::Easy
. Its available at https://github.com/infosimples/mechanize_curl . We use this in an environment where curl-imperonate is installed, and we can set the user agent, the TLS version, and the ciphers to be used by a Mechanize-Curl instance to be the same as a real Chrome/Firefox.We think that it would be a good idea to have this feature in the main
Mechanize
gem. Not only it is hard for us to maintain the fork (we couldn't figure ou how to use a moneky patched version of mechanize at the same time that we use the original mechanize gem), but we also think it would be a good feature for the community.So, we propose the following changes to the
Mechanize
gem, that can be developed in steps:Change the project so the
Net::HTTP
client can be "detached" and replaced by any other client. Ruby does'nt have Interface types, but we can think of this featuere in some similarity to interfaces. We could have "attacheable backends" that implement all necessary methods to be used by Mechanize.I'm not sure if I was clear enough, so I will show with some examples:
The example above is inflexible.
@http
will always be aNet::HTTP::Persistent
instance. We could change it to:In this example, all methods that are called in
@http
should be implemented inMechanize::HTTP::BACKENDS[:net_http_persistent]
class andMechanize::HTTP::BACKENDS[:curb]
(and others). This way, we can change the backend used by Mechanize without changing the code in the rest of the project. New backends can be added by the community, or easily added in a monkey patch for a specific private project.In this first step, we could only detach
Net::HTTP
and implement it as a backend.Add CURL backend to the project. This is a easier step, since the backend feature would already be implemented. Would be a matter of implementing the
Mechanize::HTTP::BACKENDS[:curb]
class.I know this would cause a lot of changes internally, and it would be harder to guarantee that no new bugs would be introduced. But I can develop it and add tests to it.
Before starting this project, I would like to know if this feature would be accepted by the community and if it would be merged into the main project, in case it is ready. It will be time consuming, so I'd like to know if it would be possibly merged before committing myself to it.
Thank you for your attention. I hope to hear from the community soon.
The text was updated successfully, but these errors were encountered: