-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do no accept invalid HTTPs connections #166
Comments
|
@rgaudin This seems straighforward to implement. Per default it should not be insecure, but in the context of youzim.it I would activate it (not need to allow user to change this IMO). Doing this would allow to reduce easily the number of scraping just failing. |
I found new errors (for Zimit scrapes) because actually the https configuration we were using is pretty picky. We should really implement this ASAP. |
This is definitely an important issue BUT not a "good first issue" AFAIK, since this change must be done in python |
And from a higher PoV, I'm really not sure it is a good idea to make this a default. As mentioned by another issue, security of content grabbed into the ZIM is quite important since we are somehow responsible of this. If some viruses achieves to make its way into the ZIM, we will at least have our share of responsibility. If someone achieves to man-in-the-middle any of our Zimit worker, he will very easily be able to inject all kind of payloads into the ZIM. With insecure HTTP accepted by default, it means that the kind of attacks will be even easier to perform. |
@kelson42 did not propose to make it a default but to use it by default in youzim.it Responsibility chain is very different there as users request a particular URL to be turned into a ZIM. TLS only ensures encryption between zimit and the server. Bad content can already be served and thus included into ZIM. While I'm in favor of an option to disable security, I'm not comfortable enabling it on youzim.it because we'd be breaking the chain of trust. |
I agree I did not get the point very precisely. And even if the chain of responsibility is different on youzim.it, while the end-user can check that the URL he is entering has no bad content (are at least that this is what he intend to package), he won't be aware of any man-in-the-middle between zimit and his intended website manipulating the ZIM content. Ensuring there is minimal risk of such a man-in-the-middle is our responsibility. Ensuring no-one can package bad content into a ZIM (e.g. by setting up a server with bad content and requesting it in youzim.it) is something obviously way more debatable. |
We have regularly case because of current strict TLS configuration, see https://github.com/kiwix/k8s/issues?q=is%3Aissue+is%3Aclosed+sort%3Aupdated-desc+routine. I have removed the question tag of this issue. |
Correct me if you are not ok with proposition below. Part 1
Part 2
Some notes:
|
(stupid) question: what is the risk of having |
For Zimit only.
I don't care, but "yes".
Yes for Zimfarm, I don't think this is necessary for Zimit-frontend.
Configure software with proper configuration files, configuration via ENV variable is simply bad in 90% of the cases.
99% of people have no clue how to deal with a SSL error, here does not make sense IMHO. Keep it simple, the user risk is really almost null. |
The flip side is that we will have all recipes running an insecure configuration. Meaning a bigger attack surface. More chances to put unsolicited content in a ZIM. And this will be our responsibility since we took the decision to run an insecure configuration by default. The secure configuration / HTTPS ensures that the website which is responding is the proper one. If an attacker achieves to modify our network to respond with another server than the one which is supposed to respond, he won't have the proper HTTPS setup, and in the secure context the connection will fail, we won't create the ZIM with unexpected/bad content. When a user requests an zimit scrape, he probably hopes we will put inside the ZIM content from the real website he is targeting. If an attacker has modified our network to reply to web requests with whatever he likes and we are running in the insecure configuration, we will put this illegitimate content inside the ZIM. The user will never be warned about it. We will never be warned about it. The ZIM might now contain a defaced website, harmful payloads, ... The risk of an attacker being able to modify the network setup is probably limited in the zimit setup where we have full control on the worker. It is bigger on the opemZIM farm, because we have no control on the worker (and the worker owner could be the attacker). We do not share the same sensitivity on this risk with Emmanuel. Once the risk will materialize (if it does, we have significant chances it won't) and if the insecure configuration has been used by default, we will have very little arguments but "we are sorry, we decided to run an insecure configuration to have less work/errors". Most users won't care. Technical users will be very angry. If the insecure configuration is not applied by default at all and it is the youzim.it user which explicitly decides to run the recipe in an insecure configuration, as said, risk is transferred, we honored our responsibilities in the chain of trust. Trust is always very difficult to gain and very easy to loose. But it is definitely your responsibility. I can definitely implement whatever you decide. I just cannot let you decide without warning you about the risks as I perceive them (and my perception might be totally wrong). |
Yes
OK
This is the current way of doing things in zimfarm (and most other tools we have AFAIK). But I could say that the ENV variable is configured in a k8s configuration file, so we match somehow your requirement.
I disagree, I don't see why we can't say to the user something like "oh, we are sorry but it looks like the website you are targeting is not running a secure configuration ; do you allow to us to proceed with an insecure one". All browsers already do it. And yes, in 99% of the cases the user decides to take the risk, but the responsibility has been explicitly transferred. |
#285 somehow resolved the point for Zimit2:
Should we open another issue to "force" the validity of HTTPS connection before proceeding? |
I'm going to open issues then |
Or more exactly, I will reopen this one and open one in Browsertrix Crawler. |
Upstream issue is here: webrecorder/browsertrix-crawler#510 |
Is this mandatory for 2.0? |
We've never allowed invalid HTTPs as start URL for zimit but I frequently see failed attempts in youzim.it logs about self-signed certificates.
I am not sure about the crawler's behavior regarding this. There doesn't seem to be a related flag on the crawler and there is no issue regarding certificates so maybe the default is to allow insecure connection…
If it's not, we could still pass
--allow-running-insecure-content
via theCHROME_FLAGS
environment.Currently, it fails before we start the crawler, when we check the URL in zimit (python).
The text was updated successfully, but these errors were encountered: