Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request : md5 on a page #351

Open
sachaz opened this issue Aug 27, 2018 · 18 comments · May be fixed by #1354
Open

feature request : md5 on a page #351

sachaz opened this issue Aug 27, 2018 · 18 comments · May be fixed by #1354

Comments

@sachaz
Copy link

sachaz commented Aug 27, 2018

Hi

All is in the title, this is a feature request for a new probe: md5 check to a specified value to verify the integrity of a page.

@brian-brazil
Copy link
Contributor

There's already regex support to verify that a http response contains given output, is this not sufficient for your use case?

@sachaz
Copy link
Author

sachaz commented Aug 28, 2018

Regex feature is really cool (thanks for this) but it is not enough to verify the page is not changed, that's not the same thing to verify the integrity of a page like with a checksum.

@brian-brazil
Copy link
Contributor

What are you actually trying to test here?

The presumption is that you've got some form of web app whose content changes over time, so you want to look for e.g. key phrases rather than exact content which can change from release to release.

@sachaz
Copy link
Author

sachaz commented Aug 28, 2018

Ok let's have some concretes examples :)
I have some web app content giving several results of tests which have to be always the same. There is too much to do a regex, a md5 in this case is easier.
Another example could be a to verify the integrity of a site pages, the developers can provide a md5 for the pages and the test can validate your site is not defaced.

@brian-brazil
Copy link
Contributor

Checking the integrity of an entire website is a bit out of scope, this exporter is more for determining if a website is working at all - and not something you want to be doing once a minute. A tool specifically designed for this may be better here.

@SuperQ
Copy link
Member

SuperQ commented Aug 28, 2018

It might be an option to expose the sha256sum of the page an info metric.

@brian-brazil
Copy link
Contributor

That could vary from scrape to scrape, and thus would be too high cardinality.

@sachaz
Copy link
Author

sachaz commented Sep 4, 2018

Let's be clear: the feature is requested to validate an http web page not to check a site.

@discordianfish
Copy link
Member

I like that feature. You could even expose it just as metric value. I think it would be useful for monitoring all kind of assets for consistency. e.g use it to check if your public key on 3rd party service wasn't modified oder your shasum file for a binary release on a package mirror etc.

@sachaz
Copy link
Author

sachaz commented Sep 8, 2018

@discordianfish absolutely

@silkeh
Copy link
Contributor

silkeh commented Oct 12, 2019

I've taken a look into implementing this, and based on the comments I see the following options:

  • Add a setting to check for a match in a set of SHA-256 checksums, similar to the regex checks.
    This only really works for static data, as the probe config would need to be updated for every update on the page.
  • Add a setting to export the SHA-256 checksum as a metric with a note about cardinality.
    This only works for fairly static changing data, because of the cardinality.
  • Export the CRC32 of the page as a metric value.
    This works for any page, but cannot be used for security purposes. Just using probe_http_last_modified_timestamp_seconds is probably better.

@discordianfish
Copy link
Member

Why not SHA as metric value? That's what I would do. But it looks like @brian-brazil doesn't want it anyway so this issue should probably be closed.

@silkeh
Copy link
Contributor

silkeh commented Oct 16, 2019

Why not SHA as metric value? That's what I would do.

Because metric values are double-precision floating point (float64), and a SHA is >64 bits. 64 bits is not sufficient to ensure that the content has not been tampered with. This limitation is why I suggested CRC32 above.

Concerning label values: this will result in high cardinality (see caution in the documentation), so it would need to be opt-in (and even then would not be a great idea). Play around with this branch if your really want to try it.

@platan
Copy link

platan commented Oct 16, 2019

A HTTP response does not have to be a text. It can be a binary data. In this case a regex will not work. And a content checksum seems to be a good idea for such data. In my case I want to check that a content of the dynamically generated PNG file does not change. Currently I can only check status code and content length.

@brian-brazil
Copy link
Contributor

In this case a regex will not work.

Why do you think that? What problems did you encounter?

@znerol
Copy link

znerol commented Apr 19, 2023

I've taken a look into implementing this, and based on the comments I see the following options:

  • Add a setting to check for a match in a set of SHA-256 checksums, similar to the regex checks.
    This only really works for static data, as the probe config would need to be updated for every update on the page.

This approach would be a perfect fit for monitoring the integrity of security.txt files and PGP public keys linked from there.

@discordianfish
Copy link
Member

Yeah I still think this would be make a good feature. Now with Brian not being maintainer anymore, I think it's likely that this would get merged /cc @roidelapluie

@electron0zero
Copy link
Member

Regex feature is really cool (thanks for this) but it is not enough to verify the page is not changed, that's not the same thing to verify the integrity of a page like with a checksum.

not exactly what's being asked here but we do we have probe_http_uncompressed_body_length and probe_http_content_length that can be used to monitor when the content is changed by monitoring the content_length and body_length

@silkeh silkeh linked a pull request Jan 11, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants