Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add versioning information to "public_suffix_list.dat" file #1808

Open
TurtleWilly opened this issue Jul 23, 2023 · 9 comments
Open

Add versioning information to "public_suffix_list.dat" file #1808

TurtleWilly opened this issue Jul 23, 2023 · 9 comments
Labels
❔❔ question Open question, please look / answer / respond

Comments

@TurtleWilly
Copy link

It would be nice to have some sort of (automatic) versioning information directly inside the "public_suffix_list.dat" file. Currently it is practically impossible to determine which file is the most current from a set of multiple "public_suffix_list.dat" on disk. This probably also could be useful for libpsl to determine what the "latest" is.

With CVS or SVN we could add // $Id$ as the first line of the file and the problem would solve itself (svn may need a propset depending on the configuration). The source control system would then automatically insert current version and/or date during the checkout (I'm not too familiar with git and if it has a similar feature or not.)

@eli-schwartz
Copy link

You can do the same thing in git with $Format:%cs$ where %cs is the formatter code to embed a YYYY-MM-DD style timestamp of the commit date (not the checkout date).

There are no tags so git describe can't be used with any degree of accuracy.

@dnsguru
Copy link
Member

dnsguru commented Jul 26, 2023

@smarnach is this possible?

@dnsguru dnsguru added the ❔❔ question Open question, please look / answer / respond label Jul 26, 2023
@weppos
Copy link
Member

weppos commented Aug 1, 2023

Git doesn't ship with an $id$ equivalent feature. Instead, you are encouraged to leverage SHAs generated by Git itself.

In order to embed an external information, like the SHA or any other ID, we would need to pre-process the file before being committed. This is generally the responsibility of a CI/pipeline that we don't have.

I am not inclined to add such complexity in the file itself when this is within the repo, as it would be redundant since we can leverage git.

Ideally, the tagging should happen in the pipeline that processes the list for distribution at
https://publicsuffix.org/list/public_suffix_list.dat

Although these days I even question whether we still need such distribution mechanism and we shouldn't instead just rely on Git hosting.

For consumers that need/want version tagging the current solution would be to switch towards pulling the list directly from the repo. I've actually been doing it for years in the library I maintain, here's an example:

weppos/publicsuffix-go@a20f9ab

https://github.com/weppos/publicsuffix-go/blob/a20f9abcc222b049ef9b7a28845bac88e0155ae3/publicsuffix/generator/gen.go#L24-L49

@dnsguru
Copy link
Member

dnsguru commented Aug 1, 2023 via email

@smarnach
Copy link
Contributor

smarnach commented Aug 3, 2023

Cloud Storage returns the date the list was last modified in the Last-Modified header, so anyone is free to post-process the file when downloading it via the CDN. It would also be easy to modify the deployment workflow to include the date in the file when uploading the data. From an operational point of view, I don't have any concerns about doing this, so it's up to you to make the call here, @weppos and @dnsguru. I'm happy to make the required changes if you want me to.

@eli-schwartz
Copy link

Git doesn't ship with an $id$ equivalent feature. Instead, you are encouraged to leverage SHAs generated by Git itself.

I specifically pointed out that it does indeed do precisely this. It's part of the git-archive(1) machinery, for example the thing that github uses to generate https://github.com/publicsuffix/list/archive/refs/heads/master.tar.gz

It doesn't affect git clones, although you could invoke that machinery pretty easily:

git archive HEAD <filename> | bsdtar -x -C path/to/output/directory -f -

@dnsguru
Copy link
Member

dnsguru commented Aug 14, 2023

Because the gTLD list from ICANN's JSON has a timestamp in it, and that's the most often updated element, I'd assert that "Solution Exists" if one were to track that as the last date. It does not account for deltas that occur between auto-pulls from ICANN, but due to the frequency of those, and their priority of processing ahead of subdomain projects, this works itself out relatively well.

This was referenced Sep 15, 2023
@dnsguru
Copy link
Member

dnsguru commented Oct 2, 2023

Cloud Storage returns the date the list was last modified in the Last-Modified header, so anyone is free to post-process the file when downloading it via the CDN. It would also be easy to modify the deployment workflow to include the date in the file when uploading the data. From an operational point of view, I don't have any concerns about doing this, so it's up to you to make the call here, @weppos and @dnsguru. I'm happy to make the required changes if you want me to.

In reviewing #1855 / #1856 - in order to avoid confusion about versions of security reports that would cause further disposible volunteer resource drain in hunting, we may want to tie doing these things together:

  • Add Date in file
  • Implement Security Policy

I have seen salient arguments for doing both and also for doing neither, but it seems like datestamp would be prereq should we implement a security policy were that to proceed.

@eli-schwartz
Copy link

Would you be interested in an implementation of the git-archive side of this on the theory that it causes no harm to have this literal text in the file:

// this is not guaranteed to be updated, but will contain either "$Format" or else a YYYY-MM-DD timestamp
// Date updated: $Format:%cs$

and under some conditions, at least, it would be a benefit since it would actually contain:

// this is not guaranteed to be updated, but will contain either "$Format" or else a YYYY-MM-DD timestamp
// Date updated: 2023-10-02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❔❔ question Open question, please look / answer / respond
Projects
Status: awaiting feedback
Development

No branches or pull requests

5 participants