-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spotlessSetLicenseHeaderYearsFromGitHistory should respect existing headers and current date #783
Comments
I'm worried about a combinatoric explosion of features in the license header step, but I'm open to new features if necessary. For now, it looks to me like there's a misunderstanding. First off, any project which wants to enforce headers can be in one of two states:
If the headers are accurate, then updating them is easy. You can either use If the headers are inaccurate, then the only option is a one-time manual process to make them accurate. Once they are accurate, then you can use one of the simple workflows above. One starting point for this manual process is "what does git say", which is nice because although it's not perfect, it is nonetheless at least a verifiable lower bound. Another starting point is "parse what is in there now". The problem is that if the headers have not been enforced in the past, then it is guaranteed that there is going to be a lot of noise. And that means that a lot of the headers cannot be accurately parsed automatically (e.g. If I understand correctly, you want to combine "git" with "what is in there now", but you want to do that automatically. I think that is an unavoidably manual process, because parsing "what is in there now" is unavoidably manual if the existing headers are dirty. So I am inclined to not merge a feature to support this behavior, because I think the specification is inconsistent. Manual steps are unavoidable, and simple tools can help with the manual process. I am suspicious of complex tools, because I think the combination of new-user-confusion, power-user-overconfidence, and implementation complexity leads to net-negative value. |
Obviously I disagree with your assessment 😄
I think this statement over-simplifies things by assuming that all files within a project are in the same consistent state. The truth is that when onboarding projects there are likely to be a mix of files with accurate headers, missing headers, once accurate but now outdated headers, and some nonsense oddities too. Some of these files will have copyright statements pre-dating the git repository they currently reside in because not all files start life in git.
(FYI Spotless will establish a date range of You're right that there is a "garbage in - garbage out" problem here, but the reality is that often the "garbage in" is only a little inconsistent, and not totally irreparable nonsense. The projects I've been working with of late have a few minor variations of license format between files but primarily vary between simply having a header vs not. Whichever mode you use Spotless's license header functionality in, if existing headers and the new template are of vastly different structure then the tool will produce considerable changes to review and likely correct (more on that below).
Given a file claiming copyright 2016-2020, but not seen in git until 2019, what are the chances that we should ever change that initial claim year to 2019? The chances seem very slim to me. The claim that copyright started in 2016 can be parsed from the file content and is just as verifiable as the as the date associated with the git commit - each could be faked but we have no reason to assume either has been.
Review of the tools output is surely unavoidable anyway. When onboarding a project nobody should be blindly accepting the copyright claims generated by the tool. The thing is that you want to the tool to have done the reasonably right thing so that when reviewing said output the number of corrections is minimised. As you point out, the tool's "use git history" mode is clearly the best way to add verifiable copyright statements to files with missing copyright headers. The problem is that it assumes it knows best and blindly overwrites the claims of pre-existing files, sometimes shortening their claimed range, which I don't think is ever acceptable. This resulting reduction in copyright claim is the part that really troubles me about the "use git history" mode. Currently I'm left with the dissatisfactory the choice between:
If I've persuaded you on any of these points then #784 is available for review, otherwise I'm happy to drop it. For the projects I've been working with my version produces output which, of course, needs review but is correct the vast majority of the time and only needs minimal corrections - all of which suits me perfectly. I'll likely end up onboarding using my fork, but the nice thing is that it shouldn't be needed beyond the onboarding process and we'll be able to switch to a stable maintained version of Spotless for ongoing ratcheting. Beyond this issue Spotless looks to be a pretty spot-on tool for our team! |
If there are a ton of files like this, could you find-replace
Great! Thanks very much for the PR. At least for now, I am closing this as won't-implement. I added a comment to your PR #784 so that others can run it. If other people use/need this and feel that I made a mistake, I hope you'll leave a comment. The history of this license header step is that it's been buried in combinatoric explosion-of-features, so I'm protective of adding more complexity to it, perhaps overly so. In particular, there are other open issues like #774 and #323 which I think are more common, and I don't want to make implementing them harder based on what I perceive to be a very rare case. |
I'm trying use Spotless 2.7.0 to onboard a repository and inject license headers, via the maven plugin. Existing sources are in a variety of states, many without any header, some with a outdated header that hasn't been maintained, some with a history pre-dating the git history.
For example, with the following files, running
mvn spotless:apply
I get the "Default Result", whereas runningmvn spotless:apply -DspotlessSetLicenseHeaderYearsFromGitHistory=true
I get the "FromGit Result", but I'd really like to get the "Desired Result".new.java
old.java
outdated.java
prehistory.java
For
old.java
,outdated.java
,prehistory.java
with the-DspotlessSetLicenseHeaderYearsFromGitHistory=true
option I think the Git Range should be extended to include the Current Year, as the change will be committed in the current year. This effect can be achieved manually by rerunning without the-DspotlessSetLicenseHeaderYearsFromGitHistory=true
option but I don't think that should be necessary.For
prehistory.java
with the-DspotlessSetLicenseHeaderYearsFromGitHistory=true
option I think the date range should also be extended to incorporate the date range claimed in the existing license Header Range. To ignore the existing license header range results in data loss and a narrower copyright claim which is a problem.Would it be reasonable to extend
spotlessSetLicenseHeaderYearsFromGitHistory
functionality to incorporate these new behaviours?Would it be necessary to add further options so that current behaviour can be maintained too?
The text was updated successfully, but these errors were encountered: