Skip to content

ISC title variations #497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 15, 2017
Merged

ISC title variations #497

merged 4 commits into from
Dec 15, 2017

Conversation

mlinksva
Copy link
Contributor

@mlinksva mlinksva commented Nov 28, 2017

The colon in ISC License: is probably copied from https://www.isc.org/downloads/software-support-policy/isc-license/ where it is part of a longer string that is more clearly not even supposed to be included in the text ("Text of the ISC License:") so I think it makes sense to support matching the colon but not have it be canonical -- not sure I've seen it in the wild.

I have seen The ISC License plenty, eg one of the most starred ISC repos on github.com -- https://github.com/isaacs/node-glob/blob/master/LICENSE

I have also seen ISC License (ISC), probably because that's what seems to be on https://opensource.org/licenses/isc-license.txt if you take the heading to be part of the license text, which people often do.

Note the version at https://choosealicense.com/licenses/isc/ just has ISC License and that version is of course popular.

So I'm proposing that ISC License be canonical, optionally preceded by The, optionally followed by (ISC), optionally followed by:.

Separately, I also think it would be a good idea to make ISC not the canonical copyright holder. The license is now very widely used generally with the disclaimer fields as AUTHOR and, and starting with a fill-in copyright line like MIT (see above links). If there's any interest I could add here or make a new PR to cover these non-title aspects.

cc @waldyrious as an ISC expert

The colon in `ISC License:` is probably copied from https://www.isc.org/downloads/software-support-policy/isc-license/ where it is part of a longer string that is more clearly not even supposed to be included in the text ("Text of the ISC License:") so I think it makes sense to support matching the colon but not have it be canonical -- not sure I've seen it in the wild.

I have seen `The ISC License plenty`, eg one of the most starred ISC repos on github.com -- https://github.com/isaacs/node-glob/blob/master/LICENSE

I have also seen `ISC License (ISC)`, probably because that's what seems to be on https://opensource.org/licenses/isc-license.txt if you take the heading to be part of the license text, which people often do.

Note the version at https://choosealicense.com/licenses/isc/ just has `ISC License` and that version is of course popular.
Copy link
Contributor

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Some previous discussion here and the following comments. I'll buy your assertion that the ISC page does not consider their current “Text of the ISC License:” as part of the license iteself, and once you accept that your title makes sense.

@wking
Copy link
Contributor

wking commented Nov 28, 2017

I'm also ok dropping the entire title from the canonical version. For example, DHCP v4.0.0 includes no title. Markup for that would be:

<titleText>
  <alt match="(The )?ISC Liense( \(ISC\))?:?" name="title"></alt>
</titleText>

or something like that.

@mlinksva
Copy link
Contributor Author

I weakly prefer to see a title, but happy to change PR to use alt for title if that makes sense for the project.

Speaking of which, I realize I'm not sure what the objective here for canonical versions is. It's the text that'll show up on spdx.org/licenses/$ID ... but should that text be as true to possible to the 1) first known use of a license, 2) the license as currently published by its steward if it has one, or 3) reflecting most common use in the wild, or 4) reflecting useful defaults for people who wish to copy and use the license text, or 5) something else? This especially has bearing on ISC ... if 3 or 4, probably ISC should not be the copyright holder in the canonical/published-on-spdx.org version.

@waldyrious
Copy link
Contributor

waldyrious commented Nov 28, 2017

I'm proposing that ISC License be canonical, optionally preceded by The, optionally followed by (ISC), optionally followed by:.

The ISCL acronym should also be supported, as it's the canonical form for Python packages. See jab/bidict#38, PyPI's list of Trove classifiers, and pkgbase_schema.sql#L2246.

I'm also ok dropping the entire title from the canonical version.

While that's obviously an acceptable option, I'd point out that whatever canonical version ends up being picked will have a significant effect in the text people will end up using in the wild (case in point, the title shown in https://choosealicense.com/licenses/isc/ has been getting popular, as @mlinksva points out).

The short, permissive licenses are quite easily confused: I ran into several people who were using the ISC license mistakenly believing it to be MIT (possibly having copied it from elsewhere, or inherited the repo maintainership). Sane defaults would make this kind of confusion less likely.

[EDIT] examples of ISC/MIT confusion: umbrae/jsonlintdotcom#32, jung-kurt/gofpdf#77, danreeves/react-tether#36.

@waldyrious
Copy link
Contributor

waldyrious commented Nov 28, 2017

Separately, I also think it would be a good idea to make ISC not the canonical copyright holder. (...) If there's any interest I could add here or make a new PR to cover these non-title aspects.

👍. There's some discussion about this in licensee/licensee#139. I quite like the wording used for BSD-2 ("the copyright holders and/or contributors"), but I agree it's best to discuss this in a separate thread.

@waldyrious
Copy link
Contributor

  1. reflecting useful defaults for people who wish to copy and use the license text

IMHO this is the most reasonable choice. The other ones risk propagating mistakes (e.g. typos) or sub-optimal choices, but I can imagine that 2) could be considered legally safer.

@wking
Copy link
Contributor

wking commented Nov 28, 2017 via email

As pointed out at spdx#497 (comment) ISCL is used in some communities, and sometimes appears in license files
https://github.com/search?utf8=%E2%9C%93&q=ISCL+filename%3ALICENSE&type=Code
@mlinksva
Copy link
Contributor Author

I included "if it has one" with (2) for a reason...

Is ISC steward of the ISC license? I think it's questionable. They're the creator/first user, and they have a web page up about what licenses their software are under (some ISC, some MPL-2.0). Contrast with FSF, CC, Eclipse, Mozilla, and Larry Rosen (off the top of my head), who either actively maintain their licenses, or their licenses stipulate that they are the steward.

@wking
Copy link
Contributor

wking commented Nov 28, 2017

Is ISC steward of the ISC license? I think it's questionable. They're the creator/first user...

I think that makes them the steward unless:

  • they hand the role over to someone else, or
  • you make some sort of fair use argument for your edits.

If the ISC doesn't want to steward the ISC license, can we get them to designate an official successor?

@mlinksva
Copy link
Contributor Author

I don't think that being creator/first user makes one steward. Not providing a reusable version is pretty clear indication of non-stewardship. ISC is the steward of their software, not of the license they happened to make up. Same as MIT, UC Berkeley, and others. Those licenses have become widely used in spite of non-stewardship of their creators. (This is just a conjecture, not strongly held.)

@wking
Copy link
Contributor

wking commented Nov 28, 2017 via email

@mlinksva
Copy link
Contributor Author

I'd rather license texts be public domain (as Creative Commons ones are), but I don't think that has any bearing on whether a license is reusable, and thus in my opinion, potentially stewarded.

As far as influencing practice, ISC, MIT, UC Berkeley, and similar are have all gone dark. The cat is out of the bag, the licenses aren't frozen (obviously, but also see doi:10.1007/978-3-319-17837-0_14 though I think it only covers BSD and MIT) in the state promulgated by those entities, and thus usage is subject to what entities like OSI (and probably increasingly SPDX) recommend.

@waldyrious
Copy link
Contributor

waldyrious commented Nov 29, 2017

I tend to agree with @mlinksva here. The points raised by @wking are strong, but ultimately I believe organizations/projects aiming to curate licenses (OSI, SPDX, choosealicense, etc.) would do more good overall by ensuring sane defaults are available and can proliferate, than by increasing the complexity of an already fragmented field out of respect for actors who, for all we know, didn't make a conscious decision to make licenses hard to reuse in the first place.

That said, I think it's reasonable to assume that consensus can be easily reached, in a one-by-one case, regarding what changes are acceptable (formatting/templating vs. rewording à la "and/or"); so as long as these decisions are made by the interested members of the community, following a reasonable discussion period, in an open forum, with permanent documentation, I don't think it's problematic to make these changes.

Which, going back to the origin of this sub-thread, is the reason I support option 4 ("reflecting useful defaults for people who wish to copy and use the license text") for SPDX's canonical versions of licenses.

@wking
Copy link
Contributor

wking commented Dec 1, 2017

I'd rather license texts be public domain (as Creative Commons ones are)…

That seems orthogonal. You still can't edit, for example, CC-BY-4.0 without also removing all instances of the trademarked “Creative Commons”. You're free to create spin-off licenses with different branding, but that's not a “stewardship of CC-BY-4.0 issue”.

The cat is out of the bag, the licenses aren't frozen … in the state promulgated by those entities, and thus usage is subject to what entities like OSI (and probably increasingly SPDX) recommend.

So formally appoint a new steward in charge of vetting these things? What happens if the OSI decides a particular variation matches the ISC but SPDX does not agree? Or vice versa? For example, SPDX currently accepts the and<optional>/or</optional> in the ISC (#423, based on a 2015 SPDX decision). But the OSI contains no analogous variable markup:

$ curl -s https://opensource.org/licenses/ISC | grep and/or
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

SPDX claims the ISC is OSI-approved, but is the and-only form actually OSI approved? Once you have multiple parties editing the templates, this gets sticky. But in order to have a single party editing templates, the OSI (and others) would have to delegate that authority to SPDX (or vice versa). Or things like “OSI approved” will need to become conditional on the exact wording (in which case, collapsing to an SPDX ID may be too lossy to be useful).

That said, I think it's reasonable to assume that consensus can be easily reached, in a one-by-one case, regarding what changes are acceptable…

Right, as long as folks keep in mind that the consensus impacts the many parties consuming SPDX.

Which, going back to the origin of this sub-thread, is the reason I support option 4 ("reflecting useful defaults for people who wish to copy and use the license text") for SPDX's canonical versions of licenses.

With title text excluded from SPDX matching (and presumably from OSI/FSF/… analysis), I'm ok having a title in the canonical ISC text. And I'm also fine using dummy placeholders in copyright statements and other “you must replace this variable text to use the license” cases; users will be replacing those anyway. I don't think leaving those out makes the license particularly hard to use, but I don't think making those changes is likely to break compatibility with folks consuming the SPDX identifier either.

But there's a very similar title issue with the MIT, where we claim (possibly based on the same mistaken reading of the OSI page) that there's a title which belongs in the license. The FSF folks would presumably prefer “The Expat License”, although they include no title in their text. Would you recommend the SPDX breaks that tie and picks one as the canonical license title?

@waldyrious
Copy link
Contributor

Would you recommend the SPDX breaks that tie and picks one as the canonical license title?

Wouldn't an <alt match=...> expression be able to support both options? Note, I don't actually see this as being a tie in practical terms, considering the vastly more common usage of the MIT title than the Expat one. So as I see it, it would make sense to use the "MIT" title as part of the canonical template, while supporting the "Expat" title.

@wking
Copy link
Contributor

wking commented Dec 1, 2017 via email

@jlovejoy
Copy link
Member

jlovejoy commented Dec 2, 2017

Getting back to the original purpose of this issue and pull request and a question for @mlinksva as he opened this: do I understand that you simply wanted to reflect that the presence or absence of a colon in the title can still be a match to the license? (even though as per the matching guidelines, the title does not need to be an exact match)

src/ISC.xml Outdated
@@ -6,7 +6,7 @@
<crossRef>http://www.opensource.org/licenses/ISC</crossRef>
</crossRefs>
<titleText>
<p>ISC License:</p>
<p><alt match="The " name="titleThe"></alt>ISC License<alt match=" \(ISC[L]{0,1}\)" name="titleID"></alt><alt match=":" name="titleColon"></alt></p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an alternative alt tag that uses a single alt which should improve performance on the matching:

<alt match="(The )?ISC License( \(ISC\))?:?"></alt>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the recommended match with the following:

ISC License:
ISC License
The ISC License
The ISC License (ISC)
The ISC License (ISC):

@mlinksva
Copy link
Contributor Author

mlinksva commented Dec 3, 2017

@jlovejoy yes. I was not aware of:

11.1.1 Guideline: Ignore the license name or title for matching purposes, so long as what ignored is the title only and there is no additional substantive text added here. Templates do not include markup for this guideline.

What counts as substantive text? I don't see that defined in https://spdx.org/spdx-license-list/matching-guidelines ... would love a pointer to matcher code so I can see intention manifest. 😄 I had noticed that punctuation is generally significant for matching purposes, which is one of the reasons I opened this PR.

But the main reason I opened it doesn't have much to do with matching, which perhaps make it uninteresting or offtopic for this repo: ideally for me, the texts published by SPDX would be as close as possible to what is used and useful in the wild so that eventually projects like choosealicense.com and thus github.com can just consume what SPDX publishes. I've never seen a colon used in the title of ISC in the wild. Yes, trivial. I should've left ISC as the canonical copyright holder to a wholly separate issue, but this main reason is also why I mentioned that topic.

@goneall
Copy link
Member

goneall commented Dec 3, 2017

@mlinksva There is a Java implementation of matcher code in spdx tools repo. It is rather complex (probably more complex than it needs to be ;) There were a number of interesting technical challenges in implementing the algorithm, such as handling nested optional and variable blocks and greedy regular expression matches.

The starting point is isStandardLicense.

The algorithm works off of the license template which describe variable and optional text.

The License-List-XML source is translated to the license templates when the license-list-data and website HTML pages are generated.

<titleText> is translated to an optional block of text. This effectively will skip the title only if it matches the text within the <titleText> element.

Note that this doesn't exactly implement the matching guidelines since there is no attempt to programatically determine substantive text. This could be considered a bug in the matching algorithm, however, if we tried to implement a more generous matching algorithm we run the risk of allowing a match to substantive text unless we define substantive text more precisely.

For the purposes of the License-List-XML source, If we include the additional matching hints as we did here, we will be able to match more of the license titles.

@mlinksva
Copy link
Contributor Author

mlinksva commented Dec 3, 2017

@goneall thanks for your patient reviews and explanation. I might open up other issues or PRs for tangents raised here, but am happy the title is sorted out. 😄

src/ISC.xml Outdated
@@ -6,7 +6,7 @@
<crossRef>http://www.opensource.org/licenses/ISC</crossRef>
</crossRefs>
<titleText>
<p>ISC License:</p>
<p><alt match="(The )?ISC License( \(ISC[L]?\))?:?"></alt></p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is back to having no canonical title in the license body (which seems reasonable to me, but has had a fair bit of discussion in this PR, I don't know if there is a consensus yet). If the intention is to have no canonical title in the license body, I think we want to drop the <p> tags too. No need to render an empty <p></p>.

If the intention is to have a canonical title in the license body, then I think you want:

<p><alt match="(The )?ISC License( \(ISC[L]?\))?:?">ISC License</alt></p>

or whatever you decide to use for the canonical title inside. I'm also fine with <alt …><titleText><p>…</p></titleText></alt> if folks prefer that order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the title be in the canonical version, so I'll go ahead and add it back. @goneall can either approve or ask it to be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants