Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix incorrect parse failure for a valid pre-release semver #70

Merged
merged 1 commit into from
Jul 20, 2024

Conversation

dylan-bourque
Copy link

This PR addresses an incorrect parse failure when a pre-release token has a leading zero and consists of only digits and a hyphen ('-'), like 0.0.0-00010101000000-000000000000.

Using v0.31.2 of the extension, semver('0.0.0-00010101000000-000000000000') reports the following error:

ERROR: bad semver value '0.0.0-00010101000000-000000000000': semver prerelease numbers can't start with 0

The portion after the first hyphen constitutes a valid pre-release token (ASCII alphanumerics and hyphens) but is not a number so the Numeric identifiers MUST NOT include leading zeroes clause in the specification does not apply.

The submitted fix is to extend the isalpha() check in the look-ahead portion of parse_semver() to also check if the current character is a hyphen.

@theory
Copy link
Owner

theory commented Jul 17, 2024

Oh wow, yeah, that dates from SemVer 1.0.0-beta. Thanks for the patch!

@theory
Copy link
Owner

theory commented Jul 17, 2024

Hrm. This actually breaks test/sql/base.sql:

not ok 31 - "1.0.0-02799" is not a valid semver
# Failed test 31: ""1.0.0-02799" is not a valid semver"
#       caught: no exception
#       wanted: an exception

That tests for the statement in semver 2.0.0 paragraph 9 that "Numeric identifiers MUST NOT include leading zeroes." Note that 1.0.0-02799 is rejected by the official semver 2.0 regex. I wonder if it is starting on the - that comes before the leading zero…

@theory theory self-assigned this Jul 17, 2024
@theory theory added the bug label Jul 17, 2024
@dylan-bourque
Copy link
Author

dylan-bourque commented Jul 17, 2024

Thanks for the patch!

Thank you for the great product. We've been using this in github.com/CrowdStrike/perseus for a bit over 2 years now and this is the first issue we've run into. It only cropped up because someone tagged a Go module with that 0.0.0-00010101000000-000000000000 version.

I apologize for not running the tests before submitting this PR but I'm a Go developer these days and don't have a machine set up to do C builds.

@theory
Copy link
Owner

theory commented Jul 17, 2024

I'm also mainly a Go developer; I only pretend to be a C programmer when I have no other choice :-)

theory added a commit to dylan-bourque/pg-semver that referenced this pull request Jul 20, 2024
Turns out `patch` wasn't populated yet, so my last commit was scanning
nothing. Finally found the bug: it need to start scanning at `atchar`,
not `len - atchar`. Wild that this never came up before! Only found it
just now by, at the last second, testing the example from theory#70:
`0.0.0-00010101000000-000000000000`. Now included in the test corpus.
A pre-release "Numeric identifiers MUST NOT include leading zeroes,"
according to SemVer 2.0. The existing code had accounted for this
limitation: when it found a pre-release starting with a 0 followed by a
digit, it would scan ahead and allow the value if it found an alphabetic
character. There were two bugs in the Implementation.

First, in addition to alphabetic characters, SemVer allows dashes. The
[project regex] matches `[a-zA-Z-]`. So add a check for a dash in
addition to `isalpha()`. Thanks to Dylan Bourque for the pull request
(theory#70).

The other bug was that the scanning was was not starting from the
character after the zero and following digit. It was, instead, starting
from that position subtracted from the length.

For example, if the pre-release started at character 7 and the semver
was 12 characters long, it would start scanning at character 13, well
*after* the start of the pre-release. If the pre-release started at
character 8 and the semver was 12 characters long, it would start
scanning from character 4, well *before* the start of the pre-release.

It's a wonder no errors were previously discovered.

Fix it by starting from the current character, and include some details
in Changes to help find existing bad values.
@theory theory merged commit 02af21f into theory:main Jul 20, 2024
12 checks passed
@theory
Copy link
Owner

theory commented Jul 20, 2024

Bah, what an annoying bug! Will release v0.40.0 shortly. Many thanks for the report and fix!

@dylan-bourque
Copy link
Author

I appreciate you closing the loop on this. 🍻

@theory
Copy link
Owner

theory commented Jul 21, 2024

I appreciate you closing the loop on this. 🍻

Glad to do it. Just be aware that you might have some existing prerelease versions that would be invalid after an upgrade.

@dylan-bourque
Copy link
Author

dylan-bourque commented Jul 21, 2024

What's the pattern for the existing versions that are now invalid? We have ~65K rows in our database with pre-release versions, all of which came from Go modules. I feel like we'll be fine, but it's easy enough to check.

Follow-up: I have 0 records in my database with a non-empty pre-release version that doesn't match the official regex.

@theory
Copy link
Owner

theory commented Jul 22, 2024

It's difficult to say, because it depends on the length of the semver and the pre-release. You're probably safe, but you can use a regex to check. From Changes, it will be something like:

SELECT name, version FROM packages
 WHERE version::text !~ '^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$';

If that returns no rows you should be fine.

@dylan-bourque
Copy link
Author

That's pretty much what I did, except I only checked the pre-release section

SELECT count(*) FROM module_versions
WHERE
    get_semver_prerelease(version) != '' AND
   get_semver_prerelease(version) ~ '^(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(\.(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*$';

@theory
Copy link
Owner

theory commented Jul 22, 2024

Oh yeah, I forgot about get_semver_prerelease(). You could probably have called it just once, though.

FWIW, this applies only to prereleases that start with a 0 and contains only digits. So something like this would probably work:

SELECT count(*) FROM module_versions
 WHERE get_semver_prerelease(version) ~ '^0\[0-9]+($|\+)';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants