-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
use a URI-based scheme for package names like Java #20183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there any way we could avoid having to read package names backwards and forwards to be able to find the actual URL? We could try a restricted version of Go's approach and use |
In this proposed system, there is no actual URL corresponding to a package name. URLs are only a fetch mechanism, and the same package could be fetched via an arbitrary number of different URLs, each with different domain names and schemes. Why would you want such an "actual URL" anyway? |
What's the benefit of the domain name over just a |
One criticism I would have of this is it functionally requires you to own a [sub]domain in order to create a package. I recognise this is pretty easy with services like GitHub Pages, but it's an extra roadblock which I don't think ought to exist. If Layperson Bob wants to make a high-quality package that others can use, which conforms to our typical naming convention, I don't think we should require him to set up a personal domain. Perhaps they could use |
How do you ensure that |
In this example, what URL will users of Layperson Bob's package use to fetch it? |
I'm not implying that it is better. I just find it an inconvenience to mentally turn your format into a URL. We can say that this format is not restricted to being a valid URL, but in almost every case (and example you wrote in this proposal) I'd argue it'd be one. |
I imagine a GitHub archive URL, like many people already use today (or perhaps a Git URI). However, there's a big difference between using this as the fetch URL and the package name: if Bob later decides that GitHub's pivot to AI is leading to too many poor business decisions, and moves his repos over to GitLab, I don't think he should be stuck with this legacy package name. |
Why does it need to be unique, or, alternatively, how is the uniqueness of the domain enforced/what is the uniqueness meant to represent? Couldn't I create my own |
@squeek502 The idea isn't that it would be technically enforced, but that it would just give a unique name you could use. This is effectively a convention (the only enforcement would be the leading TLD requirement), but the benefit of the convention is that when you follow it, by using a [sub]domain you own, you'll get a package name which is unique amongst everyone else following the convention. Packages need some kind of unique identifier for overrides, deduplication, and version management. This proposal is an alternative to the |
I think this is the main downside of the proposal; people will be tempted to pointlessly rename their projects, causing completely unnecessary churn for package users.
The build system needs a way to find out when the dependency tree has multiple package versions of the same project, so that it can implement features such as:
Imagine the situation that would occur if two different projects were both named "abc" and the build system swapped the higher version number in for the lower version one. This is not a problem for centralized package managers because they have a single, global namespace. The only alternative to this proposal is to generate a unique random identifier and either attach it to the name, or have another |
as an example of using this in the wild, Zigmod went with the edit: it got this idea from the original package manager thread [2] and been using it for over 3 years the entire life of the project |
What type of enforcement would this be out of curiosity? Would this mean a dependency on something like the public suffix list data (a moving target)? |
Basically just that you have at least one byte followed by And then an error message that suggests to use a domain name, when there are no dots. In other words, just enough to make the path of least resistance to be to follow the convention. |
it's unclear to me what problem this proposal solves over using a random id. i'm also concerned along the same lines as @mlugg. My own experience as a young aspiring software developer in college looking at java's convention was that i thought i had to buy a domain name to be a "real programmer", which we don't want to communicate with this. i wouldn't want the legitimacy of someone's opensource contributions to humanity to be implicitly tied to a domain name someone has to pay for. Either you pay 10USD per year, or thereabouts, or you decide which sugardaddy host platform is going to pay for you, like github. but your hosting provider is just an implementation detail, unrelated to the identity of your project. sure using ICANN and related registries to resolve naming conflicts works, but why implicitly endorse any centralized authority when RNG is the only authority you really need? and just back to my original point, why is having a long name desirable over having a short name plus a random number? |
The issue I anticipate with a random number is unintentional duplication -- for instance, someone copying A check which could help here is storing an association between known package IDs and names in the global cache; that way, when you first try to build your new project, Zig can notice that the ID is already known but under a different name, and emit an appropriate error. Ideally we'd also have a command to just regenerate the ID, perhaps An alternative, which would certainly work for me personally, is something like Duplicate IDs are a particularly big problem because they can be quite hard to catch. Ignoring the system I proposed above, if two packages are unintentionally given the same ID, then we might see no problems whatsoever -- up until those packages are used by the same project at some point. We'd really like to avoid this. I've just come up with another idea to perhaps solve this duplicate
This has a similar effect to the global package ID <-> name association idea, but removes unnecessary global state, and consequently doesn't require the package with that ID to have been used on this system before. Thinking about it, I quite like this idea. |
Oh, I have another alternative: Introduce a .{
.dependencies = .{
.hello = .{
.url = "https://example.com/hello/v$version.tar.gz",
.version = "1.2.8",
.hash = "...",
},
},
} This would enable the build system to use the latest semantic version of This will work nicely with GitHub tags as well: |
That solution disallows having multiple mirrors for one package, enforces a URL scheme, and in fact even disallows upstreams from ever changing their archive URLs - it's a non-starter. |
Not sure why this makes it a "non-starter" instead of "an idea that we can expand and build on top of and then evaluate":
EDIT: I realized that I might have misunderstood the current proposal here and that the full package ID is also part of the dependency definition. That would of course simplify matters quite a lot. |
@mlugg This means you can never rename a package, which seems fine maybe? renaming a package is indistinguishable from making the mistake you're trying to avoid. if we really wanted to support renaming, i can imagine something like an additional field called "original_name" and then hopefully the copypaste workflow would remember to remove that when making a new file. but that seems pretty low stakes by that point, and i agree that your suggestion to prevent the typo is pretty important. |
Why not use ID (UUID ?) as namespace, i.e. <package_name>.<namespace_id>, this way it's not tied to specific URL, fork or some completely unrelated package with same name will just have a different namespace_id. Of course there still can be a collision is someone is intentionally using the same values but there is little you can do about it without centralised system, unless you want to use something like crypto key hash as ID and use the key for signing packages at the same time ? |
I agree with @mlugg on this. Since the naming mechanism is purely cooperative anyway, the best we can do is to prevent accidental name clashes. We don't need a lot of entropy for that, though, so a 128 bit base_ID may be overkill. We could settle for a 64-bit checksum, the first half being a random seed, followed by 32 bits from sha256(seed ++ name). This way the package manager can enforce a freshly generated ID for any given package name without causing too much friction. |
This is incorrect. Package A can declare a mirror for Package B's URL, and then the build proceeds as before. |
Related issues:
$name-$semver-$hash
#20178In order to detect that two different fetched packages represent different versions of the same project, with one intended to supersede the other, we need some kind of stable identifier. Zig package management is decentralized, so there is no global namespace that serves this purpose.
We have the
name
field of packages, but it is not globally unique.... or is it? This proposal takes inspiration from Java's solution to this problem which is to use the domain name system and canonical URIs, so you end up with names such as
com.sun.foo.bar.ArrayList
. Although Zig supports URL mirrors, canonical URIs can still be used as globally unique package identifiers.Example fully-qualified package names would be:
com.github.scottredig.JavascriptBridge
me.andrewkelley.groovebasin
org.codeberg.river.river
org.ziglang.zig
In order to discourage people from fighting over unqualified names and causing general chaos, Zig would emit an error if a package fetched by URL was missing a top level domain name.
Whether the person owns the domain names they use is not enforced. When you choose to add a dependency on a package, you can notice whether they use their own domain name, or someone else's and factor that into the human decision of whether to trust them. Maybe someone decides to take over an abandoned project, and continue using the old maintainer's domain name for the project, for a seamless transition. If that new maintainer has built trust in the community, people would generally find this acceptable.
As for #14288, there would no longer be an
id
field introduced. Thename
field acts as the globally unique identifier that persists across versions.As for #20178, only the last path segment in the name would be used. For example:
Finally, as for #20180, this could potentially provide a way to override an entire project across the full dependency tree, to quickly find out if a particular patch, for example, could be used to make all the different usage sites of the package satisfied, thus being a candidate for a global override.
The text was updated successfully, but these errors were encountered: