Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: identifiers in a file #511

Merged
merged 8 commits into from
Jan 9, 2021
Merged

Conversation

henryiii
Copy link
Contributor

@henryiii henryiii commented Jan 2, 2021

Fixes #431, should make #496 easier. I did not provide a way to override the files. I am now using importlib.resources here. instead of messing with the backport, we can wait until #508 and then just use the stdlib. Edit: the 3.9 version is much nicer and less intrusive, using that.

@YannickJadoul
Copy link
Member

To nitpick, pinned_docker_images.cfg is read with configparser.ConfigParser. Do we have a reason to pick toml for this?

@henryiii
Copy link
Contributor Author

henryiii commented Jan 2, 2021

See #496 (comment)

And TOML is a real format, while .cfg is a rather poorly defined thing. And we already require toml (YAML would have been much more natural - I'm kind of working around the fact that TOML/cfg has to have keys, while YAML could have just been a list. Really is ugly when you have the dual Python 2.7 builds with matching identifiers.

@YannickJadoul
Copy link
Member

See #496 (comment)

Right! I also says

similar to the manylinux one at cibuildwheel/resources/pinned_docker_images.cfg

but fair enough ;-)

Would it then make sense to move pinned_docker_images.cfg to toml as well (maybe in a follow-up)? These formats might look too similar to both keep around, no?

Really is ugly when you have the dual Python 2.7 builds with matching identifiers.

"ugly" and "Python 2.7" in the same sentence? You must be joking! ;-)

But yes, the overview is completely gone in these files :-(

@henryiii henryiii force-pushed the refactor/identifiers branch 2 times, most recently from c8e697d to 828aa8a Compare January 2, 2021 17:08
cibuildwheel/util.py Outdated Show resolved Hide resolved
@henryiii
Copy link
Contributor Author

henryiii commented Jan 2, 2021

Would it then make sense to move pinned_docker_images.cfg to toml as well

I think we should match them up, yes. If we don't want to add yaml as a dependency, what about using json?

@henryiii
Copy link
Contributor Author

henryiii commented Jan 2, 2021

TOML (basically the same as cfg):

[cp39-manylinux_aarch64]
version="3.9"
path_str="/opt/python/cp39-cp39"

YAML:

- identifier: cp39-manylinux_aarch64
  version: 3.9
  path_str: /opt/python/cp39-cp39

JSON (technically valid YAML too):

{"identifier": "cp39-manylinux_aarch64", "version": "3.9", "path_str": "/opt/python/cp39-cp39"},

Currently CPython 2.7 is the only one with two builds from one build identifier (m and mu).

We could also combine the files into one file, as OS's don't overlap.

@YannickJadoul
Copy link
Member

The JSON looks best to me, indeed, the nice thing being that if you have a list of those, they will immediately show you the versions without scrolling between blocks.

Funnily enough, none of them look as good as our Python namedtuple.

(This might of course also just be me, being used to the current format in Python code, and not yet to the TOML version.)

@henryiii
Copy link
Contributor Author

henryiii commented Jan 2, 2021

This seems more like data to me, always seems odd to have this sitting inside the .py files, especially having to edit them for updates. And if we add the ability to update these files programmatically, that’s much better as a data file!

YAML would allow:

- {identifier: cp39-manylinux_aarch64, version: 3.9, path_str: /opt/python/cp39-cp39}

not a fan of mixing yaml and json styles but would probably be cleanest. But adds a dependency. And likely hard to auto generate for an update.

we could shorten identifier or path_str.

@YannickJadoul
Copy link
Member

not a fan of mixing yaml and json styles but would probably be cleanest.

Me neither, but this looks nice.

But adds a dependency. And likely hard to auto generate for an update.

Well, with something like this we're pretty close, no?

{
"linux": [
    ...
    {"identifier": "cp39-manylinux_x86_64", "version": "3.9", "path_str": "/opt/python/cp39-cp39"},
    {"identifier": "cp39-manylinux_i686", "version": "3.9", "path_str": "/opt/python/cp39-cp39"},
    ...
    {"identifier": "cp39-manylinux_aarch64", "version": "3.9", "path_str": "/opt/python/cp39-cp39"},
],
...
}

I'm indeed not sure whether adding yaml is worth it (though it should be easy enough); up to @joerick, that one, I'd say.

@YannickJadoul
Copy link
Member

YannickJadoul commented Jan 2, 2021

For the record, I did ask about json/yaml in #256, but seems we got caught up in another discussion: #256 (comment)

@henryiii
Copy link
Contributor Author

henryiii commented Jan 2, 2021

Questions for @joerick:

  • One file or three
  • Format (JSON, YAML, TOML with current workaround, cfg with same workaround)
  • Should we change pinned_docker_images.cfg to match if we pick one of the others?

If you want it to be a vote, I ordered the format list above by my preferences, json first.

@joerick
Copy link
Contributor

joerick commented Jan 4, 2021

  • One file or three

The PythonConfiguration objects for each platform have different schemas, so I think they should be stored in different files.

  • Format (JSON, YAML, TOML with current workaround, cfg with same workaround)

I don't think we should be adding another runtime dependency to parse an internal config file. TOML is currently only needed for the cp35 windows workaround, so it might be dropped again after that. I'd be more in favour of configparser or JSON. Generally speaking I don't much enjoy JSON as a user-facing format, it's very brittle to edit directly.

I think we should be using a list of configs, not the dict of 'identifier': {...attributes}. Ordering is important, and as you've found, on linux there are two builds with the same identifier. That rules out configparser, because it doesn't support lists. TOML does with [[array_name]], but it's probably not worth it here.

So, it's probably back to JSON. But, for a bit of fun, I tried out https://github.com/Instagram/LibCST to see if there was a pure-python way to do it. And honestly, it's not too crazy. Here's a script that patches the urls in macos.py:

import libcst as cst


class UpgradeTransformer(cst.CSTTransformer):
    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.BaseExpression:
        node = updated_node

        if not (isinstance(node.func, cst.Name) and node.func.value == 'PythonConfiguration'):
            return node

        for arg in node.args:
            assert arg.keyword is not None

            if arg.keyword.value == 'url':
                assert isinstance(arg.value, cst.SimpleString)
                url = arg.value.evaluated_value

                print(url)
                # patch the url here
                url += '#patched'

                node = node.with_deep_changes(arg.value, value=f"'{url}'")

        return node


macos_cst = cst.parse_module(open('cibuildwheel/macos.py').read())
macos_cst = macos_cst.visit(UpgradeTransformer())
print(macos_cst.code)

This could be used to update the NamedTuples in-place, or another option could be to use something like the above with a standalone configs file, like macos_python_configs.py.

I'm probably too close to this to tell if it's a good idea, what do you guys think?

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2021

I'd rather a config format (JSON) over editing Python files. I'm not a fan of Python files for config; Python generally should be code, and data should be in a data format. I would expect to be able to take a Python file and add arbitrary Python code, which such a parser as you've shown would choke on.

I think we should be using a list of configs, not the dict of 'identifier': {...attributes}. Ordering is important, and as you've found, on linux there are two builds with the same identifier.

That's what I mean with "the current workaround" - I was referring to the fact that it was a dict rather than a list. TOML allows an array inside a section, yes, but the values of that array are not allowed to be dicts, I believe - only simple values. Then you'd have to parse again with some other format on the strings.

@YannickJadoul
Copy link
Member

YannickJadoul commented Jan 4, 2021

I definitely like the "fun" part, but I'm not entirely sure it's the right match here? Also, if you didn't want to introduce another dependency, libcst is a bit annoying, there ;-)

Not sure I'm convinced of that myself, but we don't need to per se follow a fancy "everything's possible" kind of standard, ofc. Something like

cp39-manylinux_x86_64,  3.9, /opt/python/cp39-cp39
cp39-manylinux_i686,    3.9, /opt/python/cp39-cp39
cp39-manylinux_aarch64, 3.9, /opt/python/cp39-cp39

might perfectly server our purpose for now to populate the existing NamedTuple, won't be anything hard to parse (some split() and strip()?), and can be easily replaced by something more fancy once we do suddenly want separators (could be ; or \t or anything else) or spaces in our strings.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2021

This is called csv, and it's got a parser in the standard library. ;)

@YannickJadoul
Copy link
Member

I know, but I didn't want to pin us to csv per se ;-)
Also, never been a great fan of the CSV parser in the standard library, since it's still pretty verbose. But yes.

@YannickJadoul
Copy link
Member

YannickJadoul commented Jan 4, 2021

Actually, I stand corrected. It's just a oneliner:

list(csv.DictReader(open("test.csv", 'r'), fieldnames=["identifier", "version", "path_str"]))

(Except for those spaces behind the comma; we'd have to write cp39-manylinux_x86_64,3.9,/opt/python/cp39-cp39)

Nevermind, this works:

list(csv.DictReader(open("test.csv", 'r'), fieldnames=["identifier", "version", "path_str"], skipinitialspace=True))

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2021

Actually, maybe toml does support inline dicts, I just noticed this: https://python-poetry.org/docs/dependency-specification/#expanded-dependency-specification-syntax

@YannickJadoul
Copy link
Member

YannickJadoul commented Jan 4, 2021

Seems this also works with pytoml yes:

[x86_64]
cp38 = {identifier = "cp38-manylinux_x86_64", version = "3.8", path_str = "/opt/python/cp38-cp38"}
cp39 = {identifier = "cp39-manylinux_x86_64", version = "3.9", path_str = "/opt/python/cp39-cp39"}

Just a pity of those cp38 and cp39 keys (since toml doesn't have arrays).

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2021

Yes it does. Just not at top level.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2021

Try:

[linux]
vals = [
    {identifier = "cp38-manylinux_x86_64", version = "3.8", path_str = "/opt/python/cp38-cp38"},
    {identifier = "cp39-manylinux_x86_64", version = "3.9", path_str = "/opt/python/cp39-cp39"},
]

I can check later.

@YannickJadoul
Copy link
Member

YannickJadoul commented Jan 4, 2021

Yes it does. Just not at top level.

Once again, you're right. I hadn't tired that!

Well, I'd be up for TOML, then, actually, if @joerick doesn't mind the extra dependency. If not, I do kind of like the simplicity of the CSV (or similar) version.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2021

I like JSON, CSV, and this form of TOML pretty equally, with a minor preference on the JSON still.

Currently working on a cookie cutter and the specs for it all all JSON. It's not unusable as long as you don't want to add comments to whatever you are doing. :)

I'd probably do something like:

[tool.cibw.build-platforms]
linux = [
    {...},

That way, you could imagine them being specifiable in a single file if someone wanted to.

@henryiii henryiii force-pushed the refactor/identifiers branch 2 times, most recently from 3152675 to f2d4de2 Compare January 4, 2021 22:11
@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2021

CSV wouldn't work, because they are variable length. That leaves JSON and TOML. The TOML I have right now is not bad at all, IMO; I dropped the workaround. Though JSON would also work.

@YannickJadoul
Copy link
Member

YannickJadoul commented Jan 4, 2021

CSV wouldn't work, because they are variable length.

It will if you have multiple files per OS, as @joerick said he preferred anyway? The url on Windows can just stay empty, no?

But fair enough, all my concerns about readability and getting an overview are gone, indeed, in the current form! So that just leaves the dependency concern.

@joerick
Copy link
Contributor

joerick commented Jan 4, 2021

This TOML is looking very nice, now :) I think I might be swayed over. I also really like the way it slots into the namedtuple using the PythonConfiguration(**items) syntax. Does this way of writing it survive a round-trip through toml.load(), toml.dump()?

If those headers are required, (I know this is me changing my mind), perhaps we should put them all in one file? Then the headers actually mean something.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 5, 2021

Does this way of writing it survive a round-trip

Excellent question. With some customization and a few extra spaces, it can be done. Will commit an example tomorrow.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 5, 2021

One benefit of json + one file here, by the way, it would make it easy to write a parser for it in javascript that would allow a user to type into a box and see the results live in the docs (never tried that, but should be possible, I think?). TOML or any other format would be harder to parse in JS (though you could always just read it with toml and stick a simple list JSON into the docs during generation, so really, not a big issue).

Going to push my toml version soon.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 5, 2021

I've put an example roundtrip (the linux one and a test for now). What do you think? Should I go with this, in a single file?

@YannickJadoul
Copy link
Member

Now that we have this nice formatting, I'm up for a single file. I think it could enhance updates, when now, you need to pass 3 files. Though one file is not a must-have for me, either.

@henryiii henryiii mentioned this pull request Jan 6, 2021
@henryiii henryiii force-pushed the refactor/identifiers branch 2 times, most recently from e6fa915 to 42621aa Compare January 6, 2021 01:08
@henryiii
Copy link
Contributor Author

henryiii commented Jan 6, 2021

Touched up types a bit in my last comment, adding exhaustiveness checking on the PlatStr literal, and worked around the fact that toml doesn't have py.typed or .pyi files in the package, even though it does have them in the original source. (uiri/toml#347)

Copy link
Contributor

@joerick joerick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we're nearly there with this one! Just a few minor comments on naming etc.

cibuildwheel/__main__.py Outdated Show resolved Hide resolved
cibuildwheel/__main__.py Show resolved Hide resolved
cibuildwheel/__main__.py Outdated Show resolved Hide resolved
cibuildwheel/__main__.py Show resolved Hide resolved
cibuildwheel/resources/build-platforms.toml Outdated Show resolved Hide resolved
from importlib.resources import files


resources_dir = files('cibuildwheel') / 'resources'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what does importlib.resources.files do that Path(__file__).parent doesn't? It can't be .egg compatibility, can it? :P

Copy link
Contributor Author

@henryiii henryiii Jan 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It supports directories that are zip files (see https://github.com/scikit-hep/particle/releases/tag/v0.14.0 - that .pyz file runs directly without installing, python3 particle.pyz just works), and namespace packages that are split across directories.

Though, however, I'm using it here because it's more elegant and the "right' thing to do. We are accessing a resource.

PS: There's a few tricks when running from a zip file, you can open the path or access the filename, but cannot use the file name to open the file without a little workaround which I didn't attempt to implement.

cibuildwheel/__main__.py Show resolved Hide resolved
@henryiii henryiii merged commit 5eb5ea1 into pypa:master Jan 9, 2021
@henryiii henryiii deleted the refactor/identifiers branch January 9, 2021 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configuration files for download links
3 participants