Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which_package to yield unique collection of package records #5108

Merged
merged 10 commits into from
Dec 14, 2023
5 changes: 2 additions & 3 deletions conda_build/inspect_pkg.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,8 @@ def which_package(
path = prefix / path

for prec in PrefixData(str(prefix)).iter_records():
for file in prec["files"]:
if samefile(prefix / file, path):
yield prec
if any(samefile(prefix / file, path) for file in prec["files"]):
Copy link
Contributor

@dholth dholth Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(n * m) ? I see this version skips work if true.

Too bad to iterate over all installed files to do the search

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worst case yes, same as before

any returns on the first True: https://docs.python.org/3/library/functions.html#any

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what avoids the duplicated records to begin with, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a subsequent PR, we could see if implementing a cached path -> artifact map would speed this up. It can be a veeery slow part of the post processing stage in conda-build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what avoids the duplicated records to begin with, right?

correct, this is essentially what was happening pre #5041:

for dist in fn(prefix):
# dfiles = set(dist.get('files', []))
dfiles = dist_files(prefix, dist)
# TODO :: This is completely wrong when the env is on a case-sensitive FS!
if any(norm_ipp == normcase(w) for w in dfiles):
yield dist

cached path -> artifact map

definitely something to consider

yield prec


def print_object_info(info, key):
Expand Down
19 changes: 6 additions & 13 deletions conda_build/post.py
Original file line number Diff line number Diff line change
Expand Up @@ -1187,9 +1187,7 @@ def _lookup_in_prefix_packages(
if len(precs_in_reqs) == 1:
_print_msg(
errors,
"{}: {} found in {}{}".format(
info_prelude, n_dso_p, precs_in_reqs[0], and_also
),
f"{info_prelude}: {n_dso_p} found in {precs_in_reqs[0]}{and_also}",
verbose=verbose,
)
elif in_whitelist:
Expand All @@ -1201,25 +1199,20 @@ def _lookup_in_prefix_packages(
elif len(precs_in_reqs) == 0 and len(precs) > 0:
_print_msg(
errors,
"{}: {} found in {}{}".format(
msg_prelude, n_dso_p, [prec.name for prec in precs], and_also
),
f"{msg_prelude}: {n_dso_p} found in {[str(prec) for prec in precs]}{and_also}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This upgrades from name to the full str representation. i.e. from python to defaults::python-3.12.0-h123abc_0, I think? Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is now more detailed (but still less verbose than str(precs)) than before

the change was inspired by the fix below on line 1214-1215, where precs_in_reqs: List[PackageRecord] was cast to string resulting in a very verbose output, see #5108 (review)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good that this is now more consistent with the full spec which was already output at https://github.com/conda/conda-build/pull/5108/files#diff-5073339a2e88f80714f82a001af334869aa5df9627b2f00ffd18d7a2035f537eR1190 .

verbose=verbose,
)
_print_msg(
errors,
"{}: .. but {} not in reqs/run, (i.e. it is overlinking)"
" (likely) or a missing dependency (less likely)".format(
msg_prelude, [prec.name for prec in precs]
),
f"{msg_prelude}: .. but {[str(prec) for prec in precs]} not in reqs/run, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I find the change to use str(prec) in the len(precs_in_reqs) == 0 and len(precs) > 0 case above a welcome change, one could argue that we might only want to output the name here so users get a more succinct "you might've missed package in run requirements" message.

But I don't have a strong opinion on this one.

"(i.e. it is overlinking) (likely) or a missing dependency (less likely)",
verbose=verbose,
)
elif len(precs_in_reqs) > 1:
_print_msg(
errors,
"{}: {} found in multiple packages in run/reqs: {}{}".format(
warn_prelude, in_prefix_dso, precs_in_reqs, and_also
),
f"{warn_prelude}: {in_prefix_dso} found in multiple packages in run/reqs: "
f"{[str(prec) for prec in precs_in_reqs]}{and_also}",
verbose=verbose,
)
else:
Expand Down
19 changes: 19 additions & 0 deletions news/5108-fix-which_package
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
### Enhancements

* <news item>

### Bug fixes

* Fix `conda_build.inspect_pkg.which_package` so it does not return duplicate package records. (#5108)
kenodegard marked this conversation as resolved.
Show resolved Hide resolved

### Deprecations

* <news item>

### Docs

* <news item>

### Other

* <news item>
2 changes: 2 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@
from conda_build.utils import check_call_env, copy_into, prepend_bin_path
from conda_build.variants import get_default_variant

pytest_plugins = ["conda.testing"]

kenodegard marked this conversation as resolved.
Show resolved Hide resolved

@pytest.fixture(scope="function")
def testing_workdir(monkeypatch: MonkeyPatch, tmp_path: Path) -> Iterator[str]:
Expand Down
100 changes: 100 additions & 0 deletions tests/test_inspect_pkg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Copyright (C) 2014 Anaconda, Inc
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations

import json
from pathlib import Path

import pytest
from conda import __version__ as conda_version
from conda.core.prefix_data import PrefixData
from packaging.version import Version, parse

from conda_build.inspect_pkg import which_package


@pytest.mark.skipif(
parse(conda_version) < Version("23.5.0"),
reason="tmp_env fixture first available in conda 23.5.0",
)
kenodegard marked this conversation as resolved.
Show resolved Hide resolved
def test_which_package(tmp_path: Path):
# create a dummy environment
(tmp_path / "conda-meta").mkdir()
(tmp_path / "conda-meta" / "history").touch()
# a dummy package with a unique file (softlink) and a shared file (shared)
kenodegard marked this conversation as resolved.
Show resolved Hide resolved
(tmp_path / "conda-meta" / "packageA-1-0.json").write_text(
json.dumps(
{
"build": "0",
"build_number": 0,
"channel": "packageA-channel",
"files": ["softlink", "shared"],
"name": "packageA",
"paths_data": {
"paths": [
{
"_path": "softlink",
"path_type": "softlink",
"size_in_bytes": 0,
},
{
"_path": "shared",
"path_type": "hardlink",
"size_in_bytes": 0,
},
],
"paths_version": 1,
},
"version": "1",
}
)
)
# a dummy package with a unique file (hardlink) and a shared file (shared)
kenodegard marked this conversation as resolved.
Show resolved Hide resolved
(tmp_path / "conda-meta" / "packageB-1-0.json").write_text(
json.dumps(
{
"build": "0",
"build_number": 0,
"channel": "packageB-channel",
"files": ["hardlink", "shared"],
"name": "packageB",
"paths_data": {
"paths": [
{
"_path": "hardlink",
"path_type": "hardlink",
"size_in_bytes": 0,
},
{
"_path": "shared",
"path_type": "hardlink",
"size_in_bytes": 0,
},
],
"paths_version": 1,
},
"version": "1",
}
)
)

# fetch package records
pd = PrefixData(tmp_path)
precA = pd.get("packageA")
precB = pd.get("packageB")

# test returned package records given a path
precs_missing = list(which_package(tmp_path / "missing", tmp_path))
assert not precs_missing

precs_softlink = list(which_package(tmp_path / "softlink", tmp_path))
assert len(precs_softlink) == 1
assert precs_softlink[0] == precA

precs_hardlink = list(which_package(tmp_path / "hardlink", tmp_path))
assert len(precs_hardlink) == 1
assert precs_hardlink[0] == precB

precs_shared = list(which_package(tmp_path / "shared", tmp_path))
assert len(precs_shared) == 2
assert precs_shared == [precB, precA]
Loading