-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving mlflow=2.12.2=py39hb3b8efb_0 takes a lot of time #684
Comments
Thanks for these! I can confirm that these are horribly slow and instant with libsolv. We will investigate! |
Ok, thanks for confirming! If you are curious, I found these by iterating through every single record in the channel and trying to resolve them individually. |
I've found a performance improvement that fixes these cases (they resolve instantly) but makes all other solves about 50% slower... Need more investigation. 😅 |
We found a solution for this case by changing our selection heuristic. Currently we are deciding packages with few options first, in order to minimize backtracking. But it seems to not play well with Python constraints since apparently we are doing excessive backtracking. If we reverse the heuristic, things are super fast (ie. decide packages with many options first). Did you write some kind of benchmark script @JeanChristopheMorinPerso? I think we'll adjust the heuristic, but it would be great to have more extensive benchmarks to check that we don't introduce big regressions. For the "fix", I just changed the |
Great news!
I don't think what I have can be called a benchmark... I basically take repodata and try to resolve every single "latest" record one by one (including all its variants). Something like import collections
import rattler
timedelta = collections.namedtuple("timedelta", ["microseconds"])
channels = [rattler.Channel["main"]]
platforms = [rattler.Platform("osx-arm64"), rattler.Platform("noarch")]
virtual_packages = [p.into_generic() for p in rattler.VirtualPackage.current()]
repo_datas = await rattler.fetch_repo_data(
channels=channels,
platforms=platforms,
cache_path="/tmp/py-rattler-cache/repodata",
callback=None,
)
for subdir in repo_datas:
for package_name in subdir.package_names():
all_records = sorted(
subdir.load_records(rattler.PackageName(package_name)),
key=lambda x: (x.version, x.build_number),
)
latest = all_records[-1]
records = [
r
for r in all_records
if r.version == latest.version and r.build_number == latest.build_number
]
futures = []
for record in records:
futures.append(
packages = await rattler.solve(
channels,
[rattler.MatchSpec(str(record))],
platforms=platforms,
virtual_packages=virtual_packages,
timeout=timedelta(microseconds=30 * 1000000),
)
)
await asyncio.gather(*futures) (I did not test this specific code) I'm doing this to see if a channel is solvable and also to do some channel analysis and thought it would be simpler and faster to use rattler instead of either using subprocs to conda conda create or use libmamba. |
Hello, while trying rattler (via py-rattler), I noticed that some packages take ages to resolve. For example:
mlflow=2.12.2=py39hb3b8efb_0
is fast to resolve with conda-libmamba-solver.The same is true with all these packages (on osx-arm64):
orange3=3.36.2=py39h46d7db6_0
ray-dashboard=2.6.3=py39hca03da5_2
ray-default=2.6.3=py39hca03da5_2
spark-nlp=5.1.2=py39hca03da5_0
spyder=5.5.1=py38hca03da5_0
spyder=5.5.1=py39hca03da5_0
spyder=5.5.1=py310hca03da5_0
streamlit-faker=0.0.2=py39hca03da5_0
The text was updated successfully, but these errors were encountered: