-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include .conda
packages
#45
Comments
Looking into this with @cappadona |
@jezdez Did that go anywhere? I was working on collecting some download numbers for my library and right now 2023 shows minimal downloads due to the transition to |
@jakirkham @dopplershift. Apologies for the delay. We have not yet addressed |
Thanks Nick! 🙏 |
Hi @jakirkham @dopplershift. Quick update on the status of this issue. We're working on finalizing a new pipeline that will source this public data set and include |
Hi @cappadona Thanks for the update! Q: Would it be possible to also update the past statistics when the new pipeline is up? |
@leofang At the moment we're not planning to replace any existing files in the bucket and only implement the fix for future data. |
cc @aterrel @chenghlee (as we discussed this earlier) |
Hi @cappadona @jezdez Friendly nudge for updates 🙂 This has impacted several statistics tracking tools and caused confusion. I've heard jabbering about "no one is using conda" as they looked at the download counts from, say, |
Hi @leofang. Thanks for checking in. We are on track to include |
Just wanted to check in, @cappadona how are things looking here? |
Still looks reaaaally flat: https://prefix.dev/channels/conda-forge/packages/aesara (picked a random package) |
To be fair, Nick said end of the month originally. So end of next week Though would be good to learn if that is still true or if this is likely to slip |
@cappadona how are things looking? |
@jakirkham Sorry I missed your earlier message. Thanks for checking in. We're looking good and the March 2024 data published to the s3 bucket later this week will include I will post an update to this thread once the March data is available. |
Thanks Nick! 🙏 |
Thanks Nick! 🙏 With
|
Hi @jakirkham. The screenshot is an aggregation of multiple channels, which are usually identified in the final dataset via the |
How are things looking @cappadona ? |
@cappadona are there any updates here? Also as a side note, users are also asking about March data in this issue: #51 |
Hi @jakirkham. Monthly and hourly data for March and April 2024, which includes Thank you all for your patience. |
@cappadona Do you think we could update the old files as well, since .conda files had been hosted for a while? Should we keep this ticket open until we fix that? |
So just to get it right, the format of the parquet files changed? |
Thanks Jannis! 🙏 Please let us know if you need more info from us or need us to test anything 🙂 |
The 1970 issues were actually issues in our code. Sorry about that! |
We just fixed things on our end, but it appaears that the pipeline to produce this data is not really working anymore? The latest data is 2024-06... |
Huh, I'd check with @cappadona about it, he was working on an analysis |
Hi all. We've been running some analysis on the dataset in response to everyone's feedback and will share our findings when this is complete. In the interim, responding to some of the recent questions in this thread...
The latest data available in the s3 bucket is for
Thank you. This is one issue that we haven't been able to reproduce.
False alarm -- addressed by Wolf This is the main focus of our QA effort and we're tentatively planning to replace data beginning in Temporarily paused publishing new data (see my response above)
We still need to dig into the download counter displayed on |
We've dropped the faulty data from our end. Any chance you are going to backfill data from the past? it looks pretty weird now, because some packages that had releases only have 1 measuring point. |
Lastly, while it appears you fixed the https://prefix.dev/channels/conda-forge/packages/_libgcc_mutex |
Hi Wolf, yes we are planning to backfill past data and we will be sharing details at this week's conda community sync. |
Hi @wolfv I'm unable to reproduce this dropoff for |
OK, then we might have an issue on our end again :) Thanks! |
@cappadona is this working correctly for other channels? Think it would be good to double check these are all handled correctly (others may have suggestions):
|
Also worth noting RAPIDS is switching to publishing |
Any updates on the backfill? I tried to run the by-the-numbers binder again, and dd.read_parquet("s3://anaconda-package-data/conda/hourly/2024/06/2024-06-*.parquet",storage_options={'anon': True}) returns an empty data frame, and so do all months after June (whereas the months up until May 2024 are fine). I've loosened the match to dd.read_parquet("s3://anaconda-package-data/conda/hourly/2024/06/*.parquet",storage_options={'anon': True}) and still nothing. |
Asked about this at the Conda community meeting earlier this week and it sounds like they are working through some issues |
Any updates? |
@jakirkham @h-vetinari @wolfv We are finalizing the work to generate new hourly and monthly data beginning with |
@cappadona could you please let us know what the status is on the updated download statistics? |
Hi @jakirkham. 2024 data is now available in the public bucket through November and includes We are backfilling the remaining prior months ( |
oh my god, finally! The graph looks a little better again, e.g. for bzip2 (https://prefix.dev/channels/conda-forge/packages/bzip2) |
The backfill of |
Thank you very much! 🙏 Not to rain on the parade, but just to double-check: Regarding the graph that wolf showed, the trough between oct '23 and june '24 still looks quite suspicious...? 🤔 |
Yeah we need to drop the faulty data from our database. Will take care of it shortly |
Update for 2020-2024 (minus Dec. '24 data, which gets downloaded but fails processing): Success 🥳 |
Happy New Year everyone! 🥳 Thank you Nick! 🙏 Think the next step will be for all of us to go through this data and make sure things are looking reasonable |
It would be helpful to include both
.conda
&.tar.bz2
packages. Particularly as more of the former and less of the latter are produced. May also help to track these separately to track the transition to the newer formatThe text was updated successfully, but these errors were encountered: