-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependency Graph #2
Comments
@Fazel94 Thank you for offering your help. Could you please tell me which blog article you refer to and which data I should upload? |
Sorry, I would be glad to mine PyPI data. But it is you pleasing for me to get around scraping PyPI myself. Thank you for your attention. |
There is no such thing as a dependency list of each package in PyPI metadata. You could only download all the packages (completely), look for a I can upload the data. However, it is quite a bit. I'm currently running the script again. The scripts beginning with "c" are currently running and even a 7z-compressed csv version of the Would that still be of use for you? If you really want to build the dependency graph, you have to download a quite massive amount of data. Estimating with the query SELECT sum(size)/1000000000 FROM `urls` it is currently about 3.3GB. I can give you a better approximation tomorrow. Where should I upload it? |
Currently it is at I've added a scripts to check for imports in a package. TODOs are:
Done:
|
Ok, I've just put some more work in it:
If you really want to make the dependency graph, you still have to:
This will fill your database with all possible dependencies. Even if you don't implement |
@Fazel94 I've just made the script to run it over the complete PyPI database. That will take quite a while. And it corrently ignores setuptools, which is a major issue (but was too complicated to make a secure / fast implementation within just a couple of hours - you could add that, if you want). How would you like to visualize the graph? It has 67582 nodes and a lot more than 4600 edges (I'm just downloading / building the graph... takes a while). You cannot use graphviz for that. (By the way, do we know each other? Are you a student from KIT, too?) By now, the most imported module is |
If you upload all meta data or just their dependency in some easy to use format like xml , json or even an mySQL full db dump, I can implement a dependency graph and thus answer your blog post questions.
I can implement a adoption of page rank or similar algorithm to find the impact factor of packages.
The text was updated successfully, but these errors were encountered: