Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch data for all PyPI packages #41

Merged
merged 3 commits into from
Nov 30, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,7 @@ ENV/

# mypy
.mypy_cache/

# Big unzipped files
top-pypi-packages-30-days-all.csv
top-pypi-packages-30-days-all.json
18 changes: 16 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,25 +1,39 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.8.0
hooks:
- id: ruff
args: [--exit-non-zero-on-fix]

- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.10.0
hooks:
- id: black

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-added-large-files
exclude: top-pypi-packages-30-days-all.*
- id: check-case-conflict
- id: check-merge-conflict
- id: check-json
- id: check-toml
- id: check-yaml
- id: debug-statements
- id: end-of-file-fixer
- id: forbid-submodules
- id: trailing-whitespace

- repo: https://github.com/python-jsonschema/check-jsonschema
rev: 0.29.3
rev: 0.29.4
hooks:
- id: check-github-workflows


- repo: meta
hooks:
- id: check-hooks-apply
- id: check-useless-excludes

ci:
autoupdate_schedule: quarterly
29 changes: 29 additions & 0 deletions .ruff.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
fix = true

lint.select = [
"C4", # flake8-comprehensions
"E", # pycodestyle
"EM", # flake8-errmsg
"F", # pyflakes
"I", # isort
"ICN", # flake8-import-conventions
"ISC", # flake8-implicit-str-concat
"LOG", # flake8-logging
"PGH", # pygrep-hooks
"PT", # flake8-pytest-style
"PYI", # flake8-pyi
"RUF022", # unsorted-dunder-all
"RUF100", # unused noqa (yesqa)
"S", # flake8-bandit
"UP", # pyupgrade
"W", # pycodestyle
"YTT", # flake8-2020
]
lint.ignore = [
"E203", # Whitespace before ':'
"E221", # Multiple spaces before operator
"E226", # Missing whitespace around arithmetic operator
"E241", # Multiple spaces after ','
"UP038", # Makes code slower and more verbose
]
lint.isort.required-imports = [ "from __future__ import annotations" ]
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ Old versions can be found in [releases](https://github.com/hugovk/top-pypi-packa

From cron, it runs pypinfo to dump JSON and commit back to this repo.

### Install jq
### Install jq and zip

For example on Ubuntu 22.04:

```bash
sudo apt-get install jq
sudo apt-get install jq zip
```

### Install and set up pypinfo
Expand Down
6 changes: 5 additions & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,18 @@
set -e

# Timestamp for logs
echo "$(date)"
date

# Update
git pull origin main

# Generate the files
bash generate.sh

# Remove big unzipped file
rm top-pypi-packages-30-days-all.csv
rm top-pypi-packages-30-days-all.json

# Make output directory, don't fail if it exists
# mkdir -p build

Expand Down
4 changes: 2 additions & 2 deletions deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
set -e

# Gets commit hash as message
REV=`git rev-parse HEAD`
REV=$(git rev-parse HEAD)

# git checkout gh-pages # Step 3

Expand All @@ -27,6 +27,6 @@ git push # Step 9

# CalVer YYYY.0M
date=$(date '+%Y.%m')
echo $date
echo "$date"
git tag -a "$date" -m "Release $date"
git push --tags
16 changes: 14 additions & 2 deletions generate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,20 @@ python3 -m pip install -U pypinfo
python3 -m pip --version
/home/botuser/.local/bin/pypinfo --version

# Check if zip is installed
if ! command -v zip &> /dev/null
then
echo "zip not be found, consider: apt install zip"
hugovk marked this conversation as resolved.
Show resolved Hide resolved
exit 1
fi

# Generate and minify for 30 days
/home/botuser/.local/bin/pypinfo --all --json --indent 0 --limit 8000 --days 30 "" project > top-pypi-packages-30-days.json
/home/botuser/.local/bin/pypinfo --all --json --indent 0 --limit 10000000 --days 30 "" project > top-pypi-packages-30-days-all.json
python3 trim.py > top-pypi-packages-30-days.json
jq -c . < top-pypi-packages-30-days.json > top-pypi-packages-30-days.min.json
echo 'download_count,project' > top-pypi-packages-30-days-all.csv
echo 'download_count,project' > top-pypi-packages-30-days.csv
jq -r '.rows[] | [.download_count, .project] | @csv' top-pypi-packages-30-days.json >> top-pypi-packages-30-days.csv
jq -r '.rows[] | [.download_count, .project] | @csv' top-pypi-packages-30-days-all.json >> top-pypi-packages-30-days-all.csv
jq -r '.rows[] | [.download_count, .project] | @csv' top-pypi-packages-30-days.json >> top-pypi-packages-30-days.csv
zip top-pypi-packages-30-days-all.csv.zip top-pypi-packages-30-days-all.csv
zip top-pypi-packages-30-days-all.json.zip top-pypi-packages-30-days-all.json
3 changes: 2 additions & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,8 @@ <h2 id="changelog">Changelog</h2>
<li>2021-07: Fetch data for 5,000 packages over only 30 days (<a href="https://github.com/hugovk/top-pypi-packages/pull/20">#20</a>)</li>
<li>2021-09: Fetch data for 8,000 packages (<a href="https://github.com/hugovk/top-pypi-packages/pull/30">#30</a>)</li>
<li>2024-05: Provide data in CSV in addition to JSON (<a href="https://github.com/hugovk/top-pypi-packages/issues/31">#31</a>)</li>
<li>2024-11: Fetch data for all installers, not only pip (<a href="https://github.com/hugovk/top-pypi-packages/issues/39">#39</a>)</li>
<li>2024-11: Fetch data for all PyPI packages (<a href="https://github.com/hugovk/top-pypi-packages/issues/41">#41</a>)
and for installers, not only pip (<a href="https://github.com/hugovk/top-pypi-packages/issues/39">#39</a>)</li>
</ul>
</div>
<div class="col-sm-6">
Expand Down
Binary file added top-pypi-packages-30-days-all.csv.zip
Binary file not shown.
Binary file added top-pypi-packages-30-days-all.json.zip
Binary file not shown.
Loading