Skip to content

Commit

Permalink
Allow analyzing manifests and particular versions (#71)
Browse files Browse the repository at this point in the history
* wip

* rm tar

* clean up

* more formatting

* add test

* fix

* make code more robust

* messy messy

* wip

* wip

* wip

* wip

* wip

* fix identation

* update show and docs

* fix test; wrap in outer testset to run all in CI

* add auth

* add compat

* wip

* add fast path to avoid downloads, analyze_manifest

* wip

* re-organize

* wip

* wip

* wip

* wip

* wip

* don't run CI on draft

* wip

* wip

* wip

* JET pass to fix bugs and throw better errors

* get tests passing

* rm dead code

* add `analyze()`

* fallback in git api errors

* git stuff

* add comment

* more comment

* add some more tests

* test `find_packages_in_manifest`

* wip

* add multiregistry test

* thread safety

* update docs

* fix

* get ci running post-draft

* compat

* deep clone for tests

* make test failure more clear

* debug

* oops, debug continued

* urg

* more detail

* oh, `==` isn't nice with PkgEntries. Skip it.

* simplify `Release` struct

* make tests nicer

* tweaks

* Update test/runtests.jl

Co-authored-by: Mosè Giordano <[email protected]>

* Update test/runtests.jl

Co-authored-by: Mosè Giordano <[email protected]>
  • Loading branch information
ericphanson and giordano authored Nov 24, 2022
1 parent 1f500e0 commit f71ba76
Show file tree
Hide file tree
Showing 13 changed files with 1,362 additions and 672 deletions.
7 changes: 6 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ on:
branches: "main"
tags: ["*"]
pull_request:
types: [opened, synchronize, reopened, ready_for_review]
release:

jobs:
test:
# Run on push's or non-draft PRs
if: (github.event_name == 'push') || (github.event.pull_request.draft == false) || (github.event_name == 'workflow_dispatch')
name: Julia ${{ matrix.julia-version }} - ${{ matrix.os }} - ${{ matrix.julia-arch }}
runs-on: ${{ matrix.os }}
strategy:
Expand All @@ -24,7 +27,9 @@ jobs:
julia-arch:
- x64
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: julia-actions/setup-julia@v1
with:
version: ${{ matrix.julia-version }}
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ on:
- 'docs/**'
- 'Project.toml'
pull_request:
types: [opened, synchronize, reopened]
types: [opened, synchronize, reopened, ready_for_review]
paths:
- '.github/workflows/docs.yml'
- 'src/**'
Expand All @@ -18,6 +18,8 @@ on:

jobs:
Documentation:
# Run on push's or non-draft PRs
if: (github.event_name == 'push') || (github.event.pull_request.draft == false) || (github.event_name == 'workflow_dispatch')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
Expand Down
4 changes: 4 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ authors = ["Mosè Giordano <[email protected]>"]
version = "1.0.0"

[deps]
CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
Git = "d7ba0133-e1db-5d97-8f8c-041e4b3a1eb2"
GitHub = "bc5e4493-9b4d-5f90-b8aa-2b2bcaad7a26"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
Expand All @@ -12,10 +14,12 @@ Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
RegistryInstances = "2792f1a3-b283-48e8-9a74-f99dce5104f3"
TOML = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
Tar = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"
Tokei_jll = "3ac119c9-1236-5556-b556-adc8150b0244"
UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"

[compat]
CodecZlib = "0.7"
Git = "1.2.1"
GitHub = "5.4"
JSON3 = "1.5.1"
Expand Down
96 changes: 51 additions & 45 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,50 +9,47 @@ julia> analyze("Flux")
Package Flux:
* repo: https://github.com/FluxML/Flux.jl.git
* uuid: 587475ba-b771-5e3f-ad9e-33799f191a9c
* version: 0.13.6
* is reachable: true
* Julia code in `src`: 5496 lines
* Julia code in `test`: 2432 lines (30.7% of `test` + `src`)
* documentation in `docs`: 1533 lines (21.8% of `docs` + `src`)
* documentation in README: 10 lines
* tree hash: 76ca02c7c0cb7b8337f7d2d0eadb46ed03c1e843
* Julia code in `src`: 5299 lines
* Julia code in `test`: 3030 lines (36.4% of `test` + `src`)
* documentation in `docs`: 1856 lines (25.9% of `docs` + `src`)
* documentation in README: 14 lines
* has license(s) in file: MIT
* filename: LICENSE.md
* OSI approved: true
* number of contributors: 159 (and 7 anonymous contributors)
* number of commits: 3794
* has `docs/make.jl`: true
* has `test/runtests.jl`: true
* has continuous integration: true
* GitHub Actions
* Buildkite
```

The argument is a string pointing towards a local path or the name of
a package in a locally-installed registry (the General registry is checked by default).
The argument is a string, which can be the name of a package, a local path or a URL.

*NOTE*: the Git repository of the package will be cloned, in order to inspect
*NOTE*: the Git repository of the package may be cloned, in order to inspect
its content.

You can also pass a [`PkgEntry`](@ref) from RegistryInstances.jl.
The function [`find_package`](@ref) gives you the
[`PkgEntry`](@ref) of a package in your local copy of any registry, by
default the [General registry](https://github.com/JuliaRegistries/General).
`find_package` is invoked automatically when you pass the name of a package.
You can also pass the output of [`find_package`](@ref) which is used under-the-hood
to look up package names in any installed registries. `find_package` also allows one to specify
a package by `UUID`.

```julia
julia> analyze(find_package("JuMP"))
julia> analyze(find_package("JuMP"; version=v"1"))
Package JuMP:
* repo: https://github.com/jump-dev/JuMP.jl.git
* uuid: 4076af6c-e467-56ae-b986-b466b2749572
* version: 1.0.0
* is reachable: true
* Julia code in `src`: 16418 lines
* Julia code in `test`: 11388 lines (41.0% of `test` + `src`)
* documentation in `docs`: 10970 lines (40.1% of `docs` + `src`)
* documentation in README: 78 lines
* tree hash: 936e7ebf6c84f0c0202b83bb22461f4ebc5c9969
* Julia code in `src`: 16906 lines
* Julia code in `test`: 12777 lines (43.0% of `test` + `src`)
* documentation in `docs`: 15978 lines (48.6% of `docs` + `src`)
* documentation in README: 79 lines
* has license(s) in file: MPL-2.0
* filename: LICENSE.md
* OSI approved: true
* number of contributors: 106 (and 4 anonymous contributors)
* number of commits: 4128
* has `docs/make.jl`: true
* has `test/runtests.jl`: true
* has continuous integration: true
Expand All @@ -68,11 +65,13 @@ julia> analyze(PackageAnalyzer)
Package PackageAnalyzer:
* repo:
* uuid: e713c705-17e4-4cec-abe0-95bf5bf3e10c
* version: nothing
* is reachable: true
* Julia code in `src`: 574 lines
* Julia code in `test`: 142 lines (19.8% of `test` + `src`)
* documentation in `docs`: 267 lines (31.7% of `docs` + `src`)
* documentation in README: 41 lines
* tree hash: 7bfd2ab7049d92809eb18eed1b0548c7e07ec150
* Julia code in `src`: 912 lines
* Julia code in `test`: 276 lines (23.2% of `test` + `src`)
* documentation in `docs`: 263 lines (22.4% of `docs` + `src`)
* documentation in README: 44 lines
* has license(s) in file: MIT
* filename: LICENSE
* OSI approved: true
Expand All @@ -82,9 +81,6 @@ Package PackageAnalyzer:
* GitHub Actions
```

You use the inplace version [`analyze!`](@ref), e.g. as `analyze!(root, find_package("Flux"))` to clone
the package to a particular directory `root` which is not cleaned up afterwards, and likewise can pass a vector of paths instead of a single path employ use a threaded loop to analyze each package.

You can also directly analyze the source code of a package via [`analyze`](@ref)
by passing in the path to it, for example with the `pkgdir` function:

Expand All @@ -95,10 +91,12 @@ julia> analyze(pkgdir(DataFrames))
Package DataFrames:
* repo:
* uuid: a93c6f00-e57d-5684-b7b6-d8193f3e46c0
* version: 0.0.0
* is reachable: true
* Julia code in `src`: 15809 lines
* Julia code in `test`: 17512 lines (52.6% of `test` + `src`)
* documentation in `docs`: 3885 lines (19.7% of `docs` + `src`)
* tree hash: db2a9cb664fcea7836da4b414c3278d71dd602d2
* Julia code in `src`: 15628 lines
* Julia code in `test`: 21089 lines (57.4% of `test` + `src`)
* documentation in `docs`: 6270 lines (28.6% of `docs` + `src`)
* documentation in README: 21 lines
* has license(s) in file: MIT
* filename: LICENSE.md
Expand All @@ -109,6 +107,8 @@ Package DataFrames:
* GitHub Actions
```

You can pass the keyword argument `root` to specify a directory to store downloaded code.

## The `Package` struct

The returned values from [`analyze`](@ref), and [`analyze!`](@ref) are objects of the type `Package`, which has the following fields:
Expand All @@ -135,39 +135,45 @@ struct Package
licenses_in_project::Vector{String} # any licenses in the `license` key of the Project.toml
lines_of_code::Vector{@NamedTuple{directory::String, language::Symbol, sublanguage::Union{Nothing, Symbol}, files::Int, code::Int, comments::Int, blanks::Int}} # table of lines of code
contributors::Vector{@NamedTuple{login::Union{String,Missing}, id::Union{Int,Missing}, name::Union{String,Missing}, type::String, contributions::Int}} # table of contributor data
tree_hash::String # `git_tree_sha1` hash of the analyzed code
version::Union{VersionNumber, Nothing} # the version number, if a release was analyzed
tree_hash::String # the tree hash of the code that was analyzed
end
```

Adding additional fields to `Package` is *not* considered breaking, and may occur in feature releases of PackageAnalyzer.jl.

Removing or altering the meaning of existing fields *is* considered breaking and will only occur in major releases of PackageAnalyzer.jl


## Analyzing multiple packages

To run the analysis for multiple packages you can either use broadcasting
```julia
analyze.(registry_entries)
analyze.(pkg_entries)
```
or use the method `analyze(pkg_entries::AbstractVector{<:PkgEntry})` which
runs the analysis with multiple threads.
or use the function `analyze_packages(pkg_entries)` which
runs the analysis with multiple threads. Here, `pkg_entries` may be any valid input to `analyze`.

You can use the function [`find_packages`](@ref) to find all packages in a given
registry:

```julia
julia> find_packages(; registry=general_registry())
4632-element Vector{PackageAnalyzer.RegistryEntry}:
PackageAnalyzer.RegistryEntry("/Users/eph/.julia/registries/General/C/CitableImage")
PackageAnalyzer.RegistryEntry("/Users/eph/.julia/registries/General/T/Trixi2Img")
PackageAnalyzer.RegistryEntry("/Users/eph/.julia/registries/General/I/ImPlot")
PackageAnalyzer.RegistryEntry("/Users/eph/.julia/registries/General/S/StableDQMC")
PackageAnalyzer.RegistryEntry("/Users/eph/.julia/registries/General/S/Strapping")
[...]
julia> result = find_packages(; registry=general_registry());

julia> summary(result)
"7213-element Vector{PkgSource}"
```
Do not abuse this function! Consider using the in-place function `analyze!(root, registry_entries)` to avoid re-cloning packages if you might run the analysis more than once.

Do not abuse this function!

!!! warning
Cloning all the repos in General will take more than 20 GB of disk space and can take up to a few hours to complete.

You can use RegistryInstance's `reachable_registries()` function to find other `RegistryInstance` objects to use for the `registry` keyword argument.

You can also use `find_packages_in_manifest` to use a Manifest.toml to lookup
packages and their versions. Besides handling release dependencies, this should also correctly handle
dev'd dependencies, and non-released `Pkg.add`'d dependencies. The helper `analyze_manifest` is provided
as a convenience to composing `find_packages_in_manifest` and `analyze_packages`.

## License information

Expand Down
2 changes: 1 addition & 1 deletion docs/src/serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ to save the results as a table. For example,

```@repl 1
using DataFrames, Arrow, PackageAnalyzer
results = analyze(find_packages("DataFrames", "Flux"));
results = analyze_packages(find_packages("DataFrames", "Flux"));
Arrow.write("packages.arrow", results)
roundtripped_results = DataFrame(Arrow.Table("packages.arrow"))
rm("packages.arrow") # hide
Expand Down
Loading

2 comments on commit f71ba76

@ericphanson
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/72818

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v1.0.0 -m "<description of version>" f71ba76a6a076bfa4538eb2a8ed87b0bbea239aa
git push origin v1.0.0

Please sign in to comment.