From 654a5a0b935154789d7f9bbe88b3d0b1e976a7d0 Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Mon, 19 Dec 2016 15:01:58 -0500 Subject: [PATCH 01/15] Proposal for default crate recommendation ranking --- text/0000-crates.io-default-ranking.md | 1440 ++++++++++++++++++++++++ 1 file changed, 1440 insertions(+) create mode 100644 text/0000-crates.io-default-ranking.md diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md new file mode 100644 index 00000000000..13bb2b6566a --- /dev/null +++ b/text/0000-crates.io-default-ranking.md @@ -0,0 +1,1440 @@ +- Feature Name: crates_io_default_ranking +- Start Date: 2016-12-19 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Crates.io has many useful libraries for a variety of purposes, but it's +difficult to find which crates are meant for a particular purpose and then to +decide among the available crates which one is most suitable in a particular +context. [Categorization][cat-pr] and [badges][badge-pr] are coming to +crates.io; categories help with finding a set of crates to consider and badges +help communicate attributes of crates. The question of how to order crates +within a category, or within the list of crates that have a particular keyword, +is still open. This RFC proposes a method of ranking crates combining number of +downloads, version, and other attributes in order to help people decide what +crate to use. + +[cat-pr]: https://github.com/rust-lang/crates.io/pull/473 +[badge-pr]: https://github.com/rust-lang/crates.io/pull/481 + +# Motivation +[motivation]: #motivation + +Finding and evaluating crates can be time consuming. People already familiar +with the Rust ecosystem often know which crates are best for which puproses, but +we want to share that knowledge with everyone. For example, someone looking for +a crate to help create a parser should be able to navigate to a category +for that purpose and get a list of crates to consider. This list would include +crates such as [nom][] and [peresil][], and the order in which they appear +should be significant and should help make the decision between the crates in +this category easier. + +[nom]: https://crates.io/crates/nom +[peresil]: https://github.com/docopt/docopt.rs + +This helps address the goal of "Rust should provide easy access to high quality +crates" as stated in the [Rust 2017 Roadmap][roadmap]. + +[roadmap]: https://github.com/rust-lang/rfcs/pull/1774 + +# Detailed design +[design]: #detailed-design + +Please see the [Appendix: Comparative Research][comparative-research] section +for ways that other package manager websites have solved this problem, and the +[Appendix: User Research][user-research] section for results of a user research +survey we did on how people evaluate crates by hand today. + +A few assumptions we made: + +- Measures that can be made automatically are preferred over measures that + would need administrators, curators, or the community to spend time on + manually. +- Measures that can be made for any crate regardless of that crate's choice of + version control, repository host, or CI service are preferred over measures + that would only be available or would be more easily available with git, + GitHub, Travis, and Appveyor. Our thinking is that when this additional + information is available, it would be better to display a badge indicating it + since this is valuable information, but it should not influence the ranking + of the crates. +- There are some measures, like "suitability for the current task" or "whether + I like the way the crate is implemented" that crates.io shouldn't even + attempt to assess, since those could potentially differ across situations for + the same person looking for a crate. +- We assume we will be able to calculate these in a reasonable amount of time + either on-demand or by a background job initiated on crate publish and saved + in the database as appropriate. We think the measures we have proposed can be + done without impacting the performance of either publishing or browsing + crates noticeably. If this does not turn out to be the case, we will have to + adjust the formula. + +## Factors + +Through [the survey we conducted][user-research], we found that when people +evaluate crates, they are looking primarily for approximate signals of: + +- Ease of use +- Maintenance +- Quality + +Feeding those signals are related measures of: + +- Popularity +- Credibility + +We detail how we propose to address each of these in turn, plus a rating of the +five crates from the user research survey as examples. + +We'd like to provide a coarse binning of the scores in each category, to avoid +over-analyzing the difference between, say, 72% and 78% and seeing significance +where there isn't really one. We've considered using letter grades, but those +often have emotional associations (F means you're a failure), when it should be +just an indicator of reality and not a value judgment. So we're also proposing +an option of an emoji scale and are open to other proposals: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PercentageLetter gradeEmoji
>= 90%A☀️
80-89%B🌤
70-79%C⛅️
60-69%D🌥
50-59%E☁️
<= 49%F🌧
+ +### Ease of use + +By far, the most common attribute people said they considered in the survey was +whether a crate had good documentation. Frequently mentioned when discussing +documentation was the desire to quickly find an example of how to use the crate. + +- Percentage of top-level items that have documentation + - We have created a proof-of-concept [cargo doc-coverage][] tool to count the + number of public items and the percentage of those that have/don't have + documentation. The overall documentation coverage didn't match our human + perceptions of well-documentedness from looking at the front page of + documentation, so we decided top-level items are more important than items + in submodules. For example, nom is 48% documented overall, but the + top-level items are extremely well documented, 170/195 or 87%. Our + definition of "top-level" counts the overall crate as an item. We think our + doc coverage POC can be modified to report this number. + - Would need to unpack and run this on each package version in a background + job started by a publish; then save the percentage in crates.io's database. + +- In the crate root documentation, presence of a section headed with the word + "Example" and containing a codeblock + - Existing issue, seen in the survey results is that people look in both the + README of the repo and the front page of the docs for examples. We have an + opportunity to encourage at least one to be present reliably. + - Increases the doc percentage score by 5% + +- Presence of files in `/examples` + - Future improvement: [render and link to examples in documentation][examples] + - Increases the doc percentage score by 5% + +[cargo doc-coverage]: https://crates.io/crates/cargo-doc-coverage +[examples]: https://github.com/rust-lang/cargo/issues/2760 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CrateDoc coverage of top-level itemsExample in the crate root docs bonus`/examples` bonusOverall Ease of Use score
peresil10/10, 100%5%0%105%, ☀️
combine43/44, 98%5%0%103%, ☀️
nom170/195, 87%5%0%92%, ☀️
lalrpop4/5, 80%0%0%80%, 🌤
peg2/3, 66%0%0%66%, 🌥
+ +### Maintenance + +- Last released version date: newer is better. This information is already + available in crates.io's database; could be stored in the database and + updated per-publish. Combined as follows, then reported as a percentage + relative to the most released crate. + - Number of releases in the last year - 10% + - Number of releases in the last 6 mo - 30% + - Number of releases in the last month - 60% + - Yanked versions are not counted. + +- Stable version number + - >= 1.0.0 ranks higher than < 1.0.0 + - >= 1.0.0 increases the maintenance score by 5%. + +- Number of owners: more is better. + - A GitHub group owner would count as 1. + - Future improvement: count # of people in the github group at version + publish time + - >= 3 owners increases the maintenance score by 5%. + + +We don't have the overall most actively released crate to compute a relative +release score, so for this analysis we're using the one out of these five +crates that has the most release activity, peg. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CrateReleases in last yearReleases in last 6 moReleases in last 1 moRelease scoreRelative Release scoreStable bonus# owners bonusOverall Maintenance score
peg13714100%0%0%100%, ☀️
nom8322.973%5%0%78%, ⛅️
combine7412.563%5%0%68%, 🌥
lalrpop6312.153%0%0%53%, ☁️
peresil100.13%0%0%3%, 🌧
+ +### Quality + +Given that so much of "quality" is subjective, we do not have a proposed +quality measure at this time. Involving CI might be useful, but that would +require taking a stand on supported 3rd party CI providers. The same problem +would exist with test coverage percentage. + +Measures we have considered but that we do not have tools to compute at this +time: + +- Number of unit and/or integration tests +- Ratio of test code to implementation code + +If the community feels the effort to create these tools would be worth the +information, we would investigate these further. + +### Popularity + +- Number of downloads weighted by time across all versions. Combined as + follows, then reported as a percentage relative to the most downloaded crate. + Can be calculated as part of the [update-downloads][] background job. + - Number of downloads in the last year - 10% + - Number of downloads in the last 6 mo - 30% + - Number of downloads in the last month - 60% + +[update-downloads]: https://github.com/rust-lang/crates.io/blob/master/src/bin/update-downloads.rs + + +Due to the data that the crates.io API currently exposes, we're approximating +our proposed formula. We're using downloads over all time to approximate +downloads in the last year, and downloads over the last 90 days to approximate +downloads in the last 6 months. + +Since we don't have the overall most downloaded crate to compute a relative +release score, for this analysis we're using the one out of these five +crates that has the highest download score, nom. + +Given the exponential nature of popular crates' downloads, we think percentile +is a more appropriate measure here. We are presenting both relative percentage +and percentile here for your consideration. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CrateDownloads all time (~year)Downloads in last 90 days (~6 mo)Downloads in last 1 moDownloads scoreRelative Downloads score %Relative Downloads score percentile
nom274,71582,97521,33565165100%, ☀️100%, ☀️
peg12,7352,1906931130117%, 🌧80%, 🌤
combine10,8094,2521,11530265%, 🌧60%, 🌥
lalrpop7,1081,92879617673%, 🌧40%, 🌧
peresil8,9601,85942717103%, 🌧20%, 🌧
+ +### Credibility + +We think credibility is an even more subjective measure than quality. We +considered using number of other crates an author has, but that would skew +heavily towards [retep998][]. Highlighting Rust team members is also a +possibility since people tend to regard them more highly, but there are many +crate authors who are not on any Rust team who are releasing excellent crates. +We have [an idea for a more personal "favorite authors" list][favs] that we +think would help indicate credibility. With this proposed feature, each person +can define credibility for themselves, which makes this measure less gameable +and less of a popularity contest. + +[retep998]: https://crates.io/users/retep998 +[favs]: https://github.com/rust-lang/crates.io/issues/494 + +### Overall + +Since documentation/ease of use was such a highly mentioned factor in peoples' decisions, we propose that, instead of averagaing the three scores, we weight ease of use by 2x and divide by 4 instead of 3. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CrateEase of useMaintenancePopularityOverall
nom92%, ☀️78%, ⛅️100%, ☀️91%, ☀️
combine103%, ☀️68%, 🌥60%, 🌥84%, 🌤
peg66%, 🌥100%, ☀️80%, 🌤78%, ⛅️
lalrpop80%, 🌤53%, ☁️40%, 🌧63%, 🌥
peresil105%, ☀️3%, 🌧20%, 🌧58%, ☁️
+ + +## Display + +On a list of crates, the letter representing the score in each category plus +the overall score would be displayed using text, color, symbols, and detail on +hover, with a link to a more thorough explanation. We like the information +density in the way [npms][] displays scores: + + + +## Out of scope + +This proposal is not advocating to change the order of **search results**; those +should still be ordered by relevancy to the query based on the indexed content. +We may want to have an option to sort search results by "recommended" or +whatever we want to call this sorting, but probably not change the default. + +# How do we teach this? + +A criticism we anticipate and that would be totally fair is that this formula +is too complex. If we go with this formula, we think it's important to make +available a clear explanation of why a crate has the score it does, for +transparency to both crate users and crate authors. [Ruby toolbox][ruby] has a +great example of what we'd like to provide. + +[ruby]: #ruby-toolbox + +A possible benefit of having multiple measures influence the ranking is making +it less likely that crate owners will go to the effort of gaming the formula in +order to have a higher ranking. + +# Drawbacks +[drawbacks]: #drawbacks + +We might create a system that incentivizes attributes that are not useful, or +worse, actively harmful to the Rust ecosystem. For example, the documentation +percentage could be gamed by having one line of uninformative documentation for +all public items, thus giving a score of 100% without the value that would come +with a fully documented library. We hope the community at large will agree +these attributes are valuable to approach in good faith, and that trying to +game the ranking will be easily discoverable. We could have a reporting +mechanism for crates that are attempting to inflate their ranking artificially, +and implement a way for administrators to impose a ranking penalty on these +crates instead. + +# Alternatives +[alternatives]: #alternatives + +## Manual curation + +1. We could keep the default ranking as number of downloads, and leave further +curation to sites like [Awesome Rust][]. + +[Awesome Rust]: https://github.com/kud1ing/awesome-rust + +2. We could build entirely manual ranking into crates.io, as [Ember Observer][] +does. This would be a lot of work that would need to be done by someone, but +would presumably result in higher quality evaluations and be less vulnerable to +gaming. + +[Ember Observer]: https://emberobserver.com/about + +3. We could add user ratings or reviews in the form of upvote/downvote, 1-5 +stars, and/or free text, and weight more recent ratings higher than older +ratings. This could have the usual problems that come with online rating +systems, such as spam, paid reviews, ratings influenced by personal +disagreements, etc. + +## More options instead of a default + +1. We could add filtering options for metadata, so that each user could choose, +for example, "show me only crates that work on stable" or "show me only crates +that have a version greater than 1.0". + +2. We could add independent axes of sorting criteria in addition to the existing +alphabetical and number of downloads, such as by number of owners or most +recent version release date. + +These sorting and filtering options would let each user choose exactly what's +important to them, which gives them more freedom, but this also pushes more +work onto the user. Crates.io would avoid taking a position on what "best" +means, which could prevent gaming of the system since crate authors wouldn't +know how users are ultimately sorting and filtering. We would probably want to +implement saved search configurations per user, so that people wouldn't have to +re-enter their criteria every time they wanted to do a similar search. + +# Unresolved questions +[unresolved]: #unresolved-questions + +- There might be metadata about crates that we haven't thought of yet that would +be useful. +- How do we change the ranking if we try something for a while and decide it's +not what we want? Would we need another RFC? +- How will we know this algorithm is working? + - We could do another survey + - We could ask for reports on an issue on crates.io of crates not being + ordered as people would expect + - Crates.io does have Google Analytics. We could compare the "funnels" of + navigating to crate pages after searches that are similar to categories. + This could potentially tell us if people start using categories at all + instead of searching, if searches for terms that have categories go down + and use of the categories go up. It might also be possible to see what + crate pages people end up on from search and from categories, to see if + they end up on "better" crates as a result of the ordering in categories. + It might be difficult to get the right data in a significant quantity for + this to be useful, though. + - We could wait and see if there are complaints on the various Rust forums + +# Appendix: Comparative Research +[comparative-research]: #appendix-comparative-research + +This is how other package hosting websites handle default sorting within +categories. + +## Django Packages + +[Django Packages][django] has the concept of [grids][], which are large tables +of packages in a particular category. Each package is a column, and each row is +some attribute of packages. The default ordering from left to right appears to +be GitHub stars. + +[django]: https://djangopackages.org/ +[grids]: https://djangopackages.org/grids/ + +Example of a Django Packages grid + +## Libhunt + +[Libhunt][libhunt] pulls libraries and categories from [Awesome Rust][], then +adds some metadata and navigation. + +The default ranking is relative popularity, measured by GitHub stars and scaled +to be a number out of 10 as compared to the most popular crate. The other +ordering offered is dev activity, which again is a score out of 10, relative to +all other crates, and calculated by giving a higher weight to more recent +commits. + +[libhunt]: https://rust.libhunt.com/ + +Example of a Libhunt category + +You can also choose to compare two libraries on a number of attributes: + +Example of comparing two crates on Libhunt + +## Maven Repository + +[Maven Repository][mvn] appears to order by the number of reverse dependencies +("# usages"): + +[mvn]: http://mvnrepository.com + +Example of a maven repository category + +## Pypi + +[Pypi][pypi] lets you choose multiple categories, which are not only based on +topic but also other attributes like library stability and operating system: + +[pypi]: https://pypi.python.org/pypi?%3Aaction=browse + +Example of filtering by Pypi categories + +Once you've selected categories and click the "show all" packages in these +categories link, the packages are in alphabetical order... but the alphabet +starts over multiple times... it's unclear from the interface why this is the +case. + +Example of Pypi ordering + +## GitHub Showcases + +To get incredibly meta, GitHub has the concept of [showcases][] for a variety +of topics, and they have [a showcase of package managers][show-pkg]. The +default ranking is by GitHub stars (cargo is 17/27 currently). + +[showcases]: https://github.com/showcases +[show-pkg]: https://github.com/showcases/package-managers + +Example of a GitHub showcase + +## Ruby toolbox + +[Ruby toolbox][rb] sorts by a relative popularity score, which is calculated +from a combination of GitHub stars/watchers and number of downloads: + +[rb]: https://www.ruby-toolbox.com + +How Ruby Toolbox's popularity ranking is calculated + +Category pages have a bar graph showing the top gems in that category, which +looks like a really useful way to quickly see the differences in relative +popularity. For example, this shows nokogiri is far and away the most popular +HTML parser: + +Example of Ruby Toolbox ordering + +Also of note is the amount of information shown by default, but with a +magnifying glass icon that, on hover or tap, reveals more information without a +page load/reload: + +Expanded Ruby Toolbox info + +## npms + +While [npms][] doesn't have categories, its search appears to do some exact +matching of the query and then rank the rest of the results [weighted][] by +three different scores: + +* score-effect:14: Set the effect that package scores have for the final search + score, defaults to 15.3 +* quality-weight:1: Set the weight that quality has for the each package score, + defaults to 1.95 +* popularity-weight:1: Set the weight that popularity has for the each package + score, defaults to 3.3 +* maintenance-weight:1: Set the weight that the quality has for the each + package score, defaults to 2.05 + +[npms]: https://npms.io +[weighted]: https://api-docs.npms.io/ + +Example npms search results + +There are [many factors][] that go into the three scores, and more are planned +to be added in the future. Implementation details are available in the +[architecture documentation][]. + +[many factors]: https://npms.io/about +[architecture documentation]: https://github.com/npms-io/npms-analyzer/blob/master/docs/architecture.md + +Explanation of the data analyzed by npms + +## Package Control (Sublime) + +[Package Control][] is for Sublime Text packages. It has Labels that are +roughly equivalent to categories: + +[Package Control]: https://packagecontrol.io/ + +Package Control homepage showing Labels like language syntax, snippets + +The only available ordering within a label is alphabetical, but each result has +the number of downloads plus badges for Sublime Text version compatibility, OS +compatibility, Top 25/100, and new/trending: + +Sample Package Control list of packages within a label, sorted alphabetically + +# Appendix: User Research +[user-research]: #appendix-user-research + +## Demographics + +We ran a survey for 1 week and got 134 responses. The responses we got seem to +be representative of the current Rust community: skewing heavily towards more +experienced programmers and just about evenly distributed between Rust +experience starting before 1.0, since 1.0, in the last year, and in the last 6 +months, with a slight bias towards longer amounts of experience. 0 Graydons +responded to the survey. + +Distribution of programming experience of survey repsondents, over half have been programming for over 10 years + +Distribution of Rust experience of survey respondents, slightly biased towards those who have been using Rust before 1.0 and since 1.0 over those with less than a year and less than 6 months + +Since this matches about what we'd expect of the Rust community, we believe +this survey is representative. Given the bias towards more experience +programming, we think the answers are worthy of using to inform recommendations +crates.io will be making to programmers of all experience levels. + +## Crate ranking agreement + +The community ranking of the 5 crates presented in the survey for which order +people would try them out for parsing comes out to be: + +1. nom +2. combine +3. and 4. peg and lalrpop, in some order +5. peresil + +This chart shows how many people ranked the crates in each slot: + +Raw votes for each crate in each slot, showing that nom and combine are pretty clearly 1 and 2, peresil is clearly 5, and peg and lalrpop both got slotted in 4th most often + +This chart shows the cumulative number of votes: each slot contains the number +of votes each crate got for that ranking or above. + + + +Whatever default ranking formula we come up with in this RFC, when applied to +these 5 crates, it should generate an order for the crates that aligns with the +community ordering. Also, not everyone will agree with the crates.io ranking, +so we should display other information and provide alternate filtering and +sorting mechanisms so that people who prioritize different attributes than the +majority of the community will be able to find what they are looking for. + +## Factors considered when ranking crates + +The following table shows the top 25 mentioned factors for the two free answer +sections. We asked both "Please explain what information you used to evaluate +the crates and how that information influenced your ranking." and "Was there +any information you wish was available, or that would have taken more than 15 +minutes for you to get?", but some of the same factors were deemed to take too +long to find out or not be easily available, while others did consider those, +so we've ranked by the combination of mentions of these factors in both +questions. + +Far and away, good documentation was the most mentioned factor people used to +evaluate which crates to try. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+   + + Feature + + Used in evaluation + + Not available/too much time needed + + Total + + Notes +
+ 1 + + Good documentation + + 94 + + 10 + + 104 + +   +
+ 2 + + README + + 42 + + 19 + + 61 + +   +
+ 3 + + Number of downloads + + 58 + + 0 + + 58 + +   +
+ 4 + + Most recent version date + + 54 + + 0 + + 54 + +   +
+ 5 + + Obvious / easy to find usage examples + + 37 + + 14 + + 51 + +   +
+ 6 + + Examples in the repo + + 38 + + 6 + + 44 + +   +
+ 7 + + Reputation of the author + + 36 + + 3 + + 39 + +   +
+ 8 + + Description or README containing Introduction / goals / value prop / use cases + + 29 + + 5 + + 34 + +   +
+ 9 + + Number of reverse dependencies (Dependent Crates) + + 23 + + 7 + + 30 + +   +
+ 10 + + Version >= 1.0.0 + + 30 + + 0 + + 30 + +   +
+ 11 + + Commit activity + + 23 + + 6 + + 29 + + Depends on VCS +
+ 12 + + Fits use case + + 26 + + 3 + + 29 + + Situational +
+ 13 + + Number of dependencies (more = worse) + + 28 + + 0 + + 28 + +   +
+ 14 + + Number of open issues, activity on issues + + 22 + + 6 + + 28 + + Depends on GitHub +
+ 15 + + Easy to use or understand + + 27 + + 0 + + 27 + + Situational +
+ 16 + + Publicity (blog posts, reddit, urlo, "have I heard of it") + + 25 + + 0 + + 25 + +   +
+ 17 + + Most recent commit date + + 17 + + 5 + + 22 + + Dependent on VCS +
+ 18 + + Implementation details + + 22 + + 0 + + 22 + + Situational +
+ 19 + + Nice API + + 22 + + 0 + + 22 + + Situational +
+ 20 + + Mentioned using/wanting to use docs.rs + + 8 + + 13 + + 21 + +   +
+ 21 + + Tutorials + + 18 + + 3 + + 21 + +   +
+ 22 + + Number or frequency of released versions + + 19 + + 1 + + 20 + +   +
+ 23 + + Number of maintainers/contributors + + 12 + + 6 + + 18 + + Depends on VCS +
+ 24 + + CI results + + 15 + + 2 + + 17 + + Depends on CI service +
+ 25 + + Whether the crate works on nightly, stable, particular stable versions + + 8 + + 8 + + 16 + +   +
+ +## Relevant quotes motivating our choice of factors + +### Easy to use + +> 1) Documentation linked from crates.io 2) Documentation contains decent +> example on front page + +----- + +> 3. "Docs Coverage" info - I'm not sure if there's a way to get that right +> now, but this is almost more important that test coverage. + +----- + +> rust docs: Is there an intro and example on the top-level page? are the +> rustdoc examples detailed enough to cover a range of usecases? can i avoid +> reading through the files in the examples folder? + +----- + +> Documentation: +> - Is there a README? Does it give me example usage of the library? Point me +> to more details? +> - Are functions themselves documented? +> - Does the documentation appear to be up to date? + +----- + +> The GitHub repository pages, because there are no examples or detailed +> descriptions on crates.io. From the GitHub readme I first checked the readme +> itself for a code example, to get a feeling for the library. Then I looked +> for links to documentation or tutorials and examples. The crates that did not +> have this I discarded immediately. + +----- + +> When evaluating any library from crates.io, I first follow the repository +> link -- often the readme is enough to know whether or not I like the actual +> library structure. For me personally a library's usability is much more +> important than performance concerns, so I look for code samples that show me +> how the library is used. In the examples given, only peresil forces me to +> look at the actual documentation to find an example of use. I want something +> more than "check the docs" in a readme in regards to getting started. + +----- + +> I would like the entire README.md of each package to be visible on crates.io +> I would like a culture where each README.md contains a runnable example + +----- + +Ok, this one isn't from the survey, it's from [a Sept 2015 internals thread][]: + +[a Sept 2015 internals thread]: https://users.rust-lang.org/t/lets-talk-about-ecosystem-documentation/2791/24?u=carols10cents + +>> there should be indicator in Crates.io that show how much code is +>> documented, this would help with choosing well done package. +> +> I really love this idea! Showing a percentage or a little progress bar next +> to each crate with the proportion of public items with at least some docs +> would be a great starting point. + +### Maintenance + +> On nom's crates.io page I checked the version (2.0.0) and when the latest +> version came out (less than a month ago). I know that versioning is +> inconsistent across crates, but I'm reassured when a crate has V >= 1.0 +> because it typically indicates that the authors are confident the crate is +> production-ready. I also like to see multiple, relatively-recent releases +> because it signals the authors are serious about maintenance. + +----- + +> Answering yes scores points: crates.io page: Does the crate have a major +> version >= 1? Has there been a release recently, and maybe even a steady +> stream of minor or patch-level releases? + +----- + +> From github: +> * Number of commits and of contributors (A small number of commits (< 100) +> and of contributors (< 3) is often the sign of a personal project, probably +> not very much used except by its author. All other things equal, I tend to +> prefer active projects.); + + +### Quality + +> Tests: +> - Is critical functionality well tested? +> - Is the entire package well tested? +> - Are the tests clear and descriptive? +> - Could I reimplement the library based on these tests? +> - Does the project have CI? +> - Is master green? + +### Popularity/credibility + +> 2) I look at the number of download. If it is too small (~ <1000), I assume +> the crate has not yet reached a good quality. nom catches my attention +> because it has 200K download: I assume it is a high quality crate. + +----- + +> 1. Compare the number of downloads: More downloads = more popular = should be +> the best + +----- + +> Popularity: - Although not being a huge factor, it can help tip the scale +> when one is more popular or well supported than another when all other +> factors are close. + +### Overall + +> I can't pick a most important trait because certain ones outweigh others when +> combined, etc. I.e. number of downloads is OK, but may only suggest that it's +> been around the longest. Same with number of dependent crates (which probably +> spikes number of downloads). I like a crate that is well documented, has a +> large user base (# dependent crates + downloads + stars), is post 1.0, is +> active (i.e. a release within the past 6 months?), and it helps when it's a +> prominent author (but that I feel is an unfair metric). + +## Relevant bugs capturing other feedback + +There was a wealth of good ideas and feedback in the survey answers, but not +all of it pertained to crate ranking directly. Commonly mentioned improvements +that could greatly help the usability and usefulness of crates.io included: + +* [Rendering the README on crates.io](https://github.com/rust-lang/crates.io/issues/81) +* [Linking to docs.rs if the crate hasn't specified a Documentation link](https://github.com/rust-lang/crates.io/pull/459) +* [`cargo doc` should render crate examples and link to them on main documentation page](https://github.com/rust-lang/cargo/issues/2760) +* [`cargo doc` could support building/testing standalone markdown files](https://github.com/rust-lang/cargo/issues/739) +* [Allow documentation to be read from an external file](https://github.com/rust-lang/rust/issues/15470) +* [Have "favorite authors" and highlight crates by your favorite authors in crate lists](https://github.com/rust-lang/crates.io/issues/494) +* [Show the number of reverse dependencies next to the link](https://github.com/rust-lang/crates.io/issues/496) +* [Reverse dependencies should be ordered by number of downloads by default](https://github.com/rust-lang/crates.io/issues/495) From 66056b7727eaf35409bc431f76bb058e17c4d19d Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Mon, 19 Dec 2016 17:09:08 -0500 Subject: [PATCH 02/15] Update incorrect link --- text/0000-crates.io-default-ranking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index 13bb2b6566a..82c2e7d8e05 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -33,7 +33,7 @@ should be significant and should help make the decision between the crates in this category easier. [nom]: https://crates.io/crates/nom -[peresil]: https://github.com/docopt/docopt.rs +[peresil]: https://crates.io/crates/peresil This helps address the goal of "Rust should provide easy access to high quality crates" as stated in the [Rust 2017 Roadmap][roadmap]. From 1fab32acc26d987663ca2f58246c913c5e983ede Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Mon, 19 Dec 2016 17:10:41 -0500 Subject: [PATCH 03/15] Change list to get around markdown's auto numbering --- text/0000-crates.io-default-ranking.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index 82c2e7d8e05..19bb8dd2d9a 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -743,10 +743,13 @@ crates.io will be making to programmers of all experience levels. The community ranking of the 5 crates presented in the survey for which order people would try them out for parsing comes out to be: -1. nom -2. combine -3. and 4. peg and lalrpop, in some order -5. peresil +1.) nom + +2.) combine + +3.) and 4.) peg and lalrpop, in some order + +5.) peresil This chart shows how many people ranked the crates in each slot: From 0ba29a5355e19edbed27d46e6881b28f83460494 Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Tue, 20 Dec 2016 14:04:25 -0500 Subject: [PATCH 04/15] Alternate proposal for Maintenance --- text/0000-crates.io-default-ranking.md | 130 ++++++++----------------- 1 file changed, 38 insertions(+), 92 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index 19bb8dd2d9a..ac8562c1db6 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -214,98 +214,44 @@ documentation was the desire to quickly find an example of how to use the crate. ### Maintenance -- Last released version date: newer is better. This information is already - available in crates.io's database; could be stored in the database and - updated per-publish. Combined as follows, then reported as a percentage - relative to the most released crate. - - Number of releases in the last year - 10% - - Number of releases in the last 6 mo - 30% - - Number of releases in the last month - 60% - - Yanked versions are not counted. - -- Stable version number - - >= 1.0.0 ranks higher than < 1.0.0 - - >= 1.0.0 increases the maintenance score by 5%. - -- Number of owners: more is better. - - A GitHub group owner would count as 1. - - Future improvement: count # of people in the github group at version - publish time - - >= 3 owners increases the maintenance score by 5%. - - -We don't have the overall most actively released crate to compute a relative -release score, so for this analysis we're using the one out of these five -crates that has the most release activity, peg. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CrateReleases in last yearReleases in last 6 moReleases in last 1 moRelease scoreRelative Release scoreStable bonus# owners bonusOverall Maintenance score
peg13714100%0%0%100%, ☀️
nom8322.973%5%0%78%, ⛅️
combine7412.563%5%0%68%, 🌥
lalrpop6312.153%0%0%53%, ☁️
peresil100.13%0%0%3%, 🌧
+We can add an optional attribute to Cargo.toml that crate authors could use to +self-report their maintenance intentions. The valid values would be along the +lines of the following, and would influence the ranking in the order they're +presented: + +- **Actively developed**, meaning new features are being added and bugs are + being fixed +- **Passively maintained**, meaning there are no plans for new features, but + the maintainer intends to respond to issues that get filed +- **As-is**, meaning the crate is feature complete, the maintainer does not + intend to continue working on it or providing support, but it works for the + purposes it was designed for +- None, we don't display anything, since the maintainer has not chosen to + specify their intentions, potential crate users will need to investigate on + their own +- **Experimental**, meaning the author wants to share it with the community but + is not intending to meet anyone's particular use case +- **Looking for maintainer**, meaning the current maintainer would like to give + up the crate to someone else + +These would be displayed as badges on lists of crates. + +These levels would not have any time commitments attached to them-- maintainers +who would like to batch changes into releases every 6 months could report +"actively developed" just as much as mantainers who like to release every 6 +weeks. This would need to be clearly communicated to set crate user +expectations properly. + +This is also inherently a crate author's statement of current intentions, which +may get out of sync with the reality of the crate's maintenance over time. + +If I had to guess for the maintainers of the parsing crates, I would assume: + +* nom: actively developed +* combine: actively developed +* lalrpop: actively developed +* peg: actively developed +* peresil: passively maintained ### Quality From 8d5208159b3ccdf842ba725a83d6a98bfbca24f2 Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Tue, 20 Dec 2016 14:22:11 -0500 Subject: [PATCH 05/15] Alternate proposal for Popularity --- text/0000-crates.io-default-ranking.md | 85 +++----------------------- 1 file changed, 8 insertions(+), 77 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index ac8562c1db6..66f69411e2a 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -271,86 +271,17 @@ information, we would investigate these further. ### Popularity -- Number of downloads weighted by time across all versions. Combined as - follows, then reported as a percentage relative to the most downloaded crate. - Can be calculated as part of the [update-downloads][] background job. - - Number of downloads in the last year - 10% - - Number of downloads in the last 6 mo - 30% - - Number of downloads in the last month - 60% +- Number of downloads in the last 90 days, and the top, say, 10% most + downloaded would get a bump in ranking and a badge that says "frequently + downloaded". Can be calculated as part of the [update-downloads][] background + job. [update-downloads]: https://github.com/rust-lang/crates.io/blob/master/src/bin/update-downloads.rs - -Due to the data that the crates.io API currently exposes, we're approximating -our proposed formula. We're using downloads over all time to approximate -downloads in the last year, and downloads over the last 90 days to approximate -downloads in the last 6 months. - -Since we don't have the overall most downloaded crate to compute a relative -release score, for this analysis we're using the one out of these five -crates that has the highest download score, nom. - -Given the exponential nature of popular crates' downloads, we think percentile -is a more appropriate measure here. We are presenting both relative percentage -and percentile here for your consideration. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CrateDownloads all time (~year)Downloads in last 90 days (~6 mo)Downloads in last 1 moDownloads scoreRelative Downloads score %Relative Downloads score percentile
nom274,71582,97521,33565165100%, ☀️100%, ☀️
peg12,7352,1906931130117%, 🌧80%, 🌤
combine10,8094,2521,11530265%, 🌧60%, 🌥
lalrpop7,1081,92879617673%, 🌧40%, 🌧
peresil8,9601,85942717103%, 🌧20%, 🌧
+With this proposal, out of the 5 parser crates assuming these are the only +crates on crates.io, nom would be marked as "frequently downloaded" and the +others would not. nom is currently ranked at #83 in the list of crates by +number of downloads, which easily puts it in the top 10% out of 7,239 crates. ### Credibility From 5ca22d1bf284b914aa92c2800f68b1451df0a381 Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Tue, 20 Dec 2016 14:23:24 -0500 Subject: [PATCH 06/15] Remove section on display; addressing within each measure instead --- text/0000-crates.io-default-ranking.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index 66f69411e2a..9f647c4b34e 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -348,15 +348,6 @@ Since documentation/ease of use was such a highly mentioned factor in peoples' d -## Display - -On a list of crates, the letter representing the score in each category plus -the overall score would be displayed using text, color, symbols, and detail on -hover, with a link to a more thorough explanation. We like the information -density in the way [npms][] displays scores: - - - ## Out of scope This proposal is not advocating to change the order of **search results**; those From 6a0c4dc9fc3994ca020afbfff44ef04c7717bb4b Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Tue, 20 Dec 2016 14:36:22 -0500 Subject: [PATCH 07/15] Reword a confusing statement --- text/0000-crates.io-default-ranking.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index 9f647c4b34e..decf8935c48 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -80,7 +80,8 @@ evaluate crates, they are looking primarily for approximate signals of: - Maintenance - Quality -Feeding those signals are related measures of: +Cited as secondary signals that were used to infer that the primary signals are +good as well: - Popularity - Credibility From f7644b14583ecba77b9cda0397a718f049f8779f Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Tue, 20 Dec 2016 15:34:49 -0500 Subject: [PATCH 08/15] Alternate proposal for Ease of Use/Documentation --- text/0000-crates.io-default-ranking.md | 125 +++++++++++-------------- 1 file changed, 53 insertions(+), 72 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index decf8935c48..8fa1f816cbd 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -140,78 +140,59 @@ By far, the most common attribute people said they considered in the survey was whether a crate had good documentation. Frequently mentioned when discussing documentation was the desire to quickly find an example of how to use the crate. -- Percentage of top-level items that have documentation - - We have created a proof-of-concept [cargo doc-coverage][] tool to count the - number of public items and the percentage of those that have/don't have - documentation. The overall documentation coverage didn't match our human - perceptions of well-documentedness from looking at the front page of - documentation, so we decided top-level items are more important than items - in submodules. For example, nom is 48% documented overall, but the - top-level items are extremely well documented, 170/195 or 87%. Our - definition of "top-level" counts the overall crate as an item. We think our - doc coverage POC can be modified to report this number. - - Would need to unpack and run this on each package version in a background - job started by a publish; then save the percentage in crates.io's database. - -- In the crate root documentation, presence of a section headed with the word - "Example" and containing a codeblock - - Existing issue, seen in the survey results is that people look in both the - README of the repo and the front page of the docs for examples. We have an - opportunity to encourage at least one to be present reliably. - - Increases the doc percentage score by 5% - -- Presence of files in `/examples` - - Future improvement: [render and link to examples in documentation][examples] - - Increases the doc percentage score by 5% - -[cargo doc-coverage]: https://crates.io/crates/cargo-doc-coverage -[examples]: https://github.com/rust-lang/cargo/issues/2760 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CrateDoc coverage of top-level itemsExample in the crate root docs bonus`/examples` bonusOverall Ease of Use score
peresil10/10, 100%5%0%105%, ☀️
combine43/44, 98%5%0%103%, ☀️
nom170/195, 87%5%0%92%, ☀️
lalrpop4/5, 80%0%0%80%, 🌤
peg2/3, 66%0%0%66%, 🌥
+- Number of lines of documentation in Rust files: + `grep -r \/\/[\!\/] --binary-files=without-match --include=*.rs . | wc -l` +- Number of lines in the README file, if specified in Cargo.toml +- Number of lines in Rust files: `find . -name '*.rs' | xargs wc -l` + +We would then add the lines in the README to the lines of documentation and +subtract the lines of documentation from the total lines of code in order to +get the ratio of documentation to code. Test code (and any documentation within +test code) *is* part of this calculation. + +Any crate getting in the top 20% of all crates would get a badge saying "well +documented". + +Additionally, lists of crates would have a badge showing the number of files in +the standard `/examples` directory, if any. A further enhancement would be to +make that badge link to the examples displayed somewhere (crates.io? in the +repository? in the documentation?). + +* combine: + * 1,195 lines of documentation + * 99 lines in README.md + * 5,815 lines of Rust + * (1195 + 99) / (5815 - 1195) = 1294/4620 = .28 + +* nom: + * 2,263 lines of documentation + * 372 lines in README.md + * 15,661 lines of Rust + * (2263 + 372) / (15661 - 2263) = 2635/13398 = .20 + +* peresil: + * 159 lines of documentation + * 20 lines in README.md + * 1,341 lines of Rust + * (159 + 20) / (1341 - 159) = 179/1182 = .15 + +* lalrpop: ([in the /lalrpop directory in the repo][lalrpop-repo]) + * 742 lines of documentation + * 110 lines in ../README.md + * 94,104 lines of Rust + * (742 + 110) / (94104 - 742) = 852/93362 = .01 + +* peg: + * 3 lines of documentation + * no readme specified in Cargo.toml + * 1,531 lines of Rust + * (3 + 0) / (1531 - 3) = 3/1528 = .00 + +[lalrpop-repo]: https://github.com/nikomatsakis/lalrpop/tree/master/lalrpop + +If we assume these are all the crates on crates.io for this example, then +combine is the top 20% and would get a badge. None of the crates have files in +`/examples`, so none would have the examples badge. ### Maintenance From e8773a242eda94b1bf16e6079ca8d69afd5bc98b Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Tue, 20 Dec 2016 15:35:21 -0500 Subject: [PATCH 09/15] Remove all emoji and letter grades, overall ranking is WIP --- text/0000-crates.io-default-ranking.md | 93 +------------------------- 1 file changed, 1 insertion(+), 92 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index 8fa1f816cbd..c3599ec7c75 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -89,51 +89,6 @@ good as well: We detail how we propose to address each of these in turn, plus a rating of the five crates from the user research survey as examples. -We'd like to provide a coarse binning of the scores in each category, to avoid -over-analyzing the difference between, say, 72% and 78% and seeing significance -where there isn't really one. We've considered using letter grades, but those -often have emotional associations (F means you're a failure), when it should be -just an indicator of reality and not a value judgment. So we're also proposing -an option of an emoji scale and are open to other proposals: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PercentageLetter gradeEmoji
>= 90%A☀️
80-89%B🌤
70-79%C⛅️
60-69%D🌥
50-59%E☁️
<= 49%F🌧
- ### Ease of use By far, the most common attribute people said they considered in the survey was @@ -282,53 +237,7 @@ and less of a popularity contest. ### Overall -Since documentation/ease of use was such a highly mentioned factor in peoples' decisions, we propose that, instead of averagaing the three scores, we weight ease of use by 2x and divide by 4 instead of 3. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CrateEase of useMaintenancePopularityOverall
nom92%, ☀️78%, ⛅️100%, ☀️91%, ☀️
combine103%, ☀️68%, 🌥60%, 🌥84%, 🌤
peg66%, 🌥100%, ☀️80%, 🌤78%, ⛅️
lalrpop80%, 🌤53%, ☁️40%, 🌧63%, 🌥
peresil105%, ☀️3%, 🌧20%, 🌧58%, ☁️
- +(Combining the new proposals for an overall ranking is a work in progress) ## Out of scope From 75747a28db98c2c4b6e49725a385085443737569 Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Tue, 20 Dec 2016 16:41:40 -0500 Subject: [PATCH 10/15] Convert HTML table to markdown --- text/0000-crates.io-default-ranking.md | 549 ++----------------------- 1 file changed, 27 insertions(+), 522 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index c3599ec7c75..dfc9ce9d746 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -540,528 +540,33 @@ questions. Far and away, good documentation was the most mentioned factor people used to evaluate which crates to try. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-   - - Feature - - Used in evaluation - - Not available/too much time needed - - Total - - Notes -
- 1 - - Good documentation - - 94 - - 10 - - 104 - -   -
- 2 - - README - - 42 - - 19 - - 61 - -   -
- 3 - - Number of downloads - - 58 - - 0 - - 58 - -   -
- 4 - - Most recent version date - - 54 - - 0 - - 54 - -   -
- 5 - - Obvious / easy to find usage examples - - 37 - - 14 - - 51 - -   -
- 6 - - Examples in the repo - - 38 - - 6 - - 44 - -   -
- 7 - - Reputation of the author - - 36 - - 3 - - 39 - -   -
- 8 - - Description or README containing Introduction / goals / value prop / use cases - - 29 - - 5 - - 34 - -   -
- 9 - - Number of reverse dependencies (Dependent Crates) - - 23 - - 7 - - 30 - -   -
- 10 - - Version >= 1.0.0 - - 30 - - 0 - - 30 - -   -
- 11 - - Commit activity - - 23 - - 6 - - 29 - - Depends on VCS -
- 12 - - Fits use case - - 26 - - 3 - - 29 - - Situational -
- 13 - - Number of dependencies (more = worse) - - 28 - - 0 - - 28 - -   -
- 14 - - Number of open issues, activity on issues - - 22 - - 6 - - 28 - - Depends on GitHub -
- 15 - - Easy to use or understand - - 27 - - 0 - - 27 - - Situational -
- 16 - - Publicity (blog posts, reddit, urlo, "have I heard of it") - - 25 - - 0 - - 25 - -   -
- 17 - - Most recent commit date - - 17 - - 5 - - 22 - - Dependent on VCS -
- 18 - - Implementation details - - 22 - - 0 - - 22 - - Situational -
- 19 - - Nice API - - 22 - - 0 - - 22 - - Situational -
- 20 - - Mentioned using/wanting to use docs.rs - - 8 - - 13 - - 21 - -   -
- 21 - - Tutorials - - 18 - - 3 - - 21 - -   -
- 22 - - Number or frequency of released versions - - 19 - - 1 - - 20 - -   -
- 23 - - Number of maintainers/contributors - - 12 - - 6 - - 18 - - Depends on VCS -
- 24 - - CI results - - 15 - - 2 - - 17 - - Depends on CI service -
- 25 - - Whether the crate works on nightly, stable, particular stable versions - - 8 - - 8 - - 16 - -   -
+| | Feature | Used in evaluation | Not available/too much time needed | Total | Notes | | | | +|----|--------------------------------------------------------------------------------|----------------------|------------------------------------|---------------------------|-----------------------|-------------------|----|--| +| 1 | Good documentation | 94 | 10 | 104 | | | | | +| 2 | README | 42 | 19 | 61 | | | | | +| 3 | Number of downloads | 58 | 0 | 58 | | | | | +| 4 | Most recent version date | 54 | 0 | 54 | | | | | +| 5 | Obvious / easy to find usage examples | 37 | 14 | 51 | | | | | +| 6 | Examples in the repo | 38 | 6 | 44 | | | | | +| 7 | Reputation of the author | 36 | 3 | 39 | | | | | +| 8 | Description or README containing Introduction / goals / value prop / use cases | 29 | 5 | 34 | | | | | +| 9 | Number of reverse dependencies (Dependent Crates) | 23 | 7 | 30 | | | | | +| 10 | Version >= 1.0.0 | 30 | 0 | 30 | | | | | +| 11 | Commit activity | 23 | 6 | 29 | Depends on VCS | | | | +| 12 | Fits use case | 26 | 3 | 29 | Situational | | | | +| 13 | Number of dependencies (more = worse) | 28 | 0 | 28 | | | | | +| 14 | "Number of open issues | activity on issues" | 22 | 6 | 28 | Depends on GitHub | | | +| 15 | Easy to use or understand | 27 | 0 | 27 | Situational | | | | +| 16 | "Publicity (blog posts | reddit | urlo | ""have I heard of it"")" | 25 | 0 | 25 | | +| 17 | Most recent commit date | 17 | 5 | 22 | Dependent on VCS | | | | +| 18 | Implementation details | 22 | 0 | 22 | Situational | | | | +| 19 | Nice API | 22 | 0 | 22 | Situational | | | | +| 20 | Mentioned using/wanting to use docs.rs | 8 | 13 | 21 | | | | | +| 21 | Tutorials | 18 | 3 | 21 | | | | | +| 22 | Number or frequency of released versions | 19 | 1 | 20 | | | | | +| 23 | Number of maintainers/contributors | 12 | 6 | 18 | Depends on VCS | | | | +| 24 | CI results | 15 | 2 | 17 | Depends on CI service | | | | +| 25 | "Whether the crate works on nightly | stable | particular stable versions" | 8 | 8 | 16 | | | ## Relevant quotes motivating our choice of factors From 7fee44dabcba00e6807300fa599191a8b905bfc7 Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Tue, 20 Dec 2016 16:51:26 -0500 Subject: [PATCH 11/15] Fix the markdown table computers are terrible --- text/0000-crates.io-default-ranking.md | 54 +++++++++++++------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index dfc9ce9d746..ad764a42ff1 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -540,33 +540,33 @@ questions. Far and away, good documentation was the most mentioned factor people used to evaluate which crates to try. -| | Feature | Used in evaluation | Not available/too much time needed | Total | Notes | | | | -|----|--------------------------------------------------------------------------------|----------------------|------------------------------------|---------------------------|-----------------------|-------------------|----|--| -| 1 | Good documentation | 94 | 10 | 104 | | | | | -| 2 | README | 42 | 19 | 61 | | | | | -| 3 | Number of downloads | 58 | 0 | 58 | | | | | -| 4 | Most recent version date | 54 | 0 | 54 | | | | | -| 5 | Obvious / easy to find usage examples | 37 | 14 | 51 | | | | | -| 6 | Examples in the repo | 38 | 6 | 44 | | | | | -| 7 | Reputation of the author | 36 | 3 | 39 | | | | | -| 8 | Description or README containing Introduction / goals / value prop / use cases | 29 | 5 | 34 | | | | | -| 9 | Number of reverse dependencies (Dependent Crates) | 23 | 7 | 30 | | | | | -| 10 | Version >= 1.0.0 | 30 | 0 | 30 | | | | | -| 11 | Commit activity | 23 | 6 | 29 | Depends on VCS | | | | -| 12 | Fits use case | 26 | 3 | 29 | Situational | | | | -| 13 | Number of dependencies (more = worse) | 28 | 0 | 28 | | | | | -| 14 | "Number of open issues | activity on issues" | 22 | 6 | 28 | Depends on GitHub | | | -| 15 | Easy to use or understand | 27 | 0 | 27 | Situational | | | | -| 16 | "Publicity (blog posts | reddit | urlo | ""have I heard of it"")" | 25 | 0 | 25 | | -| 17 | Most recent commit date | 17 | 5 | 22 | Dependent on VCS | | | | -| 18 | Implementation details | 22 | 0 | 22 | Situational | | | | -| 19 | Nice API | 22 | 0 | 22 | Situational | | | | -| 20 | Mentioned using/wanting to use docs.rs | 8 | 13 | 21 | | | | | -| 21 | Tutorials | 18 | 3 | 21 | | | | | -| 22 | Number or frequency of released versions | 19 | 1 | 20 | | | | | -| 23 | Number of maintainers/contributors | 12 | 6 | 18 | Depends on VCS | | | | -| 24 | CI results | 15 | 2 | 17 | Depends on CI service | | | | -| 25 | "Whether the crate works on nightly | stable | particular stable versions" | 8 | 8 | 16 | | | +| | Feature | Used in evaluation | Not available/too much time needed | Total | Notes | +|----|--------------------------------------------------------------------------------|----------------------|------------------------------------|---------------------------|-----------------------| +| 1 | Good documentation | 94 | 10 | 104 | | +| 2 | README | 42 | 19 | 61 | | +| 3 | Number of downloads | 58 | 0 | 58 | | +| 4 | Most recent version date | 54 | 0 | 54 | | +| 5 | Obvious / easy to find usage examples | 37 | 14 | 51 | | +| 6 | Examples in the repo | 38 | 6 | 44 | | +| 7 | Reputation of the author | 36 | 3 | 39 | | +| 8 | Description or README containing Introduction / goals / value prop / use cases | 29 | 5 | 34 | | +| 9 | Number of reverse dependencies (Dependent Crates) | 23 | 7 | 30 | | +| 10 | Version >= 1.0.0 | 30 | 0 | 30 | | +| 11 | Commit activity | 23 | 6 | 29 | Depends on VCS | +| 12 | Fits use case | 26 | 3 | 29 | Situational | +| 13 | Number of dependencies (more = worse) | 28 | 0 | 28 | | +| 14 | Number of open issues, activity on issues" | 22 | 6 | 28 | Depends on GitHub | +| 15 | Easy to use or understand | 27 | 0 | 27 | Situational | +| 16 | Publicity (blog posts, reddit, urlo, "have I heard of it") | 25 | 0 | 25 | | +| 17 | Most recent commit date | 17 | 5 | 22 | Dependent on VCS | +| 18 | Implementation details | 22 | 0 | 22 | Situational | +| 19 | Nice API | 22 | 0 | 22 | Situational | +| 20 | Mentioned using/wanting to use docs.rs | 8 | 13 | 21 | | +| 21 | Tutorials | 18 | 3 | 21 | | +| 22 | Number or frequency of released versions | 19 | 1 | 20 | | +| 23 | Number of maintainers/contributors | 12 | 6 | 18 | Depends on VCS | +| 24 | CI results | 15 | 2 | 17 | Depends on CI service | +| 25 | Whether the crate works on nightly, stable, particular stable versions | 8 | 8 | 16 | | ## Relevant quotes motivating our choice of factors From 2108e8179c98ea64e935969170dd90a48f781f29 Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Wed, 21 Dec 2016 11:28:18 -0500 Subject: [PATCH 12/15] Make the problem this RFC is solving clearer --- text/0000-crates.io-default-ranking.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index ad764a42ff1..9906f40a8e8 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -11,11 +11,11 @@ difficult to find which crates are meant for a particular purpose and then to decide among the available crates which one is most suitable in a particular context. [Categorization][cat-pr] and [badges][badge-pr] are coming to crates.io; categories help with finding a set of crates to consider and badges -help communicate attributes of crates. The question of how to order crates -within a category, or within the list of crates that have a particular keyword, -is still open. This RFC proposes a method of ranking crates combining number of -downloads, version, and other attributes in order to help people decide what -crate to use. +help communicate attributes of crates. + +**This RFC aims to create a default ranking of crates within a list of crates +that have a category or keyword in order to make a recommendation to crate users +about which crates are likely to deserve further manual evaluation.** [cat-pr]: https://github.com/rust-lang/crates.io/pull/473 [badge-pr]: https://github.com/rust-lang/crates.io/pull/481 From 3a7ff91d10b13116d723a339cb014b74c9e0d08b Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Mon, 16 Jan 2017 21:24:56 -0500 Subject: [PATCH 13/15] Making a third large revision to the proposal --- text/0000-crates.io-default-ranking.md | 246 ++++++++++--------------- 1 file changed, 102 insertions(+), 144 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index 9906f40a8e8..aaedb188930 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -11,7 +11,7 @@ difficult to find which crates are meant for a particular purpose and then to decide among the available crates which one is most suitable in a particular context. [Categorization][cat-pr] and [badges][badge-pr] are coming to crates.io; categories help with finding a set of crates to consider and badges -help communicate attributes of crates. +help communicate attributes of crates. **This RFC aims to create a default ranking of crates within a list of crates that have a category or keyword in order to make a recommendation to crate users @@ -95,149 +95,107 @@ By far, the most common attribute people said they considered in the survey was whether a crate had good documentation. Frequently mentioned when discussing documentation was the desire to quickly find an example of how to use the crate. -- Number of lines of documentation in Rust files: - `grep -r \/\/[\!\/] --binary-files=without-match --include=*.rs . | wc -l` -- Number of lines in the README file, if specified in Cargo.toml -- Number of lines in Rust files: `find . -name '*.rs' | xargs wc -l` - -We would then add the lines in the README to the lines of documentation and -subtract the lines of documentation from the total lines of code in order to -get the ratio of documentation to code. Test code (and any documentation within -test code) *is* part of this calculation. - -Any crate getting in the top 20% of all crates would get a badge saying "well -documented". - -Additionally, lists of crates would have a badge showing the number of files in -the standard `/examples` directory, if any. A further enhancement would be to -make that badge link to the examples displayed somewhere (crates.io? in the -repository? in the documentation?). - -* combine: - * 1,195 lines of documentation - * 99 lines in README.md - * 5,815 lines of Rust - * (1195 + 99) / (5815 - 1195) = 1294/4620 = .28 - -* nom: - * 2,263 lines of documentation - * 372 lines in README.md - * 15,661 lines of Rust - * (2263 + 372) / (15661 - 2263) = 2635/13398 = .20 - -* peresil: - * 159 lines of documentation - * 20 lines in README.md - * 1,341 lines of Rust - * (159 + 20) / (1341 - 159) = 179/1182 = .15 - -* lalrpop: ([in the /lalrpop directory in the repo][lalrpop-repo]) - * 742 lines of documentation - * 110 lines in ../README.md - * 94,104 lines of Rust - * (742 + 110) / (94104 - 742) = 852/93362 = .01 - -* peg: - * 3 lines of documentation - * no readme specified in Cargo.toml - * 1,531 lines of Rust - * (3 + 0) / (1531 - 3) = 3/1528 = .00 - -[lalrpop-repo]: https://github.com/nikomatsakis/lalrpop/tree/master/lalrpop - -If we assume these are all the crates on crates.io for this example, then -combine is the top 20% and would get a badge. None of the crates have files in -`/examples`, so none would have the examples badge. - -### Maintenance - -We can add an optional attribute to Cargo.toml that crate authors could use to -self-report their maintenance intentions. The valid values would be along the -lines of the following, and would influence the ranking in the order they're -presented: - -- **Actively developed**, meaning new features are being added and bugs are - being fixed -- **Passively maintained**, meaning there are no plans for new features, but - the maintainer intends to respond to issues that get filed -- **As-is**, meaning the crate is feature complete, the maintainer does not - intend to continue working on it or providing support, but it works for the - purposes it was designed for -- None, we don't display anything, since the maintainer has not chosen to - specify their intentions, potential crate users will need to investigate on - their own -- **Experimental**, meaning the author wants to share it with the community but - is not intending to meet anyone's particular use case -- **Looking for maintainer**, meaning the current maintainer would like to give - up the crate to someone else - -These would be displayed as badges on lists of crates. - -These levels would not have any time commitments attached to them-- maintainers -who would like to batch changes into releases every 6 months could report -"actively developed" just as much as mantainers who like to release every 6 -weeks. This would need to be clearly communicated to set crate user -expectations properly. - -This is also inherently a crate author's statement of current intentions, which -may get out of sync with the reality of the crate's maintenance over time. - -If I had to guess for the maintainers of the parsing crates, I would assume: - -* nom: actively developed -* combine: actively developed -* lalrpop: actively developed -* peg: actively developed -* peresil: passively maintained - -### Quality - -Given that so much of "quality" is subjective, we do not have a proposed -quality measure at this time. Involving CI might be useful, but that would -require taking a stand on supported 3rd party CI providers. The same problem -would exist with test coverage percentage. - -Measures we have considered but that we do not have tools to compute at this -time: - -- Number of unit and/or integration tests -- Ratio of test code to implementation code - -If the community feels the effort to create these tools would be worth the -information, we would investigate these further. - -### Popularity - -- Number of downloads in the last 90 days, and the top, say, 10% most - downloaded would get a bump in ranking and a badge that says "frequently - downloaded". Can be calculated as part of the [update-downloads][] background - job. - -[update-downloads]: https://github.com/rust-lang/crates.io/blob/master/src/bin/update-downloads.rs - -With this proposal, out of the 5 parser crates assuming these are the only -crates on crates.io, nom would be marked as "frequently downloaded" and the -others would not. nom is currently ranked at #83 in the list of crates by -number of downloads, which easily puts it in the top 10% out of 7,239 crates. - -### Credibility - -We think credibility is an even more subjective measure than quality. We -considered using number of other crates an author has, but that would skew -heavily towards [retep998][]. Highlighting Rust team members is also a -possibility since people tend to regard them more highly, but there are many -crate authors who are not on any Rust team who are releasing excellent crates. -We have [an idea for a more personal "favorite authors" list][favs] that we -think would help indicate credibility. With this proposed feature, each person -can define credibility for themselves, which makes this measure less gameable -and less of a popularity contest. - -[retep998]: https://crates.io/users/retep998 -[favs]: https://github.com/rust-lang/crates.io/issues/494 - -### Overall - -(Combining the new proposals for an overall ranking is a work in progress) +This would be addressed through human evaluation, rather than automatic +evaluation, in two ways: + +1. [Render README files on a crate's page on crates.io][render-readme] so that + people can quickly see for themselves the information that a crate author + chooses to make available in their README. We can nudge towards having an + example in the README by adding a template README that includes an Examples + section [in what `cargo new` generates][cargo-new]. +2. Add a mechanism for logged-in crates.io users to indicate that a crate has + particularly good documentation. + - This would be a very constrained form of voting/rating: one UI element + (ex: an up arrow, a thumbs up, a star, a checkbox, a link) that could be + toggled from "not indicated" to "this crate has good documentation" and + vice versa. + - The number of people who have indicated a crate has good documentation + would be displayed for each crate. + - That number would be limited to an amount of time (proposal: 6 mo). 6 mo + after you voted, your vote would disappear and you could choose to renew + your vote. This would prevent older crates from getting too much of an + advantage or a high rating being inaccurate if many new, undocumented + features get added to a crate. + - This would not influence ranking at all, and therefore is less likely to + be gamed. You'd need to make many github accounts to easily game this. + - Since there is no negative "this crate has bad documenation" indication, + nor is there free-form text, the moderation burden should be minimal. + +[render-readme]: https://github.com/rust-lang/crates.io/issues/81 +[cargo-new]: https://github.com/rust-lang/cargo/issues/3506 + +### Maintenance (and Popularity) + +The number of releases in the last 6 months and the number of downloads in the +last 90 days can be combined into an automatic indicator of the status of a +crate. This would be more like a badge and would not influence ranking at all. + +- Many recent releases and few downloads indicates an *experimental* crate. +- At least occasional releases in the last 6 months and many recent downloads + indicates a *mainstream* crate. +- Few to no releases in the last 6 months and few recent downloads indicates an + *inactive* crate. + +In table form: + +| | Many releases | Few releases | +|----------------|---------------|--------------| +| Many downloads | Mainstream | Mainstream | +| Few downloads | Experimental | Inactive | + +TODO: Decide what the cutoff values these measures should have, which we will do +if people are generally in favor of this idea. + +By using the number of downloads, crates that are "finished" and stable should +still be regarded as mainstream while many people continue to use it. + +These labels will have an indicator of their meaning and how they are calculated +when you hover over them. + +A downside of this method is that it does not convey crate author *intent*, +only what one might assume based only on these two measures. A crate might get +popular while an author still considers it to be experimental, thus creating +expectations of stability and support. + +We might need to experiment with the thresholds for the number of releases and +the number of downloads considered to be "few" and "many". + +Alternatives: + +- Also factor in the version number: keep the "Experimental" label unless a + crate version has many downloads *and* its version number is >= 1.0.0. This + might be better once more crates release a 1.0.0 version. +- Don't show any label for the "Mainstream" category and only label + "Experimental" or "Inactive" crates. +- Use different words for these concepts +- Use more words for these concepts that more clearly states what is measured: + - "This crate has many recent releases and many downloads" + - "This crate has many recent downloads but has not been updated in the last 6 + months" + - "This crate has many recent releases but few downloads" + - "This crate has not been updated in the last 6 months and has few downloads" + +For the crates used in the survey, assuming that any release in the last 6 mo +makes a crate "Experimental" rather than "Inactive", and that whatever the +cutoff value for "many downloads" is exactly, the line lies somewhere between +nom and combine since nom has an order of magnitude more downloads than combine: + +| Crate | Releases in last 6 mo | Downloads in last 90 days | Label | +|---------|-----------------------|---------------------------|--------------| +| nom | 3 | 82,975 | Mainstream | +| combine | 4 | 4,252 | Experimental | +| lalrpop | 3 | 1,928 | Experimental | +| peg | 7 | 2,190 | Experimental | +| peresil | 0 | 1,859 | Inactive | + +### Overall ordering: Recent downloads + +To remove some of the bias towards older crates that may have been replaced with +newer alternatives, we propose that the default ranking of crates be changed +from the all-time number of downloads to the number of downloads in the last 90 +days. This is easy to understand and explain, and is being used as a rough +measure of evaluation today. This should be enough to get the most suitable +crates on the first page of results. ## Out of scope From d6dc8e467f787c8827b72d072d74f5ebab7738b7 Mon Sep 17 00:00:00 2001 From: "Carol (Nichols || Goulding)" Date: Fri, 17 Feb 2017 11:30:47 -0500 Subject: [PATCH 14/15] Revising once again to advocate for recent downloads and more badges --- text/0000-crates.io-default-ranking.md | 397 ++++++++++++++++--------- 1 file changed, 249 insertions(+), 148 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index aaedb188930..56a8f74d0f5 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -71,152 +71,272 @@ A few assumptions we made: crates noticeably. If this does not turn out to be the case, we will have to adjust the formula. -## Factors +## Order by recent downloads + +Through the iterations of this RFC, there was no consensus around a way to order +crates that would be useful, understandable, resistent to being gamed, and not +require work of curators, reviewers, or moderators. Furthermore, different +people in different situations may value different aspects of crates. + +Instead of attempting to order crates as a majority of people would rank them, +we propose a coarser measure to expose the set of crates worthy of further +consideration on the first page of a category or keyword. At that point, the +person looking for a crate can use other indicators on the page to decide which +crates best meet their needs. + +**The default ordering of crates within a keyword or category will be changed to +be the number of downloads in the last 90 days.** + +While coarse, downloads show how many people or other crates have found this +crate to be worthy of using. By limiting to the last 90 days, crates that have +been around the longest won't have an advantage over new crates that might be +better. Crates that are lower in the "stack", such as `libc`, will always have a +higher number of downloads than those higher in the stack due to the number of +crates using a lower-level crate as a dependency. Within a category or keyword, +however, crates are likely to be from the same level of the stack and thus their +download numbers will be comparable. + +Crates are currently ordered by all-time downloads and the sort option button +says "Downloads". We will: + +- change the ordering to be downloads in the last 90 days +- change the number of downloads displayed with each crate to be those made in + the last 90 days +- change the sort option button to say "Recent Downloads". + +"All-time Downloads" could become another sort option in the menu, alongside +"Alphabetical". + +## Add more badges, filters, and sorting options + +Crates.io now has badges for master branch CI status, and [will soon have a +badge indicating the version(s) of Rust a particular version builds +successfully on][build-info]. + +[build-info]: https://github.com/rust-lang/crates.io/pull/540 + +To enable a person to narrow down relevant crates to find the one that will best +meet their needs, we will add more badges and indicators. **Badges will not +influence crate ordering**. + +Some badges may require use of third-party services such as GitHub. We recognize +that not everyone uses these services, but note a specific badge is only one +factor that people can consider out of many. Through [the survey we conducted][user-research], we found that when people -evaluate crates, they are looking primarily for approximate signals of: +evaluate crates, they are primarily looking for signals of: - Ease of use - Maintenance - Quality -Cited as secondary signals that were used to infer that the primary signals are -good as well: +Secondary signals that were used to infer the primary signals: -- Popularity +- Popularity (covered by the default ordering by recent downloads) - Credibility -We detail how we propose to address each of these in turn, plus a rating of the -five crates from the user research survey as examples. - ### Ease of use By far, the most common attribute people said they considered in the survey was whether a crate had good documentation. Frequently mentioned when discussing documentation was the desire to quickly find an example of how to use the crate. -This would be addressed through human evaluation, rather than automatic -evaluation, in two ways: - -1. [Render README files on a crate's page on crates.io][render-readme] so that - people can quickly see for themselves the information that a crate author - chooses to make available in their README. We can nudge towards having an - example in the README by adding a template README that includes an Examples - section [in what `cargo new` generates][cargo-new]. -2. Add a mechanism for logged-in crates.io users to indicate that a crate has - particularly good documentation. - - This would be a very constrained form of voting/rating: one UI element - (ex: an up arrow, a thumbs up, a star, a checkbox, a link) that could be - toggled from "not indicated" to "this crate has good documentation" and - vice versa. - - The number of people who have indicated a crate has good documentation - would be displayed for each crate. - - That number would be limited to an amount of time (proposal: 6 mo). 6 mo - after you voted, your vote would disappear and you could choose to renew - your vote. This would prevent older crates from getting too much of an - advantage or a high rating being inaccurate if many new, undocumented - features get added to a crate. - - This would not influence ranking at all, and therefore is less likely to - be gamed. You'd need to make many github accounts to easily game this. - - Since there is no negative "this crate has bad documenation" indication, - nor is there free-form text, the moderation burden should be minimal. +This would be addressed in two ways. + +#### Render README on a crate's page + +[Render README files on a crate's page on crates.io][render-readme] so that +people can quickly see for themselves the information that a crate author +chooses to make available in their README. We can nudge towards having an +example in the README by adding a template README that includes an Examples +section [in what `cargo new` generates][cargo-new]. [render-readme]: https://github.com/rust-lang/crates.io/issues/81 [cargo-new]: https://github.com/rust-lang/cargo/issues/3506 -### Maintenance (and Popularity) - -The number of releases in the last 6 months and the number of downloads in the -last 90 days can be combined into an automatic indicator of the status of a -crate. This would be more like a badge and would not influence ranking at all. - -- Many recent releases and few downloads indicates an *experimental* crate. -- At least occasional releases in the last 6 months and many recent downloads - indicates a *mainstream* crate. -- Few to no releases in the last 6 months and few recent downloads indicates an - *inactive* crate. - -In table form: - -| | Many releases | Few releases | -|----------------|---------------|--------------| -| Many downloads | Mainstream | Mainstream | -| Few downloads | Experimental | Inactive | - -TODO: Decide what the cutoff values these measures should have, which we will do -if people are generally in favor of this idea. - -By using the number of downloads, crates that are "finished" and stable should -still be regarded as mainstream while many people continue to use it. - -These labels will have an indicator of their meaning and how they are calculated -when you hover over them. - -A downside of this method is that it does not convey crate author *intent*, -only what one might assume based only on these two measures. A crate might get -popular while an author still considers it to be experimental, thus creating -expectations of stability and support. - -We might need to experiment with the thresholds for the number of releases and -the number of downloads considered to be "few" and "many". - -Alternatives: - -- Also factor in the version number: keep the "Experimental" label unless a - crate version has many downloads *and* its version number is >= 1.0.0. This - might be better once more crates release a 1.0.0 version. -- Don't show any label for the "Mainstream" category and only label - "Experimental" or "Inactive" crates. -- Use different words for these concepts -- Use more words for these concepts that more clearly states what is measured: - - "This crate has many recent releases and many downloads" - - "This crate has many recent downloads but has not been updated in the last 6 - months" - - "This crate has many recent releases but few downloads" - - "This crate has not been updated in the last 6 months and has few downloads" - -For the crates used in the survey, assuming that any release in the last 6 mo -makes a crate "Experimental" rather than "Inactive", and that whatever the -cutoff value for "many downloads" is exactly, the line lies somewhere between -nom and combine since nom has an order of magnitude more downloads than combine: - -| Crate | Releases in last 6 mo | Downloads in last 90 days | Label | -|---------|-----------------------|---------------------------|--------------| -| nom | 3 | 82,975 | Mainstream | -| combine | 4 | 4,252 | Experimental | -| lalrpop | 3 | 1,928 | Experimental | -| peg | 7 | 2,190 | Experimental | -| peresil | 0 | 1,859 | Inactive | - -### Overall ordering: Recent downloads - -To remove some of the bias towards older crates that may have been replaced with -newer alternatives, we propose that the default ranking of crates be changed -from the all-time number of downloads to the number of downloads in the last 90 -days. This is easy to understand and explain, and is being used as a rough -measure of evaluation today. This should be enough to get the most suitable -crates on the first page of results. +#### "Well Documented" badge + +For each crate published, in a background job, unpack the crate files and +calculate the ratio of lines of documentation to lines of code as follows: + +- Find the number of lines of documentation in Rust files: + `grep -r "//[!/]" --binary-files=without-match --include=*.rs . | wc -l` +- Find the number of lines in the README file, if specified in Cargo.toml +- Find the number of lines in Rust files: `find . -name '*.rs' | xargs wc -l` + +We would then add the lines in the README to the lines of documentation, +subtract the lines of documentation from the total lines of code, and divide +the lines of documentation by the lines of non-documentation in order to get +the ratio of documentation to code. Test code (and any documentation within +test code) *is* part of this calculation. + +Any crate getting in the top 20% of all crates would get a badge saying "well +documented". + +This measure is gameable if a crate adds many lines that match the +documentation regex but don't provide meaningful content, such as `/// lol`. +While this may be easy to implement, a person looking at the documentation for +a crate using this technique would immediately be able to see that the author +is trying to game the system and reject it. If this becomes a common problem, +we can re-evaluate this situation, but we believe the community of crate +authors genuinely want to provide great documentation to crate users. We want +to encourage and reward well-documented crates, and this outweighs the risk of +potential gaming of the system. + +* combine: + * 1,195 lines of documentation + * 99 lines in README.md + * 5,815 lines of Rust + * (1195 + 99) / (5815 - 1195) = 1294/4620 = .28 + +* nom: + * 2,263 lines of documentation + * 372 lines in README.md + * 15,661 lines of Rust + * (2263 + 372) / (15661 - 2263) = 2635/13398 = .20 + +* peresil: + * 159 lines of documentation + * 20 lines in README.md + * 1,341 lines of Rust + * (159 + 20) / (1341 - 159) = 179/1182 = .15 + +* lalrpop: ([in the /lalrpop directory in the repo][lalrpop-repo]) + * 742 lines of documentation + * 110 lines in ../README.md + * 94,104 lines of Rust + * (742 + 110) / (94104 - 742) = 852/93362 = .01 + +* peg: + * 3 lines of documentation + * no readme specified in Cargo.toml + * 1,531 lines of Rust + * (3 + 0) / (1531 - 3) = 3/1528 = .00 + +[lalrpop-repo]: https://github.com/nikomatsakis/lalrpop/tree/master/lalrpop + +If we assume these are all the crates on crates.io for this example, then +combine is the top 20% and would get a badge. + +### Maintenance + +We will add a way for maintainers to communicate their intended level of +maintenance and support. We will add indicators of issues resolved from the +various code hosting services. + +#### Self-reported maintenance intention + +We will add an optional attribute to Cargo.toml that crate authors could use to +self-report their maintenance intentions. The valid values would be along the +lines of the following, and would influence the ranking in the order they're +presented: + +- **Actively developed**, meaning new features are being added and bugs are + being fixed +- **Passively maintained**, meaning there are no plans for new features, but + the maintainer intends to respond to issues that get filed +- **As-is**, meaning the crate is feature complete, the maintainer does not + intend to continue working on it or providing support, but it works for the + purposes it was designed for +- None, we don't display anything, since the maintainer has not chosen to + specify their intentions, potential crate users will need to investigate on + their own +- **Experimental**, meaning the author wants to share it with the community but + is not intending to meet anyone's particular use case +- **Looking for maintainer**, meaning the current maintainer would like to give + up the crate to someone else + +These would be displayed as badges on lists of crates. + +These levels would not have any time commitments attached to them-- maintainers +who would like to batch changes into releases every 6 months could report +"actively developed" just as much as mantainers who like to release every 6 +weeks. This would need to be clearly communicated to set crate user +expectations properly. + +This is also inherently a crate author's statement of current intentions, which +may get out of sync with the reality of the crate's maintenance over time. + +If I had to guess for the maintainers of the parsing crates, I would assume: + +* nom: actively developed +* combine: actively developed +* lalrpop: actively developed +* peg: actively developed +* peresil: passively maintained + +#### GitHub issue badges + +[isitmaintained.com][] provides badges indicating the time to resolution of GitHub issues and percentage of GitHub issues that are open. + +[isitmaintained.com]: http://isitmaintained.com/ + +We will enable maintainers to add these badges to their crate. + +| Crate | Issue Resolution | Open Issues | +|-------|------------------|-------------| +| combine | [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/Marwes/combine.svg)](http://isitmaintained.com/project/Marwes/combine "Average time to resolve an issue") | [![Percentage of issues still open](http://isitmaintained.com/badge/open/Marwes/combine.svg)](http://isitmaintained.com/project/Marwes/combine "Percentage of issues still open") | +| nom | [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/Geal/nom.svg)](http://isitmaintained.com/project/Geal/nom "Average time to resolve an issue") | [![Percentage of issues still open](http://isitmaintained.com/badge/open/Geal/nom.svg)](http://isitmaintained.com/project/Geal/nom "Percentage of issues still open") | +| lalrpop | [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/nikomatsakis/lalrpop.svg)](http://isitmaintained.com/project/nikomatsakis/lalrpop "Average time to resolve an issue") | [![Percentage of issues still open](http://isitmaintained.com/badge/open/nikomatsakis/lalrpop.svg)](http://isitmaintained.com/project/nikomatsakis/lalrpop "Percentage of issues still open") | +| peg | [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/kevinmehall/rust-peg.svg)](http://isitmaintained.com/project/kevinmehall/rust-peg "Average time to resolve an issue") | [![Percentage of issues still open](http://isitmaintained.com/badge/open/kevinmehall/rust-peg.svg)](http://isitmaintained.com/project/kevinmehall/rust-peg "Percentage of issues still open") | +| peresil | [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/shepmaster/peresil.svg)](http://isitmaintained.com/project/shepmaster/peresil "Average time to resolve an issue") | [![Percentage of issues still open](http://isitmaintained.com/badge/open/shepmaster/peresil.svg)](http://isitmaintained.com/project/shepmaster/peresil "Percentage of issues still open") | + +### Quality + +We will enable maintainers to add [Coveralls][] badges to indicate the +crate's test coverage. If there are other services offering test coverage +reporting and badges, we will add support for those as well, but this is the +only service we know of at this time that offers code coverage reporting that +works with Rust projects. + +[Coveralls]: https://coveralls.io + +This excludes projects that cannot use Coveralls, which only currently supports +repositories hosted on GitHub or BitBucket that use CI on Travis, CircleCI, +Jenkins, Semaphore, or Codeship. + +nom has coveralls.io configured: [![Coverage Status](https://coveralls.io/repos/Geal/nom/badge.svg?branch=master)](https://coveralls.io/r/Geal/nom?branch=master) + +### Credibility + +We have [an idea for a "favorite authors" list][favs] that we +think would help indicate credibility. With this proposed feature, each person +can define "credibility" for themselves, which makes this measure less gameable +and less of a popularity contest. + +[favs]: https://github.com/rust-lang/crates.io/issues/494 ## Out of scope -This proposal is not advocating to change the order of **search results**; those -should still be ordered by relevancy to the query based on the indexed content. -We may want to have an option to sort search results by "recommended" or -whatever we want to call this sorting, but probably not change the default. +This proposal is not advocating to change the default order of **search +results**; those should still be ordered by relevancy to the query based on the +indexed content. We will add the ability to sort search results by recent +downloads. + +# Evaluation + +If ordering by number of recent downloads and providing more indicators is not +helpful, we expect to get bug reports from the community and feedback on the +users forum, reddit, IRC, etc. + +In the community survey scheduled to be taken around May 2017, we will ask +about people's satisfaction with the information that crates.io provides. + +If changes are needed that are significant, we will open a new RFC. If smaller +tweaks need to be made, the process will be managed through crates.io's issues. +We will consult with the tools team and core team to determine whether a change +is significant enough to warrant a new RFC. # How do we teach this? -A criticism we anticipate and that would be totally fair is that this formula -is too complex. If we go with this formula, we think it's important to make -available a clear explanation of why a crate has the score it does, for -transparency to both crate users and crate authors. [Ruby toolbox][ruby] has a -great example of what we'd like to provide. +We will change the label on the default ordering button to read "Recent +Downloads" rather than "Downloads". -[ruby]: #ruby-toolbox +Badges will have tooltips on hover that provide additional information. -A possible benefit of having multiple measures influence the ranking is making -it less likely that crate owners will go to the effort of gaming the formula in -order to have a higher ranking. +We will also add a page to doc.crates.io that details all possible indicators +and their values, and explains to crate authors how to configure or earn the +different badges. # Drawbacks [drawbacks]: #drawbacks @@ -227,10 +347,9 @@ percentage could be gamed by having one line of uninformative documentation for all public items, thus giving a score of 100% without the value that would come with a fully documented library. We hope the community at large will agree these attributes are valuable to approach in good faith, and that trying to -game the ranking will be easily discoverable. We could have a reporting -mechanism for crates that are attempting to inflate their ranking artificially, -and implement a way for administrators to impose a ranking penalty on these -crates instead. +game the badges will be easily discoverable. We could have a reporting +mechanism for crates that are attempting to gain badges artificially, and +implement a way for administrators to remove badges from those crates. # Alternatives [alternatives]: #alternatives @@ -255,7 +374,10 @@ ratings. This could have the usual problems that come with online rating systems, such as spam, paid reviews, ratings influenced by personal disagreements, etc. -## More options instead of a default +## More sorting and filtering options + +There are even more options for interacting with the metadata that crates.io +has than we are proposing in this RFC at this time. For example: 1. We could add filtering options for metadata, so that each user could choose, for example, "show me only crates that work on stable" or "show me only crates @@ -265,35 +387,14 @@ that have a version greater than 1.0". alphabetical and number of downloads, such as by number of owners or most recent version release date. -These sorting and filtering options would let each user choose exactly what's -important to them, which gives them more freedom, but this also pushes more -work onto the user. Crates.io would avoid taking a position on what "best" -means, which could prevent gaming of the system since crate authors wouldn't -know how users are ultimately sorting and filtering. We would probably want to -implement saved search configurations per user, so that people wouldn't have to -re-enter their criteria every time they wanted to do a similar search. +We would probably want to implement saved search configurations per user, so +that people wouldn't have to re-enter their criteria every time they wanted to +do a similar search. # Unresolved questions [unresolved]: #unresolved-questions -- There might be metadata about crates that we haven't thought of yet that would -be useful. -- How do we change the ranking if we try something for a while and decide it's -not what we want? Would we need another RFC? -- How will we know this algorithm is working? - - We could do another survey - - We could ask for reports on an issue on crates.io of crates not being - ordered as people would expect - - Crates.io does have Google Analytics. We could compare the "funnels" of - navigating to crate pages after searches that are similar to categories. - This could potentially tell us if people start using categories at all - instead of searching, if searches for terms that have categories go down - and use of the categories go up. It might also be possible to see what - crate pages people end up on from search and from categories, to see if - they end up on "better" crates as a result of the ordering in categories. - It might be difficult to get the right data in a significant quantity for - this to be useful, though. - - We could wait and see if there are complaints on the various Rust forums +All questions have now been resolved. # Appendix: Comparative Research [comparative-research]: #appendix-comparative-research From 38c59ee393171c42d4c87dab6285fd6f34cfbf0b Mon Sep 17 00:00:00 2001 From: Jake Goulding Date: Fri, 17 Feb 2017 12:51:53 -0500 Subject: [PATCH 15/15] Convert to a definition list --- text/0000-crates.io-default-ranking.md | 50 ++++++++++++++++++-------- 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/text/0000-crates.io-default-ranking.md b/text/0000-crates.io-default-ranking.md index 56a8f74d0f5..6710abfc815 100644 --- a/text/0000-crates.io-default-ranking.md +++ b/text/0000-crates.io-default-ranking.md @@ -231,20 +231,42 @@ self-report their maintenance intentions. The valid values would be along the lines of the following, and would influence the ranking in the order they're presented: -- **Actively developed**, meaning new features are being added and bugs are - being fixed -- **Passively maintained**, meaning there are no plans for new features, but - the maintainer intends to respond to issues that get filed -- **As-is**, meaning the crate is feature complete, the maintainer does not - intend to continue working on it or providing support, but it works for the - purposes it was designed for -- None, we don't display anything, since the maintainer has not chosen to - specify their intentions, potential crate users will need to investigate on - their own -- **Experimental**, meaning the author wants to share it with the community but - is not intending to meet anyone's particular use case -- **Looking for maintainer**, meaning the current maintainer would like to give - up the crate to someone else +
+
Actively developed
+
+ New features are being added and bugs are being fixed. +
+ +
Passively maintained
+
+ There are no plans for new features, but the maintainer intends to respond + to issues that get filed. +
+ +
As-is
+
+ The crate is feature complete, the maintainer does not intend to continue + working on it or providing support, but it works for the purposes it was + designed for. +
+ +
none
+
+ We display nothing. Since the maintainer has not chosen to specify their + intentions, potential crate users will need to investigate on their own. +
+ +
Experimental
+
+ The author wants to share it with the community but is not intending to meet + anyone's particular use case. +
+ +
Looking for maintainer
+
+ The current maintainer would like to transfer the crate to someone else. +
+
These would be displayed as badges on lists of crates.