-
Notifications
You must be signed in to change notification settings - Fork 47
Count committers and authors differently? // formerly: Phantom commits created when editing on GitHub.com #181
Comments
A known issue in which we count too many commits is when people "push force", because when crawling daily, if the last commit crawled yesterday has now been replaced, it's hard to know where we left off. I need to check if editing via GitHub's website is replacing/editing commits as well. I'll have a closer look hopefully this week-end. Thanks for reporting! |
I have this problem too. |
OK I see. Sometimes, commits have a different author
So I like this "feature" but what I didn't think about is that when you do some work over GitHub's website, the committer is I'm now preparing a special handling for that special user so that we don't get these phantom commits. It should be quite easy. |
IMHO, a commit should be attributed solely to it's author. A lot of hijinks can happen in the process of committing and merging code, and arguably the use of web-flow seems to indicate GitHub doesn't treat the committer identity as noteworthy from an authorship standpoint. I certainly don't think if someone cherry-picks my work they have done equal-weight work to my work in writing it. |
Also, is the committer clearly revealed anywhere in the GitHub UI? I think if the goal is to have a clearly understood data source and a fairly predictable metric calculation, a system which occasionally creates double the commits for 5-10% of the commits, and attributes those extra commits to someone who is not the author who does not appear in the GitHub UI, to be an inconsistent magic that is likely to confuse and confound. |
TL;DR true (and this is very useful feedback, see more below) but that's too painful to improve right now and it can cause other regressions, sorry 🙁 Details:
yes, they appear with two avatars, you can see many of them here: https://github.com/brandon-rhodes/uncommitted/commits/master FYI, the reasons why we crawl commits are:
For that we have now crawled all the commits of 156000+ repositories and stored their amount per user per day, but we haven't stored whether author or committer. (Stupid, right? 😄 not entirely as we try to keep the DB small). What you're saying makes sense and it's really useful to get this opinion (thanks!) but if we want now to keep only the authors, we need to re-crawl everything and with our current API rate limit it will take weeks. It will disturb the daily crawling (the API rate limit is our bottleneck) and we'll have to do some merging between the output of this long crawling and the daily one. I'd rather go through this painful process if there is at least another issue we're trying to solve at the same time or if this imprecision turns out to be very problematic. Also I'd need to think about this more, because I know other users who think that any contribution (documentation, user-support, marketing, design, code review, etc.) should be visible on ghuser. Depending on how you merge a PR, you can end up being committer and since you reviewed that PR, this is a contribution. If you cherry-pick a commit to an older branch (i.e. you do a backport of a feature) and you even need to solve a conflict, you will be committer and this is a contribution. Users having this in mind will consider the current mechanism as "better". So here is what I'll do: I implement that special |
@AurelienLourot That all sounds quite reasonable, and fixing for web-flow will definitely remove the most visible case of confoundment. :) Do note that when I express my "IMHOs", a big capital letter on the H part. Not expecting you to reinvent the wheel based on one person's opinion. |
The
(https://github.com/ocdtrekkie/xrf_books has 30 commits right now but at the moment it got crawled the last commit wasn't pushed yet) Keeping this issue open for the more general issue of counting committers. |
Last week I created the https://github.com/ocdtrekkie/xrf_books repo. It has 13 commits, 10 submitted via the GitHub.com website.
It did add this repo to my ghuser.io profile, but bizarrely, specifies I made only 57% of the commits in the repo, which makes very little sense. Looking closely, ghuser.io seems to believe the repo has 23 commits (which it doesn't), and 13 of them (my actual 13 commits) are mine. So I posit that ghuser.io might be detecting some sort of additional phantom commit for each edit made directly in the GitHub.com web UI.
The text was updated successfully, but these errors were encountered: