Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Formula needs attention (churn * cost * penalty produces unexpected results) #41

Open
etagwerker opened this issue May 7, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@etagwerker
Copy link
Member

Context

Initially the SkunkScore was calculated as churn * cost * penalty. This made sense based on the churn vs. complexity idea -> https://www.agileconnection.com/article/getting-empirical-about-refactoring

However, I quickly realized that this formula would not work when running skunk -b master -- more here: https://www.fastruby.io/blog/code-quality/escaping-the-tar-pit-at-rubyconf.html

So I decided to change the formula to be cost * penalty.

Alternatives

I think a potential solution is to apply a modified weight to churn, so that the formula could look like this:

skunk_score = (magical_weight * churn) * cost * penalty_factor

That way, the formula could work both as a snapshot and as a comparison between two branches.

Test

Testing this should show that removing complexity in a module, git committing, and then running skunk -b master produces a lower skunk score.

@bronzdoc bronzdoc added the bug Something isn't working label Aug 11, 2021
@mateusdeap
Copy link
Member

Interesting. I've a question though: any idea on what could be this weight number and how it would change? As I see it, maybe it can be as simple as:

  • 0 if we're doing it on a snapshot
  • 1 otherwise

Because I think we're dealing with an issue of lack of information: if we want to ascribe a number to capture all the complexities of a file in a snapshot, naturally we can't take churn into account, in which case our skunk core is less precise or maybe less significant due to the lack of information.

It's more like we have one possible formula if we take the code's history into account and another formula if we don't...

One other way to think of this is analogous to the integration of a function, I believe. In math, when you integrate something over a variable, it's as if you were summing the values of the function at many points, ending up with the area under the function's curve and we could consider that our skunk score

I could try and elaborate more, but this would imply a change on how to see the skunk score: it would be the sum of the cost*penalty function over the churn.

What we'd need to define is what does churn mean for a given snapshot in order to know if this makes any sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants