-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code Complexity + Lines & DRYness #560
Comments
I wrote a quick script to strip desired syntax sugar with the following results: Old
New:
|
New with Blank lines stripped:
This totals a 24% swing from the original DRYness calculation. To me, this accurately reflects the amount of maintainable code in a repo (eg. Comments and Code). |
Finally, output example when using all available logical operators for Python and C++ (notice complexity total in this one vs. other metrics):
|
"|" and "&" are bitwise arithmetic operators. They are not logical operators. See here for more information. This two operator are usually used by branchless programming, but we are counting the branches, so I dont think they should be counted as complexity. Complexity is calculated based on possible code paths (branches, loops, ...), but comparison operators do not automatically generate branches, and it is more complicated to calculate “>=” or “<=” that reflects values within a certain range than “==” which only has a fixed value. So I tend not to count these operators.
I'm afraid I'm not agree with this. Blanks and Comments are important parts of the code base. Some languages also have rules on how to use blank lines, such as Python (PEP8). From practical experience, the proper use of blank lines and comments can indeed help developers understand the code, so the complexity is "reduced". If we only look at the number of lines of code, then the winner of the IOCCC (The International Obfuscated C Code Contest) is the code with the lowest complexity, but we all know this is ridiculous.
They are valid normal syntax not syntax sugars. In addition, in GNU C/C++ and rust, "{" and "}" are parts of statement expressions: int k = ({
int i = foo();
int j = bar(2, i);
i * j;
}); We also use "{}" in marcos to wrap a couple of expressions. They must be counted. Even syntax sugars should be counted if they are valid code. Finally, code complexity is not an accurate value that reflects code quality, it can only be used as a reference. |
As @apocelipes mentioned the complexity is intended to count branch conditions in code, hence not including ==. This is because its based on cyclomatic complexity calculations and how they are performed. That said, with the addition of #462 you will be able to do this yourself if that's how you want to count things, but it is very unlikely to ever make it into scc itself. As for the blanks and comments, you are correct in that your example
Will have a lower DRYness, however that's a fairly contrived example since most codebases are not going to look like this. I has a quick attempt at adding this, and while the filtering is fairly simple in ULOC itself, its much harder to do on the display. The reason for this is that the final calculation uses other results. You can see the result of this here, where the DRYness actually goes down when ignore those lines due to the missed lines not being taken into account for the final calculation.
You can see this on this branch https://github.com/boyter/scc/tree/UlocIgnore I will have a look a bit later to see if I can correct this because I don't have any issues with including it as an option. |
I have been playing around with the idea of "syntax sugar" a bit this weekend. As of this morning, I got to this regex pattern: However, I thought some more on it and realized even that is perhaps too complicated. The initial thrust of the idea is that basic formatting preferences be ignored. So that simplified the pattern simply is This position came from thinking about the edge case of multi-line comments and the infinite horrible ways to do that in a language like C++. For example:
The expected complexity and hassle of detecting this nonsense plus the performance hit made me come to the conclusion that code like this has larger concerns than complexity, code, blank, and comment counts. These metrics only need to be "close enough" to the real stats to make reasonable decisions in a larger context, so there is ultimately the trade-off of accuracy and performance. However, the typical syntax formatting highlighted in the initial post and second paragraph regex matching drastically affect outputs and are fairly simple to capture. I hope this makes sense. My main focus is the number of individual statements in the code with as little inflation as possible of syntax formatting preference. Another approach at this would be determining a unique line window. Rather than comparing 1 line at a time (trimmed of whitespace), the unique window compares N lines at a time for uniqueness trimmed of whitespace and excluding blank lines. Single liners such as a fairly common one in my code: Again, happy to help. The Go language is a bit of an enigma trying to read it though and confidently make contributions. |
One of the issues presented with capturing Consider the following:
The |
So regex is one of the things that is avoided inside the core counting. The reason for this is performance. You can compare cloc to scc to see what impact it has. This is why the core of scc is a state machine that moves from state to state based on the defined language specification. As such I will not willingly add such a check into the core of scc. That said, adding one as an option to pre parse files, IF you are prepared to weather the cost is a different story. It sounds like to me what you actually want to do is have a way to pre process the files, removing/replacing things that match some patterns in order to achieve your goal. IE a pre-processor option, where you could as in your example above provide some regex that is used, and if it matches a whole line removes it. Such a thing with #462 would allow you to achieve your goals. I guess it would also allow people to count without comments if they wanted, or to remove doc-comments and such too. |
Code Complexity:
Using C++ as an example:
"for ",
"for(",
"if ",
"if(",
"switch ",
"switch(",
"while ",
"while(",
"else ",
"|| ",
"&& ",
"!= ",
"== "
The addition of "< ", "> ", "<= ", ">= ", "| ", "& " are all valid logical operators in the C++ language. I can help add these additional operators for languages that I know (Python and C/C++). To me, these logical operators imply the same "complexity" we are trying to track.
The Complexity/Line calculation appears to include blanks and comments in the calculation. I would suspect that code complexity only compare Code vs. Blanks + Comments + Code. This artificially decreases that metric.
Lines and DRYness:
For languages like Java, C, and C++, the utilization of "{" and "}" is common as syntax "sugar" to define portions of code. Currently, a line containing only these constitutes as a Line in the total count. This artificially lowers the DRYness metric by inflating lines of code. I believe there is a strong argument to add an ignore list for lines containing only these symbols. These are counted as not unique when occurring multiple times.
Example:
This counts as 6 lines of code with 4 unique lines for DRYness of 66.66% whereas it could be represented equally as the following:
This counts as 2 lines of code with 2 unique lines for DRYness of 100.00%.
In addition, I have noticed the calculation uses all lines (eg. Blanks, Comments, and Code). My expectation would be ULOC to be comprised only of Code and DRYness to also only be Code or Code + Comments.
I know how to add the operators. The ignoring a list of symbols if a line comprises only of those is beyond me (not familiar with Go). Happy to help where I can.
The text was updated successfully, but these errors were encountered: