Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count for python is wildly inaccurate #111

Open
olivren opened this issue Feb 22, 2019 · 3 comments
Open

Count for python is wildly inaccurate #111

olivren opened this issue Feb 22, 2019 · 3 comments

Comments

@olivren
Copy link

olivren commented Feb 22, 2019

I tried this tool for the very first time, to count the number of lines of code of a Python project. The numbers it reports are shockingly inaccurate. It reports a correct number of total lines and blank lines, but it over-counts the number of comments.

I investigated a bit, and I found a simple example that reports 6 lines of comments and 0 lines of code:

'''
This is a module docstring
'''
a = 1
b = 2
c = 3

So, loc correctly tries to match the docstring delimited by 3 simple quotes, and ends up matching the whole file.

Additional notes about Python comments

In Python, '''hello''' and """hello""" are string literals, but they are considered a docstring comment only if they appear at the top level of the file, or in a class or function definition. A good heuristic to tell them apart is to count only the triple-quoted string literals that start at the beginning of a line (not counting the blanks).

Here is another example where loc counts 2 lines of comment and 1 line of code:

a = '''hello
world
'''

And another one that counts 6 lines of code:

"""
This is a module docstring
"""
a = 1
b = 2
c = 3

For what is worth, tokei is not better as it ignores docstring comments entirely (which is a very poor choice in my opinion).

@boyter
Copy link

boyter commented Feb 27, 2019

Not trying to hijack the conversation away from loc, @olivren did you try https://github.com/boyter/scc as a comparison? I belive it handles all these cases as you would expect.

I ask because I keep an eye on all of the counters and try to add any issues into its test suite to make it as accurate as possible.

@olivren
Copy link
Author

olivren commented Mar 4, 2019

@boyter I just tried with scc 2.2.0, and it does not handle docstrings at all. I opened an issue about that boyter/scc#62

@olivren olivren closed this as completed Mar 4, 2019
@olivren olivren reopened this Mar 4, 2019
@olivren
Copy link
Author

olivren commented Mar 4, 2019

Errata: I previously said that Tokei ignores docstring comments (and by that I meant it considers it as code). This is in fact the default behavior, but Tokei has a configuration that triggers the correct behavior of counting all docstrings as comments (treat_doc_strings_as_comments = true in tokei.toml).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants