Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to filter out links #227

Open
Robbepop opened this issue May 14, 2018 · 5 comments
Open

Add option to filter out links #227

Robbepop opened this issue May 14, 2018 · 5 comments
Assignees

Comments

@Robbepop
Copy link

Today I found out that tokei's reported lines of code for my repository seemed to explode.
The reason for this was that tokei (incorrectly) counted links (symbolic-links and hard-links) as if they were regular files.

I think it would be the best to make tokei completely ignore symbolic links and ignore all multiple occurrences of the same hard linked file.

While restating defaults might be hard, it should be also okay to add options to filter links out of the accumulation process.
I think options like

  • --ignore-symbolic-links: Ignores any symbolic link.
  • --ignore-multiple-hard-links: Ignores all but one occurences of a hard linked file that has its source within the given search space (repository for example) and ignore all hard linked files that have their source outside of the search space.
@XAMPPRocky XAMPPRocky self-assigned this Jun 18, 2018
@jhpratt
Copy link
Contributor

jhpratt commented Feb 13, 2020

@XAMPPRocky How difficult would this be to implement? Without having looked at tokei (aside from adding PostCSS), my initial thought would be to use a HashSet to store inodes as files are visited. I'd have to check on OS-interop for that, though.

I'd be interested in doing this if it's not too hard to jump into.

@XAMPPRocky
Copy link
Owner

XAMPPRocky commented Feb 13, 2020

@jhpratt Hey, I don't about about the second option, however the first option should be easy to add. Tokei's CLI is set using clap in src/cli.rs and its configuration is set in src/config.rs. I would create this option very similar to --no-ignore/no_ignore that is already in the code.

For traversing the file system tokei uses ignore which has an option in WalkBuilder::follow_links that says whether it should follow symbolic links or not.

@jhpratt
Copy link
Contributor

jhpratt commented Feb 14, 2020

Looks like there's the same-file crate which could be of use. A quick test shows it handles both hard and symbolic links correctly.

Looking through tokei's code, I presume src/utils/fs.rs is where a change would need to be made, given the presence of the get_all_files method. I'm not entirely sure what is happening there, though.

@XAMPPRocky
Copy link
Owner

XAMPPRocky commented Feb 14, 2020

@jhpratt Yes sorry I should have pointed you to that. I believe that is what ignore uses for follow_links. I would add the follow_links as part of the WalkerBuilder configuration.

Everything before walker.build_parallel().run(…) is configuring the walker's behaviour (what paths to search, to ignore, etc). walker.build_parallel().run(…) runs in parallel over the paths and sends any file paths to rx (a crossbeam::Receiver), which are then validated as programming languages.

If you have any other questions please feel free to ask.

@jhpratt
Copy link
Contributor

jhpratt commented Feb 14, 2020

Looking through various documentation, looks like symlinks are currently ignored, and a trivial check shows this is the case.

What would be the preferred way to handle hard links? If it were up to me, I'd lean towards automatically excluding anything past the first, as you'd essentially be counting the file twice.

Thanks for the explanation, by the way! Responsiveness is quite helpful 🙂


Edit: Turns out it's nearly trivial to exclude a file the second time around. Using DashMap instead of HashMap because it's parallel, it amounts to just wrapping an if statement around the tx.send().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants