Skip to content

Commit

Permalink
Add filesize metrics to github-linguist output (github-linguist#5464)
Browse files Browse the repository at this point in the history
* Adding detailed-stats option to git-linguist results

When generating Linguist statistics via the command-line currently the
only information returned is a hash of the language and it's percentage
of the repository it takes up. Sometimes it's important to also know the
file size of each language.

This adds a new option to the command=line interface called
`detailed-stats` which returns back a JSON representation of the
repository broken down by language and returning _both_ the percentage
of the repository that language comprises and the total file size in
bytes for that language. This will allow consumers an expanded ability
to extend and use the information available for their tooling.

* Removing change to git-linguist based on feedback

* Moving the functionality into a method for processing

* Moving functionality to use OptionsParser

Previously, you could only choose to have a summary of the Linguist file
sizes, a breakdown including the summary and a list of files per
language, _or_ JSON output of the files per language. This has the
limitation where you cannot get all of the information in both formats.

By moving to using OptionsParser, we're able to more easily separate the
data the user is looking to get (summary or breakdown) and the format
they want the data to be returned in (CLI-friendly output or JSON).

* For full repository, separating data and format concerns

Implementing the separation of which data we want to collect from which
format we want to get the data in when processing a full
repository.

* Adding the size of files for each language to the output

* Renaming variable for JSON to make it more clear as to intent

* Updating readme to match command line functionality

* Update bin/github-linguist

Fixing indentation as suggested in review

Co-authored-by: Colin Seymour <[email protected]>

* Update bin/github-linguist

Reusing HELP_TEXT when aborting due to an issue

Co-authored-by: Colin Seymour <[email protected]>

* Updating help text for new OptionsParser options

* README Updates based on recommendation

Co-authored-by: Colin Seymour <[email protected]>
  • Loading branch information
walterg2 and lildude authored Jul 19, 2021
1 parent 709cf15 commit b980593
Show file tree
Hide file tree
Showing 2 changed files with 165 additions and 102 deletions.
70 changes: 52 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,23 +73,40 @@ project.languages #=> { "Ruby" => 119387 }
#### Git Repository

A repository's languages stats can also be assessed from the command line using the `github-linguist` executable.
Without any options, `github-linguist` will output the breakdown that correlates to what is shown in the language stats bar.
The `--breakdown` flag will additionally show the breakdown of files by language.
Without any options, `github-linguist` will output the language breakdown by percentage and file size.

```bash
cd /path-to-repository/
cd /path-to-repository
github-linguist
```

You can try running `github-linguist` on the root directory in this repository itself:

```console
$ github-linguist
66.84% 264519 Ruby
24.68% 97685 C
6.57% 25999 Go
1.29% 5098 Lex
0.32% 1257 Shell
0.31% 1212 Dockerfile
```

#### Additional options

##### `--breakdown`
The `--breakdown` or `-b` flag will additionally show the breakdown of files by language.

You can try running `github-linguist` on the root directory in this repository itself:

```console
$ github-linguist --breakdown
68.57% Ruby
22.90% C
6.93% Go
1.21% Lex
0.39% Shell
66.84% 264519 Ruby
24.68% 97685 C
6.57% 25999 Go
1.29% 5098 Lex
0.32% 1257 Shell
0.31% 1212 Dockerfile

Ruby:
Gemfile
Expand All @@ -102,6 +119,21 @@ lib/linguist.rb
```

##### `--json`
The `--json` or `-j` flag output the data into JSON format.

```console
$ github-linguist --json
{"Dockerfile":{"size":1212,"percentage":"0.31"},"Ruby":{"size":264519,"percentage":"66.84"},"C":{"size":97685,"percentage":"24.68"},"Lex":{"size":5098,"percentage":"1.29"},"Shell":{"size":1257,"percentage":"0.32"},"Go":{"size":25999,"percentage":"6.57"}}
```

This option can be used in conjunction with `--breakdown` to get a full list of files along with the size and percentage data.
```console
$ github-linguist --breakdown --json
{"Dockerfile":{"size":1212,"percentage":"0.31","files":["Dockerfile","tools/grammars/Dockerfile"]},"Ruby":{"size":264519,"percentage":"66.84","files":["Gemfile","Rakefile","bin/git-linguist","bin/github-linguist","ext/linguist/extconf.rb","github-linguist.gemspec","lib/linguist.rb",...]}}

```

#### Single file

Alternatively you can find stats for a single file using the `github-linguist` executable.
Expand All @@ -123,17 +155,19 @@ If you have Docker installed you can build an image and run Linguist within a co
```console
$ docker build -t linguist .
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist
68.57% Ruby
22.90% C
6.93% Go
1.21% Lex
0.39% Shell
66.84% 264519 Ruby
24.68% 97685 C
6.57% 25999 Go
1.29% 5098 Lex
0.32% 1257 Shell
0.31% 1212 Dockerfile
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist github-linguist --breakdown
68.57% Ruby
22.90% C
6.93% Go
1.21% Lex
0.39% Shell
66.84% 264519 Ruby
24.68% 97685 C
6.57% 25999 Go
1.29% 5098 Lex
0.32% 1257 Shell
0.31% 1212 Dockerfile

Ruby:
Gemfile
Expand Down
197 changes: 113 additions & 84 deletions bin/github-linguist
Original file line number Diff line number Diff line change
Expand Up @@ -8,105 +8,134 @@ require 'json'
require 'optparse'
require 'pathname'

path = ARGV[0] || Dir.pwd
HELP_TEXT = <<~HELP
Linguist v#{Linguist::VERSION}
Detect language type and determine language breakdown for a given Git repository.
# special case if not given a directory
# but still given the --breakdown or --json options/
if path == "--breakdown"
path = Dir.pwd
breakdown = true
elsif path == "--json"
path = Dir.pwd
json_breakdown = true
end
Usage: linguist <path>
linguist <path> [--breakdown] [--json]
linguist [--breakdown] [--json]
HELP

ARGV.shift
breakdown = true if ARGV[0] == "--breakdown"
json_breakdown = true if ARGV[0] == "--json"
def github_linguist(args)
breakdown = false
json_output = false
path = Dir.pwd

if File.directory?(path)
rugged = Rugged::Repository.new(path)
repo = Linguist::Repository.new(rugged, rugged.head.target_id)
parser = OptionParser.new do |opts|
opts.banner = HELP_TEXT

if !json_breakdown
repo.languages.sort_by { |_, size| size }.reverse.each do |language, size|
percentage = ((size / repo.size.to_f) * 100)
percentage = sprintf '%.2f' % percentage
puts "%-7s %s" % ["#{percentage}%", language]
opts.on("-b", "--breakdown", "Analyze entire repository and display detailed usage statistics") { breakdown = true }
opts.on("-j", "--json", "Output results as JSON") { json_output = true }
opts.on("-h", "--help", "Display a short usage summary, then exit") do
puts opts
exit
end
end
if breakdown
puts
file_breakdown = repo.breakdown_by_file
file_breakdown.each do |lang, files|
puts "#{lang}:"
files.each do |file|
puts file
end
puts

parser.parse!(args)

if !args.empty?
if File.directory?(args[0]) || File.file?(args[0])
path = args[0]
else
abort HELP_TEXT
end
elsif json_breakdown
puts JSON.dump(repo.breakdown_by_file)
end
elsif File.file?(path)

begin
# Check if this file is inside a git repository so we have things like
# `.gitattributes` applied.
file_full_path = File.realpath(path)
rugged = Rugged::Repository.discover(file_full_path)
file_rel_path = file_full_path.sub(rugged.workdir, '')
oid = -> { rugged.head.target.tree.walk_blobs { |r, b| return b[:oid] if r + b[:name] == file_rel_path } }
blob = Linguist::LazyBlob.new(rugged, oid.call, file_rel_path)
rescue Rugged::RepositoryError
blob = Linguist::FileBlob.new(path, Dir.pwd)
end

type = if blob.text?
'Text'
elsif blob.image?
'Image'
else
'Binary'
end
if File.directory?(path)
rugged = Rugged::Repository.new(path)
repo = Linguist::Repository.new(rugged, rugged.head.target_id)

if json_breakdown
puts JSON.generate( { path => {
:lines => blob.loc,
:sloc => blob.sloc,
:type => type,
:mime_type => blob.mime_type,
:language => blob.language,
:large => blob.large?,
:generated => blob.generated?,
:vendored => blob.vendored?,
}
} )
else
puts "#{path}: #{blob.loc} lines (#{blob.sloc} sloc)"
puts " type: #{type}"
puts " mime type: #{blob.mime_type}"
puts " language: #{blob.language}"
full_results = {}
repo.languages.each do |language, size|
percentage = ((size / repo.size.to_f) * 100)
percentage = sprintf '%.2f' % percentage
full_results.merge!({"#{language}": { size: size, percentage: percentage } })
end

if !json_output
full_results.sort_by { |_, v| v[:size] }.reverse.each do |language, details|
puts "%-7s %-10s %s" % ["#{details[:percentage]}%", details[:size], language]
end
if breakdown
puts
file_breakdown = repo.breakdown_by_file
file_breakdown.each do |lang, files|
puts "#{lang}:"
files.each do |file|
puts file
end
puts
end
end
else
if !breakdown
puts JSON.dump(full_results)
else
combined_results = full_results.merge({})

if blob.large?
puts " blob is too large to be shown"
repo.breakdown_by_file.each do |language, files|
combined_results[language.to_sym].update({"files": files})
end
puts JSON.dump(combined_results)
end
end
elsif File.file?(path)

if blob.generated?
puts " appears to be generated source code"
begin
# Check if this file is inside a git repository so we have things like
# `.gitattributes` applied.
file_full_path = File.realpath(path)
rugged = Rugged::Repository.discover(file_full_path)
file_rel_path = file_full_path.sub(rugged.workdir, '')
oid = -> { rugged.head.target.tree.walk_blobs { |r, b| return b[:oid] if r + b[:name] == file_rel_path } }
blob = Linguist::LazyBlob.new(rugged, oid.call, file_rel_path)
rescue Rugged::RepositoryError
blob = Linguist::FileBlob.new(path, Dir.pwd)
end

if blob.vendored?
puts " appears to be a vendored file"
type = if blob.text?
'Text'
elsif blob.image?
'Image'
else
'Binary'
end

if json_output
puts JSON.generate( { path => {
:lines => blob.loc,
:sloc => blob.sloc,
:type => type,
:mime_type => blob.mime_type,
:language => blob.language,
:large => blob.large?,
:generated => blob.generated?,
:vendored => blob.vendored?,
}
} )
else
puts "#{path}: #{blob.loc} lines (#{blob.sloc} sloc)"
puts " type: #{type}"
puts " mime type: #{blob.mime_type}"
puts " language: #{blob.language}"

if blob.large?
puts " blob is too large to be shown"
end

if blob.generated?
puts " appears to be generated source code"
end

if blob.vendored?
puts " appears to be a vendored file"
end
end
else
abort HELP_TEXT
end
else
abort <<-HELP
Linguist v#{Linguist::VERSION}
Detect language type for a file, or, given a repository, determine language breakdown.
Usage: linguist <path>
linguist <path> [--breakdown] [--json]
linguist [--breakdown] [--json]
HELP
end

github_linguist(ARGV)

0 comments on commit b980593

Please sign in to comment.