If the language stats bar is reporting a language that you don't expect:
- Click on the name of the language in the stats bar to see a list of the files that are identified as that language. Keep in mind this performs a search so the code search restrictions may result in files identified in the language statistics not appearing in the search results. Installing Linguist locally and running it from the command line will give you accurate results.
- If you see files that you didn't write in the search results, consider moving the files into one of the paths for vendored code, or use the manual overrides feature to ignore them.
- If the files are misclassified, search for open issues to see if anyone else has already reported the issue. Any information you can add, especially links to public repositories, is helpful. You can also use the manual overrides feature to correctly classify them in your repository.
- If there are no reported issues of this misclassification, open an issue and include a link to the repository or a sample of the code that is being misclassified.
Keep in mind that the repository language stats are only updated when you push changes, and the results are cached for the lifetime of your repository. If you have not made any changes to your repository in a while, you may find pushing another change will correct the stats.
When I click on a language present in the language stats bar, I get a message saying "Your search did not match any code"
There are a few reasons that could lead to this outcome:
- If the repository implements an override via the
linguist-language
attribute, it won't be taken into account in GitHub's search results since GitHub Search relies on go-enry which doesn't support overrides at the moment (More info). - go-enry might not be using the latest version of Linguist which means that files could be detected differently in search compared to Linguist (see also the note at the end of this section).
- It could be that files are associated to a language that is part of a group. This means that they are counted as the parent language in the language stats bar, but as the actual language in search. For instance, a file ending with
.f90
is considered to be "Fortran" in the stats bar, but "Fortran Free Form" in search. - Finally, this can be caused by code search limitations that are unrelated to Linguist.
Correctly detecting the language for the C-family .h
header files is tough because Linguist detects the languages of files in isolation when analysing repositories and these header files, especially the smaller ones, can be used across all three languages without using any language-specific content.
To try and reduce the number of false positives and keep some degree of predictability for users, Linguist will assume all .h
header files are C by default and will only identify a file as C++ or Objective-C if the content matches the specific heuristic for that language.
This will mean you will need to implement an override for some of your header files if you wish for them to be classified as C++ or Objective-C if they do not contain language-specific content.
Linguist does not consider vendored code, generated code, documentation, or data
(e.g. SQL) or prose
(e.g. Markdown) languages (as defined by the type
attribute in languages.yml
) when calculating the repository language statistics.
If the language statistics bar is not showing your language at all, it could be for a few reasons:
- Linguist doesn't know about your language.
- The extension you have chosen is not associated with your language in
languages.yml
. - All the files in your repository fall into one of the categories listed above that Linguist excludes by default.
If Linguist doesn't know about the language or the extension you're using, consider contributing to Linguist by opening a pull request to add support for your language or extension. For everything else, you can use the manual overrides feature to tell Linguist to include your files in the language statistics.
Linguist detects the language of a file but the actual syntax-highlighting is powered by a set of language grammars which are included in this project as a set of submodules as listed here.
If you experience an issue with the syntax-highlighting on GitHub, please report the issue to the upstream grammar repository, not here. Grammars are updated every time we build the Linguist gem so upstream bug fixes are automatically incorporated as they are fixed.
Linguist only works on Git repositories and individual files. Its primary use is on GitHub.com which uses bare repositories and thus changes need to be committed as individual files don't show on the filesystem.
As a work around you could initialise a temporary Git repository in your directory as demonstrated in this script.
Alternatively you can run Linguist on individual files, see here.
There are several known issues with the version of Ruby shipped with macOS that cause problems when it comes to installing the charlock-holmes gem which is a Linguist dependency.
As this is a problem with the Ruby shipped by Apple and not Linguist or charlock-holmes, we recommend you install a version of Ruby using Homebrew, rbenv
, rvm
, ruby-build
, asdf
or other packaging system, before attempting to install Linguist.
Changes to Linguist will only appear on GitHub when a new Linguist release is made and this is deployed to GitHub.com. There is no set release timeframe, but the intention is at least once every three-to-four months so the latest version ships with each new major version of GitHub Enterprise Server. A PR like this will be created for the release with checkmarks for each stage of the release process.
All syntax highlighting grammars will also be updated in all major and minor releases. Grammars will only be updated in patch releases if the patch release is specifically for that language and it requires a grammar update to address the issue.
Note: New languages will not appear in GitHub's search results for some time after the pull request has been merged and the new Linguist release deployed to GitHub.com. This is because GitHub's search uses go-enry for language detection which tends to lag behind Linguist by a few weeks to months. This in turn requires an update to the underlying search code once go-enry is inline with Linguist.