Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Ignore unknown extensions --language is specified #2159

Open
eriq-augustine opened this issue Jan 23, 2025 · 3 comments
Open
Labels
enhancement Issue/PR that involves features, improvements and other changes language PR / Issue deals (partly) with new and/or existing languages for JPlag minor Minor issue/feature/contribution/change

Comments

@eriq-augustine
Copy link

Since mixing languages is not allowed, it would be nice if extensions were ignored (and language assumed) when the --language flag is present.

This would allow any lesser known extensions to pass through without issue.
For example, the c++ parser currently recognizes .cxx, but not .hxx:
https://github.com/jplag/JPlag/blob/main/languages/cpp/src/main/java/de/jplag/cpp/CPPLanguage.java#L21

A more common use case that I run into is that I want to just run a text-based analysis (with the text language), but very few extensions are supported:
https://github.com/jplag/JPlag/blob/main/languages/text/src/main/java/de/jplag/text/NaturalLanguage.java#L29

@tsaglam tsaglam added enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change language PR / Issue deals (partly) with new and/or existing languages for JPlag labels Jan 24, 2025
@tsaglam
Copy link
Member

tsaglam commented Jan 24, 2025

We will take a look at these cases. As a workaround, you can try to use -p to specify the file extensions:

-p, --suffixes=[,...]
comma-separated list of all filename suffixes that are included.

@eriq-augustine
Copy link
Author

That should definitely work as a workaround.

I'm wondering if checking for a "valid suffix" is even necessary.
Since specifying a language is required (when unspecified it implicitly sets the language to java), checking extensions seems a bit redundant.
But, I think the extensions could be useful to guess the language.

Of course this is outside the scope of this issue, but does a system that guesses the language using a file's extension when --language is not supplied make sense for this project?

@tsaglam
Copy link
Member

tsaglam commented Jan 24, 2025

I'm wondering if checking for a "valid suffix" is even necessary. Since specifying a language is required (when unspecified it implicitly sets the language to java), checking extensions seems a bit redundant. But, I think the extensions could be useful to guess the language.

The problem is that projects may or may not contain many different file types, such as build scripts, resources, configuration files, etc. The valid suffixes are used to decide which files are passed to the parser, and without that, the user would receive many warnings/errors from the parser for obviously incompatible files.

Of course this is outside the scope of this issue, but does a system that guesses the language using a file's extension when --language is not supplied make sense for this project?

Currently, the default language is Java, which is used when no language is specified (solely due to historical reasons).
However, we have plans for multi-language support, which would entail that the default language parses all files that conform to the set of languages that JPlag supports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue/PR that involves features, improvements and other changes language PR / Issue deals (partly) with new and/or existing languages for JPlag minor Minor issue/feature/contribution/change
Projects
None yet
Development

No branches or pull requests

2 participants