Feat request: multiple programming languages #1546

euberdeveloper · 2024-02-07T12:20:14Z

As of now it seems that JPlag supports multiple programming languages, but only in a homogeneous way.

This means that I can compare two different submissions both in Java, both in Python but not one in Java and one in Python.

It could seem that it doesn't make sense, but it could actually be a type of obfuscation, translating a program from a language to another one.

Maybe Java and python are not the perfect example, but if we take into account languages such as Java and Kotlin or Scala, that all work with the JVM, this issue becomes more relevant

tsaglam · 2024-02-07T15:38:18Z

Good point, this relates to cross-language plagiarism detection. While there has been some research in that area, there are (to my knowledge) no usable tools for that. In future, we may want to introduce that by creating a shared token type set for common concepts between languages. Thus, language modules may reuse these token types thus allowing for cross-language support.
On a similar note, we may consider polyglot support, meaning parsing multi-language submissions by delegating the different files to different language modules.

euberdeveloper · 2024-07-13T11:14:26Z

Hello, this has been done in this fork: https://github.com/euberdeveloper/JPlag/tree/feature/multilanguage-plagiarism-detection

A pull request will follow up in the future

tsaglam · 2024-07-15T11:03:46Z

We have our own ideas for that, but we are happy to look at yours. Keep in mind, that these might be major changes that need to consider other upcoming changes, API considerations, and not break existing features (e.g. token sequence normalization or match merging).

euberdeveloper · 2024-07-16T14:38:15Z

I think what I've done is more like a proof of concept.
The pros until now are:

In the code, examples of the changes that should be done in order to accept as input a set of languages and not one can be seen
Each language interface is added with the method "supportCrossPlagiarism", to specify that that language supports it
Each language that supports the feature has an additional parser to general tokens
The code proves that on the side of the report there are not major changes

To speed up the process, I made the single language front ends use first their default language-specific tokens to get specific tokens and then I made a converter to convert those tokens to general ones. Don't do it, the results are not good and many issues could be fixed by obtaining language-agnostic tokens directly by parsing the source code from scratch. I will implement this improvement soon.

euberdeveloper · 2024-07-16T14:42:56Z

Another improvement I want to do is making the language-agnostic tokens dynamic. Each language will override/implement some methods such as "supportsClasses" or "has variable declarations". For example C would return false to the first method and true for the second one. Python would return true to the first one and false to the second one. Java true to both.

Then, the langiage-agnostic tokenizers for Rach language would receive the full set of languages for this run as an additional parameter. Based on what those language support, it will change behaviour, for example if Java Python and C are provided, the java tokenizer will discard Class tokens. If only Java and Python are provided as possible languages for this run, the Java tokenizer will emit class tokens.

euberdeveloper · 2024-07-16T14:43:09Z

I have some work in progress with this

tsaglam · 2024-09-06T06:34:40Z

Note, that we have our own plans here that might be conflicting with yours. But we are always happy to look at your ideas for inspiration.

tsaglam added enhancement Issue/PR that involves features, improvements and other changes major Major issue/feature/contribution/change language PR / Issue deals (partly) with new and/or existing languages for JPlag labels Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat request: multiple programming languages #1546

Feat request: multiple programming languages #1546

euberdeveloper commented Feb 7, 2024

tsaglam commented Feb 7, 2024

euberdeveloper commented Jul 13, 2024

tsaglam commented Jul 15, 2024

euberdeveloper commented Jul 16, 2024

euberdeveloper commented Jul 16, 2024

euberdeveloper commented Jul 16, 2024

tsaglam commented Sep 6, 2024

Feat request: multiple programming languages #1546

Feat request: multiple programming languages #1546

Comments

euberdeveloper commented Feb 7, 2024

tsaglam commented Feb 7, 2024

euberdeveloper commented Jul 13, 2024

tsaglam commented Jul 15, 2024

euberdeveloper commented Jul 16, 2024

euberdeveloper commented Jul 16, 2024

euberdeveloper commented Jul 16, 2024

tsaglam commented Sep 6, 2024