Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plagiarism checks: Plagiarism check shows 69% similarity for exactly same submissions #7174

Open
jakubriegel opened this issue Sep 11, 2023 · 12 comments
Assignees
Labels
bug component:Plagiarism Detection exercise Pull requests that affect the corresponding module plagiarism Pull requests that affect the corresponding module programming Pull requests that affect the corresponding module

Comments

@jakubriegel
Copy link
Contributor

Describe the bug

Two exactly same submissions got 69% similarity

To Reproduce

  1. Create two exactly the same programming exercises submissions
  2. Run plagiarism checks on them
  3. Check the result

Expected behavior

Two exactly same submissions have 100% similarity

Screenshots

image

Which version of Artemis are you seeing the problem on?

6.4.3

What browsers are you seeing the problem on?

Chrome, Safari

Additional context

No response

Relevant log output

No response

@krusche
Copy link
Member

krusche commented Sep 11, 2023

JPlag calculates the similarities. While it might be possible that Artemis inputs the data wrongly to JPlag, I don't think this is the case. It is more realistic that you found an edge case for a very simple comparison that does not work.

@dfuchss any ideas?

@dfuchss
Copy link
Contributor

dfuchss commented Sep 11, 2023

mmh .. I cannot reproduce the behavior with JPlag.
I've used the following files:

BubbleSort.java

package edu;

import java.util.*;

public class BubbleSort {

    /**
     * Sorts dates with BubbleSort.
     *
     * @param input the List of Dates to be sorted
     */
    public void performSort(final List<Date> input) {
		var x = 123;
        //TODO: implement
    }
	private static int abc(int a) {
		return a*5;
	}
	
	private int def(int d) {
		return d - 15;
	}
}

JPlag reports 100%:

image

@dfuchss
Copy link
Contributor

dfuchss commented Sep 11, 2023

@dfuchss
Copy link
Contributor

dfuchss commented Sep 11, 2023

Nevertheless, in general depending on the input size this might be the important Issue. But, I don't understand why JPlag produces 100% and Artemis ~70% on the same file (as far as I can see in the images).

@jakubriegel do you have more files. If you have very small files. This could lead to smaller values of similarity.

@krusche
Copy link
Member

krusche commented Sep 11, 2023

I would assume the image only shows one file, so it could theoretically be the case that other files differ. Maybe this is also related to the fact that we exclude the initial template.

@jakubriegel please link an example of one of the test servers or produce a minimum example to reproduce the issue

@dfuchss
Copy link
Contributor

dfuchss commented Sep 20, 2023

/cc jfyi @tsaglam

@tsaglam
Copy link

tsaglam commented Sep 21, 2023

Another factor could be the basecode functionality, if you provide a class with an empty main method as a basecode template (shared code that is, for example, given to all students as part of the exercise), these parts of the source code will not be matched. I can probably give more input if I get more details on how JPlag was configured for that run.

@jakubriegel
Copy link
Contributor Author

This is most likely an effect of using the case code feature.

To verify, it I've carefully ran this comparison again on Artemis using debugger. I used the same exercise template and 3 submissions: 2 with BubbleSort.java same as in the screenshot and 1 same as the template. The findings are:

  • JPlag from Artemis correctly matches the similarities: the only found match is the whole BubbleSort.java for the two modified submissions,
  • the 69.57% similarity is produced by JPlag,
  • for the template submission JPlag correctly indicates 0% similarity.

The number 69.57% comes from the de.jplag.JPlagComparison.similarity() method. Generally, it calculates the similarity as the division of the number of matched tokens by the total number of tokens. The two modified submission have 99 tokens in total of which 16 tokens are matched. If no base code was used then 99 tokens would be matched and the similarity would be 100%. But, since there is the base code, JPlag matches only 16 tokens and the division is modified by the number of token matched between the base code and the submission. This yields the similarity of 69.57%. I guess the motivation was to acknowledge the unchanged base code lines in the results (as not all the lines from the modified BubbleSort.java differ from the base code.

In short, this looks like a feature, not a bug 🙃

@tsaglam @dfuchss Can you confirm if JPlag works as intended in the described scenario?

@krusche @MarkusPaulsen Should we keep it in Artemis like that? An idea to augment the behaviour would be not to use the base code feature. Since the minimum size parameter is implemented as the minumum number of diff between the submission and the template, instructors should have enough control over the process. What do you think?

@tsaglam
Copy link

tsaglam commented Oct 24, 2023

Can you confirm if JPlag works as intended in the described scenario?

Yes, if you use basecode, then you basically tell JPlag: "Do not count that code, this is template code that we gave every student". These code segments are not counted for the similarity calculation to reduce false positives based on the template code. Using basecode makes sense, iff you gave students some template code that they did not alter. Thus, using this feature depends on the specific use case and assignment.

I think what causes the confusion here is the Artemis UI not showing which parts of the code are matched and which parts are not. In the JPlag report viewer, matches between two submissions are not highlighted when they are part of the basecode.

@dfuchss
Copy link
Contributor

dfuchss commented Feb 22, 2024

Just as an idea: Maybe it would be an option to integrate the JPlag UI into Artemis.

@MarkusPaulsen MarkusPaulsen changed the title Plagiarism detection: Plagiarism check shows 69% similarity for exactly same submissions Plagiarism checks: Plagiarism check shows 69% similarity for exactly same submissions Nov 5, 2024
@github-actions github-actions bot added assessment Pull requests that affect the corresponding module exercise Pull requests that affect the corresponding module plagiarism Pull requests that affect the corresponding module programming Pull requests that affect the corresponding module text Pull requests that affect the corresponding module labels Nov 5, 2024
@maximiliansoelch maximiliansoelch removed assessment Pull requests that affect the corresponding module text Pull requests that affect the corresponding module labels Dec 6, 2024
@MarkusPaulsen MarkusPaulsen assigned AjayvirS and unassigned xHadie Dec 17, 2024
@krusche
Copy link
Member

krusche commented Jan 14, 2025

@AjayvirS could you please try out whether it's possible to exclude template code better in the comparison and in the display in the UI as mentioned by @tsaglam

@MarkusPaulsen MarkusPaulsen assigned xHadie and unassigned AjayvirS Jan 27, 2025
@xHadie
Copy link

xHadie commented Feb 6, 2025

We are prioritizing this issue now and would like to reach a consensus on what action to take for Artemis.
We reproduced the issue, and to summarize, here is the concrete situation we have:


Using the available Sorting Algorithm template, JPlag produces the following results:

File Submission A Submission B Notes
Client.java 0% 0% Both did not attempt this file.
MergeSort.java 0% 0% Both did not attempt this file.
BubbleSort.java 100% 100% Identical attempt.
69.57% 69.57% Total Similarity

As @jakubriegel pointed out and @tsaglam confirmed, this occurs because template code tokens are included in the overall score computation.
While this inclusion reduces false positives, it would be helpful to configure JPlag to remove files from similarity computation when at least one of the two students being compared did not attempt any tasks in the corresponding file.
I explored the available similarity metrics but couldn't find a configuration to achieve this. If I overlooked something, I would appreciate some help @tsaglam.

Given this, I see the following possible outcomes for Artemis:

  • Enhance the comparison view as @krusche proposed. This alone might already be sufficient.
  • Decide to implement one of the following strategies:
    1. Leave the similarity score at 69,57% since it is unlikely that students will not attempt a file and since the issue at hand is based on a simple toy example
    2. Find an alternative JPlag configuration if such exists.
    3. Introduce an additional similarity metric based on file-level similarities from JPlag (100% in this case). This metric can be displayed alongside or instead of the 69.57%.
      • The 69.57% score could still serve as a "severity score" for the detected plagiarism
      • In addition, our new metric would function as an "overlap score", which excludes files from its computation when at least one of the two compared students did not attempt to solve tasks in the file or instead when the file-based similarity score is below a certain threshold.

@MarkusPaulsen, do you have any further ideas?
I'd appreciate some final opinions on this issue before we proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component:Plagiarism Detection exercise Pull requests that affect the corresponding module plagiarism Pull requests that affect the corresponding module programming Pull requests that affect the corresponding module
Projects
None yet
Development

No branches or pull requests

7 participants