Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added automatic mode selection #2168

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 63 additions & 30 deletions README.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many changes here (and in the docs) due to the formatting of it changing. Is that due to the help text generation of picocli having changed?

Original file line number Diff line number Diff line change
Expand Up @@ -85,52 +85,84 @@ Parameter descriptions:
[root-dirs[,root-dirs...]...]
Root-directory with submissions to check for plagiarism.
-bc, --bc, --base-code=<baseCode>
Path to the base code directory (common framework used in all submissions).
-l, --language=<language>
Select the language of the submissions (default: java). See subcommands below.
-M, --mode=<{RUN, VIEW, RUN_AND_VIEW}>
The mode of JPlag: either only run analysis, only open the viewer, or do both (default: null)
-n, --shown-comparisons=<shownComparisons>
The maximum number of comparisons that will be shown in the generated report, if set to -1 all comparisons will be shown (default: 500)
Path to the base code directory (common framework used
in all submissions).
-l, --language=<language>
Select the language of the submissions (default: java).
See subcommands below.
-M, --mode=<{RUN, VIEW, RUN_AND_VIEW, AUTO}>
The mode of JPlag. By default JPlag will automatically
select the mode based on the given input files. If
none are given the report viewer will open on the
file upload page. If a single result zip is given it
will be opened in the report viewer directly.
Otherwise, JPlag will check the given submissions and
show the result in the report viewer. One of: RUN,
VIEW, RUN_AND_VIEW, AUTO (default: null)
-n, --shown-comparisons=<shownComparisons>
The maximum number of comparisons that will be shown in
the generated report, if set to -1 all comparisons
will be shown (default: 2500)
-new, --new=<newDirectories>[,<newDirectories>...]
Root-directories with submissions to check for plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for languages: Java, C++.
Root-directories with submissions to check for
plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for
languages: Java, C++.
-old, --old=<oldDirectories>[,<oldDirectories>...]
Root-directories with prior submissions to compare against.
-r, --result-file=<resultFile>
Name of the file in which the comparison results will be stored (default: results). Missing .zip endings will be automatically added.
-t, --min-tokens=<minTokenMatch>
Tunes the comparison sensitivity by adjusting the minimum token required to be counted as a matching section. A smaller value increases the sensitivity but might lead to more
false-positives.
Root-directories with prior submissions to compare
against.
-r, --result-file=<resultFile>
Name of the file in which the comparison results will
be stored (default: results). Missing .zip endings
will be automatically added.
-t, --min-tokens=<minTokenMatch>
Tunes the comparison sensitivity by adjusting the
minimum token required to be counted as a matching
section. A smaller value increases the sensitivity
but might lead to more false-positives.

Advanced
--csv-export Export pairwise similarity values as a CSV file.
-d, --debug Store on-parsable files in error folder.
-m, --similarity-threshold=<similarityThreshold>
Comparison similarity threshold [0.0-1.0]: All comparisons above this threshold will be saved (default: 0.0).
-d, --debug Store on-parsable files in error folder.
--log-level=<{ERROR, WARN, INFO, DEBUG, TRACE}>
Set the log level for the cli.
-m, --similarity-threshold=<similarityThreshold>
Comparison similarity threshold [0.0-1.0]: All
comparisons above this threshold will be saved
(default: 0.0).
--overwrite Existing result files will be overwritten.
-p, --suffixes=<suffixes>[,<suffixes>...]
comma-separated list of all filename suffixes that are included.
-P, --port=<port> The port used for the internal report viewer (default: 1996).
-s, --subdirectory=<subdirectory>
-p, --suffixes=<suffixes>[,<suffixes>...]
comma-separated list of all filename suffixes that are
included.
-P, --port=<port> The port used for the internal report viewer (default:
1996).
-s, --subdirectory=<subdirectory>
Look in directories <root-dir>/*/<dir> for programs.
-x, --exclusion-file=<exclusionFileName>
All files named in this file will be ignored in the comparison (line-separated list).
-x, --exclusion-file=<exclusionFileName>
All files named in this file will be ignored in the
comparison (line-separated list).

Clustering
--cluster-alg, --cluster-algorithm=<{AGGLOMERATIVE, SPECTRAL}>
Specifies the clustering algorithm (default: spectral).
Specifies the clustering algorithm. Available
algorithms: agglomerative, spectral (default:
spectral).
--cluster-metric=<{AVG, MIN, MAX, INTERSECTION}>
The similarity metric used for clustering (default: average similarity).
The similarity metric used for clustering. Available
metrics: average similarity, minimum similarity,
maximal similarity, matched tokens (default: average
similarity).
--cluster-skip Skips the cluster calculation.

Subsequence Match Merging
--gap-size=<maximumGapSize>
Maximal gap between neighboring matches to be merged (between 1 and minTokenMatch, default: 6).
--match-merging Enables merging of neighboring matches to counteract obfuscation attempts.
Maximal gap between neighboring matches to be merged
(between 1 and minTokenMatch, default: 6).
--match-merging Enables merging of neighboring matches to counteract
obfuscation attempts.
--neighbor-length=<minimumNeighborLength>
Minimal length of neighboring matches to be merged (between 1 and minTokenMatch, default: 2).

Minimal length of neighboring matches to be merged
(between 1 and minTokenMatch, default: 2).
Languages:
c
cpp
Expand All @@ -142,6 +174,7 @@ Languages:
javascript
kotlin
llvmir
multi
python3
rlang
rust
Expand Down
40 changes: 39 additions & 1 deletion cli/src/main/java/de/jplag/cli/CLI.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.slf4j.ILoggerFactory;
import org.slf4j.Logger;
Expand Down Expand Up @@ -32,6 +34,8 @@ public final class CLI {
private static final String OUTPUT_FILE_EXISTS = "The output file (also with suffixes e.g. results(1).zip) already exists. You can use --overwrite to overwrite the file.";
private static final String OUTPUT_FILE_NOT_WRITABLE = "The output file (%s) cannot be written to.";

private static final String ZIP_FILE_ENDING = ".zip";

private final CliInputHandler inputHandler;

/**
Expand All @@ -58,7 +62,8 @@ public void executeCli() throws ExitException, IOException {
switch (this.inputHandler.getCliOptions().mode) {
case RUN -> runJPlag();
case VIEW -> runViewer(null);
case RUN_AND_VIEW -> runViewer(runJPlag());
case RUN_AND_VIEW -> runAndView();
case AUTO -> selectModeAutomatically();
}
}
}
Expand Down Expand Up @@ -105,6 +110,15 @@ public File runJPlag() throws ExitException, FileNotFoundException {
return target;
}

/**
* Runs JPlag and shows the result in the report viewer
* @throws IOException If something went wrong with the internal server
* @throws ExitException If JPlag threw an exception
*/
public void runAndView() throws IOException, ExitException {
runViewer(runJPlag());
}

/**
* Runs the report viewer using the given file as the default result.zip.
* @param zipFile The zip file to pass to the viewer. Can be null, if no result should be opened by default
Expand All @@ -115,6 +129,30 @@ public void runViewer(File zipFile) throws IOException {
JPlagRunner.runInternalServer(zipFile, this.inputHandler.getCliOptions().advanced.port);
}

private void selectModeAutomatically() throws IOException, ExitException {
TwoOfTwelve marked this conversation as resolved.
Show resolved Hide resolved
List<File> inputs = this.getAllInputs();

if (inputs.isEmpty()) {
this.runViewer(null);
return;
}

if (inputs.size() == 1 && inputs.getFirst().getName().endsWith(ZIP_FILE_ENDING)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a small comment what happens here, why we give an input file to the that is normally a submission to the viewer

this.runViewer(inputs.getFirst());
return;
}

this.runAndView();
}

private List<File> getAllInputs() {
List<File> inputs = new ArrayList<>();
inputs.addAll(List.of(this.inputHandler.getCliOptions().rootDirectory));
inputs.addAll(List.of(this.inputHandler.getCliOptions().newDirectories));
inputs.addAll(List.of(this.inputHandler.getCliOptions().oldDirectories));
return inputs;
}

private void finalizeLogger() {
ILoggerFactory factory = LoggerFactory.getILoggerFactory();
if (!(factory instanceof CollectedLoggerFactory collectedLoggerFactory)) {
Expand Down
5 changes: 3 additions & 2 deletions cli/src/main/java/de/jplag/cli/options/CliOptions.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,9 @@ public class CliOptions implements Runnable {
"--result-file"}, description = "Name of the file in which the comparison results will be stored (default: ${DEFAULT-VALUE}). Missing .zip endings will be automatically added.")
public String resultFile = "results";

@Option(names = {"-M", "--mode"}, description = "The mode of JPlag. One of: ${COMPLETION-CANDIDATES} (default: ${DEFAULT_VALUE})")
public JPlagMode mode = JPlagMode.RUN_AND_VIEW;
@Option(names = {"-M",
"--mode"}, description = "The mode of JPlag. By default JPlag will automatically select the mode based on the given input files. If none are given the report viewer will open on the file upload page. If a single result zip is given it will be opened in the report viewer directly. Otherwise, JPlag will check the given submissions and show the result in the report viewer. One of: ${COMPLETION-CANDIDATES} (default: ${DEFAULT_VALUE})")
public JPlagMode mode = JPlagMode.AUTO;

@Option(names = {"--normalize"}, description = "Activate the normalization of tokens. Supported for languages: Java, C++.")
public boolean normalize = false;
Expand Down
6 changes: 5 additions & 1 deletion cli/src/main/java/de/jplag/cli/options/JPlagMode.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,9 @@ public enum JPlagMode {
/**
* Run JPlag and open the result in report viewer
*/
RUN_AND_VIEW
RUN_AND_VIEW,
/**
* Choose the mode automatically from the given input files
*/
AUTO,
}
99 changes: 68 additions & 31 deletions docs/1.-How-to-Use-JPlag.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,59 +9,95 @@ The language can either be set with the -l parameter or as a subcommand. If both
When using the subcommand, language-specific arguments can be set.
A list of language-specific options can be obtained by requesting the help page of a subcommand (e.g., "jplag java -h").

To open an existing report run: `java -jar jplag.jar </path/to/report.zip>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also name the command you have above again here, to have them grouped better
java -jar jplag.jar path/to/the/submissions


To open the report viewer without any file selected run: `java -jar jplag.jar`

The following arguments can be used to control JPlag:
```
Parameter descriptions:
[root-dirs[,root-dirs...]...]
Root-directory with submissions to check for plagiarism.
-bc, --bc, --base-code=<baseCode>
Path to the base code directory (common framework used in all submissions).
-l, --language=<language>
Select the language of the submissions (default: java). See subcommands below.
-M, --mode=<{RUN, VIEW, RUN_AND_VIEW}>
The mode of JPlag: either only run analysis, only open the viewer, or do both (default: null)
-n, --shown-comparisons=<shownComparisons>
The maximum number of comparisons that will be shown in the generated report, if set to -1 all comparisons will be shown (default: 500)
Path to the base code directory (common framework used
in all submissions).
-l, --language=<language>
Select the language of the submissions (default: java).
See subcommands below.
-M, --mode=<{RUN, VIEW, RUN_AND_VIEW, AUTO}>
The mode of JPlag. By default JPlag will automatically
select the mode based on the given input files. If
none are given the report viewer will open on the
file upload page. If a single result zip is given it
will be opened in the report viewer directly.
Otherwise, JPlag will check the given submissions and
show the result in the report viewer. One of: RUN,
VIEW, RUN_AND_VIEW, AUTO (default: null)
-n, --shown-comparisons=<shownComparisons>
The maximum number of comparisons that will be shown in
the generated report, if set to -1 all comparisons
will be shown (default: 2500)
-new, --new=<newDirectories>[,<newDirectories>...]
Root-directories with submissions to check for plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for languages: Java, C++.
Root-directories with submissions to check for
plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for
languages: Java, C++.
-old, --old=<oldDirectories>[,<oldDirectories>...]
Root-directories with prior submissions to compare against.
-r, --result-file=<resultFile>
Name of the file in which the comparison results will be stored (default: results). Missing .zip endings will be automatically added.
-t, --min-tokens=<minTokenMatch>
Tunes the comparison sensitivity by adjusting the minimum token required to be counted as a matching section. A smaller value increases the sensitivity but might lead to more
false-positives.
Root-directories with prior submissions to compare
against.
-r, --result-file=<resultFile>
Name of the file in which the comparison results will
be stored (default: results). Missing .zip endings
will be automatically added.
-t, --min-tokens=<minTokenMatch>
Tunes the comparison sensitivity by adjusting the
minimum token required to be counted as a matching
section. A smaller value increases the sensitivity
but might lead to more false-positives.

Advanced
--csv-export Export pairwise similarity values as a CSV file.
-d, --debug Store on-parsable files in error folder.
-m, --similarity-threshold=<similarityThreshold>
Comparison similarity threshold [0.0-1.0]: All comparisons above this threshold will be saved (default: 0.0).
-d, --debug Store on-parsable files in error folder.
--log-level=<{ERROR, WARN, INFO, DEBUG, TRACE}>
Set the log level for the cli.
-m, --similarity-threshold=<similarityThreshold>
Comparison similarity threshold [0.0-1.0]: All
comparisons above this threshold will be saved
(default: 0.0).
--overwrite Existing result files will be overwritten.
-p, --suffixes=<suffixes>[,<suffixes>...]
comma-separated list of all filename suffixes that are included.
-P, --port=<port> The port used for the internal report viewer (default: 1996).
-s, --subdirectory=<subdirectory>
-p, --suffixes=<suffixes>[,<suffixes>...]
comma-separated list of all filename suffixes that are
included.
-P, --port=<port> The port used for the internal report viewer (default:
1996).
-s, --subdirectory=<subdirectory>
Look in directories <root-dir>/*/<dir> for programs.
-x, --exclusion-file=<exclusionFileName>
All files named in this file will be ignored in the comparison (line-separated list).
-x, --exclusion-file=<exclusionFileName>
All files named in this file will be ignored in the
comparison (line-separated list).

Clustering
--cluster-alg, --cluster-algorithm=<{AGGLOMERATIVE, SPECTRAL}>
Specifies the clustering algorithm (default: spectral).
Specifies the clustering algorithm. Available
algorithms: agglomerative, spectral (default:
spectral).
--cluster-metric=<{AVG, MIN, MAX, INTERSECTION}>
The similarity metric used for clustering (default: average similarity).
The similarity metric used for clustering. Available
metrics: average similarity, minimum similarity,
maximal similarity, matched tokens (default: average
similarity).
--cluster-skip Skips the cluster calculation.

Subsequence Match Merging
--gap-size=<maximumGapSize>
Maximal gap between neighboring matches to be merged (between 1 and minTokenMatch, default: 6).
--match-merging Enables merging of neighboring matches to counteract obfuscation attempts.
Maximal gap between neighboring matches to be merged
(between 1 and minTokenMatch, default: 6).
--match-merging Enables merging of neighboring matches to counteract
obfuscation attempts.
--neighbor-length=<minimumNeighborLength>
Minimal length of neighboring matches to be merged (between 1 and minTokenMatch, default: 2).

Subcommands (supported languages):
Minimal length of neighboring matches to be merged
(between 1 and minTokenMatch, default: 2).
Languages:
c
cpp
csharp
Expand All @@ -72,6 +108,7 @@ Subcommands (supported languages):
javascript
kotlin
llvmir
multi
python3
rlang
rust
Expand Down
Loading