Skip to content

Java program that removes duplicate json objects from json input file

Notifications You must be signed in to change notification settings

ldel10/JSONDeduplicator

Repository files navigation

JSONDeduplicator

Java program that removes duplicate json objects from json input file

/USAGE clone the repo: git clone https://github.com/LiamDelumpa/JSONDeduplicator.git

From the root directory of the cloned repo, cd to the src folder: cd JSONDeduplicator/JsonDedupe/src/
This src file contains the input file to pass in as an argument, as well as an external jar for the json library which needs to be compiled.

Compile the program: javac -cp .:json-simple-1.1.1.jar marketo/*.java
Run the program: java -cp .:json-simple-1.1.1.jar marketo/MainChallenge leads.json

When you run the program, I have provided logging in the form of print statements. The console will print out all deletions and insertions to the original json object list.
The output is written to a file called deduplicatedLeads.json
****/

/Objective*/
Take a variable number of identically structured json records and de-duplicate the set.

An example file of records is given in the accompanying 'leads.json'. Output should be same format, with dups reconciled according to the following rules:

The data from the newest date should be preferred duplicate IDs count as dups. Duplicate emails count as dups. Both must be unique in our dataset. Duplicate values elsewhere do not count as dups. If the dates are identical the data from the record provided last in the list should be preferred Simplifying assumption: the program can do everything in memory (don't worry about large files)

The application should also provide a log of changes including some representation of the source record, the output record and the individual field changes (value from and value to) for each field.

Please implement as a command line Java program.

About

Java program that removes duplicate json objects from json input file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published