Library method to "diff" releases #160

jpmckinney · 2020-10-21T02:23:41Z

https://pypi.org/project/deepdiff/ is a great library for comparing generic dictionaries. However, if we want to diff two "full" releases in order to calculate a minimal release to publish as part of a release history, we need specialized code that is aware of the merging routine. For example:

If the id of two array entries are swapped, the naive diff will just report the id fields, rather than the full objects.
If a non-id field is changed in an array of objects, the naive diff will report it without the id field to identify the object.
Array order is not significant in OCDS, but it is significant to a naive diff.

This feature would be relevant to:

Users who want to see what changed between two releases
Publishers who can generate a "full" release, but would rather only publish a "diff" release (more useful to users)

Requested by @dwasyl

The text was updated successfully, but these errors were encountered:

dwasyl · 2020-10-21T04:02:05Z

Just for some added context, this would be especially helpful because of the system at OpenNWT can only generate a 'full' release. The full releases we generate don't account for a few things - mostly deletions from lists and field deletions.

A tool to help create minimal diff releases would save some space, but also create better releases that would merge properly.

Specifically, there is some difference between making a diff release between two releases or between two compiledReleases (compiledReleases being more absolute so more assumptions could be made). Either or both would be helpful in my use case.

jpmckinney · 2020-10-21T12:52:24Z

@dwasyl If a field is missing in the second release, should the diff set that field to null, or just not mention it? Similarly, if an object in an array is missing, should it set all its fields except id to null?

dwasyl · 2020-10-21T15:35:34Z

@jpmckinney I was thinking about this after we spoke. For my purposes, if it's missing then it should be null since otherwise the system would have included it in the release. This assumes the two releases being diffed are explicit (so are essentially equivalent to compiledReleases).

For a more generic tool, it might be a desirable configurable option? Missing fields are either ignored or set to null.

jpmckinney · 2020-10-21T15:50:00Z

For a more generic tool, it might be a desirable configurable option? Missing fields are either ignored or set to null.

Yes, I was thinking the same :)

dwasyl · 2020-10-21T16:06:39Z

That way it'd work even for publishers who are only able to put out a single release at any given time. If someone had saved those along the way, they could develop a mini release history if they needed to for some reason (which is essentially what OpenNWT does, scrapes point in time measures).

jpmckinney added the feature label Oct 21, 2020

jpmckinney added CLI command Relating to a new CLI command and removed feature CLI command Relating to a new CLI command labels May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Library method to "diff" releases #160

Library method to "diff" releases #160

jpmckinney commented Oct 21, 2020 •

edited

Loading

dwasyl commented Oct 21, 2020 •

edited

Loading

jpmckinney commented Oct 21, 2020

dwasyl commented Oct 21, 2020

jpmckinney commented Oct 21, 2020

dwasyl commented Oct 21, 2020

Library method to "diff" releases #160

Library method to "diff" releases #160

Comments

jpmckinney commented Oct 21, 2020 • edited Loading

dwasyl commented Oct 21, 2020 • edited Loading

jpmckinney commented Oct 21, 2020

dwasyl commented Oct 21, 2020

jpmckinney commented Oct 21, 2020

dwasyl commented Oct 21, 2020

jpmckinney commented Oct 21, 2020 •

edited

Loading

dwasyl commented Oct 21, 2020 •

edited

Loading