Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library method to "diff" releases #160

Open
jpmckinney opened this issue Oct 21, 2020 · 5 comments
Open

Library method to "diff" releases #160

jpmckinney opened this issue Oct 21, 2020 · 5 comments

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Oct 21, 2020

https://pypi.org/project/deepdiff/ is a great library for comparing generic dictionaries. However, if we want to diff two "full" releases in order to calculate a minimal release to publish as part of a release history, we need specialized code that is aware of the merging routine. For example:

  • If the id of two array entries are swapped, the naive diff will just report the id fields, rather than the full objects.
  • If a non-id field is changed in an array of objects, the naive diff will report it without the id field to identify the object.
  • Array order is not significant in OCDS, but it is significant to a naive diff.

This feature would be relevant to:

  • Users who want to see what changed between two releases
  • Publishers who can generate a "full" release, but would rather only publish a "diff" release (more useful to users)

Requested by @dwasyl

@dwasyl
Copy link

dwasyl commented Oct 21, 2020

Just for some added context, this would be especially helpful because of the system at OpenNWT can only generate a 'full' release. The full releases we generate don't account for a few things - mostly deletions from lists and field deletions.

A tool to help create minimal diff releases would save some space, but also create better releases that would merge properly.

Specifically, there is some difference between making a diff release between two releases or between two compiledReleases (compiledReleases being more absolute so more assumptions could be made). Either or both would be helpful in my use case.

@jpmckinney
Copy link
Member Author

@dwasyl If a field is missing in the second release, should the diff set that field to null, or just not mention it? Similarly, if an object in an array is missing, should it set all its fields except id to null?

@dwasyl
Copy link

dwasyl commented Oct 21, 2020

@jpmckinney I was thinking about this after we spoke. For my purposes, if it's missing then it should be null since otherwise the system would have included it in the release. This assumes the two releases being diffed are explicit (so are essentially equivalent to compiledReleases).

For a more generic tool, it might be a desirable configurable option? Missing fields are either ignored or set to null.

@jpmckinney
Copy link
Member Author

For a more generic tool, it might be a desirable configurable option? Missing fields are either ignored or set to null.

Yes, I was thinking the same :)

@dwasyl
Copy link

dwasyl commented Oct 21, 2020

That way it'd work even for publishers who are only able to put out a single release at any given time. If someone had saved those along the way, they could develop a mini release history if they needed to for some reason (which is essentially what OpenNWT does, scrapes point in time measures).

@jpmckinney jpmckinney added CLI command Relating to a new CLI command and removed feature CLI command Relating to a new CLI command labels May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants