Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diff of two equal objects is not empty #32

Open
EyalAr opened this issue Jan 7, 2016 · 9 comments
Open

Diff of two equal objects is not empty #32

EyalAr opened this issue Jan 7, 2016 · 9 comments

Comments

@EyalAr
Copy link
Contributor

EyalAr commented Jan 7, 2016

jiff.diff({
  "a": {
    "b": {
      "c": [ ["a"], ["b"], ["c"] ]
    }
  }
}, {
  "a": {
    "b": {
      "c": [ ["a"], ["b"], ["c"] ]
    }
  }
});

Produces the following patch:

[ { op: 'add',
    path: '/a/b/c/0',
    value: [ 'a' ],
    context: undefined },
  { op: 'add',
    path: '/a/b/c/1',
    value: [ 'b' ],
    context: undefined },
  { op: 'add',
    path: '/a/b/c/2',
    value: [ 'c' ],
    context: undefined },
  { op: 'test',
    path: '/a/b/c/3',
    value: [ 'a' ],
    context: undefined },
  { op: 'remove', path: '/a/b/c/3', context: undefined },
  { op: 'test',
    path: '/a/b/c/3',
    value: [ 'b' ],
    context: undefined },
  { op: 'remove', path: '/a/b/c/3', context: undefined },
  { op: 'test',
    path: '/a/b/c/3',
    value: [ 'c' ],
    context: undefined },
  { op: 'remove', path: '/a/b/c/3', context: undefined } ]

Which is correct, in the sense that applying the patch will yield the correct result; but not efficient, as the patch could simply be empty.

@briancavalier
Copy link
Member

Hey @EyalAr, by default jiff uses a hash function to determine if two array items (which, in this case are also arrays) are the "same", so it can deal with array items that have moved to a different index. By default, it uses === to determine if the nested array items are equivalent. Since ["a"] !== ["a"], you'll need to provide a hash function (documented here) that computes hashes for arrays. For example, you could pass JSON.stringify and it'll result in an empty patch as you expected. Or you could use something more efficient if you have knowledge of what the arrays will contain.

Unfortunately, the docs for the hash function aren't quite accurate. It uses JSON.stringify for objects, but not for arrays (perhaps it should ...). I'll update the docs.

In the meantime, here's a modified version of your example that prints an empty patch:

var jiff = require('../jiff');

var p = jiff.diff({
    "a": {
        "b": {
            "c": [ ["a"], ["b"], ["c"] ]
        }
    }
}, {
    "a": {
        "b": {
            "c": [ ["a"], ["b"], ["c"] ]
        }
    }
}, JSON.stringify);

console.log(p);

@EyalAr
Copy link
Contributor Author

EyalAr commented Jan 11, 2016

Thanks for clarifying @briancavalier.
I suppose I assumed all non-primitive values will be deeply compared.

From what I observed, JSON.stringify will only be used to compare objects inside arrays; which I guess makes sense, in order to detect movement of elements inside the array.

But this creates somewhat confusing situations:

jiff.diff({
  "a": { a: "a", b: "b" }
}, {
  "a": { b: "b", a: "a" }
});

Will yield an empty diff, as expected. But:

jiff.diff({
  "a": [{ a: "a", b: "b" }]
}, {
  "a": [{ b: "b", a: "a" }]
});

Will yield a non-empty diff, because:

JSON.stringify({ a: "a", b: "b" }) !== JSON.stringify({ b: "b", a: "a" })

@briancavalier
Copy link
Member

JSON.stringify will only be used to compare objects inside arrays

Yes. More specifically, the hash function (which defaults to JSON.stringify for objects) will only be used to detect "sameness" (i.e. identity) inside arrays.

But this creates somewhat confusing situations

Diffing arrays is tricky, and significantly more complex than diffing objects. The two are quite different because of the inherent difference that arrays are ordered and objects aren't. While there is no "JSON Diff" spec, the JSON Patch spec implies (by the presence of the move operation) that a diff algorithm must detect changes in order within arrays.

So, diffing an object and diffing an array containing an object are inherently different, which is why they behave differently.

Will yield a non-empty diff, because:

Right. This is exactly why the hash function is configurable. I don't think jiff could ever guess the most efficient and effective way to detect identity for every possible style of object someone might ever put in an array.

I guess the takeaway here is that, if you need to diff arrays of non-primitive values, explicitly providing a hash function is the way to go.

Perhaps jiff should enforce that by throwing if a hash function isn't specified and it encounters a non-primitive (object or array) inside an array?

@EyalAr
Copy link
Contributor Author

EyalAr commented Jan 12, 2016

What do you think about:

  1. Using JSON.stringify by default also for arrays inside arrays? Seems like it fits arrays even better than plain objects, because arrays are ordered.
  2. Always using deep-equality to detect movement of items inside an array? I realise this will slow things down significantly (since the LCS code will have to compare array elements not by hashes, but by deep equality), but it will allow to create more concise patches.

Perhaps jiff should enforce that by throwing if a hash function isn't specified and it encounters a non-primitive (object or array) inside an array?

It's definitely important to let users know the significance of the hash function. But I think most users would prefer a default behaviour which "just works", even if less performant; than trying to figure out why the library throws.

EyalAr added a commit to EyalAr/jiff that referenced this issue Jan 13, 2016
following discussion at cujojs#32 - comparison of array elements is different
for objects and arrays.
EyalAr added a commit to EyalAr/jiff that referenced this issue Jan 13, 2016
EyalAr added a commit to EyalAr/jiff that referenced this issue Jan 13, 2016
@balihoo-dengstrom
Copy link

In case this is helpful for anyone else, we used object-hash in place of the default hashing function to solve the issue we had when diff'ing arrays that contain non-scalar objects.

@briancavalier
Copy link
Member

I think most users would prefer a default behaviour which "just works"

@balihoo-dengstrom @EyalAr Maybe we should use something like object-hash as the default hash? What do you think?

@balihoo-dengstrom
Copy link

@briancavalier I think that would be a good idea.

@briancavalier
Copy link
Member

I see that object-hash uses node's crypto module. That seems like it wouldn't bode well for browser use, as a general solution. Jiff really only needs to hash JSON, not JavaScript, so maybe there is another package that hashes JSON and will work in node and browsers. Do you know of any good ones?

@jasonk
Copy link

jasonk commented Jul 26, 2019

Since this is a somewhat old issue I'm not sure if this changed after the earlier comments or not, but object-hash includes a browser version built with browserify, so it does work in browsers..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants