Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please support deterministic output (PROV-N) #130

Open
olebole opened this issue Nov 22, 2018 · 2 comments
Open

Please support deterministic output (PROV-N) #130

olebole opened this issue Nov 22, 2018 · 2 comments

Comments

@olebole
Copy link

olebole commented Nov 22, 2018

With Python 3 (3.6, 3.7), the output with PROV-N (at least) is not deterministic:

import prov.model as prov

d = prov.ProvDocument()
d.set_default_namespace('https://example.com')
d.entity('id', [(prov.PROV_TYPE, 'foo'), (prov.PROV_TYPE, 'bar')])
print(d.get_provn())

Gives sometimes

document
  default <https://example.com>
  
  entity(id, [prov:type="foo", prov:type="bar"])
endDocument

and sometimes

document
  default <https://example.com>
  
  entity(id, [prov:type="bar", prov:type="foo"])
endDocument

This makes it difficult to create doctests here.

@trungdong
Copy link
Owner

Hi @olebole,

Attributes are stored internally in a set of values. I think this is why the values do not follow a deterministic order when listed. I can see why it is an issue for your testing, but I wonder if you could have the same test done in a different way. The two documents that you provided above are equivalent.

@olebole
Copy link
Author

olebole commented Nov 22, 2018

Could that be wrapped by sorted()? Having a deterministic output would greatly improve doctests. Currently, I write

'''Example:

>>> import prov.model as prov
>>> d = prov.ProvDocument()
>>> d.set_default_namespace('https://example.com')
>>> d.entity('id', [(prov.PROV_TYPE, 'foo'), (prov.PROV_TYPE, 'bar')])
>>> print(d.get_provn())
document
  default <https://example.com>
  
  entity(id, [prov:type="foo", prov:type="bar"])
endDocument
'''

which is nicely readable as tutorial for the user, and would be a good doctest. Putting the expected output into a string, parsing it, and comparing with the input doc would probably work (right?), but is then not usable as documentation anymore.
If two documents are equivalent (and created in the same manner), shouldn't they have the same serialization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants