-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve retrieval of original name #59
Conversation
712ce9e
to
8e6ec45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ross-spencer, good investigative work here! I think you've hit on a good solution and the code looks good to me. I've left one suggestion for a clarifying comment and a question about one of the tests.
I see that you've opened a related mets-reader-writer issue. Forgive me if this is a naive question, but would it be better to make these changes to mets-reader-writer
(i.e. add a helper to retrieve the transfer's original name, and more gracefully handle multiple dmdSec
s with identical IDs) and then call the new helper here in AIPscan? You've done really good work of teasing this deceptively complicated question apart and it seems like there would be benefits to other projects of having this code in a shared helper library instead of only in AIPscan.
@@ -49,31 +49,32 @@ def get_aip_original_name(mets): | |||
# Negated as we're going to want to remove this length of values. | |||
NAMESUFFIX = -len("-00000000-0000-0000-0000-000000000000") | |||
|
|||
# Other intellectual entities exist in the METS, i.e. for empty | |||
# directories. We can identify those with an invalid name prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a comment that this name prefix is an indicator of an invalid <premis:originalName>
value specifically within the context of a dmdSec. I was confused for a minute because the <premis:originalName>
value in the amdSec
of an original file also begins with "%transferDirectory%".
(os.path.join("iso_mets", "iso_mets.xml"), "iso"), | ||
(os.path.join("features_mets", "features-mets.xml"), "myTransfer", False), | ||
(os.path.join("iso_mets", "iso_mets.xml"), "iso", False), | ||
(os.path.join("original_name_mets", "dataverse_example.xml"), "", True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding correctly that this test raises a METSError
because Archivematica has written two dmdSec
s with an ID of "dmdSec_1"
and mets-reader-writer isn't able to read two dmdSec
s with the same ID? Forgive me if this naive, but with the naked eye it looks like like the <premis:originalName>
value in the second <mets:dmdSec ID="dmdSec_1">
(starting on line 39) should be what we're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. I will add notes to the test. Related issue: archivematica/Issues#1310
8e6ec45
to
ae08385
Compare
Thanks for the review @tw4l. You are somewhat right about the upstream issues. We had already determined an approach to handling those which was to not look at I have started an EPIC for us to build on in this repository: #53 and creating related issues in archivematica/issues. I feel we're approaching critical mass with regards to outstanding I have added comments to the above and hopefully made the intentions in the code and test clearer,,. but I couldn't see what explicit changes you wanted? Just a note on marking "request for changes" in a PR. I recognize not everyone approaches our work like this - but we do have branch protection on this project so it's already explicit the PR can't be merged without approval - it try not to tell folks that's doubly the case. My personal approach is to leave a PR conversational by submitting comments unless there is something critically broken or out of keeping with our contributing gude that we should avoid at all costs. I tend to comment without explicit approval to create better understanding, ask questions, and that in most cases does result in some change to code. If in the rare case someone following their process approves a PR in the middle of a conversation and I am the first reviewer then I am comfortable with that as long as I know I've caught the things I had the capacity to catch that we can't really live with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ross-spencer. I'm on board delivering value in the application itself and keeping track of potential upstream work for now (thanks for starting that Epic - that'll prove very helpful). I think that I had not yet been in a sprint meeting when I did the last review, so I was missing a bit of the context there.
You're right that I wasn't explict about any changes. I was trying to suggest (perhaps too implicitly) that upstream changes to mets-reader-writer
to more gracefully handle the case of duplicative dmdSec IDs might result in downstream changes here - for instance, in the dataverse_example
test returning the original value from the second "dmdSec=1"
rather than throwing a METSError
.
But given the sprint priorities and current focus, I'm quite happy to remove myself as an obstacle for merging - this is good work and definitely adding value. Thanks for adding the comments - I left one additional comment on a docstring typo but otherwise looks great.
Thanks for the PR process note as well - we do seem to be a bit inconsistent as a company in how/whether we use "request for changes" but I see your point and agree that a comment without explicit approval one way or the other would have been appropriate here.
"""Retrieve PREMIS original name from a METSDocument object. | ||
|
||
If the original name cannot be reliably retrieved from the METS file | ||
a METSError is exception is returned to be handled by the called as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a typo in "METSError is exception is returned"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good spot. Thanks Tessa.
Having used AIPscan a little more, leaving the name blank is strange to look at, but we also cannot always retrieve the original name. We do our best to a) retrieve the name in this commit, as well as b) populate the name with the transfer UUID as an alternative.
ae08385
to
9a1e708
Compare
Having used AIPscan a little more, leaving the name blank is strange
to look at, but we also cannot always retrieve the original name. We
do our best to a) retrieve the name in this commit, as well as b)
populate the name with the transfer UUID as an alternative.
Connected to #57