-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve BagIt implementation #99
Comments
Link to BagIt v1.0 i.e. RFC-8493: https://www.rfc-editor.org/rfc/rfc8493.html |
Folder structure how it could look like adopted from the link to RO-Crate (see above, RO-Crate also had difficulties to read the specfication, there are errors in their figure, which I tried to fix):
Best, |
Dear @tilfischer, Thank you for bringing up this topic. Your comprehensive information is greatly appreciated. I'd also like to provide some information:
Regarding the mention of "dataset" in point 4, could you please provide further clarification?
Thank you! Best regards, |
Dear Claire, Thank you for your prompt reply! In the specification v1.0 on page 7 it says "A bag can have more than one payload manifest with each using a different checksum algorithm." I thought that both sha256 and sha512 have the same sha2 algorithm in the background but different bit lengths. Maybe I am wrong and I also must admit that I just notices this but this is actually of minor importance. On bullet point 4.: I simply want to say that the whole thingy downloaded with the blue "download data + metadata" button should be a BagIt Bag, rather than having a BagIt Bag (as ZIP) within the downloaded ZIP. Best, |
Dear @tilfischer, Thank you for clarifying point 4. I now understand the distinction.
The "download metadata" and "download metadata + data" functions serve as user-friendly tools to retrieve data efficiently. The former provides data in xlsx format, while the latter not only includes the metadata excel but also a "data list" file and the data itself. The system packages the information into a zip file for convenience, but it's important to note that it's not in BagIt format. You can find more information at link. Thank you. Best regards, |
Dear Claire, Unfortunately, I must disagree. The ZIP users get when selecting "download data + metadata" does not include the metadata in XLSX format. It does include a converter.json within the BagIt Bag inside the downloaded dataset. One of my suggestions is to have all in a BagIt Bag as a successor of the current implementation having a BagIt Bag within the downloaded dataset. Best, |
Dear @tilfischer, Thank you for pointing this out. Upon testing the function, I confirmed that the metadata excel is missing when accessed without logging in; the feature functions correctly only when the user is logged in, as described in the documentation. This discrepancy is certainly an issue, and we will address it promptly to ensure consistent functionality regardless of user login status. Thank you once again for bringing it to our attention. Best regards, |
Dear @tilfischer, We're pleased to inform you that the missing metadata excel issue has been resolved. The "download data + metadata" function now works properly whether the user is logged in or not, as described in the documentation. Thank you once again for your help in improving the system. Best regards, |
Dear Clarie, Thank you! What is the status for the other points mentioned above? The first two are of minor importance (at least to me), but 3-4 should be taken into account. In short: Folder structure needs to be fixed and all data need to move to the Bag so that uses do not download data which includes a Bag but just a Bag which includes all data (and metadata in other folder). Best, |
Dear all,
This is also connected to #10, #32 and #98.
User on e.g. https://dx.doi.org/10.14272/UVLHGRADYISRGZ-UHFFFAOYSA-N/CHMO0000025.18 may select the blue button called "download data + metadata". The data is downloaded as a ZIP. This ZIP contains the data shown in the pop up modal of the corresponding dataset of an CV analysis as well as a dataset_description.txt and another ZIP.
The additional ZIP seems to be a Bagit Bag. Having two folders "data" and "metadata" as well as bagit.txt, manifest-sha256.txt and manifest-sha512.txt .
The BagIt specification v0.97 is available here: https://www.digitalpreservation.gov/documents/bagitspec.pdf .
On PDF page 7 it is stated, "A bag MUST NOT contain more than one payload manifest for a particular bag checksum algorithm". To my knowledge SHA256 and SHA512 are two variants of the SHA-2 algorithm i.e. one of the manifest....txt need to be removed to be compliant with the specification. Edit: Also following BagIt specification v1.0 states, "A bag can have more than one payload manifest, with each using a different checksum algorithm."
Different to other places where I know BagIt implementations, the Bag does not contain the bagit-info.txt and the tagmanifest-md5.txt . Both are optional following the BagIt specification, but could be added to also have checksums for the bagit.txt and the manifest-shaXXX.txt .
In the Bag are the folder "data" and "metadata". The "data" folder contains the payload files. Other optional folders within the Bag root may contain optional tag files, following the BagIt specification, which must adhere to the text tag file format described in the specification. These tag files must have the extension ".txt" (PDF page 11). I think this is not was Chemotion wants to do i.e. the "metadata" folder should be moved to the "data" (payload) folder. The folder "data" might have subfolders "dataset" and "metadata". Another approach for naming the latter is to use the same wording as in RADAR, which would be "descriptive-md" (they also have technical MD).
The data in the ZIP downloaded on https://dx.doi.org/10.14272/UVLHGRADYISRGZ-UHFFFAOYSA-N/CHMO0000025.18 via the blue "download data + metadata" button (but not the BagIt ZIP in the ZIP...) should be included in the "dataset". Edit: ..shoulb be included in the /data/dataset/ folder.
The DataCite Metadata e.g. of https://dx.doi.org/10.14272/UVLHGRADYISRGZ-UHFFFAOYSA-N/CHMO0000025.18 should also be added to the /data/metadata/ folder in the Bag.
At some later point in time, BagIt might be also combined with RO-Crate see: https://www.researchobject.org/ro-crate/1.1/appendix/implementation-notes.html#adding-ro-crate-to-bagit by adding a RO-Crate (based on Schema.org metadata) to BagIt. Currently not everything can be described with this type of metadata. Until we are there, Chemotion could provide an update for the current BagIt implementation.
Best,
Tillmann
Edit: Found BagIt specification v1.0 i.e. RFC-8493 and linked this below.
The text was updated successfully, but these errors were encountered: