Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "AIPs by file format" and "AIPs by PUID" reports #76

Merged
merged 1 commit into from
Nov 30, 2020

Conversation

tw4l
Copy link
Contributor

@tw4l tw4l commented Oct 15, 2020

Connected to #75

This PR adds reports that take a file format or PUID as input and provide a sorted list of AIPs containing that format/PUID.

This was intended to show/explore what's possible in terms of format reporting with the data we have from the METS files, so higher-level groupings (e.g. format groups from the FPR) are not included.

Also included is a second commit which simply removes an outdated TODO comment on the largest files report, which didn't seem worth the overhead of a separate issue and PR.

Screenshots

Reports page

image

The dropdown lists are populated from new unique_file_formats and unique_puids properties on the StorageService model, and so only contain formats/PUIDs found in the AIPs in the currently selected Storage Service. It's possible these dropdowns might get a bit unwieldy when AIPscan is pointed to a Storage Service with lots of AIPs containing a heterogeneous mix of formats, with an upper bound of being equivalent to the very large dropdown menus in the FPR.

I'm very open to any suggestions for how to make the UI more appealing or user-friendly!

AIPs by File Format report

image

AIPs by PUID report

image

@tw4l tw4l force-pushed the dev/issue-75-file-format-report branch 12 times, most recently from 0c0f257 to 1ea7cad Compare October 20, 2020 22:19
@tw4l tw4l changed the title WIP: Add "AIPs by file format" and "AIPs by PUID" reports Add "AIPs by file format" and "AIPs by PUID" reports Oct 20, 2020
@tw4l tw4l marked this pull request as ready for review October 20, 2020 22:49
@tw4l
Copy link
Contributor Author

tw4l commented Oct 20, 2020

Converting back to draft for now as there's two areas I'd like to give a little attention before a proper CR:

  1. Adding start date/end date parameters for these reports
  2. Adding pie chart/scatterplot representations

That said, feel free to take a look and leave feedback if you'd like!

@tw4l tw4l marked this pull request as draft October 20, 2020 23:22
@tw4l tw4l force-pushed the dev/issue-75-file-format-report branch 7 times, most recently from 132aee8 to 1ea7cad Compare October 22, 2020 00:49
@tw4l tw4l marked this pull request as ready for review October 22, 2020 00:49
@tw4l
Copy link
Contributor Author

tw4l commented Oct 22, 2020

Hey @ross-spencer and @peterVG. I spent most of today starting to implement the start/end date parameters and a pie chart representation of these reports and came across at least one blocker (integration tests) and many questions that I think might take some time to resolve. So in the end I moved that WIP work to a new branch so that I can continue to work on that without it blocking the tabular reports here, which I think are pretty solid and well-tested.

If you agree that's sensible, this is now ready for re-review.

@ross-spencer ross-spencer requested review from ross-spencer and removed request for ross-spencer October 23, 2020 14:05
@ross-spencer ross-spencer self-assigned this Oct 23, 2020
@tw4l
Copy link
Contributor Author

tw4l commented Nov 17, 2020

Thanks @ross-spencer, looking forward to talking more about this all! Just to answer your last question - I modified the code locally and took screenshots in Firefox 😆 Simple but effective!

@tw4l tw4l force-pushed the dev/issue-75-file-format-report branch 2 times, most recently from bf29624 to 778e924 Compare November 17, 2020 23:34
<link rel="shortcut icon" href="{{ url_for('static', filename='favicon.ico') }}">
<link rel="stylesheet" type="text/css" href="{{ url_for('static', filename='css/jquery-ui.css') }}">
<link rel="stylesheet" type="text/css" href="{{ url_for('static', filename='css/bootstrap.min.css') }}">
<link rel="stylesheet" type="text/css" href="{{ url_for('static', filename='css/custom.css') }}">
Copy link
Contributor

@ross-spencer ross-spencer Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tw4l I was wondering about this as I work on the agents table. I used a custom CSS, say style_agents_report.css. This grants some flexibility around the amount of customization I can do in that particular grouping of reports. Minor stuff, but it also leaves out the ambiguity of having to check whether a style is being used in more than one report.

Does that make sense to you? And if so, can we split base.html into two? (although I am not sure how that would with the need for a </head> end tag for example.

Or should I perhaps use custom.css and just use comments to separate sections, but accept that stylesheet might grow over time?

It looks like I am coming to a pause on this one tonight, so will get back to it tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ross-spencer - this is a good question!

We could use a single custom.css as you say, but another option might be to use an additional block in order to add custom CSS per report.

Looking at Archivematica as an example, we could do something like this in the report_base.html template: https://github.com/artefactual/archivematica/blob/stable/1.12.x/src/dashboard/src/templates/layout.html#L14

And then in our reports templates, add the additional CSS like this: https://github.com/artefactual/archivematica/blob/stable/1.12.x/src/dashboard/src/templates/archival_storage/archival_storage.html#L19-L24

Those are Django templates and not jinja2, of course, but I think it should work the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the same for the jQuery/JS sprinkled into the templates might be a nice way to cleanly separate that out as well.

@tw4l tw4l force-pushed the dev/issue-75-file-format-report branch 3 times, most recently from 271c2cb to 7923637 Compare November 18, 2020 22:08
@tw4l tw4l force-pushed the dev/issue-75-file-format-report branch 6 times, most recently from e2f6d8c to 2bdd3a9 Compare November 24, 2020 19:38
@tw4l
Copy link
Contributor Author

tw4l commented Nov 24, 2020

Hey @ross-spencer - long time in the coming but this is now ready for re-review. I think I've addressed all of your comments and questions but please let me know if there's something I missed.

On the reports UI, I did end up implementing something close to my second option above for now:

aipscan_reports_with_puids

In many cases we don't have PUIDs for preservation derivatives (they seem to be present in the METS only when people input them during manual normalization workflows?) so for many users that screen will likely look more like this:

aipscan_reports_no_puids

I also added some helpful messages to display in place of the reports when there are no Storage Services configured, or a Fetch Job hasn't been run yet for the selected Storage Service.

Other changes include:

  • Reduced duplication between aips_by_file_format and aips_by_puid Data endpoints
  • Removing error handling for missing parameters in the API endpoints (they now return an empty list instead)
  • Adding an original_files parameter (with default True) for the reports
  • Improved test coverage, using our new ability to set test state in a fixture (and adding one new such fixture to the Data blueprint's tests module)
  • Better _get_storage_service logic in the Data and Reporter blueprints (I noticed that the previous logic for the former made it possible for the endpoints to return false/inconsistent data, so patched that up while I was there)
  • Better use of constants, and in the case of request parameters, a new request_params shared dict
  • Better exception handling for AttributeErrrors when attempting to access the new StorageService model properties in the reports view.
  • Wrapping natsort in a more descriptive helper function for naturally sorting PUIDs.
  • Making the namespace_data API routes consistently semantic - e.g. /storage_service/<storage_service_id>/file-format rather than file-format/<storage_service_id>

Not included in this piece of work is non-tabular reports and start/end data parameters, which as noted before I've intentionally left out for now and will address in a separate PR so they don't hold up this work.

Thanks!

@ross-spencer
Copy link
Contributor

Awesome @tw4l I anticipate reviewing this today - in-fact - I plan to not leave my desk until review is done!

Copy link
Contributor

@ross-spencer ross-spencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tw4l this is looking really great. Perhaps it will be prudent to check what has happened to the alert around the yet to be implemented Format version count report. Perhaps that alert code can be removed entirely now the page isn't so bear?

Otherwise I have left some comments that might also be worth getting your opinion on. Possibly the only one that might result in a change is around the way file formats are grouped for searching on. This is very much related to PRONOM's imperfections and I personally would love to figure out with y'all how to make the UI as clear as possible about what these names mean, i.e. they're not quite distinct file formats, and they're not quite format families either. It's just a grouping by name, and might end up being a mix of those two things.

Related, I think there might be one issue worth logging as a follow-up about an endpoint for those format names. So a workflow for a user of the API might be:

  1. Query format name groups.
  2. Query for the report-like statistics based on that filter.

Around the storage service ID/endpoints I have suggested an issue be logged about discovering storage service resource information. Specifically the ID. The intention of the previous code was to always return something, and I agree the result might be more unintuitive than perhaps unpredictable, but not necessarily desirable. We should be able to ask for correct information and get correct information and likewise ask for bad information and get nothing back as it's a bit "undefined".

Just some miscellaneous observations:

  • It's an important PR with some good examples demonstrating how to stabilize this work.
  • Docstrings are looking good. I don't know if you've seen my read the docs PR but they'll lend themselves well to that.
  • Some really good examples of "clean code" here.

AIPscan/Aggregator/views.py Show resolved Hide resolved
AIPscan/Data/data.py Show resolved Hide resolved
AIPscan/Data/data.py Show resolved Hide resolved
AIPscan/API/namespace_data.py Outdated Show resolved Hide resolved
AIPscan/API/namespace_data.py Show resolved Hide resolved
AIPscan/Data/data.py Show resolved Hide resolved
AIPscan/Data/data.py Outdated Show resolved Hide resolved
AIPscan/Reporter/templates/reports.html Show resolved Hide resolved
AIPscan/Reporter/templates/reports.html Show resolved Hide resolved
AIPscan/Data/tests/test_aips_by_format.py Show resolved Hide resolved
@tw4l
Copy link
Contributor Author

tw4l commented Nov 27, 2020

@ross-spencer Thank you for the review and the encouragement! This has been a long and involved PR for both of us, but in the end what's getting merged in is much improved by your careful attention, questions, and insights 😄

I believe that the remaining issues we've agreed to change in this PR have now been fixed in the previous four commits.

In addition, I've created several new issues based on your comments and our discussions in this PR:

@tw4l tw4l force-pushed the dev/issue-75-file-format-report branch from 24b3dc3 to 01988e3 Compare November 27, 2020 00:17
tw4l added a commit that referenced this pull request Nov 27, 2020
This commit temporarily adds some elements from PR #76 - namely, the
request_params dict and report_base.html template. This commit can be
deleted once that PR is merged and this branch can be rebased.
Copy link
Contributor

@ross-spencer ross-spencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superb, thanks Tessa, and likewise, I appreciate your patience, communication and perseverance! This is looking great!

One last thing to say...

LGTM!

LGTM

This commit adds reports that take a file format name or PUID as input
(to specify a file format or file format version, respectively) and
return information about which AIPs contain files in that format or
format version, as well as their count and size per AIP.

Additionally, this commit reworks the _get_storage_service logic in the
Data and Reporter blueprints to ensure that there are not discrepencies
between the Storage Service name and data returned by Data endpoints.
In Reporter, this commit adds fallback logic similar to what was
previously found in Data. If an invalid Storage Service ID is passed to
the reports view via request parameters, the new helper function will
return one of (in order of preference):
- The requested Storage Service
- The first default Storage Service
- The first Storage Service
- None

This ensures that a valid Storage Service instance is passed along to
the template if one is available. Since it's clearly indicated in the
template UI which Storage Service is currently selected, this component
does not share the issues identified with the previous approach within
the Data blueprint.

Finally, this commit standardizes request parameters in the Reporter
blueprint by adding a shared request_params dictionary and revising
some of the parameters used in the report template's JavaScript to
remove discrepencies.
@tw4l tw4l force-pushed the dev/issue-75-file-format-report branch from 01988e3 to 91a8237 Compare November 30, 2020 15:55
@tw4l tw4l merged commit 91a8237 into main Nov 30, 2020
@tw4l tw4l deleted the dev/issue-75-file-format-report branch November 30, 2020 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants