Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make superset dashboard improvements for user testing #3867

Closed
9 of 13 tasks
Tracked by #3855
bendnorman opened this issue Sep 23, 2024 · 8 comments · Fixed by #3888
Closed
9 of 13 tasks
Tracked by #3855

Make superset dashboard improvements for user testing #3867

bendnorman opened this issue Sep 23, 2024 · 8 comments · Fixed by #3888
Assignees
Labels

Comments

@bendnorman
Copy link
Member

bendnorman commented Sep 23, 2024

Before we do user testing we need to make a few improvements. We'll check in on this work after 5 hours.

High Priority

Preview Give feedback

Bonus

Preview Give feedback
@bendnorman
Copy link
Member Author

For the welcome dashboard, we want to:

  • generally have less text
  • Have people look at data and interact with the data ASAP. Something like a quickstart guide
  • Explain the three different methods for interaction with the data on superset: table dashboard, creating charts and SQL lab

The flow for welcome dashboard improvements:

  • Have a couple of sentences about what PUDL and Superset are
  • Link to one of the Table Dashboards. This dashboard will explain how to filter and download data
  • Maybe a section that goes into more detail about the table dashboards
  • Section about how to find the data you need in the data dictionary
  • Section on how to create charts
  • Section on how to use SQL Lab
  • FAQ

We can divide these sections up by using tabs.

@bendnorman
Copy link
Member Author

bendnorman commented Sep 26, 2024

Ugh, some very frustrating findings while trying to iron out the permissions for user test. When you give a user permission to a datasource, it gives them read access to all the datasets, charts and dashboards that are derived from the datasource including ones made by other users :/ This poses some UX and privacy concerns for us. I wrote up a discussion in the superset repo about this issue.

UX: If users can see everything built off PUDL it will get difficult to navigate the UI. You can filter dashboards and charts by owner. We could tell users to filter for charts owned by themselves and an admin account that is responsible for creating all the default PUDL charts and dashboards. Superset does expose a way to restrict the owners that appear in the filter dropdowns however I don't think it actually restricts the dashboards and charts you see when the lists are unfiltered.

Privacy: all users' first and last names will be available to anyone who registers. Users won't be able to publish dashboards and keep them private.

Solutions?

  • Just live with it? We could encourage people to filter for dashboards created by us and themselves but this isn't ideal from a UX perspective. Maybe people would be comfortable with their name being out there? Maybe having charts and dashboards public amongst registered users could actually serve as helpful examples and knowledge sharing? Also, the views that list dashboards, charts and datasets might not be the most intuitive method for navigating PUDL data. I think treating the PUDL Tables tab of the welcome dashboard feels more intuitive anyway.
  • Another option is to not give users access to the PUDL database or any of the datasets but give them access to specific dashboards. This gives them read only access to the charts associated with the dashboards. Users won't be able to use SQL Lab or create charts. This would be a significantly stripped-down version of Superset where it's just the table dashboards for downloading data. I think we can easily hide elements of the UI associated with these extra features so it's clearer.
  • Dig into the superset code and figure out how to hide other dashboards, charts and datasets via the UI or in the actual permissions system.
  • It looks like superset is considering redesigning its permissions system to allow for more granular control but there isn't much movement on it.

@zaneselvans
Copy link
Member

zaneselvans commented Sep 26, 2024

Hmm, this is unfortunate. How do larger organizations deal with this? Letting everybody see everything all the time seems like chaos?

Only providing access to dashboards seems like we'd be hiding a lot of functionality -- not the functionality we want the CSV folks to see maybe, but still a lot of power and flexibility I think we'd been hoping to expose for folks that want it. It would also mean that we'd only be able to roll out access piecemeal, as we build out appropriate dashboards that eventually come to cover all of PUDL.

My intuition is that it's probably best to make it clear that this is all publicly visible, and do our best to use the UI, tagging, or whatever other organizational tools are available to us to provide direction to the best resources.

Getting into the guts of the Superset code to make it do something we want that's different feels like a recipe for madness.

Separately, I've wondered whether we might be able to offer public access -- with everything you create being visible to everyone else -- as the free option, and then offer the ability to create private dashboards, saved queries, maybe access to other non-PUDL data? etc. as a paid option.

@zaneselvans
Copy link
Member

Even more separately, I've also been wondering whether there's an extremely simple way that we could provide an in-browser interface that can be pointed at the URL of a Parquet file in S3, display its metadata, and provide a barebones GUI in the vein of Datasette for folks to build a SELECT stuff FROM url WHERE whatever and click a "Download CSV" button -- but have the work of translating the stream of Arrow data from the S3 Parquet file into that CSV be done on the client side, in the browser, so we don't actually have to run any service and can just use the free S3 bucket data.

As an added benefit... this would make every publicly visible Parquet file on the internet accessible to CSV users... which would also include all the datasets at Hugging Face, and a bunch of other stuff in the AWS Open Data Registry.

Some suggestions from the Hive Mind included:

@jdangerx
Copy link
Member

jdangerx commented Sep 27, 2024

tl;dr: I think we can make our users live with this permissions thing; we can also try to build our own thing for "download a CSV" use case, and see if that's better than trying to hack it into Superset.

I think "just live with it" is probably OK... we can tell people not to publish things unless they want to have them be public.

Even more separately, I've also been wondering whether there's an extremely simple way that we could provide an in-browser interface that can be pointed at the URL of a Parquet file in S3, display its metadata, and provide a barebones GUI in the vein of Datasette for folks to build a SELECT stuff FROM url WHERE whatever and click a "Download CSV" button -- but have the work of translating the stream of Arrow data from the S3 Parquet file into that CSV be done on the client side, in the browser, so we don't actually have to run any service and can just use the free S3 bucket data.

I think that makes sense - it really seems like Superset isn't really designed for this important use case of "download the data so you can use it with Excel." I'm imagining some sort of frontend-only interface that lives alongside our data dictionary, which lets people preview/filter/download, or click on a link to make custom dashboards/visualizations in Superset.

This would let us stop fighting with Superset to get the "download data" use case, and just let it be what it was meant to be: a data viz and exploration tool.

We also wouldn't have to programmatically create dashboards for every table we want people to be able to filter, and so maybe people won't have as much of a reliance on Superset dashboards, making this permissions thing less of a problem.

@e-belfer
Copy link
Member

Given that we're on the cusp of actually testing out this tool, I'd like to just ask our users how they feel about this and whether it's a dealbreaker for them when we do the tests, rather than assuming it will be. I don't think mocking up another option is a bad idea at all, but I would like to wait until we see how people respond to our current set-up.

@jdangerx
Copy link
Member

Yeah, agree that we should do any additional frontend stuff after the first user calls!

@e-belfer
Copy link
Member

I'm moving the remaining bonus items to #3908 and closing this. Everything absolutely crucial to address prior to user testing has been addressed.

@github-project-automation github-project-automation bot moved this from In progress to Done in Catalyst Megaproject Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants