-
-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve wording describing what happens when you switch from Row to Record view in OpenRefine #329
Comments
Sure, I'm a bit confused, too. The opening paragraph reads: The second sentence tells me a row is a record. (!) Then, a record is a record to which many rows may be assigned. Perhaps pull from the documentation: "A row is a series of cells, related horizontally." Sharing the example of Show: Actor : roles might be wise, unless a FRBR version : Work: Version: Manifestation, like Othello: Film/play/tv series: directors or date. The key seems to be Records view allows some additionally subgrouping (filtering?) but only a few additional, so splitting the cells to get Tidy Data would be a better practice. Also there's a risk in Records row of deleting data when removing an empty cell's whole row (that is row in both visual and OR terms?). Does part of this go under Transformation (later)? |
I definitely see what you mean! But ... essentially this is true. To be clear. If we have a data set formatted like:
Then each row represents a single article metadata record - these are two separate articles being described, one row each. The downside is we have the author information split across multiple columns, and if we encounter an article with 3 (or 4 or 5 etc.) authors, we'll need to add a new column for every additional author. However if we layout the same data like this:
Now each article metadata record takes up multiple rows. It's 2 row for each here, but if we had an article with more authors we'd just add the extra rows for that particular article metadata - and it keeps all the author data in a single column. However in a spreadsheet (and in OpenRefine Rows mode) while using the second format makes sense to our eyes (maybe) the software has no idea that the two (or more) rows are connected - so an operation like a sort on the author column would reorder the rows with no care that each group of rows representing a single article metadata record should be kept together. This is where OpenRefine Records mode comes in. When you switch to Records mode, OpenRefine will interpret these multiple rows as being part of the same single record still - and so will keep them together at all time. This way you get the advantages of the simpler layout, with all the author data in a single column, without losing the ability to keep all the data for a single article together. NB its not just sort that's affected - we can manipulate the Record in a variety of ways, but sort is a simple example of why it's important that OpenRefine treats the group of rows as a single record Does that make sense? |
Your example helps me put on computer lenses instead of human goggles,
which is useful to help me remember I don't need to understand it, but the
computer does (harking back to tidy data). Should we format that into the
paragraph?
…On Thu, Nov 23, 2023, 5:43 AM Owen Stephens ***@***.***> wrote:
The second sentence tells me a row is a record. (!)
I definitely see what you mean! But ... essentially this is true. To be
clear. If we have a row like:
Article Author 1 Author 2
The Fisher Thermodynamics of Quasi-Probabilities Flavia Pennini Angelo
Plastino
Aflatoxin Contamination of the Milk Supply Naveed Aslam Peter C. Wynn
Then each row represents a single article metadata record - these are two
separate articles being described, one row each.
However if we layout the same data like this:
Article Authors
The Fisher Thermodynamics of Quasi-Probabilities Flavia Pennini
Angelo Plastino
Aflatoxin Contamination of the Milk Supply Naveed Aslam
Peter C. Wynn
Now each article metadata record takes up multiple rows (it's 2 for each
here, but if we had more authors we'd need more rows per record).
In a spreadsheet (and in OpenRefine Rows mode) using the second format
makes sense to our eyes (maybe) but the software has no idea that the two
(or more) rows are connected - so an operation like a sort on the author
column would reorder the rows with no care that each group of rows
representing a single article metadata record should be kept together.
However in OpenRefine Records mode, OpenRefine will understand that these
grouped rows are part of the same record - and so will keep them together
at all time.
Does that make sense?
—
Reply to this email directly, view it on GitHub
<#329 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMT5MX6KFFLIA67HM3BLDYLYF4ZHNAVCNFSM6AAAAAA7WOWUHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRUGI4DENBQGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@jas58 I'll make a PR based on what I've written above - I think I can probably still make some improvements! Once I've got a PR ready then I'll ask you to review and you can check it both makes sense and is helpful! |
With your closed PR, does that mean this issue is also closed? @ostephens If not, which element should I edit into the final checkbox? This seems related to issue 264 about rows expanding? |
Discussed in call on 9th Feb. Potentially an exercise using blank down/fill down could be added, but concern that this will over load the learners early in the session |
Starting notes to open later: Whereas in Records mode, sometimes it does not look different. Need to create library specific tidy data demo graphic (well sorted and jumbled tidy data of MARC record (multi author or subj)) how to say record, row in data vs OpenRefine " the word line?" "horizontal group?" And the nice part is, you haven't ruined the original instructor note: if you catch yourself saying row when you mean record. please stop and restart the whole because a quick switch is gobsmackingly confusing to the new learner How to format a table in markdown: https://carpentries.github.io/sandpaper-docs/episodes.html#tables |
How could the content be improved?
The wording under the images illustrating the difference between row and record layout doesn't currently make complete sense (as I read it). It says:
I think this needs re-writing as I can't currently understand what it means. I think it needs to be more clearly linked to the description of what a Row is vs what a Record is in OpenRefine so that this is much clearer overall
Which part of the content does your suggestion apply to?
https://librarycarpentry.org/lc-open-refine/03-working-with-data.html#rows-and-records
The text was updated successfully, but these errors were encountered: