Make validation more explicit #125

loleg · 2023-03-20T19:59:13Z

None of the documentation mentions when validation happens, though it's a key value add of this package. I would also suggest adding a native call to show the issues from the validator. At least a code snippet like this could be a useful example to tell people how to use readr to check their Data Package:

library(frictionless) # https://docs.ropensci.org/frictionless/ 
library(readr) # https://readr.tidyverse.org/reference/problems.html

# Read a Data Package
my_package <- read_package("datapackage.json")

# Read and validate the resource
my_resource = read_resource(my_package, "my-package-csv")

# Use tidyverse/readr to explain the problem with the data
problems(my_resource)

The text was updated successfully, but these errors were encountered:

peterdesmet · 2023-03-20T21:27:21Z

Thanks for the suggestion. Can you clarify what you mean with validation, since this R package does not offer full validation like frictionless-py.

That said, it is possible to get warnings when reading a resource that doesn’t match the provided table schema. Note that:

Column names are taken from the provided Table Schema (schema), not from the header in the CSV file(s).

For example, a schema might have 3 fields defined, while the data has only 2 columns. This mismatch can lead to readr (and therefore this package) to return a warning. Is this the validation you refer to?

loleg · 2023-03-20T22:25:08Z

Yes, exactly. Thanks. It would be good to also point out the difference, and any reasons (if it's by design) why a validate function is not available.

peterdesmet · 2023-03-21T08:12:26Z

I assume that with “point out the difference”, you mean the difference between this R package and frictionless-py.

You are right that it (“this package does not offer validation”) could be clarified in the package description.
The reason I didn’t implement a validate function is because it is daunting. 😊 It covers a lot of aspects and requires skills I do not have to make it performant. But if someone is willing, it could be implemented with frictionless-py under the hood.

Regarding clarifying the cause of sometimes obscure warnings returned by readr:

We could show users an additional helpful message. E.g. tell them it might be caused by a mismatch between schema and data, and that one can inspect with problems().
Actually inspecting if schema and data align before reading would require more work, because then we have to read the data twice: once to check alignment between header and schema and once actually reading the data (ignoring the header).

I’m in favour of implementing 1 and 3, would that be sufficient?

Ping @PietrH

peterdesmet · 2023-03-21T14:32:44Z

@loleg I have discussed this in person with @PietrH. We decided the following:

Clarify in package description that validation is not included (Indicate in package description that it does not support validation #128)
We won't offer a validate function
We will leave the default message that readr returns on parsing issues; without clarifying it further, i.e.

One or more parsing issues, call problems() on your data frame for details

We will include problems() in the namespace, so that the user can use it without loading readr (Include problems() as part of namespace #129)
We will explicitly compare schema and header and return an error on mismatches (Compare header and schema #127). That way parsing issues will be restricted to data type incompatibilities, not schema/header mismatches.

loleg · 2023-05-03T23:42:36Z

Sorry for the slow reply. That looks to me like a more than reasonable proposal! Anything I could help with?

peterdesmet · 2023-05-04T07:51:03Z

Thanks, I will close this overarching issue. The remaining subtasks are part of https://github.com/frictionlessdata/frictionless-r/milestone/4 and will (with the already implemented changes) be included in version 1.1.0.

@PietrH and I will probably get to that during the summer, but you are welcome to tackle #127 if you want.

peterdesmet added this to the 1.1.0 milestone Mar 21, 2023

peterdesmet closed this as completed May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make validation more explicit #125

Make validation more explicit #125

loleg commented Mar 20, 2023

peterdesmet commented Mar 20, 2023

loleg commented Mar 20, 2023

peterdesmet commented Mar 21, 2023

peterdesmet commented Mar 21, 2023

loleg commented May 3, 2023

peterdesmet commented May 4, 2023 •

edited

Loading

Make validation more explicit #125

Make validation more explicit #125

Comments

loleg commented Mar 20, 2023

peterdesmet commented Mar 20, 2023

loleg commented Mar 20, 2023

peterdesmet commented Mar 21, 2023

peterdesmet commented Mar 21, 2023

loleg commented May 3, 2023

peterdesmet commented May 4, 2023 • edited Loading

peterdesmet commented May 4, 2023 •

edited

Loading