New feature set #276

dizyart · 2023-04-11T10:42:28Z

dizyart
Apr 11, 2023

Hello Hakan,

Over the last 3 years I've been developing a very rich excel-uploading feature for a project I was working on.
It is very similar to POIJI and I am currently looking into migrating our solution to POIJI, because it does all the right things perfectly! (Congrats, btw! 😁)
I would propose and potentially contribute a few extensions to POIJI, which would enable us to use the lib:

Handle consumer return types

The consumer interface indeed makes handling large files possible. The problem is that passing the DTO to a consumer usually results in some different type of object being returned by the consumer. I would propose a solution to handle the result of the consumer, either via callbacks or by collecting and returning.

Provide custom parsers per type

I would propose that PoijiOptions builder provides a method of adding new parsers per field type, which would convert (and cast) a String into a specific type, required by the POJO field.
Here's a target usage example:

class ArticleDto {
    @ExcelCellName("created by")
    private Person creator; // Person is the custom field type
    @ExcelCellName("reviewed by")
    private Person reviewer;
}
// add a custom-type parser to POIJI:
PoijiOptionsBuilder.settings().addParser(Person.class, (String value) -> {
   if (value.isBlank()) {
     return Person.UNKNOWN
   }
   return new Person(value);
})

Strict validation mode

If I understand the source code correctly, when a value cannot be parsed / cast correctly, POIJI fails silently and sets the default value for a field. For our use-case, when a user provides an invalid / unexpected value in the excel file, we are supposed to abort the import and report the error(s) to the user. When the user uploads very large files with incorrect values, if the system does not reject these values, the user incorrectly assumes that the provided values were actually persisted, when in reality they were converted to defaults. This can have critical side-effects, as the user might be unaware that some values are incorrect.
I would propose a new "strict validation" option in the settings, perhaps allowing the developer to define either "fail on field" or "fail on record" options, to define whether the parser should trigger an error for the first failed field, or attempt parsing of all fields, collect all errors and trigger a single "record" failure.

Provide observer interface

Provide an option to attach one or multiple observers for the parser. The observers would be called at certain points in the parser's process, allowing the developer to plug in custom functionality separately from the "main" consumer.
The interface could have callbacks such as:

onHeaders - called when headers are resolved
onBeforeRecord - called before the record is passed to the consumer
onAfterRecord - called after the consumer has been invoked
onException - called when the parser (or consumer) has thrown
onComplete - called after all records are parsed

This would allow the developer, to write custom functionality for handling imports, like validation of headers, tracking progress, cancelling long-running imports, context-based validation, etc.

Skipping empty rows

As already mentioned in some other thread, if a row in excel is completely empty, it makes no sense to parse it.
I would propose adding an option to "skip empty rows". This includes the header row - i.e. if the header appears in 3rd row and data begins in 10th rows, it is impossible for the developer to know this in advance and skip these rows specifically.
With the "skip empty rows" option enabled, the consumer would only receive a pojo when the excel row actually has data.

Background

We have a solution, which converts excel files to DTOs, and handles long-running imports, which handle excels with e.g. 20.000 records x 180 columns, with many custom mappings, validations and business rules applied to each record (takes around 2-3 minutes per import).
Our solution also has some very specific features, like error-aggregation, unique-values validation, required column order, transaction management, pausing imports in conflicting transactions, handling return-types, idempotent updates (skipping a record if it results in a no-change operation), import progress tracking (e.g. 758 / 20.000), performance observability (logging and metrics), cancelling an import etc.., but all those features would be satisfied by the Observer interface.
We have spent years developing and optimizing the solution (using POI and stream-reading) and I think we could really provide some valuable insights and learnings and share it with the world via POIJI.

If you would agree, what I would do next is create some proof-of-concept PRs here on GitHub, and continue discussions there per each topic separately. Would that be OK for you?

Best regards, Vanja

P.S.: Thank you for the lib, it is a real pleasure reading the source!

ozlerhakan · 2023-04-11T20:24:24Z

ozlerhakan
Apr 11, 2023
Maintainer

Hi @dizyart ,

Thank you for your valuable words first off :)

I heavily encourage you to join the Poiji community to make the above target possible in short-term or long-term period.

Regarding your ideas:

I would need more details about "Handle consumer return types".
I loved that approach "Provide custom parsers per type".
About "Strict validation model", have you considered our feature https://github.com/ozlerhakan/poiji#optional-mandatory-headers-and-cells as we use it as a way of validating cells?, does it make sense to you?
"Provide observer interface" at first sight, this might require intricate implementation but we should align it with fine-grained design.
"Skipping empty rows" that makes sense!

If you would agree, what I would do next is create some proof-of-concept PRs here on GitHub, and continue discussions there per each topic separately. Would that be OK for you?

Please go ahead! Not only will we share different insights and knowledge during those ideas, but we can also focus on how others can use them more extensively.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New feature set #276

{{title}}

Replies: 1 comment

{{title}}

Select a reply

New feature set #276

dizyart Apr 11, 2023

Handle consumer return types

Provide custom parsers per type

Strict validation mode

Provide observer interface

Skipping empty rows

Background

Replies: 1 comment

ozlerhakan Apr 11, 2023 Maintainer

dizyart
Apr 11, 2023

ozlerhakan
Apr 11, 2023
Maintainer