Replies: 1 comment
-
Hi @dizyart , Thank you for your valuable words first off :) I heavily encourage you to join the Poiji community to make the above target possible in short-term or long-term period. Regarding your ideas:
Please go ahead! Not only will we share different insights and knowledge during those ideas, but we can also focus on how others can use them more extensively. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello Hakan,
Over the last 3 years I've been developing a very rich excel-uploading feature for a project I was working on.
It is very similar to POIJI and I am currently looking into migrating our solution to POIJI, because it does all the right things perfectly! (Congrats, btw! 😁)
I would propose and potentially contribute a few extensions to POIJI, which would enable us to use the lib:
Handle consumer return types
The consumer interface indeed makes handling large files possible. The problem is that passing the DTO to a consumer usually results in some different type of object being returned by the consumer. I would propose a solution to handle the result of the consumer, either via callbacks or by collecting and returning.
Provide custom parsers per type
I would propose that PoijiOptions builder provides a method of adding new parsers per field type, which would convert (and cast) a String into a specific type, required by the POJO field.
Here's a target usage example:
Strict validation mode
If I understand the source code correctly, when a value cannot be parsed / cast correctly, POIJI fails silently and sets the default value for a field. For our use-case, when a user provides an invalid / unexpected value in the excel file, we are supposed to abort the import and report the error(s) to the user. When the user uploads very large files with incorrect values, if the system does not reject these values, the user incorrectly assumes that the provided values were actually persisted, when in reality they were converted to defaults. This can have critical side-effects, as the user might be unaware that some values are incorrect.
I would propose a new "strict validation" option in the settings, perhaps allowing the developer to define either "fail on field" or "fail on record" options, to define whether the parser should trigger an error for the first failed field, or attempt parsing of all fields, collect all errors and trigger a single "record" failure.
Provide observer interface
Provide an option to attach one or multiple observers for the parser. The observers would be called at certain points in the parser's process, allowing the developer to plug in custom functionality separately from the "main" consumer.
The interface could have callbacks such as:
This would allow the developer, to write custom functionality for handling imports, like validation of headers, tracking progress, cancelling long-running imports, context-based validation, etc.
Skipping empty rows
As already mentioned in some other thread, if a row in excel is completely empty, it makes no sense to parse it.
I would propose adding an option to "skip empty rows". This includes the header row - i.e. if the header appears in 3rd row and data begins in 10th rows, it is impossible for the developer to know this in advance and skip these rows specifically.
With the "skip empty rows" option enabled, the consumer would only receive a pojo when the excel row actually has data.
Background
We have a solution, which converts excel files to DTOs, and handles long-running imports, which handle excels with e.g. 20.000 records x 180 columns, with many custom mappings, validations and business rules applied to each record (takes around 2-3 minutes per import).
Our solution also has some very specific features, like error-aggregation, unique-values validation, required column order, transaction management, pausing imports in conflicting transactions, handling return-types, idempotent updates (skipping a record if it results in a no-change operation), import progress tracking (e.g. 758 / 20.000), performance observability (logging and metrics), cancelling an import etc.., but all those features would be satisfied by the Observer interface.
We have spent years developing and optimizing the solution (using POI and stream-reading) and I think we could really provide some valuable insights and learnings and share it with the world via POIJI.
If you would agree, what I would do next is create some proof-of-concept PRs here on GitHub, and continue discussions there per each topic separately. Would that be OK for you?
Best regards, Vanja
P.S.: Thank you for the lib, it is a real pleasure reading the source!
Beta Was this translation helpful? Give feedback.
All reactions