Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling huge data #1

Open
abenhamdine opened this issue Jul 2, 2021 · 4 comments
Open

handling huge data #1

abenhamdine opened this issue Jul 2, 2021 · 4 comments

Comments

@abenhamdine
Copy link

abenhamdine commented Jul 2, 2021

Hi and thx for the module jsreport, it's very useful

We rely on jsreport in production heavily and we often stumble on this error :

2021-07-02T12:11:20+02:00 [] ERROR (4057 on a6982745-09a5-490c-99b7-29ef5a93a707): Fri, 02 Jul 2021 10:11:16 GMT: exception lors de la génération du rapport: Invalid string length, stack: RangeError: Invalid string length
2021-07-02T12:11:20+02:00 at JSON.stringify (<anonymous>)
2021-07-02T12:11:20+02:00 at Object.module.exports.serialize (/home/bas/app_4f808f19-b484-4ae7-9953-f6fe11b6bc05/node_modules/serializator/index.js:23:14)
2021-07-02T12:11:20+02:00 at module.exports.ScriptsManager.execute (/home/bas/app_4f808f19-b484-4ae7-9953-f6fe11b6bc05/node_modules/script-manager/lib/manager-processes.js:85:30)
2021-07-02T12:11:20+02:00 at module.exports.tryCatcher (/home/bas/app_4f808f19-b484-4ae7-9953-f6fe11b6bc05/node_modules/jsreport-core/node_modules/bluebird/js/release/util.js:16:23)
2021-07-02T12:11:20+02:00 at module.exports.ret [as executeAsync] (eval at makeNodePromisifiedEval (/home/bas/app_4f808f19-b484-4ae7-9953-f6fe11b6bc05/node_modules/jsreport-core/node_modules/bluebird/js/release/promisify.js:1:1), <anonymous>:13:39)
2021-07-02T12:11:20+02:00 at executeScript (/home/bas/app_4f808f19-b484-4ae7-9953-f6fe11b6bc05/node_modules/jsreport-core/lib/util/executeScript.js:41:49)
2021-07-02T12:11:20+02:00 at Reporter.executeScript (/home/bas/app_4f808f19-b484-4ae7-9953-f6fe11b6bc05/node_modules/jsreport-core/lib/reporter.js:343:12)
2021-07-02T12:11:20+02:00 at invokeRender (/home/bas/app_4f808f19-b484-4ae7-9953-f6fe11b6bc05/node_modules/jsreport-core/lib/render/render.js:60:36)
2021-07-02T12:11:20+02:00 at module.exports (/home/bas/app_4f808f19-b484-4ae7-9953-f6fe11b6bc05/node_modules/jsreport-core/lib/render/render.js:150:11)

which originates from

module.exports.serialize = (obj, { prettify = false, prettifySpace = 2 } = {}) => {

which is caused by a huge amont of data to render, which probably exceed the maximum length allowed in nodejs (which itself is limited by V8, presumably here https://github.com/nodejs/node/blob/eb24573987b27ad516b5bc86bfb320660540e2bf/deps/v8/include/v8.h#L3004-L3007)

Consequences :

  • jsreport crashes obviously
  • prototype methods are not restored, means than monkey paths remains forever 😞

How do you think things could be improved ?
Of course we have tried to change business logic, to reduce the size of the data being rendered, but we have reached a limit now.
Do you think it could be possible to stringify by chunks ?
eg semthing like https://dev.to/madhunimmo/json-stringify-rangeerror-invalid-string-length-3977 ?

Also try/catch line 29 to be sure to restore prototype method in all cases ?

Thx by advance for your ideas

@pofider
Copy link

pofider commented Jul 2, 2021

Thank you for the full problem description.

prototype methods are not restored, means than monkey paths remains forever 😞

That is ugly, we will add proper restore on error.

How do you think things could be improved ?

You could try to use templatingEngines .strategy = 'in-process' config that doesn't do the serialization.

Do you think it could be possible to stringify by chunks ?

In jsreport v3, we solve the sandboxing and cross process communication differently.
We send the whole big data string to the worker thread, deserialize there but never serialize or send it back.
The request data are in the worker thread in the form of an object the whole time of the request lifecycle.
So I believe there won't be a need for chunks serialization.
We will release the v3 beta this month.

@abenhamdine
Copy link
Author

abenhamdine commented Jul 2, 2021

Thank you for you response

That is ugly, we will add proper restore on error.

👍

You could try to use templatingEngines .strategy = 'in-process' config that doesn't do the serialization.

Thx for you response ! I forgot that server strategy implies serialization.

I have read again https://jsreport.net/blog/running-scripts-and-helpers-in-main-process

The gain of performance with in-process strategy is appealing, but we are reluctant to do rendering in the main process : we have some reports wich are kind of user-controlled and even with developers reports, we are never sure of the size of the data processed.
Is there a possibility to limit memory and time used by a report rendering with in-process strategy ?
In order prevent crashing the main process with OOM or infinite loop.

In jsreport v3, we solve the sandboxing and cross process communication differently.
We send the whole big data string to the worker thread, deserialize there but never serialize or send it back.
The request data are in the worker thread in the form of an object the whole time of the request lifecycle.
So I believe there won't be a need for chunks serialization.
We will release the v3 beta this month.

Great news !
But I see >10 issues are to be solved before release v3, I don't think we can wait for v3 unfortunately to solve this problem (not to rush you thx again for you work and explanations)

@pofider
Copy link

pofider commented Jul 2, 2021

What nodejs version you have? I see the string size limit was increased in v9 to 1GB.
The 1GB isn't enough for you? I know you mentioned it, but maybe try to analyze again what data you really need for the report and do projections.

we have some reports wich are kind of user-controlled

I see... The in-process isn't a good choice for badly coded reports. You could try to initialize a pool of jsreport instances but that has also many downsides. You could check docker workers extension, which was designed to run badly coded reports but it will likely fail the same on the string size.

You can fork or monkey patch this and experiment with chunks serialization. We need to concentrate on v3 now so we aren't going to invest into this direction.

@abenhamdine
Copy link
Author

abenhamdine commented Jul 3, 2021

What nodejs version you have? I see the string size limit was increased in v9 to 1GB.
The 1GB isn't enough for you? I know you mentioned it, but maybe try to analyze again what data you really need for the report and do projections.

You're right, it's definitely not reasonable to have >1Gb data in a string...
But some developers or users could make mistakes that fetch too more data, and we don't necesserally want the server to crash 🙃

we have some reports wich are kind of user-controlled

I see... The in-process isn't a good choice for badly coded reports. You could try to initialize a pool of jsreport instances but that has also many downsides. You could check docker workers extension, which was designed to run badly coded reports but it will likely fail the same on the string size.

You could try to initialize a pool of jsreport instances

Yes I thought about it.
Thx for the hint, I will check docker-workers.

Another idea would be to deploy 2 different servers :

  • one server with strategy 'server' for not-enough-safe reports
  • one server with strategy 'in-process' for builtin safe reports

You can fork or monkey patch this and experiment with chunks serialization. We need to concentrate on v3 now so we aren't going to invest into this direction.

ok, I understand
We will send a PR once our patch will be production-proof

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants