[libcurl] support uploading large files #19565

myml · 2024-09-11T02:22:25Z

curl_mime_data requires storing the file in memory use curl_mime_filename instead of curl_mime_data to support uploading large files.

PR checklist

Read the contribution guidelines.
Pull Request title clearly describes the work in the pull request and Pull Request description provides details about how to validate the work. Missing information here may result in delayed response from the community.
Run the following to build the project and update samples:
```
./mvnw clean package 
./bin/generate-samples.sh ./bin/configs/*.yaml
./bin/utils/export_docs_generators.sh
```
(For Windows users, please run the script in Git BASH)
Commit all changed files.
This is important, as CI jobs will verify all generator outputs of your HEAD commit as it would merge with master.
These must match the expectations made by your contribution.
You may regenerate an individual generator by passing the relevant config(s) as an argument to the script, for example ./bin/generate-samples.sh bin/configs/java*.
IMPORTANT: Do NOT purge/delete any folders/files (e.g. tests) when regenerating the samples as manually written tests may be removed.
File the PR against the correct branch: master (upcoming 7.6.0 minor release - breaking changes with fallbacks), 8.0.x (breaking changes without fallbacks)
If your PR is targeting a particular programming language, @mention the technical committee members, so they are more likely to review the pull request.

myml · 2024-09-11T02:22:41Z

@michelealbano Please help review this PR related to libcurl.

wing328 · 2024-09-11T08:55:04Z

https://github.com/OpenAPITools/openapi-generator/actions/runs/10803849417/job/29968627388?pr=19565

please update the samples so that the CI can test the change

myml · 2024-09-13T02:34:19Z

@wing328 Samples updated for CI testing. Let me know if anything else is needed.

curl_mime_data requires storing the file in memory use curl_mime_filename instead of curl_mime_data to support uploading large files.

wing328 · 2024-09-16T09:35:24Z

cc
@zhemant (2018/11) @ityuhui (2019/12) @michelealbano (2020/03)

zhemant · 2024-09-16T16:48:42Z

Hi, Thank you for PR.

We should not modify the binary_t data type for this purpose since the multipart data can have more information in itself. For this, we can define a separate struct for handling multipart data with parameters as given here: https://swagger.io/docs/specification/describing-request-body/multipart-requests/.

Could you provide a spec example to understand how a type file is defined where a file path will be given instead of a binary blob? If there is some specific type/format which highlights if it is a binary format or file path, we can may be leverage that.

myml · 2024-09-23T07:47:29Z

Hi, @zhemant

Thank you for the input. I believe we should deprecate the current binary_t type, as it copies the file content into memory, which can lead to memory issues when handling large files. Instead, I propose using filename and filepath fields directly to reference the file on disk, avoiding memory overload during file uploads.

The data field in the current binary_t structure can be maintained temporarily for compatibility, but we should plan to remove it in a future version.

Let me know your thoughts.

zhemant · 2024-09-23T09:21:02Z

Hi @myml ,

I would not modify/deprecate binary_t, since it is not used only for files but for other binary types that are not files.

    Binary:
      format: binary
      type: string
      description: string with format 'binary' as defined in OpenAPI.

In the above specification, binary_t is with binary string. This includes binary blob or other binary sending, in addition to sending files(what we do now).

For simplicity, we could check the first char if it is "@" then we treat it as a path(how curl does it) and call the file handling functions from curl, otherwise we treat it as normal binary. Curl takes the file name from the path to set the respective file name. But that's the best we could do without adding a new data type for the file.

Do you think this would work in your scenario?

myml · 2024-09-24T01:45:12Z

Hi @zhemant,

Using the first character to determine the type is a bad idea because the first character of binary data could very well be "@" itself, which would lead to unexpected behavior.

I currently haven't found a way in the specification to differentiate between binary data and binary files. If we were to add a new data type, I'm not sure how to automatically distinguish between the two when generating code.

What are your thoughts on this?

myml · 2024-09-24T01:48:23Z

I think it’s necessary to add a filename field to binary_t, since right now we are hardcoding it as "image.png," which feels more like a temporary solution.

zhemant · 2024-10-08T13:24:03Z

I think it’s necessary to add a filename field to binary_t, since right now we are hardcoding it as "image.png," which feels more like a temporary solution.

Yes, I agree, this should not be hardcoded.

Hi @zhemant,

Using the first character to determine the type is a bad idea because the first character of binary data could very well be "@" itself, which would lead to unexpected behaviour.

I currently haven't found a way in the specification to differentiate between binary data and binary files. If we were to add a new data type, I'm not sure how to automatically distinguish between the two when generating code.

What are your thoughts on this?

There is is possibility of having "@" in binary at 1st place.

I have the following thoughts:

We can make these changes only in form parameters and not in body parameters. So whatever is added in this PR is only valid for form params.
Since form params are special(i.e. only part of API and callbacks), We could do a form param related data type. The PR as of now only sets file name and data, but what about its content type. If someone wants to upload csv or image, then there is no way to specify. With addition of new form param type, we could cover these and have better support. And since this will be known in apiclient.c during mime parsing, we can make sure we are doing it properly.

I would propose a object like:

// name of the data this should be already defined as part of API in the form of a key in KVP
char *name;     

// For value, following stuct could be used(this I am referencing from libcurl form data):

typedef struct mime_form_param_t {
// payload data binary or string
char *data;                    
// file name to be used
char *filename;
// content type to set for this data         
char *content_type;           
// encoding to use to send data
char *encoder;   
// list of additional headers         
list_t *headers;  
// custom to know if the data is file path and not binary
bool is_filepath;
} mime_form_param_t;

I think with above struct, we get the option to know if the data is file path or binary, and additionally we could provide the options which are part of form params.

myml force-pushed the libcurl branch from a2790bf to 21137aa Compare September 13, 2024 02:30

[libcurl] support uploading large files

47932bb

curl_mime_data requires storing the file in memory use curl_mime_filename instead of curl_mime_data to support uploading large files.

myml force-pushed the libcurl branch from 21137aa to 47932bb Compare September 13, 2024 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libcurl] support uploading large files #19565

[libcurl] support uploading large files #19565

myml commented Sep 11, 2024

myml commented Sep 11, 2024 •

edited

Loading

wing328 commented Sep 11, 2024

myml commented Sep 13, 2024

wing328 commented Sep 16, 2024

zhemant commented Sep 16, 2024

myml commented Sep 23, 2024

zhemant commented Sep 23, 2024

myml commented Sep 24, 2024

myml commented Sep 24, 2024 •

edited

Loading

zhemant commented Oct 8, 2024 •

edited

Loading

[libcurl] support uploading large files #19565

Are you sure you want to change the base?

[libcurl] support uploading large files #19565

Conversation

myml commented Sep 11, 2024

PR checklist

myml commented Sep 11, 2024 • edited Loading

wing328 commented Sep 11, 2024

myml commented Sep 13, 2024

wing328 commented Sep 16, 2024

zhemant commented Sep 16, 2024

myml commented Sep 23, 2024

zhemant commented Sep 23, 2024

myml commented Sep 24, 2024

myml commented Sep 24, 2024 • edited Loading

zhemant commented Oct 8, 2024 • edited Loading

myml commented Sep 11, 2024 •

edited

Loading

myml commented Sep 24, 2024 •

edited

Loading

zhemant commented Oct 8, 2024 •

edited

Loading