Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libcurl] support uploading large files #19565

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

myml
Copy link
Contributor

@myml myml commented Sep 11, 2024

curl_mime_data requires storing the file in memory use curl_mime_filename instead of curl_mime_data to support uploading large files.

PR checklist

  • Read the contribution guidelines.
  • Pull Request title clearly describes the work in the pull request and Pull Request description provides details about how to validate the work. Missing information here may result in delayed response from the community.
  • Run the following to build the project and update samples:
    ./mvnw clean package 
    ./bin/generate-samples.sh ./bin/configs/*.yaml
    ./bin/utils/export_docs_generators.sh
    
    (For Windows users, please run the script in Git BASH)
    Commit all changed files.
    This is important, as CI jobs will verify all generator outputs of your HEAD commit as it would merge with master.
    These must match the expectations made by your contribution.
    You may regenerate an individual generator by passing the relevant config(s) as an argument to the script, for example ./bin/generate-samples.sh bin/configs/java*.
    IMPORTANT: Do NOT purge/delete any folders/files (e.g. tests) when regenerating the samples as manually written tests may be removed.
  • File the PR against the correct branch: master (upcoming 7.6.0 minor release - breaking changes with fallbacks), 8.0.x (breaking changes without fallbacks)
  • If your PR is targeting a particular programming language, @mention the technical committee members, so they are more likely to review the pull request.

@myml
Copy link
Contributor Author

myml commented Sep 11, 2024

@michelealbano Please help review this PR related to libcurl.

@wing328
Copy link
Member

wing328 commented Sep 11, 2024

https://github.com/OpenAPITools/openapi-generator/actions/runs/10803849417/job/29968627388?pr=19565

please update the samples so that the CI can test the change

@myml
Copy link
Contributor Author

myml commented Sep 13, 2024

@wing328 Samples updated for CI testing. Let me know if anything else is needed.

curl_mime_data requires storing the file in memory
use curl_mime_filename instead of curl_mime_data to support uploading large files.
@wing328
Copy link
Member

wing328 commented Sep 16, 2024

cc
@zhemant (2018/11) @ityuhui (2019/12) @michelealbano (2020/03)

@zhemant
Copy link
Contributor

zhemant commented Sep 16, 2024

Hi, Thank you for PR.

We should not modify the binary_t data type for this purpose since the multipart data can have more information in itself. For this, we can define a separate struct for handling multipart data with parameters as given here: https://swagger.io/docs/specification/describing-request-body/multipart-requests/.

Could you provide a spec example to understand how a type file is defined where a file path will be given instead of a binary blob? If there is some specific type/format which highlights if it is a binary format or file path, we can may be leverage that.

@myml
Copy link
Contributor Author

myml commented Sep 23, 2024

Hi, @zhemant

Thank you for the input. I believe we should deprecate the current binary_t type, as it copies the file content into memory, which can lead to memory issues when handling large files. Instead, I propose using filename and filepath fields directly to reference the file on disk, avoiding memory overload during file uploads.

The data field in the current binary_t structure can be maintained temporarily for compatibility, but we should plan to remove it in a future version.

Let me know your thoughts.

@zhemant
Copy link
Contributor

zhemant commented Sep 23, 2024

Hi @myml ,

I would not modify/deprecate binary_t, since it is not used only for files but for other binary types that are not files.

    Binary:
      format: binary
      type: string
      description: string with format 'binary' as defined in OpenAPI.

In the above specification, binary_t is with binary string. This includes binary blob or other binary sending, in addition to sending files(what we do now).

For simplicity, we could check the first char if it is "@" then we treat it as a path(how curl does it) and call the file handling functions from curl, otherwise we treat it as normal binary. Curl takes the file name from the path to set the respective file name. But that's the best we could do without adding a new data type for the file.

Do you think this would work in your scenario?

@myml
Copy link
Contributor Author

myml commented Sep 24, 2024

Hi @zhemant,

Using the first character to determine the type is a bad idea because the first character of binary data could very well be "@" itself, which would lead to unexpected behavior.

I currently haven't found a way in the specification to differentiate between binary data and binary files. If we were to add a new data type, I'm not sure how to automatically distinguish between the two when generating code.

What are your thoughts on this?

@myml
Copy link
Contributor Author

myml commented Sep 24, 2024

I think it’s necessary to add a filename field to binary_t, since right now we are hardcoding it as "image.png," which feels more like a temporary solution.

@zhemant
Copy link
Contributor

zhemant commented Oct 8, 2024

I think it’s necessary to add a filename field to binary_t, since right now we are hardcoding it as "image.png," which feels more like a temporary solution.

Yes, I agree, this should not be hardcoded.

Hi @zhemant,

Using the first character to determine the type is a bad idea because the first character of binary data could very well be "@" itself, which would lead to unexpected behaviour.

I currently haven't found a way in the specification to differentiate between binary data and binary files. If we were to add a new data type, I'm not sure how to automatically distinguish between the two when generating code.

What are your thoughts on this?

There is is possibility of having "@" in binary at 1st place.

I have the following thoughts:

  • We can make these changes only in form parameters and not in body parameters. So whatever is added in this PR is only valid for form params.
  • Since form params are special(i.e. only part of API and callbacks), We could do a form param related data type. The PR as of now only sets file name and data, but what about its content type. If someone wants to upload csv or image, then there is no way to specify. With addition of new form param type, we could cover these and have better support. And since this will be known in apiclient.c during mime parsing, we can make sure we are doing it properly.

I would propose a object like:

// name of the data this should be already defined as part of API in the form of a key in KVP
char *name;     

// For value, following stuct could be used(this I am referencing from libcurl form data):

typedef struct mime_form_param_t {
// payload data binary or string
char *data;                    
// file name to be used
char *filename;
// content type to set for this data         
char *content_type;           
// encoding to use to send data
char *encoder;   
// list of additional headers         
list_t *headers;  
// custom to know if the data is file path and not binary
bool is_filepath;
} mime_form_param_t;  

I think with above struct, we get the option to know if the data is file path or binary, and additionally we could provide the options which are part of form params.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants