Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPCLIENT-2159: Invalid handling of charset content type parameter #483

Merged
merged 1 commit into from
Sep 29, 2024

Conversation

arturobernalg
Copy link
Member

This PR is an attempt to mitigate the issues highlighted in HTTPCLIENT-2159,. The root problem stems from how we are incorrectly handling certain content types by attaching charset parameters when they shouldn't have one. This update adjusts the way the Content-Type header handles specific cases, ensuring compliance with relevant specifications.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a builder for ContenType instead of piling on constructors or factory methods.

Copy link
Member

@michael-o michael-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not put the description on the MIME types into consideration pointed out in HTTPCLIENT-2159.

@arturobernalg
Copy link
Member Author

This PR does not put the description on the MIME types into consideration pointed out in HTTPCLIENT-2159.

HI @michael-o
Please do another pass. I think we’ve covered all the cases now.

@arturobernalg
Copy link
Member Author

I think we should use a builder for ContenType instead of piling on constructors or factory methods.

HI @garydgregory
That seems like a massive change. Maybe we can consider a full refactor of the class in another ticket? I'm not sure if it's necessary right now compared to the charset ContentType parameter handling.

@michael-o
Copy link
Member

This PR does not put the description on the MIME types into consideration pointed out in HTTPCLIENT-2159.

HI @michael-o Please do another pass. I think we’ve covered all the cases now.

Will do tomorrow with a sober mind.

@michael-o michael-o self-requested a review September 23, 2024 20:11
@garydgregory
Copy link
Member

I think we should use a builder for ContenType instead of piling on constructors or factory methods.

HI @garydgregory That seems like a massive change. Maybe we can consider a full refactor of the class in another ticket? I'm not sure if it's necessary right now compared to the charset ContentType parameter handling.

Hi @arturobernalg
Whatever the scale of the change it kind of feels like the right time to do it IMO because we are taking of new factory methods/construction.

I do take your point that it could be done in a separate PR if that PR happens first. If we agree on implementing a builder, I can volunteer to do it unless you'd rather do it.

Copy link
Member

@michael-o michael-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments.

@arturobernalg
Copy link
Member Author

I think we should use a builder for ContenType instead of piling on constructors or factory methods.

HI @garydgregory That seems like a massive change. Maybe we can consider a full refactor of the class in another ticket? I'm not sure if it's necessary right now compared to the charset ContentType parameter handling.

Hi @arturobernalg Whatever the scale of the change it kind of feels like the right time to do it IMO because we are taking of new factory methods/construction.

I do take your point that it could be done in a separate PR if that PR happens first. If we agree on implementing a builder, I can volunteer to do it unless you'd rather do it.

HI @garydgregory
If we agree on implementing a builder, I will do it.
Thank you.

Copy link
Member

@michael-o michael-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is fine for me now, but the title is wrong. The actual report does not relate to any redirects at all. Please change and we are good to merge.

@arturobernalg arturobernalg changed the title HTTPCLIENT-2159: Mitigating Issues with Charset Handling During Redirects HTTPCLIENT-2159: Invalid handling of charset content type parameter Sep 28, 2024
@arturobernalg
Copy link
Member Author

Code is fine for me now, but the title is wrong. The actual report does not relate to any redirects at all. Please change and we are good to merge.

done

@michael-o michael-o self-requested a review September 28, 2024 15:19
MessageSupport.formatParameters(buf, this.params);
} else if (this.charset != null) {
boolean first = true;
for (final NameValuePair param : params) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arturobernalg One could use normal index loop here and avoid this super ugly first variable (and likely be more memory efficient).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @ok2c
The ´first´ variable is needed to handle the condition correctly, ensuring that parameters are properly separated while avoiding an extra "; " in cases where the charset is implicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @ok2c The ´first´ variable is needed to handle the condition correctly, ensuring that parameters are properly separated while avoiding an extra "; " in cases where the charset is implicit.

@arturobernalg You are right. Indeed, an index variable would not work here. Still, I would avoid using for each loops in HC.

Copy link
Member

@garydgregory garydgregory Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ok2c fair enough. changed

…media types

 * Updated ContentType to ensure that no charset is included for media types like application/octet-stream, multipart/form-data, and image/*, which do not require a charset as per the RFC.
 * Refactored the toString() method to properly handle the omission of charset for these media types.
 * Adjusted the creation methods to better handle implicit charsets and added validation for reserved characters in MIME types.
@arturobernalg arturobernalg merged commit ba6e8a9 into apache:master Sep 29, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants