Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core/services-azblob): support user defined metadata #5274

Merged
merged 13 commits into from
Nov 3, 2024

Conversation

jorgehermo9
Copy link
Contributor

@jorgehermo9 jorgehermo9 commented Nov 2, 2024

Relates to #4842

Inspired from #5030

Left some doubts in comments

@jorgehermo9 jorgehermo9 changed the title feat: add user metadata for azblob feat(core/services-azblob): support user defined metadata Nov 2, 2024
let mut meta = parse_into_metadata(path, headers)?;

let user_meta = parse_prefixed_headers(&headers, X_MS_META_NAME_PREFIX);
if !user_meta.is_empty() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is also done in the s3 backend.

Should we always call meta.with_user_metadata(user_meta); instead of checking if the user_meta is empty?

Or can we for example, add that checking inside meta.with_user_metadata(user_meta); and do an early return if the user_meta hashmap is empty?

pub fn with_user_metadata(&mut self, data: HashMap<String, String>) -> &mut Self {

I find that checking in each backend if the hashmap is empty is error-prone boilerplate...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to have such a check. OpenDAL tends to have more readable code because we have many services, and that code is rarely modified.

@@ -243,12 +245,19 @@ impl AzblobCore {

let mut req = Request::put(&url);

if let Some(user_metadata) = args.user_metadata() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm setting the metadata headers only for the azblob_put_blob_request.

Should this also be done for block requests, for example, in azblob_put_block_request? Or the metadata will be only supported for blobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that in OSS the metadata headers is also set for the append_object_request. Should the azure append blob requests also send these headers? azblob_append_blob_request, azblob_init_appendable_blob_request, etc..?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be done for block requests, for example, in azblob_put_block_request? Or the metadata will be only supported for blobs?

Not sure for now; I will need some time to delve into the documentation.

I see that in OSS the metadata headers is also set for the append_object_request.

I remember that only the first append request can include user metadata, but I'm not sure about that. Feel free to do more research on this.

@@ -462,7 +463,7 @@ impl S3Core {
// Set user metadata headers.
if let Some(user_metadata) = args.user_metadata() {
for (key, value) in user_metadata {
req = req.header(format!("{}{}", constants::X_AMZ_META_PREFIX, key), value)
req = req.header(format!("{X_AMZ_META_PREFIX}{key}"), value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this in order to be consistent with what I did in the azure part. It is more idiomatic to use string interpolation

let headers = resp.headers();
let mut meta = parse_into_metadata(path, headers)?;

let user_meta = parse_prefixed_headers(&headers, X_MS_META_NAME_PREFIX);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the metadata parsing be done in two steps (first call parse_into_metadata and then parse_prefixed_headers) or should we add a new prefix: Optional<&str> parameter to the parse_into_metadata function and the prefixed headers parsing will be done there (and also the meta.with_user_metadata call, as the parse_into_metadata returns a Metadata struct)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I tend to keep parse_into_metadata simple because not all services handle metadata and user metadata in the same way.

As I mentioned before, OpenDAL tends to have more readable code. It's acceptable for us to have some duplicated code here and there. We can revisit this part when we have more similar code, rather than just two or three instances.

user_metadata_prefix: &str,
headers: &HeaderMap,
) -> Result<Metadata> {
pub fn parse_metadata(&self, path: &str, headers: &HeaderMap) -> Result<Metadata> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the azblob and s3 services, this code is done in the backend module

let mut meta = parse_into_metadata(path, headers)?;

But in the oss it is done in this parse_metadata function in the core module and called from the backend module

let mut meta = self.core.parse_metadata(path, resp.headers())?;

I think we should unify this in order to be consistent.

In my opinion, maybe we can create a new pub fn parse_prefixed_metadata(path:&str, headers: &HeaderMap, prefix: &str) in the crate::raw::http_util::header module that contains the code from this parse_metadata funtion, and then call that parse_prefixed_metadata in the backend module of those three services

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should unify this in order to be consistent.

Hi, it's better not to do this since every service has its own functions. You'll see if you delve into the implementation of different services.

The parse_metadata provided by raw::http will only parse the HTTP standards. Services may implement their own logic to meet their specific needs.

.collect();
if !data.is_empty() {
m.with_user_metadata(data);
let user_meta = parse_prefixed_headers(&headers, X_OSS_META_PREFIX);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored this in the same way of azblob and s3 services, as it was a lot of duplicated code

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jorgehermo9 a lot for this great PR!

@Xuanwo Xuanwo merged commit fab80c9 into apache:main Nov 3, 2024
235 checks passed
@jorgehermo9 jorgehermo9 deleted the feature/az-user-metadata branch November 3, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants