Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: initial Operator::from_uri implementation #5482

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

jorgehermo9
Copy link
Contributor

@jorgehermo9 jorgehermo9 commented Dec 30, 2024

Relates to #5445

Left some doubts as // TODO in the source. I have little experience contributing to this repo so I'm sorry if there are a lot of doubts about this. Just want to be sure all the changes of this PR aligns with the current codebase. Please take a look to all the TODOs I left when reviewing.

I would like to add more tests, but I don't know in which place those should be placed. The core/tests folder seems like a good place, but I don't find any place suitable, as placing those in core/tests/behaviour seems weird to me. But as this implies various components, maybe we can have a core/tests/integration? Although I would like to write some unit tests at core/src/types/builder.rs, core/src/types/operator/builder.rs and core/src/types/operator/registry.rs, but didn't any existing unit tests there.

In this PR I implemented a single Configurator::from_uri method, which will serve as default and takes only the query parameters as options. Services which need a more specific configuration such as s3 or azblob can be implemented in follow-up PRs.

I also have pending documentating all the newly added public API, but will do that after an initial review round.

Thank you very much.

@@ -0,0 +1,226 @@
use std::cell::LazyCell;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this file inside the crate::types::operator::registry module. Is it ok? I thought about adding a crate::types::operator_registry, but it seemed better this way.


// TODO: thread local or use LazyLock instead? this way the access is lock-free
// TODO: should we expose the `GLOBAL_OPERATOR_REGISTRY` as public API at `crate::types::operator::GLOBAL_OPERATOR_REGISTRY`?
thread_local! {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the preferred way of having a global static variable such as this?

I prefer to have it thread_local so there is not need for a LazyLock, we can use LazyCell instead (LazyCell is lock-free but LazyLock is not, it synchronizes access through threads)

// TODO: thread local or use LazyLock instead? this way the access is lock-free
// TODO: should we expose the `GLOBAL_OPERATOR_REGISTRY` as public API at `crate::types::operator::GLOBAL_OPERATOR_REGISTRY`?
thread_local! {
pub static GLOBAL_OPERATOR_REGISTRY: LazyCell<OperatorRegistry> = LazyCell::new(|| OperatorRegistry::with_enabled_services());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSRV check fails due to the usage of LazyCell. Should we update the MSRV or use another thing instead?

I see this TODO about the once_cell crate usage:

# TODO: remove once_cell when lazy_lock is stable: https://doc.rust-lang.org/std/cell/struct.LazyCell.html

If the replacement is planned, I think it would be better to use LazyCell than once_cell::Lazy in new code like this one, to not increase technical debt.

image

core/Cargo.toml Outdated Show resolved Hide resolved
core/src/types/builder.rs Outdated Show resolved Hide resolved
core/src/types/operator/registry.rs Outdated Show resolved Hide resolved
}

impl OperatorRegistry {
pub fn new() -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's initialize the registry here directly.

Copy link
Contributor Author

@jorgehermo9 jorgehermo9 Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done in e887d24

Derived a Default implementation for OperatorRegistry that rely in HashMap::default so users can get an empty OperatorRegistry if they need it.

Or should the Default implementation include all the registered services like the OperatorRegistry::new? If so, should we include a OpereatorRegistry::empty so users can get an empty OperatorRegistry?

core/src/types/operator/registry.rs Outdated Show resolved Hide resolved
// ```
// and the apply_for_all_services macro would gate every statement behind the corresponding feature gate
// This seems to not be the place where we should have a "list of enabled services".
#[cfg(feature = "services-aliyun-drive")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe every service should have its own registration functions. There are two reasons:

  • The service may have different schemes for registration. For example, S3 registers as s3, s3a, minio, r2, and so on, while GCS registers as gcs, gs, and so forth.
  • Provide a register that simplifies integration with OpenDAL for external users.

Copy link
Contributor Author

@jorgehermo9 jorgehermo9 Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, going to move the registration to each service Builder.

something like

impl Builder for S3Builder{

  pub fn register_in_registry(registry: &mut OperatorRegistry){
   registry.register("s3",s3_builder)
   registry.register("r2", r2_builder)
  }
}

I wonder if s3 and r2 (or s3a, minio, etc) have different ways of building, if so, we can use different OperatorFactory such as s3_builder, r2_builder, etc

Copy link
Contributor Author

@jorgehermo9 jorgehermo9 Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented that in 3b80f70. Do you prefer it that way?

Please note that I used a default Builder::register_in_registry for all services.

For example, S3 registers as s3, s3a, minio, r2, and so on, while GCS registers as gcs, gs, and so forth.

This can be addressed in follow-up PR's (as discussed here) and then S3Builder should override the default Builder::register_in_registry implementation to register the custom schemes/factories if needed.


/// Register this builder in the given registry.
fn register_in_registry(registry: &mut OperatorRegistry) {
let operator_factory: OperatorFactory = |uri, options| {
Copy link
Contributor Author

@jorgehermo9 jorgehermo9 Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Had to add the type annotation for operator_factory because the compiler could not infer it correctly

error[E0308]: mismatched types
  --> src/types/builder.rs:68:55
   |
68 |         registry.register(Self::SCHEME.into_static(), operator_factory)
   |                                                       ^^^^^^^^^^^^^^^^ one type is more general than the other
   |
   = note: expected fn pointer `for<'a> fn(&'a _, HashMap<_, _>) -> std::result::Result<_, _>`
              found fn pointer `fn(&_, HashMap<_, _>) -> std::result::Result<_, _>`

For more information about this error, try `rustc --explain E0308`.
warning: `opendal` (lib) generated 2 warnings
error: could not compile `opendal` (lib) due to 1 previous error; 2 warnings emitted

let builder = Self::Config::from_uri(uri, options)?.into_builder();
Ok(Operator::new(builder)?.finish())
};
registry.register(Self::SCHEME.into_static(), operator_factory)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a default implementation, we simply register the associated SCHEME with this builder. Is this ok?

@@ -56,6 +58,15 @@ pub trait Builder: Default + 'static {

/// Consume the accessor builder to build a service.
fn build(self) -> Result<impl Access>;

/// Register this builder in the given registry.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doc-comment adequate?

@jorgehermo9
Copy link
Contributor Author

hi @Xuanwo, thank you very much for your review. I answered all the comments and I'm ready for another round.

@Xuanwo
Copy link
Member

Xuanwo commented Jan 7, 2025

Hi, @jorgehermo9, I'm currently addressing some urgent existing issues in the current OpenDAL release. We will be publishing one or more patch versions for OpenDAL 0.51.x, and this PR (which introduces breaking changes) will be merged afterward. I will return to review this PR once I have resolved everything.

@jorgehermo9
Copy link
Contributor Author

Don't worry, take your time! Just pinging to notify, but no hurries! Thank you very much for your hard work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants