Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add AlluxioClientConfig #72

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from
Open

Conversation

LuQQiu
Copy link
Contributor

@LuQQiu LuQQiu commented Mar 6, 2024

Add AlluxioClientConfig to define the configuration values and types.
This help avoid repeating the same configuration across different classes.

@LuQQiu LuQQiu requested review from jja725 and ChunxuTang March 6, 2024 20:17
Copy link
Contributor

@jja725 jja725 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM

self,
etcd_hosts: Optional[str] = None,
worker_hosts: Optional[str] = None,
options: Optional[Dict[str, Any]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we need additional options here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for putting Alluxio KEY=VALUE pairs
I will rename it to alluxio_properties

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My abstraction for this config class is like our AlluxioProperty in java. I feel like we can just put Key value directly in the init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, could do that, hopefully the properties needed won't increase too much in the future making this config super long

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think just keep one layer of Key value so it's easier to turn the class to dict or read config from file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense!
I move the supported alluxio_properties to this config class
If it becomes too long, we could go for the file approach

etcd_hosts: Optional[str] = None,
worker_hosts: Optional[str] = None,
options: Optional[Dict[str, Any]] = None,
logger: Optional[logging.Logger] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really need logger in config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will move it out of config

etcd_port=2379,
worker_http_port=ALLUXIO_WORKER_HTTP_SERVER_PORT_DEFAULT_VALUE,
etcd_refresh_workers_interval=120,
config: AlluxioClientConfig,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure whether it is a good practice to use a config class here. We would probably be careful as this is an interface change. I assume that the reason for this change is to avoid the long argument list for the AlluxioClient (although the users would still have to provide a long argument list for the AlluxioClientConfig).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked a few other Python libraries:

  1. scikit-learn. Take the random forest module (https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/ensemble/_forest.py#L1171) as an example, they just keep a long argument list.
  2. presto-python-client. PrestoRequest (https://github.com/prestodb/presto-python-client/blob/master/prestodb/client.py#L131) also maintains a long argument list.
  3. pymongo. MongoClient (https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/mongo_client.py#L141) keeps the argument list and uses **kwargs for other arguments.
  4. Ray. ray.tune module (https://github.com/ray-project/ray/blob/master/python/ray/tune/tune.py) maintains a long list of arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To conclude,

  1. It may not be an issue to maintain a long argument list. But for a long argument list, it might be a good practice to use * in the arguments (both ray and scikit-learn use that). https://stackoverflow.com/questions/57667742/what-does-represent-in-function-argument-list-in-python
  2. After reviewing the configs, I feel some of them are client-related configs (e.g. etcd hosts, etcd ports, HTTP ports), and some are alluxio-specific configs (e.g. page size, hash nodes, etc.). Maybe it's better to differentiate those configs and construct these configs passed in as **kwargs. pymongo uses this approach (https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/mongo_client.py#L747, https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/client_options.py).

Copy link
Contributor

@jja725 jja725 Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From mongo client_options example, they are also putting all options into one options dict and then retrieve key values from it which is the main goal of this PR: putting all options in one place. I don't have strong preference about using dict or class, just feel there's might be two strength if we use a class instead of dict: it's more intuitive what's in the dict and what's not in the dict. And it's easy to convert to dict using: __dict__.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And agree, we could do better by splitting different kind of options into their own scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original reason is to have long list of argument in both alluxio-py and alluxiofs
We could discuss about whether to merge the two repos into one, otherwise too easy to break one of them

@LuQQiu LuQQiu changed the title Add AlluxioClientConfig [WIP] Add AlluxioClientConfig Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants