-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add AlluxioClientConfig #72
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM
alluxio/config.py
Outdated
self, | ||
etcd_hosts: Optional[str] = None, | ||
worker_hosts: Optional[str] = None, | ||
options: Optional[Dict[str, Any]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we need additional options here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for putting Alluxio KEY=VALUE pairs
I will rename it to alluxio_properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My abstraction for this config class is like our AlluxioProperty in java. I feel like we can just put Key value directly in the init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, could do that, hopefully the properties needed won't increase too much in the future making this config super long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think just keep one layer of Key value so it's easier to turn the class to dict or read config from file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense!
I move the supported alluxio_properties to this config class
If it becomes too long, we could go for the file approach
alluxio/config.py
Outdated
etcd_hosts: Optional[str] = None, | ||
worker_hosts: Optional[str] = None, | ||
options: Optional[Dict[str, Any]] = None, | ||
logger: Optional[logging.Logger] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really need logger in config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will move it out of config
etcd_port=2379, | ||
worker_http_port=ALLUXIO_WORKER_HTTP_SERVER_PORT_DEFAULT_VALUE, | ||
etcd_refresh_workers_interval=120, | ||
config: AlluxioClientConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure whether it is a good practice to use a config
class here. We would probably be careful as this is an interface change. I assume that the reason for this change is to avoid the long argument list for the AlluxioClient
(although the users would still have to provide a long argument list for the AlluxioClientConfig
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked a few other Python libraries:
- scikit-learn. Take the random forest module (https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/ensemble/_forest.py#L1171) as an example, they just keep a long argument list.
- presto-python-client.
PrestoRequest
(https://github.com/prestodb/presto-python-client/blob/master/prestodb/client.py#L131) also maintains a long argument list. - pymongo.
MongoClient
(https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/mongo_client.py#L141) keeps the argument list and uses**kwargs
for other arguments. - Ray.
ray.tune
module (https://github.com/ray-project/ray/blob/master/python/ray/tune/tune.py) maintains a long list of arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To conclude,
- It may not be an issue to maintain a long argument list. But for a long argument list, it might be a good practice to use
*
in the arguments (both ray and scikit-learn use that). https://stackoverflow.com/questions/57667742/what-does-represent-in-function-argument-list-in-python - After reviewing the configs, I feel some of them are client-related configs (e.g. etcd hosts, etcd ports, HTTP ports), and some are alluxio-specific configs (e.g. page size, hash nodes, etc.). Maybe it's better to differentiate those configs and construct these configs passed in as
**kwargs
.pymongo
uses this approach (https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/mongo_client.py#L747, https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/client_options.py).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From mongo client_options example, they are also putting all options into one options dict and then retrieve key values from it which is the main goal of this PR: putting all options in one place. I don't have strong preference about using dict or class, just feel there's might be two strength if we use a class instead of dict: it's more intuitive what's in the dict and what's not in the dict. And it's easy to convert to dict using: __dict__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And agree, we could do better by splitting different kind of options into their own scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original reason is to have long list of argument in both alluxio-py and alluxiofs
We could discuss about whether to merge the two repos into one, otherwise too easy to break one of them
Add AlluxioClientConfig to define the configuration values and types.
This help avoid repeating the same configuration across different classes.