Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Milavision package as a drop-in replacement to torchvision #15

Closed
wants to merge 11 commits into from

Conversation

lebrice
Copy link
Collaborator

@lebrice lebrice commented Jan 14, 2022

Please ignore the other closed PR.

Here's my current take on #13.

Let me know what you think :)

NOTE: I will frequently push to this branch, since I alternate between my local setup and the mila cluster to test stuff out. Sorry in advance for all the notifications. Feel free to just ignore them if you want.

Signed-off-by: Fabrice Normandin [email protected]

Copy link
Member

@breuleux breuleux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a first pass at a code review. I haven't read the core logic attentively, so I might have more comments next week.

Comment on lines +22 to +26
LOCAL = enum.auto()
MILA = enum.auto()
CEDAR = enum.auto()
BELUGA = enum.auto()
GRAHAM = enum.auto()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why you're using an enum instead of something like

if socket.getfqdn().endswith(".server.mila.quebec"):
    from . import mila as current
elif socket.getfqdn().endswith(".computecanada.ca"):  # ?
    from . import cc as current
...

I am going to note one thing here, which is that if this package is successful, a natural evolution of the idea is that other institutions than Mila may be interested in adding support for their own clusters (the enum might become quite large). That would be good, because if external code starts using this library for their own reasons, that is more code that plays nice on our cluster. This is one reason why I think that organizing code to group all code for a particular cluster together is a smart play.


@property
def fast_data_dir(self) -> Path:
""" Returns the 'fast' directory where datasets are stored for quick read/writtes. """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""" Returns the 'fast' directory where datasets are stored for quick read/writtes. """
""" Returns the 'fast' directory where datasets are stored for quick read/writes. """

Comment on lines +46 to +47
if self is ClusterType.LOCAL:
return Path("data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should have a default for LOCAL at all.

Comment on lines +40 to +45
if typing.TYPE_CHECKING:
# Import these here, so that when writing code, you get hints when doing something like:
#
# from milavision.datasets import caltech
#
# without the type-checker complaining.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if you're going to paste the whole list of imports here regardless, why not just do it unconditionally and get rid of the logic above it? If we are worried about missing things in newer versions of torchvision, this can be tested for and checked by the CI.

Now this is a bit of a philosophical point, but I think it would be better to explicitly not allow importing datasets from milavision that we do not support. It's expectation management: if I can import something from milavision, I expect it to do the optimal thing on the Mila cluster. If it turns out we don't support it, I think I would prefer to get an exception. Then I know that okay I have to use the default version and I have to be careful.

Do tell if you see use cases where it is important to have full coverage within the same import.

Comment on lines +34 to +36
def on_login_node() -> bool:
# IDEA: Detect if we're on a login node somehow.
return socket.getfqdn().endswith(".server.mila.quebec") and "SLURM_TMPDIR" not in os.environ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I like that, it could prevent some accidents!

Alternatively we can check if the fqdn starts with login, but checking for SLURM_TMPDIR is also a good idea. It might be better to check if it is empty rather than just whether it is in the environment, because if it is the empty string this means we will use the cwd as a tmpdir which is bad.

Comment on lines +53 to +54
if on_login_node():
raise RuntimeError(f"Don't run this on a login node, you fool!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

_IGNORED = ""


def make_dataset(dataset_type: Type[VD], root: str, *agrs, **kwargs) -> VD:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def make_dataset(dataset_type: Type[VD], root: str, *agrs, **kwargs) -> VD:
def make_dataset(dataset_type: Type[VD], root: str, *args, **kwargs) -> VD:

(ditto for uses of the variable below)

Signed-off-by: Fabrice Normandin <[email protected]>
@lebrice
Copy link
Collaborator Author

lebrice commented Mar 6, 2023

This whole idea was moved to a separate project: https://www.github.com/mila-iqia/mila_datamodules

@lebrice lebrice closed this Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants