Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make images of reusable components smaller #573

Open
RobbeSneyders opened this issue Oct 30, 2023 · 5 comments
Open

Make images of reusable components smaller #573

RobbeSneyders opened this issue Oct 30, 2023 · 5 comments
Labels
CI/CD Related to CI/CD of the repository Components Implementation of components

Comments

@RobbeSneyders
Copy link
Member

Some of our reusable component images are quite large, especially anything cuda related, which leads to long download / startup times. We should try to minimize the images of our reusable components.

@PhilippeMoussalli
Copy link
Contributor

PhilippeMoussalli commented Oct 30, 2023

Is this more of an issue for the remote runners (vertex, kfp) rather than the local one?

I think for the local runner pulling and running images takes minimal amount of time (once the large image is downloaded once) because of caching

For the remote one:

  • KFP: depends on the pulling policy, if it's set to always then it will take long time
  • Vertex: I don't think Vertex image pulls can be cached so this might be where the bulk of the time is spent.

@RobbeSneyders
Copy link
Member Author

We also saw that pulling it once locally can take a long time, especially if you need to pull each image for a pipeline. Some images are >3GB.

@mrchtr
Copy link
Contributor

mrchtr commented Oct 30, 2023

Can we build two versions of the cuda images? One image with cuda and one without cuda support? We could use the non-cuda images for local testing. When a gpu is available, we could pull the image with cuda support.

@PhilippeMoussalli
Copy link
Contributor

@RobbeSneyders is that the case of all the images share the same base image (cuda/pytorch)? In that case, the large image will only have to be pulled once and all the layers on top of it (additional dependencies) should be lightweight.

@mrchtr that could work but it has to be estimated at compile time, we could do it based on the GPU config in the component Op. however, I'm not sure if that's the problem we're trying to solve (lightweight testing). I think we also want fast pulling when running with GPU

@RobbeSneyders
Copy link
Member Author

Building base Fondant images might make sense as well since installing Fondant adds 377MB to the docker image due to our dependencies.

We should investigate if the Fondant install layer differs between images if they're built separately, even if their Dockerfile is the same (the current case). If they're pulled separately for each image, a Fondant base image would help here by only having this layer (and other shared layers) pulled once.

@RobbeSneyders RobbeSneyders added Components Implementation of components CI/CD Related to CI/CD of the repository labels Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Related to CI/CD of the repository Components Implementation of components
Projects
Status: Breakdown
Development

No branches or pull requests

3 participants