Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FSx for Luster CSI driver #40

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jpolchlo
Copy link
Collaborator

After posting and experimenting with #36, I'm now feeling that EFS is perhaps a little bit clunky for the goals I have in mind. Specifically, we have desires to use these volumes to share data among pods. However, the data are almost always sourced from S3, which means that we have to do copy operations, which incur traffic charges. Then, when we're not using the shared data, they just sit in an idle EFS volume. EFS is also not optimized for using many small files.

FSx for Luster is a more modern, higher-performing file storage service, and it can be set up to automatically mirror content from S3. It is not as flexible, though, and it may cost more, since we will pay for the entire file system allocation as long as the device is in use. (This adds up quickly; for the smallest FSx file system, we can expect at least $90/mo, and possibly twice that amount if we're not careful!)

The ideal solution would be to provision an FSx file system on demand with S3 data repository associations. This may not be entirely possible, and this PR represents a testing ground for such exploration.

@jpolchlo
Copy link
Collaborator Author

After thinking about this, the question arose in my mind: can we use dynamic provisioning (i.e., create volumes on demand, and only when needed) using this CSI? And if so, can we dynamically provision an FSx volume with data repository associations, so that there are S3-backed mount points in the volume that can be read from and written to such that the content syncs automatically?

There is a qualified answer here. To get to the letter of the request above, the answer is no. See this issue. The FSx CSI driver isn't yet to the point of being able to do the full S3 integration. If that functionality is required, then we'll have to statically provision in the manner that we have tried. This means that using FSx for Jupyter notebook storage is a non-starter until we get to a point where needing a shared, persistent volume of the sizes required by FSx is justifiable given the costs. And by the time we would need that scale of operation, there might be a dynamic solution available.

On the plus side, it is possible to do dynamic provisioning in a more primitive way, especially if bidirectional, automatically-synced mount points are not required. Specifically, if we only want to import S3 data into the file system, that should be possible by creating a storage class along the lines described here.

Will experiment and report back.

@jpolchlo jpolchlo force-pushed the feature/fsx-for-luster-csi branch from cdb8f0f to cf9bf23 Compare January 30, 2023 19:56
@jpolchlo
Copy link
Collaborator Author

This PR is a bit confounding. The FSx CSI appears to be installed, but I can't yet get it functioning in either the static or dynamic allocation mode. Static volume won't mount due to

MountVolume.SetUp failed for volume "noaa-hydro-data-fsx" : rpc error: code = InvalidArgument desc = dnsname is not provided

No idea where this error is coming from, but it shows up in more than one context.

Dynamic allocation is failing because the IRSA role isn't getting picked up, and the node role is being used, which doesn't (and shouldn't?) have the FSx policy associated with it.

I requested some assistance on the k8s slack, but didn't get immediate response. I'll try to follow up.

In the meantime, I've updated this PR to allow for FSx to be selectively installed for deployments that want to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant