Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle differences in s3 API support #133

Open
kallisti5 opened this issue Jul 12, 2019 · 13 comments
Open

Gracefully handle differences in s3 API support #133

kallisti5 opened this issue Jul 12, 2019 · 13 comments

Comments

@kallisti5
Copy link

kallisti5 commented Jul 12, 2019

we're using Minio for our s3 bucket storage, and this resource fails to properly connect due to assumptions about supporting the versioning api.

checking failed
resource script '/opt/resource/check []' failed: exit status 1

stderr:
�[31merror finding versions: NotImplemented: A header you provided implies functionality that is not implemented
	status code: 501, request id: , host id: 
�[0m
@kallisti5
Copy link
Author

The same applies if the server supports versioning, but it is disabled:

stderr:
�[31merror finding versions: bucket is not versioned
�[0m

@vito
Copy link
Member

vito commented Aug 1, 2019

Does this still happen when using regexp:?

What can be done here? iirc we just use the S3 AWS SDK and hope that S3-compatible blobstores are, well, S3-compatible. But if we're doing some extra call that breaks things unnecessarily, I'd be happy to review a PR to fix it. (We don't have a whole lot of bandwidth to proactively fix these things ourselves.)

Thanks!

@kallisti5
Copy link
Author

regexp works with unversioned buckets... however then if something in the bucket (even unrelated) doesn't match the regex:

regex_versioning_error

Reeeeeeeeally wish this resource just allowed "uploading things, no versioning". Versioning is not really useful in most cases :-)

@vito
Copy link
Member

vito commented Dec 2, 2019

@kallisti5 Without versioning there's no way to really do anything continuously as there's no guarantee that artifacts propagating from one build to the next don't just become clobbered by a broken artifact produced by a new build upstream.

versioned_file is there for when you don't care about semver, which I agree is not always what you want. The resource is probably just adding using an extra header to perform uploads in parallel or something, and needs to be adjusted to fall back gracefully.

@kallisti5
Copy link
Author

kallisti5 commented Dec 2, 2019

@kallisti5 Without versioning there's no way to really do anything continuously as there's no guarantee that artifacts propagating from one build to the next don't just become clobbered by a broken artifact produced by a new build upstream.

I want artifacts clobbered. Our build system versions artifacts with a unique version number that changes every commit. That's why this issue is so frustrating to so many people. "Just do what I say Concourse, don't get in my way"

@kallisti5
Copy link
Author

Another example is we generate a package repository with 40+ packages. We just want to upload the contents of folder(key,prefix,etc) to a bucket at a folder (key,prefix,etc).

The versioning prevents this. The only use-case I see for the s3 versioning is where "You wanna upload the same file over and over", which almost nobody really does.

@vito
Copy link
Member

vito commented Dec 9, 2019

If you don't want versioning, you probably shouldn't be using an abstraction (resources) whose central purpose is versioning of artifacts. 😅 Maybe you should use a task for these "just put something in s3" use cases instead?

I want artifacts clobbered. Our build system versions artifacts with a unique version number that changes every commit. That's why this issue is so frustrating to so many people. "Just do what I say Concourse, don't get in my way"

I'm confused by this; it sounds like you don't want clobbering in this case, you just want to be able to push an artifact under some unique name and pass that along through your pipeline, relying on passed constraints to know which artifact to propagate instead of versioning of the artifacts in isolation. (edit: A totally valid use case, and one I'd like better support for, I'm just clarifying here to make sure it's more evidence of that kind of thing. 🙂 )

@kallisti5
Copy link
Author

kallisti5 commented Dec 9, 2019

If you don't want versioning, you probably shouldn't be using an abstraction (resources) whose central purpose is versioning of artifacts.

I'm not sure where this comes from. The core functionality of s3 isn't versioning resources. Versioning is an optional add-on to the s3 specification. It's common for non-aws object storage with s3 compatibility (Wasabi, Minio, GCP Storage with interoperability enabled) to not feature versioning, or being able to selectively enable it (There #111 have #98 been #97 a #61 few #55 tickets with #55 folks #140 being #23 confused about this limitation #133)

Folks can use s3 in a lot of ways, most expect "basic uploading of objects" to work without worrying about versioning. At minimum if there is zero interest in providing "basic uploading of objects without factoring in s3 versioning" you might wanna document it.

@vito
Copy link
Member

vito commented Dec 10, 2019

I'm not sure where this comes from. The core functionality of s3 isn't versioning resources.

Sorry, I'm not talking about S3 in general, I'm talking about the concept of a Concourse resource. Concourse resources represent versioned objects, fundamentally. It's not at all wrong to use S3 in a way that doesn't involve versioning, but it's odd to use the S3 Concourse resource and not want versioning, because that's what it's for, and that's what all Concourse resources are for.

The language in the documentation isn't very precise around this, which is totally our bad. For a while we built on resources as a "general thing to be used for all kinds of integrations" which turned out to be a bad idea. Now we define them more precisely around how they were originally designed to be used - versioning of artifacts - and I'm focusing on classifying these other use cases (e.g. notifications, triggers, and cases like this one which are more like arbitrary blobs) and finding better ways to represent them in Concourse pipelines.

If you're interested and have the time, I wrote a rather lengthy blog post about all this recently. 🙂

We actually have pretty much the same use case as yours, so I'm feeling this annoyance too. If you look at our pipeline we actually have a linux-rc resource whose filenames don't have any independently orderable versioning. It ends up working because we rely on passed constraints, but it's super confusing because what looks like the "latest" version on that page isn't actually the latest, it's literally the filename with the lowest alphabetical sort order. Super janky!

So at the end of the day I'm suggesting that this resource shouldn't be viewed as "the way to do everything with S3". It's specifically built around versioning individual files in a bucket, either by filename or with versioned_file. That's a very common use case that a lot of people use. I don't disagree that there are other use cases, like uploading a bunch of files to a bucket or uploading blobs with unique filenames - I'm just suggesting that those should not be done with this resource, as they don't fit the intent of Concourse resources at all.

If you just need to upload a bunch of things to S3 and don't need to be able to propagate it through your pipeline, I would just do that with a task for now. Maybe in the future when Prototypes land people can start to develop re-usable S3 toolkits so you don't have to write the task yourself. Maybe this resource even gets converted into a prototype which supports all those extra actions in addition to supporting the Concourse resource interface that it does today. We're just not there yet.

@kallisti5
Copy link
Author

Ok. Thanks for explaining it, this definitely helps. The biggest (and most frustrating to be honest) confusion on my end was expecting an "s3 resource" to "just push objects to s3 like any normal s3 client / api"

We already worked around this by adding an s3 client into a container and "setting up the s3 client" every job execution.. we then template the s3 client configuration out via environment variables per pipeline. It's all getting pretty janky but it lets us upload artifacts within a task. I am a bit worried about exposing s3 credentials this way... one stray log line from concourse and our artifact repository could be compromised (which is a big deal)

@vito
Copy link
Member

vito commented Dec 11, 2019

Glad all that makes sense. 🙂

Yeah, writing tasks which handle credentials is pretty risky. Lots of people are in the habit of slapping set -x up top and calling it a day. You could enable secret redaction as a sort of safety net at least: https://concourse-ci.org/creds-redacting.html - this was added recently, and will automatically redact any credentials that come from your credential manager.

If they don't come from a credential manager right now, you could consider using var_sources and the "dummy" credential manager, both of which are coming in v5.8. Here's the RFC for var_sources: concourse/rfcs#39 - we'll have docs out for it soon too.

@kallisti5
Copy link
Author

kallisti5 commented Dec 1, 2022

For any future lurkers, the concourse rclone resource works well to put artifacts to s3 and a variety of other remote cloud destinations without worrying about versioning.

I kinda wish this resource was renamed s3-versioned-resource or something... it's a critical distinction.

@vito
Copy link
Member

vito commented Dec 6, 2022

I kinda wish this resource was renamed s3-versioned-resource or something... it's a critical distinction.

All resources are versioned by design (doc), so I'd wager it would be clearer if the rclone resource called out that it's "unversioned" instead. Otherwise nearly every resource that fits the intended mold would be called "versioned-foo." 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants