-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
silently failing to download and stage source release file #75
Comments
Just happened again. In the darwin worker logs, you can see where it seems to be mapping the input:
However, the volume path is empty. Concourse reports the check successful, and still doesn't provide the file with a Edit: Even disabling the versions available, and then re-enabling a version doesn't help. Concourse thinks it properly staged the file, I guess? I'm not sure if this is a bug in the resource or not but it's really bad... because I don't have a workaround without destroying the pipeline! |
We've got the same thing - our Control Tower pipeline was watching for new Concourse releases, reckons it's got 5.0.1, but the input directory has no files in it:
The pipeline isn't publicly visible, sadly. Are GitHub releases atomic? Do the files get added to a release after it exists as an entity? |
Recreating the (one) worker in our deployment fixed it, presumably by forcibly dropping the cache. Would be nice if there was a way of doing this without recreating the VM? |
I also have an issue, when release binaries are not downloaded. Any clue on how can I debug this? |
@kayrus Restart your workers, and if you're seeing the same issue as us, it should fix it. |
@DanielJonesEB unfortunately I don't have possibility to restart nodes. is there a simple way to clear the cache by entering into the container? |
@kayrus I don't think so. You'd need to delete the volume that represents that version of the resource, and to do that you'd need to be outside of a check container, and 'at the same level' as the worker process. |
@DanielJonesEB I can get an access to the worker FS, but I cannot reboot it, it runs too many prod jobs. Where should I see the volume? |
@kayrus I've asked one of my colleagues to comment on where to find the volumes. Can you not add another worker? How many workers do you have? If you can't restart one without causing production problems, that's probably a sign that your system is a little fragile and should have more capacity. |
@DanielJonesEB I don't control it, I just use it and I have admin privileges. I really don't want to break it. It is running in k8s cluster and I'd like to carefully clean up the releases cache and figure out why it doesn't download binaries.
I looped over each of them to find the volume, which I see via bash-5.0# pwd
/tmp/build/get
bash-5.0# find
.
./tag
./version
./commit_sha
./body I see my commit, but no binary files from the release:
Here is the command I'm using to identify the source worker:
However it returns no results. I'm now aware about the low level concourse architecture, therefore I might looking in a wrong place. I appreciate your help though. |
What about safely draining that worker with fly land-worker and then restarting it once you've confirmed that it's running no containers? |
@will-gant How can I identify which worker is used? |
If you run |
Ah I'm not an author of the release - just a colleague of @DanielJonesEB :-) |
@will-gant found it:
I entered into the worker, but couldn't find anything related to
However I found the
but there is nothing about the problem release inside. |
Ok, inside the failed release container via hijack I was able to determine the volume, which contains the failed release:
Then I'm looking for a target worker:
Entering the worker and list the files:
I then delete this subvolume from the target node:
Restart the task and still has the same issue. No releases, new failed release subvolume is provisioned on another node now. |
Removing all related buggy subvolumes from every worker now ends with |
There isn't currently any way to clear the cache and force a resource refresh. This is a feature that has been talked about for years and the issue is still open (concourse/concourse#1038) but it hasn't been implemented. I suspect you also need remove references to the deleted volume from the Concourse's postgres DB but I'm not familar enough with the shema to point you to where. |
Yeah, we need to get Concourse to realise that it doesn't have the volume, and therefore to download it again (now that the binaries are uploaded to GitHub). Recreating the worker does this (it has to download the files from somewhere to be able to use them), and I was hoping that deleting the volume itself would do the trick. It could be that there's a reference to it somewhere that makes Concourse think that it's already downloaded it, and so it won't re-download it, but will error when trying to access the volume that's now been deleted? @crsimmons Would we expect volumes to be streamed from other workers? If the same cached volume with no binaries in exists on other workers, would we need to worry about bouncing those too? |
@crsimmons I found release tails in the postgres, removed them:
restarted the pipeline, I clearly saw, that the problem release was getting updated comparing to others, but it still doesn't contain the binaries. The repo is public and I clearly see the binaries on the release page. The config for the pipeline is:
other releases have the same format and no problem. Probably it is somehow related to the fact that we renamed the repo name from |
I found an issue. The pipeline sources were updated in github, but not applied to concourse. Therefore concourse successfully used old glob patterns. These globs were not seen in concourse pipeline UI, but I clearly saw a new repo URL. I applied the pipeline to concourse manually and now I can see the release being downloaded. Though now I know concourse architecture better :) Thanks everyone for a help |
@kayrus Glad you got it sorted! I would really recommend making all your pipeline 'self-set' as their first step. Here's a running example and the corresponding config. Stick something like this at the beginning of your pipelines, or encourage your developers to do so:
|
@DanielJonesEB will keep this in mind. Thanks. |
with concourse 4.0.0:
I have a pipeline which grabs a github release archive:
and
The pipeline was working fine, then I put my computer to sleep with the stack 'stopped', then resumed my pc and woke the stack back up. The pipeline ran again as expected, and showed a valid version for the
homebrew
resource.However, when it gets to the step in the pipeline where it uses the file, there's no
.tar.gz
file to be found!So for whatever reason the check container is reporting success, and concourse is acting like it pulled and staged the resource correctly, but the binary file is missing!
Also, a
fly check-resource
succeeds, but the file is still missing!The text was updated successfully, but these errors were encountered: