-
-
Notifications
You must be signed in to change notification settings - Fork 69
codedeploy-agent.service is made executable and causes sytemd to error #185
Comments
Looks like You might fix it by entering the shell and removing the |
okay! This bug is quite serious, as I had my Airflow scheduler fail during the day and the new one was not able to be provisioned due to this error. I had to manually redeploy the entire cluster to fix this (the new ec2 schedulers were starting up and shutting down immediately) |
@villasv this issue persists, just had a scheduler fail with the same error. I found the source of this issue: do you think a simple chmod in setup scripts would fix this issue? what permissions would you suggest for this service? |
Oh geez, thanks for finding that. I assumed it was a bug in one of this repo's scripts and could never find it.
Maybe just unset the execution part? |
On the other hand, looking through the issue it seems to be a benign warning that doesn't cause error. It's probably something else. Now I'm looking at the The command above is EDIT.: Ah, I see the error now. Once again a missing So if your scheduler is failing, it must be due to something else :-/ |
okay @villasv thanks a lot for checking this, I will investigate again once this failure happens. A good way to reproduce this issue is simple: just stop the scheduler in EC2 console, this will force it to shutdown and the new one to be provisioned. The new one will then fail to start, causing an infinite loop of provisioning and deprovisioning. I was unable to resolve this issue, so for now I have resorted to deleting and redeploying the entire cluster once such failure happens. |
Hi a colleague and I had the same issue where both the schedular and webserver would keep failing and restarting if they were terminated for any reason. This led to code deploy failing shutting down the instance and starting a new one. If so the fix we implemented was to perform a rm /airflow/* -rf as part of a before install in the code deploy |
Ohhh, thank you for the investigation, this is quite promising. Indeed. Curiously this is already solved for the workers, they will wait for CodeDeploy (either to deploy or to say there's nothing to deploy)! The original motivation was that I didn't want them to start consuming messages from the Celery broker before they had the DAG files in place. Now it seems this was relevant for the scheduler and webserver after all. Since there's some code reuse, it shouldn't be hard. Glad to see that you were able to workaround using the existing scripts! Nicely done. |
@Tayyab-Dativa could you please share your scripts? I was not able to figure out how to fix this issue, still relying on manual redeploys :( I would really really appreciate your help! |
So at your project level, in the scripts directory, the one that contains cdapp_start.sh and cdapp_stop.sh if you add a new bash file, we called it before_install.sh and add the following 'rm /airflow/* -rf', then as part of the appspec.yml you need to add |
@Tayyab-Dativa excellent, thanks a lot! This is huge, now I can have confidence in my cluster :D |
Hello, the scheduler has failed and then it keeps trying to get recreated without success:
This repeats over and over. What do you think could be the source of this issue?
The text was updated successfully, but these errors were encountered: