Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Submit Exec Failure when docker pars is changed and k8s api server restart #428

Open
duyanghao opened this issue Aug 12, 2017 · 6 comments

Comments

@duyanghao
Copy link

duyanghao commented Aug 12, 2017

The submit exits with following errors when i change docker pars(do not restart) or restart k8s api server:

2017-07-05T11:32:42.822851404Z 2017-07-05 11:32:42 WARN WatchConnectionManager:182 - Exec Failure
2017-07-05T11:32:42.822866179Z java.io.EOFException
2017-07-05T11:32:42.822869460Z at okio.RealBufferedSource.require(RealBufferedSource.java:59)
2017-07-05T11:32:42.822872518Z at okio.RealBufferedSource.readByte(RealBufferedSource.java:72)
2017-07-05T11:32:42.822875435Z at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:113)
2017-07-05T11:32:42.822878101Z at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:97)
2017-07-05T11:32:42.822880754Z at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
2017-07-05T11:32:42.822883385Z at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
2017-07-05T11:32:42.822895231Z at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135)
2017-07-05T11:32:42.822897880Z at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
2017-07-05T11:32:42.822900257Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2017-07-05T11:32:42.822902649Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2017-07-05T11:32:42.822905015Z at java.lang.Thread.run(Thread.java:745)
2017-07-05T11:32:42.825252889Z 2017-07-05 11:32:42 INFO WatchConnectionManager:352 - Current reconnect backoff is 1000 milliseconds (T0)
2017-07-05T11:32:43.856091750Z 2017-07-05 11:32:43 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
2017-07-05T11:32:43.856106549Z 
2017-07-05T11:32:43.856109837Z 
2017-07-05T11:32:43.856112236Z Container name: spark-kubernetes-driver
2017-07-05T11:32:43.856115104Z Container image: xxx
2017-07-05T11:32:43.856117597Z Container state: Running
2017-07-05T11:32:43.856119991Z Container started at: 2017-07-05T01:55:25Z
2017-07-05T11:32:43.856586296Z 2017-07-05 11:32:43 INFO Client:54 - Application xxx finished.

Addition:
The result shows that above operations(change docker pars or restart k8s api server) do not have any influence on driver and executors.

@duyanghao
Copy link
Author

@erikerlandson,do you have any suggestion?

@erikerlandson
Copy link
Member

It makes sense to me that restarting the kube api server could cause the watcher to fail, since the watcher would lose connection to the cluster. Can you explain what you mean about changing docker params?

@duyanghao
Copy link
Author

duyanghao commented Aug 14, 2017

@erikerlandson changing docker params means changing some pars in /etc/sysconfig/docker file(but do not restart docker).
i do think it would be more robust if watcher makes some reconnect.

@duyanghao
Copy link
Author

duyanghao commented Aug 14, 2017

@erikerlandson maybe it is not relevant to docker pars change but kubelet aborts.but still i recommend watcher reconnection.

@erikerlandson
Copy link
Member

@duyanghao I think if it's possible to make the watcher connections robust across restarts it would be desirable. @foxish, do you have any insights on this one?

@duyanghao
Copy link
Author

duyanghao commented Aug 28, 2017

@erikerlandson @foxish Taking a look at issue 465,maybe we can have these problems solved together.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants