-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorials Tests/Checks FAIL (dapr app not ready / timing) #1048
Comments
Might need maintainer on |
@yaron2 you mentioned you may be able to help with this dapr cli, multi app run health check timing? TY |
@paulyuk The failure seems to be transient and more related to why dapr-dev-redis was going into crashloop backof https://github.com/dapr/quickstarts/actions/runs/9702023201/job/26776912327#step:18:82... |
Hey @mukundansundar and @yaron2, the issue I reported above repros locally. the redis crash loop does not repro for me, using KinD > dapr init -k --dev
⌛ Making the jump to hyperspace...
ℹ️ Note: To install Dapr using Helm, see here: https://docs.dapr.io/getting-started/install-dapr-kubernetes/#install-with-helm-advanced
ℹ️ Container images will be pulled from Docker Hub
✅ Deploying the Dapr control plane with latest version to your cluster...
✅ Deploying the Dapr dashboard with latest version to your cluster...
✅ Deploying the Dapr Redis with latest version to your cluster...
✅ Deploying the Dapr Zipkin with latest version to your cluster...
ℹ️ Applying "statestore" component to Kubernetes "default" namespace.
ℹ️ Applying "pubsub" component to Kubernetes "default" namespace.
ℹ️ Applying "appconfig" zipkin configuration to Kubernetes "default" namespace.
✅ Success! Dapr has been installed to namespace dapr-system. To verify, run `dapr status -k' in your terminal. To get started, go here: https://aka.ms/dapr-getting-started
> kubectl get pods
NAME READY STATUS RESTARTS AGE
dapr-dev-redis-master-0 1/1 Running 0 24m
dapr-dev-redis-replicas-0 1/1 Running 0 24m
dapr-dev-redis-replicas-1 1/1 Running 0 23m
dapr-dev-redis-replicas-2 1/1 Running 0 23m
dapr-dev-zipkin-7d5f8fc8b5-wds69 1/1 Running 0 24m
But the original crash above of Node app not loading with > dapr init -k --dev
⌛ Making the jump to hyperspace...
ℹ️ Note: To install Dapr using Helm, see here: https://docs.dapr.io/getting-started/install-dapr-kubernetes/#install-with-helm-advanced
ℹ️ Container images will be pulled from Docker Hub
✅ Deploying the Dapr control plane with latest version to your cluster...
✅ Deploying the Dapr dashboard with latest version to your cluster...
✅ Deploying the Dapr Redis with latest version to your cluster...
✅ Deploying the Dapr Zipkin with latest version to your cluster...
ℹ️ Applying "statestore" component to Kubernetes "default" namespace.
ℹ️ Applying "pubsub" component to Kubernetes "default" namespace.
ℹ️ Applying "appconfig" zipkin configuration to Kubernetes "default" namespace.
✅ Success! Dapr has been installed to namespace dapr-system. To verify, run `dapr status -k' in your terminal. To get started, go here: https://aka.ms/dapr-getting-started
pyadmin hello-kubernetes master ≢ ?2 dapr run -k -f dapr.yaml
ℹ️ This is a preview feature and subject to change in future releases.
ℹ️ Validating config and starting app "nodeapp"
ℹ️ Deploying app "nodeapp" to Kubernetes
ℹ️ Deploying service YAML "/home/pyadmin/src/paulyuk/quickstarts/tutorials/hello-kubernetes/node/.dapr/deploy/service.yaml" to Kubernetes
ℹ️ Deploying deployment YAML "/home/pyadmin/src/paulyuk/quickstarts/tutorials/hello-kubernetes/node/.dapr/deploy/deployment.yaml" to Kubernetes
⚠ Error deploying pod to Kubernetes. See logs directly from Kubernetes command line.
ℹ️ Writing log files to directory : /home/pyadmin/src/paulyuk/quickstarts/tutorials/hello-kubernetes/node/.dapr/logs
ℹ️ Validating config and starting app "pythonapp"
ℹ️ Deploying app "pythonapp" to Kubernetes
ℹ️ Deploying deployment YAML "/home/pyadmin/src/paulyuk/quickstarts/tutorials/hello-kubernetes/python/.dapr/deploy/deployment.yaml" to Kubernetes
ℹ️ Streaming logs for containers in pod "pythonapp-5cd765b8f4-zgqlm"
ℹ️ Writing log files to directory : /home/pyadmin/src/paulyuk/quickstarts/tutorials/hello-kubernetes/python/.dapr/logs
ℹ️ Starting to monitor Kubernetes pods for deletion.
== APP - pythonapp == HTTPConnectionPool(host='localhost', port=3500): Max retries exceeded with url: /neworder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e5c485f50>: Failed to establish a new connection: [Errno 111] Connection refused'))
== APP - pythonapp == HTTPConnectionPool(host='localhost', port=3500): Max retries exceeded with url: /neworder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e5c422150>: Failed to establish a new connection: [Errno 111] Connection refused'))
== APP - pythonapp == HTTPConnectionPool(host='localhost', port=3500): Max retries exceeded with url: /neworder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e5c4282d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
== APP - pythonapp == HTTPConnectionPool(host='localhost', port=3500): Max retries exceeded with url: /neworder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e5c4283d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
== APP - pythonapp == HTTPConnectionPool(host='localhost', port=3500): Max retries exceeded with url: /neworder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e5c4220d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
== APP - pythonapp == HTTPConnectionPool(host='localhost', port=3500): Max retries exceeded with url: /neworder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e5c415cd0>: Failed to establish a new connection: [Errno 111] Connection refused'))
== APP - pythonapp == HTTPConnectionPool(host='localhost', port=3500): Max retries exceeded with url: /neworder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e5c432450>: Failed to establish a new connection: [Errno 111] Connection refused'))
== APP - pythonapp == HTTPConnectionPool(host='localhost', port=3500): Max retries exceeded with url: /neworder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e5c4385d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
== APP - pythonapp == HTTP 500 => {"errorCode":"ERR_DIRECT_INVOKE","message":"failed to invoke, id: nodeapp, err: failed to invoke target nodeapp after 3 retries. Error: rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.244.0.15:50002
... Key error is the one showing nodeapp isn't available and hence calls to it fail from pythonapp == APP - pythonapp == HTTP 500 => {"errorCode":"ERR_DIRECT_INVOKE","message":"failed to invoke, id: nodeapp, err: failed to resolve address for 'nodeapp-dapr.default.svc.cluster.local': lookup nodeapp-dapr.default.svc.cluster.local on 10.96.0.10:53: no such host"}
after about 32 seconds (retries) the e2e works locally on kind == APP - nodeapp == Got a new order! Order ID: 32
== APP - nodeapp == Successfully persisted state for Order ID: 32
== APP - nodeapp == Got a new order! Order ID: 33
== APP - nodeapp == Successfully persisted state for Order ID: 33
== APP - nodeapp == Got a new order! Order ID: 34 |
Note @greenie-msft and I also tried to put Dapr Resiliency in place, and Orders still got skipped until the NodeApp was ready! We added apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
name: myresiliency
# similar to subscription and configuration specs, scopes lists the Dapr App IDs that this
# resiliency spec can be used by.
spec:
# policies is where timeouts, retries and circuit breaker policies are defined.
# each is given a name so they can be referred to from the targets section in the resiliency spec.
policies:
# retries are named templates for retry configurations and are instantiated for life of the operation.
retries:
retryInvokeForever:
policy: constant
maxInterval: 5s
maxRetries: -1 # retry indefinitely
# targets are what named policies are applied to. Dapr supports 3 target types - apps, components and actors
targets:
apps:
nodeapp:
retry: retryInvokeForever and modified version: 1
apps:
- appDirPath: ./node
appID: nodeapp
appPort: 3000
containerImage: ghcr.io/dapr/samples/hello-k8s-node:latest
createService: true
- appDirPath: ./python
appID: pythonapp
containerImage: ghcr.io/dapr/samples/hello-k8s-python:latest
common: # optional section for variables shared across apps
resourcesPath: ./resources # any dapr resources to be shared across apps
This is almost as if this type of exception ( |
The issue here is that the app can't reach it's Dapr instance because the latter isn't up yet. This means the resiliency policy is ineffective as the requests never hit Dapr to begin with. This is in reference to all errors that contain connection refused |
I'm investigating |
Thank you. It put my brain in a loop thinking about how your app's dapr sidecar can check on the other app's sidecar, if it's not up, or you dont know the healthz endpoint you're trying to hit for remote app. |
Thanks for the explanation, Yaron. So the retries we're seeing in the logs are coming from python http client? |
Yes |
@yaron2 @msfussell - per our chat the tactical solution in 1.14 for this is will revert back to single I will take ownership of the 1.14 scoped fix so we get tests passing again. PR on the way. |
Hey - I am hitting this issue now, and it is blocking tests with --dev init, but it will not occur locally. It's only in the GH action runner. |
I'm filing a specific bug on the remaining failing issue with Redis crashloop: |
fixed by #1057 and dapr/cli#1437 |
Expected Behavior
Tutorials pass
Tutorials are technically unsupported parts of the Quickstart branch, but I don't like seeing regressions creep in like this and treat it as a P1, non ship blocker, non PR blocker that we should investigate. It likely points to a real product issue.
Actual Behavior
Tutorials Fail
Here is a good example where the dapr app client tries to call the target dapr app's /neworder api and it fails because it's not yet available, likely because it hasn't started yet. I can reproduce timing issues like this on my local KinD deployment too.
Failure example
Steps to Reproduce the Problem
Use the link to Action above.
But more importantly, it repros when you do a multi app run of the app on your local machine, e.g. using
dapr run -k -f .
@msfussell
@yaron2
The text was updated successfully, but these errors were encountered: