Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving sporadic InvalidProviderToken errors from Apple #109

Open
thisisjeffwong opened this issue Nov 3, 2021 · 7 comments
Open

Receiving sporadic InvalidProviderToken errors from Apple #109

thisisjeffwong opened this issue Nov 3, 2021 · 7 comments

Comments

@thisisjeffwong
Copy link

We have multiple APN servers sending APNs to Apple. We occasionally get a sporadic outbreak of InvalidProviderToken errors from Apple that lasts for less than an hour and recovers itself. We aren't doing anything at the application level in response to the 403 errors other than reporting the errors.

Has anyone else experienced this?

I thought that maybe one of the servers on Apple's APN cluster might have a clock skew but that would theoretically affect all of our servers equally.

@benubois
Copy link
Collaborator

benubois commented Nov 3, 2021

Hi @thisisjeffwong,

This does sound like a clock issue. I've made some changes to insulate apnotic from system time. Would you be up for trying out the monotonic branch to see if that helps?

Out of curiosity, what do you see in production when running Time.now?

Thanks!

@thisisjeffwong
Copy link
Author

I could try it on my dev APN server and see if it still works. Have you had a chance to test this code against any sends to APNS?

For testing in production, would you be looking for evidence that this fixes the problem, or evidence that this fix does no harm?

We only get this bug every few weeks but we've gone as long as 4 months without a problem. If it's a relatively safe fix and others approve, we could incorporate it and just reopen this issue if we see it again.

@thisisjeffwong
Copy link
Author

@benubois For Time.now on Rails Console, I see 2021-11-03 10:55:28.748582635 -0700, which looks normal.

The time drift is a pretty good explanation for why this is only happening to one server. However, the failures seen are interspersed with successes. Could the time used for the token only be off with respect to certain servers on Apple's cluster? Or is apnotic regenerating the token once a failure is detected.

@benubois
Copy link
Collaborator

benubois commented Nov 3, 2021

I've done some more testing and now I'm not so sure this is time related.

An expired token actually results in a 403 ExpiredProviderToken error.

However an invalid team_id, or key_id results in 403 InvalidProviderToken.

Try logging the token when you get a 403 error and make sure it includes all the required parts.

Have you had a chance to test this code against any sends to APNS?

Yes, it works. I'm running the branch in production.

For testing in production, would you be looking for evidence that this fixes the problem, or evidence that this fix does no harm?

Evidence that it fixes the problem.

@thisisjeffwong
Copy link
Author

A teammate was surprised that monotonic time would fix an issue of time discrepancy between servers since servers don't share a monotonic clock.

I referred to this explanation: https://blog.dnsimple.com/2018/03/elapsed-time-with-ruby-the-right-way/

Is this close to the reasoning that motivated your change?

@benubois
Copy link
Collaborator

benubois commented Nov 5, 2021

Yes, the monotonic change is just about preventing time related issues only on your server. I don't think this is about a time discrepancy between servers.

What's the exact length of time that the error persists?

@thisisjeffwong
Copy link
Author

It lasted only 25 minutes before disappearing. Only a tiny fraction of that server's sends to APNS errored out during that time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants