Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show poll-failure details #868

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions assets/l10n/app_en.arb
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,23 @@
"message": {"type": "String", "example": "Invalid format"}
}
},
"errorConnectingToServerShort": "Error connecting to Zulip. Retrying…",
"@errorConnectingToServerShort": {
"description": "Short error message for a generic unknown error connecting to the server."
},
"errorConnectingToServerDetails": "Error connecting to Zulip at {serverUrl}. Will retry:\n\n{error}",
"@errorConnectingToServerDetails": {
"description": "Dialog error message for a generic unknown error connecting to the server with details.",
"placeholders": {
"serverUrl": {"type": "String", "example": "http://example.com/"},
"error": {"type": "String", "example": "Invalid format"}
}
},
"errorReconnectingToServer": "Reconnecting to {serverUrl}…",
"@errorReconnectingToServer": {
"serverUrl": {"type": "String", "example": "http://example.com/"},
"description": "Message when reconnecting to the server."
},
"errorSharingFailed": "Sharing failed",
"@errorSharingFailed": {
"description": "Error message when sharing a message failed."
Expand Down
58 changes: 55 additions & 3 deletions lib/model/store.dart
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import '../notifications/receive.dart';
import 'autocomplete.dart';
import 'database.dart';
import 'emoji.dart';
import 'localizations.dart';
import 'message.dart';
import 'message_list.dart';
import 'recent_dm_conversations.dart';
Expand Down Expand Up @@ -793,8 +794,25 @@ class UpdateMachine {
}());
}

/// This controls when we start to report transient errors to the user when
/// polling.
///
/// At the 6th failure, the expected time elapsed since the first failure
/// will be 1.55 seocnds.
static const transientFailureCountNotifyThreshold = 5;

void poll() async {
BackoffMachine? backoffMachine;
int accumulatedTransientFailureCount = 0;

/// This only reports transient errors after reaching
/// a pre-defined threshold of retries.
void maybeReportTransientError(String? message, {String? details}) {
accumulatedTransientFailureCount++;
if (accumulatedTransientFailureCount > transientFailureCountNotifyThreshold) {
reportErrorToUserBriefly(message, details: details);
}
}

while (true) {
if (_debugLoopSignal != null) {
Expand All @@ -811,26 +829,57 @@ class UpdateMachine {
queueId: queueId, lastEventId: lastEventId);
} catch (e) {
store.isLoading = true;
final localizations = GlobalLocalizations.zulipLocalizations;
final serverUrl = store.connection.realmUrl.origin;
switch (e) {
case ZulipApiException(code: 'BAD_EVENT_QUEUE_ID'):
assert(debugLog('Lost event queue for $store. Replacing…'));
reportErrorToUserBriefly(localizations.errorReconnectingToServer(serverUrl));
await store._globalStore._reloadPerAccount(store.accountId);
dispose();
debugLog('… Event queue replaced.');
return;

case Server5xxException() || NetworkException():
case Server5xxException():
assert(debugLog('Transient error polling event queue for $store: $e\n'
'Backing off, then will retry…'));
maybeReportTransientError(
localizations.errorConnectingToServerShort,
details: localizations.errorConnectingToServerDetails(
serverUrl, e.toString()));
await (backoffMachine ??= BackoffMachine()).wait();
assert(debugLog('… Backoff wait complete, retrying poll.'));
continue;

case NetworkException():
assert(debugLog('Transient error polling event queue for $store: $e\n'
'Backing off, then will retry…'));
// TODO tell user if transient polling errors persist
if (e.cause is! SocketException) {
// Heuristic check to only report interesting errors to the user.
// This currently ignores [SocketException], which typically
// occurs when the client goes back from sleep.
// TODO: Investigate if there are other cases of [SocketException]
// that might turn out to be interesting.
//
// See also:
// * [NetworkException.cause], which is the underlying exception.
Comment on lines +857 to +865
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (e.cause is! SocketException) {
// Heuristic check to only report interesting errors to the user.
// This currently ignores [SocketException], which typically
// occurs when the client goes back from sleep.
// TODO: Investigate if there are other cases of [SocketException]
// that might turn out to be interesting.
//
// See also:
// * [NetworkException.cause], which is the underlying exception.
if (e.cause is! SocketException) {
// Heuristic check to only report interesting errors to the user.
// A [SocketException] is common when the app returns from sleep.

I think the TODO isn't actionable.

The "See also" isn't needed — .cause is just above, and the reader can follow that to its definition using an IDE.

Then while at it I tightened the sentence about SocketException. No need to say what the code does, as the code here speaks for itself; only to explain why we chose to make the code do that.

maybeReportTransientError(
localizations.errorConnectingToServerShort,
details: localizations.errorConnectingToServerDetails(
serverUrl, e.toString()));
}
await (backoffMachine ??= BackoffMachine()).wait();
assert(debugLog('… Backoff wait complete, retrying poll.'));
continue;

default:
assert(debugLog('Error polling event queue for $store: $e\n'
'Backing off and retrying even though may be hopeless…'));
// TODO tell user on non-transient error in polling
// TODO(#186): Handle unrecoverable failures
reportErrorToUserBriefly(
localizations.errorConnectingToServerShort,
details: localizations.errorConnectingToServerDetails(
serverUrl, e.toString()));
await (backoffMachine ??= BackoffMachine()).wait();
assert(debugLog('… Backoff wait complete, retrying poll.'));
continue;
Expand All @@ -854,6 +903,9 @@ class UpdateMachine {
// and failures, the successes themselves should space out the requests.
backoffMachine = null;
store.isLoading = false;
// Dismiss existing errors, if any.
reportErrorToUserBriefly(null);
accumulatedTransientFailureCount = 0;

final events = result.events;
for (final event in events) {
Expand Down
165 changes: 123 additions & 42 deletions test/model/store_test.dart
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import 'dart:async';
import 'dart:io';

import 'package:checks/checks.dart';
import 'package:flutter/foundation.dart';
Expand All @@ -10,6 +11,7 @@ import 'package:zulip/api/route/events.dart';
import 'package:zulip/api/route/messages.dart';
import 'package:zulip/model/message_list.dart';
import 'package:zulip/model/narrow.dart';
import 'package:zulip/log.dart';
import 'package:zulip/model/store.dart';
import 'package:zulip/notifications/receive.dart';

Expand Down Expand Up @@ -393,7 +395,26 @@ void main() {
check(store.userSettings!.twentyFourHourTime).isTrue();
}));

String? lastReportedError;
String? takeLastReportedError() {
final result = lastReportedError;
lastReportedError = null;
return result;
}

/// This is an alternative to [ZulipApp]'s implementation of
/// [reportErrorToUserBriefly] for testing.
Future<void> logAndReportErrorToUserBriefly(String? message, {
String? details,
}) async {
if (message == null) return;
lastReportedError = '$message\n$details';
}

test('handles expired queue', () => awaitFakeAsync((async) async {
reportErrorToUserBriefly = logAndReportErrorToUserBriefly;
addTearDown(() => reportErrorToUserBriefly = defaultReportErrorToUserBriefly);

await prepareStore();
updateMachine.debugPauseLoop();
updateMachine.poll();
Expand All @@ -407,7 +428,10 @@ void main() {
});
updateMachine.debugAdvanceLoop();
async.flushMicrotasks();
check(lastReportedError).isNull();
await Future<void>.delayed(Duration.zero);
check(takeLastReportedError()).isNotNull()
.contains('Reconnecting to ${eg.realmUrl.origin}…');
check(store).isLoading.isTrue();

// The global store has a new store.
Expand Down Expand Up @@ -455,53 +479,110 @@ void main() {
check(store.debugMessageListViews).isEmpty();
}));

void checkRetry(void Function() prepareError) {
awaitFakeAsync((async) async {
await prepareStore(lastEventId: 1);
updateMachine.debugPauseLoop();
updateMachine.poll();
check(async.pendingTimers).length.equals(0);

// Make the request, inducing an error in it.
prepareError();
updateMachine.debugAdvanceLoop();
async.elapse(Duration.zero);
checkLastRequest(lastEventId: 1);
check(store).isLoading.isTrue();

// Polling doesn't resume immediately; there's a timer.
check(async.pendingTimers).length.equals(1);
updateMachine.debugAdvanceLoop();
async.flushMicrotasks();
check(connection.lastRequest).isNull();
check(async.pendingTimers).length.equals(1);

// Polling continues after a timer.
connection.prepare(json: GetEventsResult(events: [
HeartbeatEvent(id: 2),
], queueId: null).toJson());
async.flushTimers();
checkLastRequest(lastEventId: 1);
check(updateMachine.lastEventId).equals(2);
check(store).isLoading.isFalse();
group('retries on errors', () {
/// Check if [UpdateMachine.poll] retries as expected when there are
/// errors.
///
/// By default, also verify that the first user-facing error message
/// appears after the `numFailedRequests`'th failed request.
///
/// If `expectNotifyError` is false, instead verify that there is no
/// user-facing error message regardless of the retries.
void checkRetry(void Function() prepareError, {
required int numFailedRequests,
bool expectNotifyError = true,
}) {
assert(numFailedRequests > 0);
reportErrorToUserBriefly = logAndReportErrorToUserBriefly;
addTearDown(() => reportErrorToUserBriefly = defaultReportErrorToUserBriefly);

final expectedErrorMessage =
'Error connecting to Zulip. Retrying…\n'
'Error connecting to Zulip at ${eg.realmUrl.origin}. Will retry';

awaitFakeAsync((async) async {
await prepareStore(lastEventId: 1);
updateMachine.debugPauseLoop();
updateMachine.poll();
check(async.pendingTimers).length.equals(0);

for (int i = 0; i < numFailedRequests; i++) {
// Make the request, inducing an error in it.
prepareError();
updateMachine.debugAdvanceLoop();
async.elapse(Duration.zero);
checkLastRequest(lastEventId: 1);
check(store).isLoading.isTrue();

if (i != numFailedRequests - 1) {
// Skip polling backoff unless this is the final iteration.
// This allows the next `updateMachine.debugAdvanceLoop` call to
// trigger the next request without wait.
async.flushTimers();
}

if (!expectNotifyError) {
// No matter how many retries there have been, no request should
// result in an error message.
check(takeLastReportedError()).isNull();
continue;
}
if (i < numFailedRequests - 1) {
// The error message should not appear until the `updateMachine`
// has retried the given number of times.
check(takeLastReportedError()).isNull();
continue;
}
Comment on lines +517 to +535
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this test (or set of tests) gets fairly tangly to follow.

I think the main thing is that it'd be helpful to test two aspects separately: there's the existing test cases that just check that we retry, and let's have separate new test cases that just check the error-reporting.

Then I think both sets of tests can be a lot simpler than this — most of the logic is only relevant for one aspect or the other. It's OK that some other fragments of code will be duplicated between them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately and more tactically, I think it'd help to pull the last iteration here outside of the loop — have the loop only concern itself with the first N-1 requests. That would avoid these multiple conditionals about whether it's the last iteration.

assert(expectNotifyError);
assert(i == numFailedRequests - 1);
check(takeLastReportedError()).isNotNull().contains(expectedErrorMessage);
}

// Polling doesn't resume immediately; there's a timer.
check(async.pendingTimers).length.equals(1);
updateMachine.debugAdvanceLoop();
async.flushMicrotasks();
check(connection.lastRequest).isNull();
check(async.pendingTimers).length.equals(1);

// Polling continues after a timer.
connection.prepare(json: GetEventsResult(events: [
HeartbeatEvent(id: 2),
], queueId: null).toJson());
async.flushTimers();
checkLastRequest(lastEventId: 1);
check(updateMachine.lastEventId).equals(2);
check(store).isLoading.isFalse();
});
}

test('Server5xxException', () {
checkRetry(() => connection.prepare(httpStatus: 500, body: 'splat'),
numFailedRequests: UpdateMachine.transientFailureCountNotifyThreshold + 1);
});
}

test('retries on Server5xxException', () {
checkRetry(() => connection.prepare(httpStatus: 500, body: 'splat'));
});
test('NetworkException', () {
checkRetry(() => connection.prepare(exception: Exception("failed")),
numFailedRequests: UpdateMachine.transientFailureCountNotifyThreshold + 1);
});

test('retries on NetworkException', () {
checkRetry(() => connection.prepare(exception: Exception("failed")));
});
test('NetworkException to be ignored', () {
checkRetry(() => connection.prepare(
exception: const SocketException("failed")),
numFailedRequests: UpdateMachine.transientFailureCountNotifyThreshold + 1,
expectNotifyError: false);
});

test('retries on ZulipApiException', () {
checkRetry(() => connection.prepare(httpStatus: 400, json: {
'result': 'error', 'code': 'BAD_REQUEST', 'msg': 'Bad request'}));
});
test('ZulipApiException', () {
checkRetry(() => connection.prepare(httpStatus: 400, json: {
'result': 'error', 'code': 'BAD_REQUEST', 'msg': 'Bad request'}),
numFailedRequests: 1);
});

test('retries on MalformedServerResponseException', () {
checkRetry(() => connection.prepare(httpStatus: 200, body: 'nonsense'));
test('MalformedServerResponseException', () {
checkRetry(() => connection.prepare(httpStatus: 200, body: 'nonsense'),
numFailedRequests: 1);
});
});
});

Expand Down