From 229fcbed21d0372150beff12461259d4532a4662 Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Fri, 31 May 2024 12:53:33 -0500 Subject: [PATCH 01/13] [FEAT] document -ERR protocol messages --- README.md | 2 ++ adr/ADR-42.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) create mode 100644 adr/ADR-42.md diff --git a/README.md b/README.md index d8c5b80..e5fce9d 100644 --- a/README.md +++ b/README.md @@ -35,6 +35,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc |[ADR-36](adr/ADR-36.md)|jetstream, client, server|Subject Mapping Transforms in Streams| |[ADR-37](adr/ADR-37.md)|jetstream, client, spec|JetStream Simplification| |[ADR-40](adr/ADR-40.md)|client, server, spec|NATS Connection| +|[ADR-42](adr/ADR-42.md)|client, server|NATS Client LifeCycle (-ERR Protocol Error Handling)| ## Jetstream @@ -113,6 +114,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc |[ADR-39](adr/ADR-39.md)|server, security|Certificate Store| |[ADR-40](adr/ADR-40.md)|client, server, spec|NATS Connection| |[ADR-41](adr/ADR-41.md)|observability, server|NATS Message Path Tracing| +|[ADR-42](adr/ADR-42.md)|client, server|NATS Client LifeCycle (-ERR Protocol Error Handling)| ## Spec diff --git a/adr/ADR-42.md b/adr/ADR-42.md new file mode 100644 index 0000000..3abbac8 --- /dev/null +++ b/adr/ADR-42.md @@ -0,0 +1,85 @@ +# NATS Client LifeCycle (-ERR Protocol Error Handling) + +| Metadata | Value | +| -------- | -------------- | +| Date | 2024-05-31 | +| Author | @aricart | +| Status | Implemented | +| Tags | client, server | + +## Context and Problem Statement + +Client lifecycle such as connect/reconnect/liveliness/LDM behaviours are fairly +complex in a NATS client. This ADR simply documents `-ERR` protocol messages +that are sent to a client. + +The `-ERR` protocol message is an important signal for clients about things that +are incorrect from the perspective of Permissions or Authorization. + +## Errors + +### Permission Violation + +`Permission Violation` means that the client tried to publish or subscribe on a +subject that it has no permissions. This type of error can happen or surface at +any time, as changes to permissions intentionally or not can happen. + +The message will include `/(Publish|Subscription) to (\S+)/` this will indicate +whether the error is related to a publish or subscirption. A second level parse +for `/using queue "(\S+)"/` will yield the queue if any. + +The server unfortunately doesn't make it easy for the client to know the actual +subscription (SID) hosting the error but the logic is simple: notify the first +one that matches the subject and queue (this assumes you track the subject and +queue name in your internal subscription representation) name - the server will +send multiple protocol errors (one per offense) so if multiple subscriptions, +you will be able to notify all of them. + +For subscriptions, errors are _terminal_, as the server has cancelled the +subscription for the client. It is very convenient for client code to receive an +error using some mechanism associated with their subscription as this will +simplify the handling by not needing to hardcode subjects/etc in an async error +handler. + +It is also useful to have some sort of Promise/Future etc that will get notified +when a subscription closes (will not yield any more messages) - The +Promise/Future can resolve to an error or void (not thrown) which the client can +inspect for the reason if any why the subscription closed. Client can then use +this information to perform their own error handling which may require taking +the service offline. + +For publish permission errors, it's hard to notify the client at the point of +failure unless the client is synchronous. But the standard async error +notification should be sufficient. In the case of request reply, since there's a +subscription handling the response, this means that you can search subscriptions +related to request and reply subjects, and notify them via the response +mechanism for the request. + +Note that regardless of a localized error handling, you should also notify the +async error handler (you don't know exactly how they are looking for errors). + +## Authorization Violation + +`Authorization Violation` is sent whenever the credentials for a client are not +accepted. This is followed by a server initiated disconnect. + +## User Authentication Expired + +`User Authentication Expired` protocol error happens whenever credentials for +the client expire while the client is connected to the server. It is followed by +a server disconnect. This error should be notified in the async handler. On +reconnect the client is going to be rejected with `Authorization Violation`. + +## Account Expiration + +`Account Authentication Expired` is sent whenever the account JWT expires and a +client for the account is connected. This will result in a disconnect initiated +by the server. On reconnect the client will be rejected with +`Authorization Violation` until the account configuration is refreshed on the +server. + +## Secure Connection - TLS Required + +`Secure Connection - TLS Required` is sent if the client is trying to connect on +a server that requires TLS. The client should have done extensive ServerInfo +investigation and determined that this would have been a failure? From 4e7e14b978ae099a83b3a429c444c5593b77405e Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Fri, 31 May 2024 13:29:36 -0500 Subject: [PATCH 02/13] clarifications --- adr/ADR-42.md | 75 ++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 33 deletions(-) diff --git a/adr/ADR-42.md b/adr/ADR-42.md index 3abbac8..b7b6268 100644 --- a/adr/ADR-42.md +++ b/adr/ADR-42.md @@ -9,9 +9,9 @@ ## Context and Problem Statement -Client lifecycle such as connect/reconnect/liveliness/LDM behaviours are fairly -complex in a NATS client. This ADR simply documents `-ERR` protocol messages -that are sent to a client. +Client lifecycle such as connect/reconnect/liveliness (ping/pong)/LDM behaviours +are fairly complex in a NATS client. This ADR simply documents `-ERR` protocol +messages that are sent to a client. The `-ERR` protocol message is an important signal for clients about things that are incorrect from the perspective of Permissions or Authorization. @@ -21,54 +21,63 @@ are incorrect from the perspective of Permissions or Authorization. ### Permission Violation `Permission Violation` means that the client tried to publish or subscribe on a -subject that it has no permissions. This type of error can happen or surface at -any time, as changes to permissions intentionally or not can happen. +subject for which it has no permissions. This type of error can happen or +surface at any time, as changes to permissions intentionally or not can happen. The message will include `/(Publish|Subscription) to (\S+)/` this will indicate -whether the error is related to a publish or subscirption. A second level parse -for `/using queue "(\S+)"/` will yield the queue if any. - -The server unfortunately doesn't make it easy for the client to know the actual -subscription (SID) hosting the error but the logic is simple: notify the first -one that matches the subject and queue (this assumes you track the subject and -queue name in your internal subscription representation) name - the server will -send multiple protocol errors (one per offense) so if multiple subscriptions, -you will be able to notify all of them. - -For subscriptions, errors are _terminal_, as the server has cancelled the -subscription for the client. It is very convenient for client code to receive an -error using some mechanism associated with their subscription as this will -simplify the handling by not needing to hardcode subjects/etc in an async error -handler. - -It is also useful to have some sort of Promise/Future etc that will get notified -when a subscription closes (will not yield any more messages) - The -Promise/Future can resolve to an error or void (not thrown) which the client can -inspect for the reason if any why the subscription closed. Client can then use -this information to perform their own error handling which may require taking -the service offline. +whether the error is related to a publish or subscription operation. For publish permission errors, it's hard to notify the client at the point of failure unless the client is synchronous. But the standard async error notification should be sufficient. In the case of request reply, since there's a subscription handling the response, this means that you can search subscriptions related to request and reply subjects, and notify them via the response -mechanism for the request. +mechanism for the request depending on the type of operation that was rejected. + +For subscription errors, a second level parse for `/using queue "(\S+)"/` will +yield the queue if any that was used during the subscribe. This means that a +client may have permissions on a subscription, but not in a specific queue or +some other permutation. + +The server unfortunately doesn't make it easy for the client to know the actual +subscription (SID) hosting the error but the logic for processing is simple: +notify the first subscription that matches the subject and queue name (this +assumes you track the subject and queue name in your internal subscription +representation) - the server will send multiple error protocol messages (one per +offense) so if multiple subscriptions, you will be able to notify all of them. + +For subscriptions, errors are _terminal_, as the server cancels the clients +interest. It is very convenient for client user code to receive an error using +some mechanism associated with the subscription in question as this will +simplify the handling of the client code. + +It is also useful to have some sort of Promise/Future/etc that will get resolved +when a subscription closes (will not yield any more messages) - The +Promise/Future can resolve to an error or void (not thrown) which the client can +inspect for the reason if any why the subscription closed. Throwing an error is +discouraged, as this would create a possibility of crashing the client. Clients +can then use this information to perform their own error handling which may +require taking the service offline if the subscription is vital for its +operation. -Note that regardless of a localized error handling, you should also notify the -async error handler (you don't know exactly how they are looking for errors). +Note that regardless of a localized error handling mechanism, you should also +notify the async error handler as you don't know exactly where they are looking +for errors. ## Authorization Violation `Authorization Violation` is sent whenever the credentials for a client are not -accepted. This is followed by a server initiated disconnect. +accepted. This is followed by a server initiated disconnect. Clients will +normally reconnect (depending on their connection options). If the client +closes, this should be reported as the last error. ## User Authentication Expired `User Authentication Expired` protocol error happens whenever credentials for the client expire while the client is connected to the server. It is followed by a server disconnect. This error should be notified in the async handler. On -reconnect the client is going to be rejected with `Authorization Violation`. +reconnect the client is going to be rejected with `Authorization Violation` and +follow its reconnect logic. ## Account Expiration @@ -76,7 +85,7 @@ reconnect the client is going to be rejected with `Authorization Violation`. client for the account is connected. This will result in a disconnect initiated by the server. On reconnect the client will be rejected with `Authorization Violation` until the account configuration is refreshed on the -server. +server. The client will follow its reconnect logic. ## Secure Connection - TLS Required From cba4206dd6c81c014c27306efc17bba7d2a635d5 Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Fri, 31 May 2024 13:46:08 -0500 Subject: [PATCH 03/13] clarifications --- adr/ADR-42.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/adr/ADR-42.md b/adr/ADR-42.md index b7b6268..cf73a02 100644 --- a/adr/ADR-42.md +++ b/adr/ADR-42.md @@ -61,8 +61,8 @@ require taking the service offline if the subscription is vital for its operation. Note that regardless of a localized error handling mechanism, you should also -notify the async error handler as you don't know exactly where they are looking -for errors. +notify the async error handler as you don't know exactly where the client code +is looking for errors. ## Authorization Violation From 6a0790137b18d67dbdbb9ea5661be20ba3935957 Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Tue, 4 Jun 2024 08:34:11 -0500 Subject: [PATCH 04/13] added other errors --- adr/ADR-42.md | 97 +++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 87 insertions(+), 10 deletions(-) diff --git a/adr/ADR-42.md b/adr/ADR-42.md index cf73a02..6216da9 100644 --- a/adr/ADR-42.md +++ b/adr/ADR-42.md @@ -23,9 +23,13 @@ are incorrect from the perspective of Permissions or Authorization. `Permission Violation` means that the client tried to publish or subscribe on a subject for which it has no permissions. This type of error can happen or surface at any time, as changes to permissions intentionally or not can happen. +This means that even if the subscription has been working, it is possible that +it will not in the future if the permissions are altered. The message will include `/(Publish|Subscription) to (\S+)/` this will indicate -whether the error is related to a publish or subscription operation. +whether the error is related to a publish or subscription operation. Note that +you should be careful in how you write your matchers as the message could change +slightly or sport additional information (as you'll see below). For publish permission errors, it's hard to notify the client at the point of failure unless the client is synchronous. But the standard async error @@ -35,9 +39,9 @@ related to request and reply subjects, and notify them via the response mechanism for the request depending on the type of operation that was rejected. For subscription errors, a second level parse for `/using queue "(\S+)"/` will -yield the queue if any that was used during the subscribe. This means that a -client may have permissions on a subscription, but not in a specific queue or -some other permutation. +yield the `queue` if any that was used during the subscribe operation. This +means that a client may have permissions on a subscription, but not in a +specific queue or some other permutation of the subject/queue. The server unfortunately doesn't make it easy for the client to know the actual subscription (SID) hosting the error but the logic for processing is simple: @@ -46,10 +50,11 @@ assumes you track the subject and queue name in your internal subscription representation) - the server will send multiple error protocol messages (one per offense) so if multiple subscriptions, you will be able to notify all of them. -For subscriptions, errors are _terminal_, as the server cancels the clients -interest. It is very convenient for client user code to receive an error using -some mechanism associated with the subscription in question as this will -simplify the handling of the client code. +For subscriptions, errors are _terminal_ for the subscription, as the server +cancels the clients interest. so the client will never get any messages on it. +It is very convenient for client user code to receive an error using some +mechanism associated with the subscription in question as this will simplify the +handling of the client code. It is also useful to have some sort of Promise/Future/etc that will get resolved when a subscription closes (will not yield any more messages) - The @@ -90,5 +95,77 @@ server. The client will follow its reconnect logic. ## Secure Connection - TLS Required `Secure Connection - TLS Required` is sent if the client is trying to connect on -a server that requires TLS. The client should have done extensive ServerInfo -investigation and determined that this would have been a failure? +a server that requires TLS. + +_????????_ The client should have done extensive ServerInfo investigation and +determined that this would have been a failure + +## Maximum Number of Connections + +`maximum connections exceeded` server limit on number of connections reached. +Server will send to the client the `-ERR maximum connections exceeded`, client +possibly go in reconnect loop. + +_????????_ The server can also send +`Connection throttling is active. Please try again later.` when too many TLS +connections are in progress. This should be treated as +`maximum connections exceeded` or reworked on the server to send this error +instead. + +## Max Payload Violation + +`Maximum Payload Violation` is sent to the client if it attempts to publish more +data than it is allowed by `max_payload`. The server will disconnect the client +after sending the protocol error. Note that clients should test payload sizes +and fail publishes that exceed the server configuration, as this allow the error +to be localized when possible to the user code that caused the error. + +## User Authentication Revoked + +`User Authentication Revoked` this is reported when an account is updated and +the user is revoked in the account. On connects where the user is already +revoked, it is just an `Authorization Error`. On actual experimentation, the +client never saw `User Authentication Revoked`, and instead was just +disconnected. Reconnect was greeted with a `Authorization Error`. + +## Invalid Client Protocol + +`invalid client protocol` sent to the client if the protocol version from the +client doesn't match. Client is disconnected when this error is sent. + +_????????_ Currently, this is not a concern since presumably, a server will be +able to deal with protocol version 1 when protocol upgrades. + +## No Responders Requires Headers + +`no responders requires headers support` sent if the client requests no +responder, but rejects headers. Client is disconnected when this error is sent. +Current clients hardcode `headers: true`, so this error shouldn't be seen by +clients. + +_????????_ `headers` connect option shouldn't be exposed by the clients - this +is a holdover from when clients opted in to `headers`. + +## Failed Account Registration + +`Failed Account Registration` an internal error while registering an account. +(Looking for reproducible test). + +## Invalid Publish Subject + +`Invalid Publish Subject` (this requires the server in pedantic mode). Client is +not disconnected when this error is sent. Note that for subscribe operations, +depending on the separator (space) you may inadvertently specify a queue. In +such cases there will be no error, your subscription will simply be part of a +queue. If multiple spaces or some other variant, the server will treat it as a +protocol error. + +## Unknown Protocol Operation + +`Unknown Protocol Operation` this error is sent if the server doesn't understand +a command. This is followed by a disconnect. + +## Other Errors (not necessarily seen by the client) + +- `maximum account active connections exceeded` not notified to the client, the + client connecting will be disconnected (seen as a connection refused.) From 4705b1821ee91e28512a7b9c7274cfa96b8f1ebb Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Tue, 4 Jun 2024 08:36:16 -0500 Subject: [PATCH 05/13] fmt --- adr/ADR-42.md | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/adr/ADR-42.md b/adr/ADR-42.md index 6216da9..93ab8a2 100644 --- a/adr/ADR-42.md +++ b/adr/ADR-42.md @@ -18,6 +18,7 @@ are incorrect from the perspective of Permissions or Authorization. ## Errors + ### Permission Violation `Permission Violation` means that the client tried to publish or subscribe on a @@ -69,14 +70,14 @@ Note that regardless of a localized error handling mechanism, you should also notify the async error handler as you don't know exactly where the client code is looking for errors. -## Authorization Violation +### Authorization Violation `Authorization Violation` is sent whenever the credentials for a client are not accepted. This is followed by a server initiated disconnect. Clients will normally reconnect (depending on their connection options). If the client closes, this should be reported as the last error. -## User Authentication Expired +### User Authentication Expired `User Authentication Expired` protocol error happens whenever credentials for the client expire while the client is connected to the server. It is followed by @@ -84,7 +85,7 @@ a server disconnect. This error should be notified in the async handler. On reconnect the client is going to be rejected with `Authorization Violation` and follow its reconnect logic. -## Account Expiration +### Account Expiration `Account Authentication Expired` is sent whenever the account JWT expires and a client for the account is connected. This will result in a disconnect initiated @@ -92,7 +93,7 @@ by the server. On reconnect the client will be rejected with `Authorization Violation` until the account configuration is refreshed on the server. The client will follow its reconnect logic. -## Secure Connection - TLS Required +### Secure Connection - TLS Required `Secure Connection - TLS Required` is sent if the client is trying to connect on a server that requires TLS. @@ -100,7 +101,7 @@ a server that requires TLS. _????????_ The client should have done extensive ServerInfo investigation and determined that this would have been a failure -## Maximum Number of Connections +### Maximum Number of Connections `maximum connections exceeded` server limit on number of connections reached. Server will send to the client the `-ERR maximum connections exceeded`, client @@ -112,7 +113,7 @@ connections are in progress. This should be treated as `maximum connections exceeded` or reworked on the server to send this error instead. -## Max Payload Violation +### Max Payload Violation `Maximum Payload Violation` is sent to the client if it attempts to publish more data than it is allowed by `max_payload`. The server will disconnect the client @@ -120,7 +121,7 @@ after sending the protocol error. Note that clients should test payload sizes and fail publishes that exceed the server configuration, as this allow the error to be localized when possible to the user code that caused the error. -## User Authentication Revoked +### User Authentication Revoked `User Authentication Revoked` this is reported when an account is updated and the user is revoked in the account. On connects where the user is already @@ -128,7 +129,7 @@ revoked, it is just an `Authorization Error`. On actual experimentation, the client never saw `User Authentication Revoked`, and instead was just disconnected. Reconnect was greeted with a `Authorization Error`. -## Invalid Client Protocol +### Invalid Client Protocol `invalid client protocol` sent to the client if the protocol version from the client doesn't match. Client is disconnected when this error is sent. @@ -136,7 +137,7 @@ client doesn't match. Client is disconnected when this error is sent. _????????_ Currently, this is not a concern since presumably, a server will be able to deal with protocol version 1 when protocol upgrades. -## No Responders Requires Headers +### No Responders Requires Headers `no responders requires headers support` sent if the client requests no responder, but rejects headers. Client is disconnected when this error is sent. @@ -146,12 +147,12 @@ clients. _????????_ `headers` connect option shouldn't be exposed by the clients - this is a holdover from when clients opted in to `headers`. -## Failed Account Registration +### Failed Account Registration `Failed Account Registration` an internal error while registering an account. (Looking for reproducible test). -## Invalid Publish Subject +### Invalid Publish Subject `Invalid Publish Subject` (this requires the server in pedantic mode). Client is not disconnected when this error is sent. Note that for subscribe operations, @@ -160,12 +161,12 @@ such cases there will be no error, your subscription will simply be part of a queue. If multiple spaces or some other variant, the server will treat it as a protocol error. -## Unknown Protocol Operation +### Unknown Protocol Operation `Unknown Protocol Operation` this error is sent if the server doesn't understand a command. This is followed by a disconnect. -## Other Errors (not necessarily seen by the client) +### Other Errors (not necessarily seen by the client) - `maximum account active connections exceeded` not notified to the client, the client connecting will be disconnected (seen as a connection refused.) From cf82450c6715d4a079e6bd3634ebac0bdb0faac7 Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Tue, 11 Jun 2024 09:49:41 -0500 Subject: [PATCH 06/13] clarifications --- adr/ADR-42.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/adr/ADR-42.md b/adr/ADR-42.md index 93ab8a2..e301cfb 100644 --- a/adr/ADR-42.md +++ b/adr/ADR-42.md @@ -13,11 +13,22 @@ Client lifecycle such as connect/reconnect/liveliness (ping/pong)/LDM behaviours are fairly complex in a NATS client. This ADR simply documents `-ERR` protocol messages that are sent to a client. +## Errors + The `-ERR` protocol message is an important signal for clients about things that are incorrect from the perspective of Permissions or Authorization. -## Errors - +A note about implementation - the current format of the errors is simple, but +messages are not typed in a way that is simple for clients to understand what +should happen - in many cases the server will disconnect th client. In others it +is just a runtime error that an update in configuration at runtime may re-enable +the client to do what was rejected previously. However the client has no way to +know whether the server will disconnect it or not. + +In cases where the error is surfaced during connection it creates the nuance +that it is difficult for the client to know if the error is recoverable (simply +attempt to reconnect later) or not. Depending on the client implementation this +makes it difficult - in ### Permission Violation From 0c534e633fe83c22f640f0eef0effc9d0619998c Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Mon, 24 Jun 2024 08:53:33 -0500 Subject: [PATCH 07/13] fixed name of error to `Permissions Violation` --- adr/ADR-42.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/adr/ADR-42.md b/adr/ADR-42.md index e301cfb..56427d8 100644 --- a/adr/ADR-42.md +++ b/adr/ADR-42.md @@ -30,9 +30,9 @@ that it is difficult for the client to know if the error is recoverable (simply attempt to reconnect later) or not. Depending on the client implementation this makes it difficult - in -### Permission Violation +### Permissions Violation -`Permission Violation` means that the client tried to publish or subscribe on a +`Permissions Violation` means that the client tried to publish or subscribe on a subject for which it has no permissions. This type of error can happen or surface at any time, as changes to permissions intentionally or not can happen. This means that even if the subscription has been working, it is possible that From 19b2c5f17c4d49d79f5fc0dab3712d42ce0cbb7b Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Mon, 24 Jun 2024 09:03:56 -0500 Subject: [PATCH 08/13] merged protocol error handling with adr 40 (connection spec) --- README.md | 2 - adr/ADR-40.md | 175 +++++++++++++++++++++++++++++++++++++++++++++-- adr/ADR-42.md | 183 -------------------------------------------------- 3 files changed, 171 insertions(+), 189 deletions(-) delete mode 100644 adr/ADR-42.md diff --git a/README.md b/README.md index e5fce9d..d8c5b80 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,6 @@ This repository captures Architecture, Design Specifications and Feature Guidanc |[ADR-36](adr/ADR-36.md)|jetstream, client, server|Subject Mapping Transforms in Streams| |[ADR-37](adr/ADR-37.md)|jetstream, client, spec|JetStream Simplification| |[ADR-40](adr/ADR-40.md)|client, server, spec|NATS Connection| -|[ADR-42](adr/ADR-42.md)|client, server|NATS Client LifeCycle (-ERR Protocol Error Handling)| ## Jetstream @@ -114,7 +113,6 @@ This repository captures Architecture, Design Specifications and Feature Guidanc |[ADR-39](adr/ADR-39.md)|server, security|Certificate Store| |[ADR-40](adr/ADR-40.md)|client, server, spec|NATS Connection| |[ADR-41](adr/ADR-41.md)|observability, server|NATS Message Path Tracing| -|[ADR-42](adr/ADR-42.md)|client, server|NATS Client LifeCycle (-ERR Protocol Error Handling)| ## Spec diff --git a/adr/ADR-40.md b/adr/ADR-40.md index 6015fe9..2175907 100644 --- a/adr/ADR-40.md +++ b/adr/ADR-40.md @@ -7,9 +7,10 @@ | Status | Implemented | | Tags | client, server, spec | -|Revision|Date|Author|Info| - |--------|------------|---------|---------------| - |1 | 2023-10-12 | @Jarema | Initial draft | +| Revision | Date | Author | Info | +|----------|------------|----------|------------------------------| +| 1 | 2023-10-12 | @Jarema | Initial draft | +| 2 | 2024-6-24 | @aricart | Added protocol error section | ## Summary @@ -244,7 +245,173 @@ The default interval for PING is 2 minutes. ### Error Handling (TODO) -Server can respond with `Authorization Error`. +The `-ERR` protocol message is an important signal for clients about things that +are incorrect from the perspective of Permissions or Authorization. + +A note about implementation - the current format of the errors is simple, but +messages are not typed in a way that is simple for clients to understand what +should happen - in many cases the server will disconnect th client. In others it +is just a runtime error that an update in configuration at runtime may re-enable +the client to do what was rejected previously. However the client has no way to +know whether the server will disconnect it or not. + +In cases where the error is surfaced during connection it creates the nuance +that it is difficult for the client to know if the error is recoverable (simply +attempt to reconnect later) or not. Depending on the client implementation this +makes it difficult - in + +#### Permissions Violation + +`Permissions Violation` means that the client tried to publish or subscribe on a +subject for which it has no permissions. This type of error can happen or +surface at any time, as changes to permissions intentionally or not can happen. +This means that even if the subscription has been working, it is possible that +it will not in the future if the permissions are altered. + +The message will include `/(Publish|Subscription) to (\S+)/` this will indicate +whether the error is related to a publish or subscription operation. Note that +you should be careful in how you write your matchers as the message could change +slightly or sport additional information (as you'll see below). + +For publish permission errors, it's hard to notify the client at the point of +failure unless the client is synchronous. But the standard async error +notification should be sufficient. In the case of request reply, since there's a +subscription handling the response, this means that you can search subscriptions +related to request and reply subjects, and notify them via the response +mechanism for the request depending on the type of operation that was rejected. + +For subscription errors, a second level parse for `/using queue "(\S+)"/` will +yield the `queue` if any that was used during the subscribe operation. This +means that a client may have permissions on a subscription, but not in a +specific queue or some other permutation of the subject/queue. + +The server unfortunately doesn't make it easy for the client to know the actual +subscription (SID) hosting the error but the logic for processing is simple: +notify the first subscription that matches the subject and queue name (this +assumes you track the subject and queue name in your internal subscription +representation) - the server will send multiple error protocol messages (one per +offense) so if multiple subscriptions, you will be able to notify all of them. + +For subscriptions, errors are _terminal_ for the subscription, as the server +cancels the clients interest. so the client will never get any messages on it. +It is very convenient for client user code to receive an error using some +mechanism associated with the subscription in question as this will simplify the +handling of the client code. + +It is also useful to have some sort of Promise/Future/etc that will get resolved +when a subscription closes (will not yield any more messages) - The +Promise/Future can resolve to an error or void (not thrown) which the client can +inspect for the reason if any why the subscription closed. Throwing an error is +discouraged, as this would create a possibility of crashing the client. Clients +can then use this information to perform their own error handling which may +require taking the service offline if the subscription is vital for its +operation. + +Note that regardless of a localized error handling mechanism, you should also +notify the async error handler as you don't know exactly where the client code +is looking for errors. + +#### Authorization Violation + +`Authorization Violation` is sent whenever the credentials for a client are not +accepted. This is followed by a server initiated disconnect. Clients will +normally reconnect (depending on their connection options). If the client +closes, this should be reported as the last error. + +#### User Authentication Expired + +`User Authentication Expired` protocol error happens whenever credentials for +the client expire while the client is connected to the server. It is followed by +a server disconnect. This error should be notified in the async handler. On +reconnect the client is going to be rejected with `Authorization Violation` and +follow its reconnect logic. + +#### Account Expiration + +`Account Authentication Expired` is sent whenever the account JWT expires and a +client for the account is connected. This will result in a disconnect initiated +by the server. On reconnect the client will be rejected with +`Authorization Violation` until the account configuration is refreshed on the +server. The client will follow its reconnect logic. + +#### Secure Connection - TLS Required + +`Secure Connection - TLS Required` is sent if the client is trying to connect on +a server that requires TLS. + +_????????_ The client should have done extensive ServerInfo investigation and +determined that this would have been a failure + +#### Maximum Number of Connections + +`maximum connections exceeded` server limit on number of connections reached. +Server will send to the client the `-ERR maximum connections exceeded`, client +possibly go in reconnect loop. + +_????????_ The server can also send +`Connection throttling is active. Please try again later.` when too many TLS +connections are in progress. This should be treated as +`maximum connections exceeded` or reworked on the server to send this error +instead. + +#### Max Payload Violation + +`Maximum Payload Violation` is sent to the client if it attempts to publish more +data than it is allowed by `max_payload`. The server will disconnect the client +after sending the protocol error. Note that clients should test payload sizes +and fail publishes that exceed the server configuration, as this allow the error +to be localized when possible to the user code that caused the error. + +#### User Authentication Revoked + +`User Authentication Revoked` this is reported when an account is updated and +the user is revoked in the account. On connects where the user is already +revoked, it is just an `Authorization Error`. On actual experimentation, the +client never saw `User Authentication Revoked`, and instead was just +disconnected. Reconnect was greeted with a `Authorization Error`. + +#### Invalid Client Protocol + +`invalid client protocol` sent to the client if the protocol version from the +client doesn't match. Client is disconnected when this error is sent. + +_????????_ Currently, this is not a concern since presumably, a server will be +able to deal with protocol version 1 when protocol upgrades. + +#### No Responders Requires Headers + +`no responders requires headers support` sent if the client requests no +responder, but rejects headers. Client is disconnected when this error is sent. +Current clients hardcode `headers: true`, so this error shouldn't be seen by +clients. + +_????????_ `headers` connect option shouldn't be exposed by the clients - this +is a holdover from when clients opted in to `headers`. + +#### Failed Account Registration + +`Failed Account Registration` an internal error while registering an account. +(Looking for reproducible test). + +#### Invalid Publish Subject + +`Invalid Publish Subject` (this requires the server in pedantic mode). Client is +not disconnected when this error is sent. Note that for subscribe operations, +depending on the separator (space) you may inadvertently specify a queue. In +such cases there will be no error, your subscription will simply be part of a +queue. If multiple spaces or some other variant, the server will treat it as a +protocol error. + +#### Unknown Protocol Operation + +`Unknown Protocol Operation` this error is sent if the server doesn't understand +a command. This is followed by a disconnect. + +#### Other Errors (not necessarily seen by the client) + +- `maximum account active connections exceeded` not notified to the client, the + client connecting will be disconnected (seen as a connection refused.) + ### Security Considerations diff --git a/adr/ADR-42.md b/adr/ADR-42.md deleted file mode 100644 index 56427d8..0000000 --- a/adr/ADR-42.md +++ /dev/null @@ -1,183 +0,0 @@ -# NATS Client LifeCycle (-ERR Protocol Error Handling) - -| Metadata | Value | -| -------- | -------------- | -| Date | 2024-05-31 | -| Author | @aricart | -| Status | Implemented | -| Tags | client, server | - -## Context and Problem Statement - -Client lifecycle such as connect/reconnect/liveliness (ping/pong)/LDM behaviours -are fairly complex in a NATS client. This ADR simply documents `-ERR` protocol -messages that are sent to a client. - -## Errors - -The `-ERR` protocol message is an important signal for clients about things that -are incorrect from the perspective of Permissions or Authorization. - -A note about implementation - the current format of the errors is simple, but -messages are not typed in a way that is simple for clients to understand what -should happen - in many cases the server will disconnect th client. In others it -is just a runtime error that an update in configuration at runtime may re-enable -the client to do what was rejected previously. However the client has no way to -know whether the server will disconnect it or not. - -In cases where the error is surfaced during connection it creates the nuance -that it is difficult for the client to know if the error is recoverable (simply -attempt to reconnect later) or not. Depending on the client implementation this -makes it difficult - in - -### Permissions Violation - -`Permissions Violation` means that the client tried to publish or subscribe on a -subject for which it has no permissions. This type of error can happen or -surface at any time, as changes to permissions intentionally or not can happen. -This means that even if the subscription has been working, it is possible that -it will not in the future if the permissions are altered. - -The message will include `/(Publish|Subscription) to (\S+)/` this will indicate -whether the error is related to a publish or subscription operation. Note that -you should be careful in how you write your matchers as the message could change -slightly or sport additional information (as you'll see below). - -For publish permission errors, it's hard to notify the client at the point of -failure unless the client is synchronous. But the standard async error -notification should be sufficient. In the case of request reply, since there's a -subscription handling the response, this means that you can search subscriptions -related to request and reply subjects, and notify them via the response -mechanism for the request depending on the type of operation that was rejected. - -For subscription errors, a second level parse for `/using queue "(\S+)"/` will -yield the `queue` if any that was used during the subscribe operation. This -means that a client may have permissions on a subscription, but not in a -specific queue or some other permutation of the subject/queue. - -The server unfortunately doesn't make it easy for the client to know the actual -subscription (SID) hosting the error but the logic for processing is simple: -notify the first subscription that matches the subject and queue name (this -assumes you track the subject and queue name in your internal subscription -representation) - the server will send multiple error protocol messages (one per -offense) so if multiple subscriptions, you will be able to notify all of them. - -For subscriptions, errors are _terminal_ for the subscription, as the server -cancels the clients interest. so the client will never get any messages on it. -It is very convenient for client user code to receive an error using some -mechanism associated with the subscription in question as this will simplify the -handling of the client code. - -It is also useful to have some sort of Promise/Future/etc that will get resolved -when a subscription closes (will not yield any more messages) - The -Promise/Future can resolve to an error or void (not thrown) which the client can -inspect for the reason if any why the subscription closed. Throwing an error is -discouraged, as this would create a possibility of crashing the client. Clients -can then use this information to perform their own error handling which may -require taking the service offline if the subscription is vital for its -operation. - -Note that regardless of a localized error handling mechanism, you should also -notify the async error handler as you don't know exactly where the client code -is looking for errors. - -### Authorization Violation - -`Authorization Violation` is sent whenever the credentials for a client are not -accepted. This is followed by a server initiated disconnect. Clients will -normally reconnect (depending on their connection options). If the client -closes, this should be reported as the last error. - -### User Authentication Expired - -`User Authentication Expired` protocol error happens whenever credentials for -the client expire while the client is connected to the server. It is followed by -a server disconnect. This error should be notified in the async handler. On -reconnect the client is going to be rejected with `Authorization Violation` and -follow its reconnect logic. - -### Account Expiration - -`Account Authentication Expired` is sent whenever the account JWT expires and a -client for the account is connected. This will result in a disconnect initiated -by the server. On reconnect the client will be rejected with -`Authorization Violation` until the account configuration is refreshed on the -server. The client will follow its reconnect logic. - -### Secure Connection - TLS Required - -`Secure Connection - TLS Required` is sent if the client is trying to connect on -a server that requires TLS. - -_????????_ The client should have done extensive ServerInfo investigation and -determined that this would have been a failure - -### Maximum Number of Connections - -`maximum connections exceeded` server limit on number of connections reached. -Server will send to the client the `-ERR maximum connections exceeded`, client -possibly go in reconnect loop. - -_????????_ The server can also send -`Connection throttling is active. Please try again later.` when too many TLS -connections are in progress. This should be treated as -`maximum connections exceeded` or reworked on the server to send this error -instead. - -### Max Payload Violation - -`Maximum Payload Violation` is sent to the client if it attempts to publish more -data than it is allowed by `max_payload`. The server will disconnect the client -after sending the protocol error. Note that clients should test payload sizes -and fail publishes that exceed the server configuration, as this allow the error -to be localized when possible to the user code that caused the error. - -### User Authentication Revoked - -`User Authentication Revoked` this is reported when an account is updated and -the user is revoked in the account. On connects where the user is already -revoked, it is just an `Authorization Error`. On actual experimentation, the -client never saw `User Authentication Revoked`, and instead was just -disconnected. Reconnect was greeted with a `Authorization Error`. - -### Invalid Client Protocol - -`invalid client protocol` sent to the client if the protocol version from the -client doesn't match. Client is disconnected when this error is sent. - -_????????_ Currently, this is not a concern since presumably, a server will be -able to deal with protocol version 1 when protocol upgrades. - -### No Responders Requires Headers - -`no responders requires headers support` sent if the client requests no -responder, but rejects headers. Client is disconnected when this error is sent. -Current clients hardcode `headers: true`, so this error shouldn't be seen by -clients. - -_????????_ `headers` connect option shouldn't be exposed by the clients - this -is a holdover from when clients opted in to `headers`. - -### Failed Account Registration - -`Failed Account Registration` an internal error while registering an account. -(Looking for reproducible test). - -### Invalid Publish Subject - -`Invalid Publish Subject` (this requires the server in pedantic mode). Client is -not disconnected when this error is sent. Note that for subscribe operations, -depending on the separator (space) you may inadvertently specify a queue. In -such cases there will be no error, your subscription will simply be part of a -queue. If multiple spaces or some other variant, the server will treat it as a -protocol error. - -### Unknown Protocol Operation - -`Unknown Protocol Operation` this error is sent if the server doesn't understand -a command. This is followed by a disconnect. - -### Other Errors (not necessarily seen by the client) - -- `maximum account active connections exceeded` not notified to the client, the - client connecting will be disconnected (seen as a connection refused.) From b8e72e486c94713e8d99b93a6a86dbbba5babd03 Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Fri, 12 Jul 2024 10:43:51 -0500 Subject: [PATCH 09/13] review comment --- adr/ADR-40.md | 231 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 140 insertions(+), 91 deletions(-) diff --git a/adr/ADR-40.md b/adr/ADR-40.md index 2175907..8ee43b4 100644 --- a/adr/ADR-40.md +++ b/adr/ADR-40.md @@ -1,14 +1,14 @@ # NATS Connection -|Metadata|Value| -|--------|-----| -| Date | 2023-10-12 | -| Author | @Jarema | -| Status | Implemented | -| Tags | client, server, spec | +| Metadata | Value | +| -------- | -------------------- | +| Date | 2023-10-12 | +| Author | @Jarema | +| Status | Implemented | +| Tags | client, server, spec | | Revision | Date | Author | Info | -|----------|------------|----------|------------------------------| +| -------- | ---------- | -------- | ---------------------------- | | 1 | 2023-10-12 | @Jarema | Initial draft | | 2 | 2024-6-24 | @aricart | Added protocol error section | @@ -16,6 +16,7 @@ This document describes how clients connect to the NATS server or NATS cluster. That includes topics like: + - connection process - reconnect - tls @@ -23,7 +24,9 @@ That includes topics like: ## Motivation -Ensuring a consistent way how Clients establish and maintain connection with the NATS server and provide consistent and predictable behaviour across the ecosystem. +Ensuring a consistent way how Clients establish and maintain connection with the +NATS server and provide consistent and predictable behaviour across the +ecosystem. ## Guide-level Explanation @@ -36,15 +39,20 @@ Ensuring a consistent way how Clients establish and maintain connection with the 1. Clients initiate a network connection to the Server. 2. Server responds with [INFO][INFO] json. 3. Client sends [CONNECT][CONNECT] json. -4. Clients and Server start to exchange PING/PONG messages to detect if the connection is alive. +4. Clients and Server start to exchange PING/PONG messages to detect if the + connection is alive. -**Note** If clients sets `protocol` field in [Connect][Connect] to equal or greater than 1, Server can send subsequent [INFO][INFO] on a ongoing connection. -Client needs to handle them appropriately and update server lists and server info. +**Note** If clients sets `protocol` field in [Connect][Connect] to equal or +greater than 1, Server can send subsequent [INFO][INFO] on a ongoing connection. +Client needs to handle them appropriately and update server lists and server +info. #### Auth flow + TODO #### TLS + There are two flows available in the Server that enable TLS. ##### Standard NATS TLS (Explicit TLS) @@ -53,41 +61,51 @@ This method is available in all NATS Server versions. 1. Clients initiate a network connection to the Server. 2. Server responds with [INFO][INFO] json. -3. If Server [INFO][INFO] contains `tls_required` set to `true`, or the client has a tls requirement set to `true`, the client performs a TLS upgrade. +3. If Server [INFO][INFO] contains `tls_required` set to `true`, or the client + has a tls requirement set to `true`, the client performs a TLS upgrade. 4. Client sends [CONNECT][CONNECT] json. -5. Clients and Server start to exchange PING/PONG messages to detect if the connection is alive. +5. Clients and Server start to exchange PING/PONG messages to detect if the + connection is alive. ##### TLS First (Implicit TLS) This method has been available since NATS Server 2.11. There are two prerequisites to use this method: + 1. Server config has enabled `handshake_first` field in the `tls` block. 2. The client has set the `tls_first` option set to true. **handshake_first** has those possible values: + - **`false`**: handshake first is disabled. Default value -- `true`: handshake first is enabled and enforced. Clients that do not use this flow will fail to connect. -- `duration` (i.e. 2s): a hybrid mode that will wait a given time, allowing the client to follow the `tls_first` flow. After the duration has expired, `INFO` is sent, enabling standard client TLS flow. -- `auto`: same as above, with some default value. By default it waits 50ms for TLS upgrade before sending the [INFO][INFO]. +- `true`: handshake first is enabled and enforced. Clients that do not use this + flow will fail to connect. +- `duration` (i.e. 2s): a hybrid mode that will wait a given time, allowing the + client to follow the `tls_first` flow. After the duration has expired, `INFO` + is sent, enabling standard client TLS flow. +- `auto`: same as above, with some default value. By default it waits 50ms for + TLS upgrade before sending the [INFO][INFO]. The flow itself is flipped. TLS is established before the Server sends INFO: 1. Client initiate a network connection to the Server. 2. Client upgrades the connection to TLS. -2. Server sends [INFO][INFO] json. +3. Server sends [INFO][INFO] json. 4. Client sends [CONNECT][CONNECT] json. -5. Client and Server start to exchange PING/PONG messages to detect if the connection is alive. - +5. Client and Server start to exchange PING/PONG messages to detect if the + connection is alive. ### Servers discovery + **Note**: Server will send back the info only -When Server sends back [INFO][INFO]. It may contain additional URLs to which the client can make connection attempts. -The client should store those URLs and use them in the Reconnection Strategy. +When Server sends back [INFO][INFO]. It may contain additional URLs to which the +client can make connection attempts. The client should store those URLs and use +them in the Reconnection Strategy. -A client should have an option to turn off using advertised URLs. -By default, those URLs are used. +A client should have an option to turn off using advertised URLs. By default, +those URLs are used. **TODO**: Add more in-depth explanation how topology discovery works. @@ -95,153 +113,184 @@ By default, those URLs are used. #### On-Demand reconnect -Client should have a way that allows users to force reconnection process. -This can be useful for refreshing auth or rebalancing clients. +Client should have a way that allows users to force reconnection process. This +can be useful for refreshing auth or rebalancing clients. -When triggered, client will drop connection to the current server and perform standard reconnection process. -That means that all subscriptions and consumers should be resubscribed and their work resumed after successful reconnect where all reconnect options are respected. +When triggered, client will drop connection to the current server and perform +standard reconnection process. That means that all subscriptions and consumers +should be resubscribed and their work resumed after successful reconnect where +all reconnect options are respected. -For most clients, that means having a `reconnect` method on the Client/Connection handle. +For most clients, that means having a `reconnect` method on the +Client/Connection handle. #### Detecting disconnection There are two methods that clients should use to detect disconnections: -1. Missing two consecutive PONGs from the Server (number of missing PONGs can be configured). + +1. Missing two consecutive PONGs from the Server (number of missing PONGs can be + configured). 2. Handling errors from network connection. #### Reconnect process -When the client detects disconnection, it starts to reconnect attempts with the following rules: +When the client detects disconnection, it starts to reconnect attempts with the +following rules: + 1. Immediate reconnect attempt - - The client attempts to reconnect immediately after finding out it has been disconnected. + - The client attempts to reconnect immediately after finding out it has been + disconnected. 2. Exponential backoff with jitter - - When the first reconnect fails, the backoff process should kick in. Default Jitter should also be included to avoid thundering herd problems. -3. If the Server returned additional URLs, the client should try reconnecting in random order to each Server on the list, unless randomization option is disabled in the client [options](#Retain-servers-order). + - When the first reconnect fails, the backoff process should kick in. Default + Jitter should also be included to avoid thundering herd problems. +3. If the Server returned additional URLs, the client should try reconnecting in + random order to each Server on the list, unless randomization option is + disabled in the client [options](#Retain-servers-order). 4. Successful reconnect resets the timers 5. Upon reconnection, clients should resubscribe to all created subscriptions. -If there is any change in the connection state - connected/disconnected, the client should have some way of notifying the user about it. -This can be a callback function or any other idiomatic mechanism in a given language for reporting asynchronous events. +If there is any change in the connection state - connected/disconnected, the +client should have some way of notifying the user about it. This can be a +callback function or any other idiomatic mechanism in a given language for +reporting asynchronous events. -**Disconnect buffer** -Most clients have a buffer that will aggregate messages on the client side in case of disconnection. -It will fill up the buffer and send pending messages as soon as connection is restored. -If buffer will be filled before the connection is restored - publish attempts should return error noting that fact. +**Disconnect buffer** Most clients have a buffer that will aggregate messages on +the client side in case of disconnection. It will fill up the buffer and send +pending messages as soon as connection is restored. If buffer will be filled +before the connection is restored - publish attempts should return error noting +that fact. ## Reference-level Explanation + ### Client options Although clients should provide sensible defaults for handling the connection, -in many cases, it requires some tweaking. -The below list defines what can be changed, what it means, and what the defaults are. +in many cases, it requires some tweaking. The below list defines what can be +changed, what it means, and what the defaults are. #### Ping interval **default**: 2 minutes -As the client or server might not know that the connection is severed, NATS has Ping/Pong protocol. -Client can set at what intervals it will send a PING to the server, expecting PONG. -If two consecutive PONGs are missed, connection is marked as lost triggering reconnect attempt. +As the client or server might not know that the connection is severed, NATS has +Ping/Pong protocol. Client can set at what intervals it will send a PING to the +server, expecting PONG. If two consecutive PONGs are missed, connection is +marked as lost triggering reconnect attempt. -It's worth noting that shorter PING intervals can improve responsiveness of the client to network issues, -but it also increases the load on the whole NATS system and the network itself with each added client. +It's worth noting that shorter PING intervals can improve responsiveness of the +client to network issues, but it also increases the load on the whole NATS +system and the network itself with each added client. #### Max Pings Out **default**: 2 -Sets number of allowed outstanding PONG responses for the client PINGs before marking client as disconnected and triggering reconnect. +Sets number of allowed outstanding PONG responses for the client PINGs before +marking client as disconnected and triggering reconnect. #### Retry on failed initial connect **default: false** -By default, if a client makes a connection attempt, if it fails, `connect` returns an error. -In many scenarios, users might want to allow the first attempt to fail as long as clients continue the efforts -and notify the progress. +By default, if a client makes a connection attempt, if it fails, `connect` +returns an error. In many scenarios, users might want to allow the first attempt +to fail as long as clients continue the efforts and notify the progress. -When this option is enabled, the client should start the initial connection process and return the standard NATS connection/client handle while in background connection attempts are continued. +When this option is enabled, the client should start the initial connection +process and return the standard NATS connection/client handle while in +background connection attempts are continued. -The client should not wait for the first connection to succeed or fail, as in some network scenarios, this can -take much time. -If the first attempt fails, a standard [Reconnect process] should be performed. +The client should not wait for the first connection to succeed or fail, as in +some network scenarios, this can take much time. If the first attempt fails, a +standard [Reconnect process] should be performed. #### Max reconnects **default: 3 / none -Specifies the number of consecutive reconnect attempts the client will make before giving up. -This is useful for preventing `zombie services` from endlessly reaching the servers, but it can also -be a footgun and surprise for users who do not expect that the client can give up entirely. +Specifies the number of consecutive reconnect attempts the client will make +before giving up. This is useful for preventing `zombie services` from endlessly +reaching the servers, but it can also be a footgun and surprise for users who do +not expect that the client can give up entirely. #### Connection timeout **default 5s** -Specifies how long the client will wait for the network connection to be established. -In some languages, this can hang eternally, and timeout mechanics might be necessary. -In others, the network connection method might have a way to configure its timeout. +Specifies how long the client will wait for the network connection to be +established. In some languages, this can hang eternally, and timeout mechanics +might be necessary. In others, the network connection method might have a way to +configure its timeout. #### Custom reconnect delay **Default: none** -If fine-grained control over reconnect attempts intervals is needed, this option allows users to specify one. -Implementation should make sense in a given language. For example, it can be a callback `fn reconnect(attempt: int) -> Duration`. +If fine-grained control over reconnect attempts intervals is needed, this option +allows users to specify one. Implementation should make sense in a given +language. For example, it can be a callback +`fn reconnect(attempt: int) -> Duration`. #### Disconnect buffer -If given client supports storing messages during disconnect periods, this option allows to tweak the number of stored messages. -It should also allow disable buffering entirely. +If given client supports storing messages during disconnect periods, this option +allows to tweak the number of stored messages. It should also allow disable +buffering entirely. #### Tls required -**default: false** -If set, the client enforces the TLS, whether the Server also requires it or not. +**default: false** If set, the client enforces the TLS, whether the Server also +requires it or not. If `tls://` scheme is used in the connection string, this also enforces tls. #### Ignore advertised servers -**default: false** -When connecting to the Server, it may send back a list of other servers in the cluster of which it is aware. -This can be very helpful for discoverability and removes the need for the client to pass all servers in `connect`, -but it also may be unwanted if, for example, some servers URLs are unreachable for a given client. +**default: false** When connecting to the Server, it may send back a list of +other servers in the cluster of which it is aware. This can be very helpful for +discoverability and removes the need for the client to pass all servers in +`connect`, but it also may be unwanted if, for example, some servers URLs are +unreachable for a given client. #### Retain servers order -**default: false** -By default, if many server addresses are passed in the connect string or array, the client will try to connect to them in random order. -This helps healthy connection distribution, but if in a specific case list should be treated as a preference list, -randomization may be turned off. +**default: false** By default, if many server addresses are passed in the +connect string or array, the client will try to connect to them in random order. +This helps healthy connection distribution, but if in a specific case list +should be treated as a preference list, randomization may be turned off. -This function can be expressed "enable retaining order" or "disable randomization" depending on what is more idiomatic in given language. +This function can be expressed "enable retaining order" or "disable +randomization" depending on what is more idiomatic in given language. ### Protocol Commands and Grammar #### INFO + [LINK][LINK] -Send by the Server before or after establishing TLS, depending of flow used. -It contains information about the Server, the nonce, and other server URLs to which the client can connect. +Send by the Server before or after establishing TLS, depending of flow used. It +contains information about the Server, the nonce, and other server URLs to which +the client can connect. #### CONNECT -[CONNECT][CONNECT] -Send by the client in response to INFO. -Contains information about client, including optional signature, client version and connection options. +[CONNECT][CONNECT] +Send by the client in response to INFO. Contains information about client, +including optional signature, client version and connection options. #### Ping Pong -This is a mechanism to detect broken connections that may not be reported by the network connection in a given language. -If the Server sends `PING`, the client should answer with `PONG`. -If the Client sends `PING`, the Server should answer with `PONG`. +This is a mechanism to detect broken connections that may not be reported by the +network connection in a given language. -If two (configurable) consecutive `PONGs are missed, the client should treat the connection as broken, and it should start reconnect attempts. +If the Server sends `PING`, the client should answer with `PONG`. If the Client +sends `PING`, the Server should answer with `PONG`. -The default interval for PING is 2 minutes. +If two (configurable) consecutive `PONGs are missed, the client should treat the +connection as broken, and it should start reconnect attempts. +The default interval for PING is 2 minutes. ### Error Handling (TODO) @@ -348,11 +397,13 @@ determined that this would have been a failure Server will send to the client the `-ERR maximum connections exceeded`, client possibly go in reconnect loop. -_????????_ The server can also send +The server can also send `Connection throttling is active. Please try again later.` when too many TLS connections are in progress. This should be treated as `maximum connections exceeded` or reworked on the server to send this error -instead. +instead. Note that this can happen if when the tls server option +[`connection_rate_limit`](https://github.com/nats-io/nats-server/blob/main/server/opts.go#L4557) +is set. #### Max Payload Violation @@ -412,16 +463,14 @@ a command. This is followed by a disconnect. - `maximum account active connections exceeded` not notified to the client, the client connecting will be disconnected (seen as a connection refused.) - ### Security Considerations -Discuss any additional security considerations pertaining to the TLS implementation and connection handling. +Discuss any additional security considerations pertaining to the TLS +implementation and connection handling. ## Future Possibilities Smart Reconnection could be a potential big improvement. - [INFO]: https://beta-docs.nats.io/ref/protocols/client#info [CONNECT]: https://beta-docs.nats.io/ref/protocols/client#connect - From c73f4035a223822b5ec55d7412295d42012e0808 Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Fri, 12 Jul 2024 10:49:33 -0500 Subject: [PATCH 10/13] fixed incomplete thought --- adr/ADR-40.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/adr/ADR-40.md b/adr/ADR-40.md index 8ee43b4..f518807 100644 --- a/adr/ADR-40.md +++ b/adr/ADR-40.md @@ -299,15 +299,15 @@ are incorrect from the perspective of Permissions or Authorization. A note about implementation - the current format of the errors is simple, but messages are not typed in a way that is simple for clients to understand what -should happen - in many cases the server will disconnect th client. In others it -is just a runtime error that an update in configuration at runtime may re-enable -the client to do what was rejected previously. However the client has no way to -know whether the server will disconnect it or not. +should happen - in many cases the server will disconnect th client. In other +cases it is just a runtime error that an update in configuration at runtime may +re-enable the client to do what was rejected previously. However, the client has +no way to know whether the server will disconnect it or not. In cases where the error is surfaced during connection it creates the nuance that it is difficult for the client to know if the error is recoverable (simply -attempt to reconnect later) or not. Depending on the client implementation this -makes it difficult - in +attempt to reconnect later) or not. In some cases a client connection will never +resolve unless the number of maximum reconnect attempts is specified. #### Permissions Violation From bc4336a64a5b7f23ba0e0690920668dc87a734ee Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Tue, 23 Jul 2024 10:10:19 -0500 Subject: [PATCH 11/13] added blocks for important etc. documented maximum subscriptions exceeded --- adr/ADR-40.md | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/adr/ADR-40.md b/adr/ADR-40.md index f518807..3bf39cd 100644 --- a/adr/ADR-40.md +++ b/adr/ADR-40.md @@ -388,8 +388,9 @@ server. The client will follow its reconnect logic. `Secure Connection - TLS Required` is sent if the client is trying to connect on a server that requires TLS. -_????????_ The client should have done extensive ServerInfo investigation and -determined that this would have been a failure +> [!IMPORTANT] +> The client should have done extensive ServerInfo investigation and +> determined that this would have been a failure when initiating the connection. #### Maximum Number of Connections @@ -413,6 +414,12 @@ after sending the protocol error. Note that clients should test payload sizes and fail publishes that exceed the server configuration, as this allow the error to be localized when possible to the user code that caused the error. + +#### Maximum Subscriptions Exceeded + +`maximum subscriptions exceeded` is sent to the client if attempts to create more +subscriptions than it the account is allowed. The error is not terminal to the connection. + #### User Authentication Revoked `User Authentication Revoked` this is reported when an account is updated and @@ -426,8 +433,9 @@ disconnected. Reconnect was greeted with a `Authorization Error`. `invalid client protocol` sent to the client if the protocol version from the client doesn't match. Client is disconnected when this error is sent. -_????????_ Currently, this is not a concern since presumably, a server will be -able to deal with protocol version 1 when protocol upgrades. +> [!NOTE] +> Currently, this is not a concern since presumably, a server will be able to deal +> with protocol version 1 when protocol upgrades. #### No Responders Requires Headers @@ -436,8 +444,9 @@ responder, but rejects headers. Client is disconnected when this error is sent. Current clients hardcode `headers: true`, so this error shouldn't be seen by clients. -_????????_ `headers` connect option shouldn't be exposed by the clients - this -is a holdover from when clients opted in to `headers`. +> [!IMPORTANT] +> `headers` connect option shouldn't be exposed by the clients - this +> is a holdover from when clients opted in to `headers`. #### Failed Account Registration From 2baae82072ac691fa5a92cda723a330ff5690b0f Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Tue, 23 Jul 2024 10:12:03 -0500 Subject: [PATCH 12/13] fmt --- adr/ADR-40.md | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/adr/ADR-40.md b/adr/ADR-40.md index 3bf39cd..5fe32ea 100644 --- a/adr/ADR-40.md +++ b/adr/ADR-40.md @@ -388,9 +388,9 @@ server. The client will follow its reconnect logic. `Secure Connection - TLS Required` is sent if the client is trying to connect on a server that requires TLS. -> [!IMPORTANT] -> The client should have done extensive ServerInfo investigation and -> determined that this would have been a failure when initiating the connection. +> [!IMPORTANT] The client should have done extensive ServerInfo investigation +> and determined that this would have been a failure when initiating the +> connection. #### Maximum Number of Connections @@ -414,11 +414,11 @@ after sending the protocol error. Note that clients should test payload sizes and fail publishes that exceed the server configuration, as this allow the error to be localized when possible to the user code that caused the error. - #### Maximum Subscriptions Exceeded -`maximum subscriptions exceeded` is sent to the client if attempts to create more -subscriptions than it the account is allowed. The error is not terminal to the connection. +`maximum subscriptions exceeded` is sent to the client if attempts to create +more subscriptions than it the account is allowed. The error is not terminal to +the connection. #### User Authentication Revoked @@ -433,9 +433,8 @@ disconnected. Reconnect was greeted with a `Authorization Error`. `invalid client protocol` sent to the client if the protocol version from the client doesn't match. Client is disconnected when this error is sent. -> [!NOTE] -> Currently, this is not a concern since presumably, a server will be able to deal -> with protocol version 1 when protocol upgrades. +> [!NOTE] Currently, this is not a concern since presumably, a server will be +> able to deal with protocol version 1 when protocol upgrades. #### No Responders Requires Headers @@ -444,9 +443,8 @@ responder, but rejects headers. Client is disconnected when this error is sent. Current clients hardcode `headers: true`, so this error shouldn't be seen by clients. -> [!IMPORTANT] -> `headers` connect option shouldn't be exposed by the clients - this -> is a holdover from when clients opted in to `headers`. +> [!IMPORTANT] `headers` connect option shouldn't be exposed by the clients - +> this is a holdover from when clients opted in to `headers`. #### Failed Account Registration From dee277154c8302b7850f65c6f13c9faa62189589 Mon Sep 17 00:00:00 2001 From: Alberto Ricart Date: Tue, 23 Jul 2024 10:13:37 -0500 Subject: [PATCH 13/13] fixed markdown github extensions for highligths --- adr/ADR-40.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/adr/ADR-40.md b/adr/ADR-40.md index 5fe32ea..cea653b 100644 --- a/adr/ADR-40.md +++ b/adr/ADR-40.md @@ -388,7 +388,8 @@ server. The client will follow its reconnect logic. `Secure Connection - TLS Required` is sent if the client is trying to connect on a server that requires TLS. -> [!IMPORTANT] The client should have done extensive ServerInfo investigation +> [!IMPORTANT] +> The client should have done extensive ServerInfo investigation > and determined that this would have been a failure when initiating the > connection. @@ -433,7 +434,8 @@ disconnected. Reconnect was greeted with a `Authorization Error`. `invalid client protocol` sent to the client if the protocol version from the client doesn't match. Client is disconnected when this error is sent. -> [!NOTE] Currently, this is not a concern since presumably, a server will be +> [!NOTE] +> Currently, this is not a concern since presumably, a server will be > able to deal with protocol version 1 when protocol upgrades. #### No Responders Requires Headers @@ -443,7 +445,8 @@ responder, but rejects headers. Client is disconnected when this error is sent. Current clients hardcode `headers: true`, so this error shouldn't be seen by clients. -> [!IMPORTANT] `headers` connect option shouldn't be exposed by the clients - +> [!IMPORTANT] +> `headers` connect option shouldn't be exposed by the clients - > this is a holdover from when clients opted in to `headers`. #### Failed Account Registration