Consistency

After our discussion today I took a look at the different articles discussing Saga:

- [Saga Pattern | Application transactions using Microservices – Part I](https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/)
- [Saga Pattern | How to implement business transactions using Microservices – Part II](https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part-2/)
- [Managing data consistency in a microservice architecture using Sagas part 2 - coordinating sagas](http://chrisrichardson.net/post/sagas/2019/08/04/developing-sagas-part-2.html)

Below I've taken some notes on how this would work for the `payment` service (which should be extendable to the `order` service`.

## Saga notes
- Choreagraphy approach sends events directly to other services that are subscribed.
	- Since each service is a set of replicas we need a place to store all the services that are subscribed to us.
		- This could be cassandra or postgres or some other service (redis?)
	- Upon an event (say creating the order) we emit this event (through http) to all that are subscribed given some details (like the transaction id)
	- The choreagraphy approach (pushing messages to the services) is preferable as in this case we are sure only one replica answers to an event. K8s loadbalancer will make sure that this replica is healthy.
	- These events trigger actions on the subscribed services.
		- For example triggering a reservation
		- They could trigger another event
	- Upon success the next event will trigger the next step in the process
	- Upon failure all services that performed some action can run the appropriate code to roll back the actions made.
		- What happens when the next event doesn't arrive? In case a service actually failed.
                        - Make sure that sending an event waits for a 2xx code or reports failure on timeout.
	- This means we should clearly document the events and when they happen in the process. As the amount of events can become quite large.
	- The main benefit of this is that we don't need to implement all rollback logic on a single service and that we don't need to store any state in the services themselves.
        - It also provides a clear distinction from the public api for actual behaviour that we want our clients to use while providing an internal api for communication between services to handle reservations or failures.

## Example
The payment service needs to reserve stock and credits before subtracting them and completing the order.

1. `Payment` receives the payment request
2. `Payment` creates a payment entry with status `INITIATED` and creates an event `PAYMENT_INITIATED` with the order id and a transaction id (uuid4?) (not sure if we need the transaction id)
3. `User` receives event `PAYMENT_INITIATED` and reserves credits for the transaction, emits `CREDITS_RESERVED` event for the same transaction id.
4. `Stock` receives event `CREDITS_RESERVED` and reserves the stock for the transaction, emits the `STOCK_RESERVED` event for the same transaction id.
5. `Payment` receives event `STOCK_RESERVED` and changes the payment status to `RESERVED`. Emits `PAYMENT_RESERVED` event.
6. `User` receives `PAYMENT_RESERVED` and applies the reservation for the transaction, emits `CREDITS_SUBTRACTED` for the transaction.
7. `Stock` receives event `CREDITS_SUBTRACTED` and applies the stock reservation. Emits `STOCK_SUBTRACTED` for the transaction.
8. At this point the payment is complete.
9. `Payment` receives the event `STOCK_SUBTRACTED` and updates the status of the payment to `PAID`.

At any point there are also failed responses. For example:

3. `User` `Failure` sends the `INSUFFICIENT_CREDITS` which is received by the payment service and stops the transaction.
4. `Stock` `Failure` sends the `INSUFFICIENT_STOCK` which is received by the user service (who cancels their reservation) and payment service (who returns failure on the transaction). 
6. `User` `Failure` Not sure how this could happen, but in case both reservations should be removed using a `FAILURE` event for the transaction.
7. `Stock` `Failure` Not sure how this could happen, but in case the user service receives the event and credits the payment back using a `FAILURE` event for the transaction.
9. `Payment` `Failure` Again not sure how this could happen, but in case we return the stock and credit using a `FAILURE` event for the transaction.


The only problem I see here is when an event is emitted but not responded to (2xx status code). The transaction will forever halt. This should be solvable using some sort of deliver-at-least once logic that waits for a 200 status code -> which could break if the node fails or is replaced -> This could be avoided by using a message broker that is highly available. 

The only logic that is required on this message broker is to be highly available, send messages through their channels and wait for a response. If no response is given or an error we send a general `FAILURE` event for that transaction which should roll back the actions on other systems. This should make it so that unless a machine actually shuts down unexpectedly the system should stay consistent.

### The original request

An additional problem I see here is that because of this chain of messages we will need to keep the original request from the user to the payment service open until we either receive a failure or `STOCK_SUBTRACTED` event. 

In case of a failure of the payment service within this time we will not be able to let the user know the payment failed or succeeded.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistency #17

Saga notes

Example

The original request

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Consistency #17

Description

Saga notes

Example

The original request

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions