Transaction Management in Microservices (implementation guide)- part 1
Microservices architecture is very popular. But one of the main problems is how to manage distributed transactions across each microservices.
In a monolith application, transactions handle in a single database. Transactions will be initiated at the database level and will be committed or rolled back based on the flow of the transaction.
But in a microservices architecture, the monolith system decomposes into isolated services. This means a local transaction in the monolithic system is now distributed into multiple services that will be called in a sequence. [Reference]
What is the solution in a microservices system?
For example imagine we have Microservices A, B, C, and an external service D.
Method doSomthing() in Service A, calls other services:
API 1 from Service B
API 2 from Service B
API 1 from Service C
API 1 from Service D which is an external service outside of Microservices.
In a failure scenario after the successful call of Service B, C, and D; while updating something in the local database of Service A, the database fails and local transactions in Service A will be roll-backed.
What about transactions that already committed in Service B, C, and D?
The problems above are important for microservice-based systems. Otherwise, there is no way to tell if a transaction has been completed successfully. The following two patterns can resolve the problem: [Reference]
- 2pc (two-phase commit)
- Saga (long-running transactions)
Two-phase commit is not really recommended for many microservice-based systems because 2pc is synchronous (blocking). The protocol will need to lock the object that will be changed before the transaction completes.[Reference]
So what are Long-running transactions?
The solution is to implement a compensating transaction. The steps in a compensating transaction must undo the effects of the steps in the original operation. A compensating transaction might not be able to simply replace the current state with the state the system was in at the start of the operation because this approach could overwrite changes made by other concurrent instances of an application. Instead, it must be an intelligent process that takes into account any work done by concurrent instances. This process will usually be application-specific, driven by the nature of the work performed by the original operation. [Reference]
Ok, now we know that in the case of failure, all the operations that have been done in services must roll-backed and replace the current state with the state the system was in at the start of the operation. Saga is a pattern that can help us to handle the failure scenario. There are two ways to structure a saga, choreography, and orchestration. Briefly, in the orchestration method there is a centralized saga orchestrator that sends commands to saga participants (microservices) and telling them to do or compensate for something, but in the choreography method decisions are the responsibility of each microservices.
But saga just is an abstraction pattern and how to implement it is a big challenge in itself.
Where I used to work, one of my responsibilities was to implement the infrastructure and architecture design of a microservices system for the finance and banking business. One of the main challenges in the migration from monolith to microservices was how to handle distributed transactions between microservices that sometimes it took more than 5 financial sub-operations to operate, and in the event of an error, all operations had to be returned to their previous state.
There, I created a framework to implement this challenge for these purposes:
1- Transactions and in the case of failure compensation transactions done safely and assure that all the compensation transactions done in reverse order and asynchronous.
2- Developers do not have to separately and individually deal with this problem.
The solution is mostly based on orchestration saga and using synchronously or asynchronously API calls for main commands and using asynchronously API calls for compensation commands.
In the next parts, I will talk about my implementation solution.