Very unusually, I'm here to ask for advice. I'd like to share with you a design dilemma and get your feedback. Bottom line is I'm interested to hear in how you manage your strategies for data replication, when and why do you replicate (or perhaps you never replicate). I'm positing a very simple example, not necessarily reflecting a real business case.
Thanks in advance for your thoughts :)Let's suppose that two software systems are involved in achieving some business capability, say assessing credit risk:* A credit risk calculation module* A customer information moduleThe business requirement is that customer's credit situation needs to be re-assessed every calendar month. So there is some job that runs each day that carries out the check on a subset of all the customers, such that the entire customer population is covered over the course of each month. Notifications of some sort will be raised for customers whose credit situation is not satisfactory.The risk module needs customer information to perform its task, and there are multiple strategies for retrieving this information, such as:* Invoke an API operation retrieve customer by ID against the customer module, on demand, each month* Invoke a bulk API operation retrieve customers by list of IDs against the customer module, on demand, each month* Maintain an exact copy of the customer information, and updates the copy whenever a change is made in the master, by subscribing to Customer Create and Customer Change events.* Maintains a subset of the customer information, optimized and transforms to its needs, and updates the copy whenever a change is made in the master, by subscribing to Customer Create and Customer Change events.* Other strategies?There's a lot going on here:* What the cadence of the data changes in customer module?* How much does it cost to add storage?* What are the implications of loss of synchronization between source and user* How fault-tolerant can the risk assessment process be?* more?And:* Does it make a difference if the two modules are supplied by one vendor or by two different vendors?* Does it make a difference if there is an industry standard (e.g. TMF Open API) for the API operations and/or events?
I tend to favour the option of maintaining a subset of the customer information by subscribing to change events. This is the optimal solution for minimizing load while maximizing performance. If the API producer supports it you can include a query portion in your subscription which can minimize the data transferred to just what you need.
The down side of this approach is the potential for loss of synchronization. My preference for solving that is to recognize a potential loss and resynchronize through a query on the collection for all items which have been updated since the time of the loss of synchronization (i.e. ?lastUpdate.gt=2023-08-07T04:20:00Z). Unfortunately not all entities have such an attribute, but we polymorphically add one where required in our own implementations. IMHO all API collections should include this (see AP-2223).
When I took the ODA course and masterclass from TM Forum, there was some examples showing the power of ODA. Huwaei was of the examples given. They abandon data replication and "implemented" ODA and implemented TMF APIs so that data would only exist in a single point of truth. This approach saved them millions of dollars in data storage alone.
For our pre-sales processes, we do dunning check for existing customers and we also chose API (on demand) because it didn't make sense to duplicate data in each system that needs dunning status.
My 2 centsPS using data replication also cause more concerns for data privacy and GDPR requirement.
I tend to agree with @Vance Shipley that subscribing to the change events is the better approach. It is not only offering the best performance but it also makes it irrelevant if both are supplied by one or multiple vendors.
Lets assume 10M customers with 1M changes each month.
Thight coupling means that the customer information module must be available and responsive during the credit risk assessment.
Using Notifications without an event bus (Kafka, MQ, ...) will reverse that dependency: the risk assessment module must be available to allow changes to the customer information or risk loss of synchronisation. When an event bus is used loss of synchronisation is only temporary. An event driven architecture is eventually consistent.
When synchronising legacy applications that don't offer support for Notifications, it is sometimes even better to use CDC (Change Data Capture) techniques to achieve loose coupling. Instead of business Events as in TMF OpenAPI, this uses low level DB information (transaction log or triggers) to generate events. In this case the volume of events will be higher but it still achieves the high performance and loose coupling aspects of the EDA.
I'll join Matthieu's team. Keep it simple. If you don't need the information in real time and there is no impediment to implementing a periodic batch process, why go to the trouble of doing a complex synchronization logic?
In given example, I will take path of using API to get customer data since dunning is not master of customer data and creating another snapshot of customer will increase complexity of system. Even though there is technology advancement of event driven architecture to handle CDC but event sourcing doesn't fit in given example and further increase operation cost and most of time break rule of simplicity.
But in other scenarios like network profile, billing account etc. where semantic of customer data is according to domain, better approach is "Maintains a subset of the customer information, optimized and transforms to its needs, and updates the copy whenever a change is made in the master, by subscribing to Customer Create and Customer Change events. ". All BPM processes and use cases impacting customer data need to cover all domains via API call and where TMF specification helps to remove vendor specific coupling between two systems. Since customer data semantic is different for each system so overhead of data storage should be minimized along with compliance like PII, GDPR.