I'll join Matthieu's team. Keep it simple. If you don't need the information in real time and there is no impediment to implementing a periodic batch process, why go to the trouble of doing a complex synchronization logic?
Original Message:
Sent: Aug 07, 2023 11:33
From: Koen Peeters
Subject: Strategies for Data Replication
Jonathan,
I tend to agree with @Vance Shipley that subscribing to the change events is the better approach. It is not only offering the best performance but it also makes it irrelevant if both are supplied by one or multiple vendors.
Lets assume 10M customers with 1M changes each month.
- On demand API call results in 10M API calls per month and thight coupling to the customer information module. The query volume could adversely impact the performance of the customer information module.
- Bulk API call results in 10M API calls in a short period of time. It still has a thight coupling to the customer information module and might make the customer information module unresponsive during the bulk operation
- Using Notifications (Create, Change, Delete events) to maintain a copy results in 1M events per month and achieves loose coupling with the customer information module. Maintaining an optimised subset will potentially reduce storage requirement and provide higher performance for the credit risk calculation module. Generating Notifications is low cost compared with query operations. This means that the impact on the performance of the customer information module is also negligible.
Thight coupling means that the customer information module must be available and responsive during the credit risk assessment.
Using Notifications without an event bus (Kafka, MQ, ...) will reverse that dependency: the risk assessment module must be available to allow changes to the customer information or risk loss of synchronisation. When an event bus is used loss of synchronisation is only temporary. An event driven architecture is eventually consistent.
When synchronising legacy applications that don't offer support for Notifications, it is sometimes even better to use CDC (Change Data Capture) techniques to achieve loose coupling. Instead of business Events as in TMF OpenAPI, this uses low level DB information (transaction log or triggers) to generate events. In this case the volume of events will be higher but it still achieves the high performance and loose coupling aspects of the EDA.
Regards
------------------------------
Koen Peeters
OryxGateway FZ LLC
Original Message:
Sent: Aug 06, 2023 10:07
From: Jonathan Goldberg
Subject: Strategies for Data Replication
Very unusually, I'm here to ask for advice. I'd like to share with you a design dilemma and get your feedback. Bottom line is I'm interested to hear in how you manage your strategies for data replication, when and why do you replicate (or perhaps you never replicate). I'm positing a very simple example, not necessarily reflecting a real business case.
Thanks in advance for your thoughts :)
Let's suppose that two software systems are involved in achieving some business capability, say assessing credit risk:
* A credit risk calculation module
* A customer information module
The business requirement is that customer's credit situation needs to be re-assessed every calendar month. So there is some job that runs each day that carries out the check on a subset of all the customers, such that the entire customer population is covered over the course of each month. Notifications of some sort will be raised for customers whose credit situation is not satisfactory.
The risk module needs customer information to perform its task, and there are multiple strategies for retrieving this information, such as:
* Invoke an API operation retrieve customer by ID against the customer module, on demand, each month
* Invoke a bulk API operation retrieve customers by list of IDs against the customer module, on demand, each month
* Maintain an exact copy of the customer information, and updates the copy whenever a change is made in the master, by subscribing to Customer Create and Customer Change events.
* Maintains a subset of the customer information, optimized and transforms to its needs, and updates the copy whenever a change is made in the master, by subscribing to Customer Create and Customer Change events.
* Other strategies?
There's a lot going on here:
* What the cadence of the data changes in customer module?
* How much does it cost to add storage?
* What are the implications of loss of synchronization between source and user
* How fault-tolerant can the risk assessment process be?
* more?
And:
* Does it make a difference if the two modules are supplied by one vendor or by two different vendors?
* Does it make a difference if there is an industry standard (e.g. TMF Open API) for the API operations and/or events?
------------------------------
Jonathan Goldberg
Amdocs Management Limited
Any opinions and statements made by me on this forum are purely personal, and do not necessarily reflect the position of the TM Forum or my employer.
------------------------------