Thanks all for feedback and input so far.
What I am hearing is that at the end of the day it needs to be a business decision. As I pointed out in my question, and confirmed by @Peter Bruun's helpful replies, there is a cost involved in understanding if the master data is accurate or not at a given moment in time. It's more moving parts, latency, compute power, etc. I think a business function in the telco (revenue assurance?) would have to take a look, with the aid of architects (telco and vendor), to see if there are any critical paths where accuracy has to be 100%.
I am actually trying to find out what is going on in the banking industry (non-scientifically, by asking a former colleague who is now there), which is presumably even more demanding than the telco for financial accuracy. Watch this space.
Any opinions and statements made by me on this forum are purely personal, and do not necessarily reflect the position of the TM Forum or my employer.
Original Message:
Sent: Jul 26, 2024 11:12
From: Peter Bruun
Subject: Data Consistency
You are right, that you cannot prevent a race condition.
What I think is being overlooked in the responses are these parts of Jonathan's example:
... don't challenge realism of the specific example - it's the principle that's important.
... manual fallout awaiting correction and resubmission
This means that there may be a period of hours/days where the response is "dubious".
This cannot be prevented, and until the fallout is resolved the Party Management, as the master of that information, could return the previous information about the Party.
Operators do, however, have varying policies and, as you say, the policies would depend on the cost to the business. So if we ignore the specifics of Party Management that may not be the most costly potential inconsistencies, there can in the general case be policies that:
- Proceed with the previous information - that is probably the default behavior of many implementations
- Issue a warning to the caller that something is about to change - personnel or north-bound systems may then react to such warnings or relay them to customers
- Fail the transaction with an error message indicating the problem, instructing the caller to await resolution
In practice, I have seen all three policies applied for various concrete use-cases.
The problem is such policies can only be implemented if the response from GET
Something contains sufficient information about the transient state of the entity being returned.
So for the API, this means that there needs to be some attribute(s) in the response that indicate the situation, and since this is not specific to Party, it should be a standard attribute, present in most GET
responses.
------------------------------
Peter Bruun
Hewlett Packard Enterprise
Original Message:
Sent: Jul 26, 2024 09:15
From: Martin Vossler
Subject: Data Consistency
I think the answer is, "a bit of both". While there are a number of strategies to prevent the most egregious occurrences of this issue, the fact that even these activities (locking/etc) take some amount of time means that you can never truly prevent a race condition. This is where I believe the paranoia comes into play, chasing the ever more extreme use case.
Even in your specific example, if the market segment was available directly in the bill calculation data model, an update that came in a millisecond late would get shut out by a lock, or not mark the record dirty fast enough to prevent the GET from accessing technically incorrect data. As noted by most other replies mitigation for the more obvious deficiencies could be warranted, but at some point, it is inevitable.
A perhaps "less paranoid" way to look at your example might be assessing what the cost to the business is of a incorrect tax calculation x the expected occurrence rate, and consider if that amount is negligible or worth pursuing.
------------------------------
Martin Vossler
Wideopenwest, Inc.
Original Message:
Sent: Jul 25, 2024 08:58
From: Jonathan Goldberg
Subject: Data Consistency
Once again, I've crossed the benches to ask a question, rather than to answer them.
Note: Pls bear with me and don't challenge realism of the specific example - it's the principle that's important.
Consider a situation where a software component (e.g. ODA component TMFC031 - Bill Calculation) needs information from another component (e.g. ODA component TMFC028 - Party Management). For example customer market segment can impact a calculated charge. When the charge (applied customer billing rate) needs to be calculated, TMFC031 does a GET PartyRole or GET Customer to retrieve the value of the market segment property (e.g. residential), and applies the business rule to calculate the charge amount based on the value.
BUT
It turns out that there was a pending event coming from somewhere else that should have caused the market segment value of this customer to change from residential to SOHO, but the event had an error condition and was in manual fallout awaiting correction and resubmission. So the value at the time of charge calculation was actually incorrect.
SO
Does it make sense that a consumer of information (by GET) should be able to receive from the provider (somehow, perhaps in the response header) that there is a known or suspected issue with the current data. This of course would require the provider to be aware of errors, DLQs, and manual fallouts that relate to the requested information, with all the concomitant impact (complexity, latency, BoM). The consumer would then be able to decide whether to continue with processing or perhaps wait until the problem was fixed (polling, or listening to update events on the data item).
OR
This is paranoia at its worst - the assigned master of information should be assumed to be correct with respect to data retrievals.
Looking forward to your insights
------------------------------
Jonathan Goldberg
Amdocs Management Limited
Any opinions and statements made by me on this forum are purely personal, and do not necessarily reflect the position of the TM Forum or my employer.
------------------------------