Open APIs

 View Only
Expand all | Collapse all

Data Consistency

  • 1.  Data Consistency

    TM Forum Member
    Posted Jul 25, 2024 08:58

    Once again, I've crossed the benches to ask a question, rather than to answer them.

    Note: Pls bear with me and don't challenge realism of the specific example - it's the principle that's important.

    Consider a situation where a software component (e.g. ODA component TMFC031 - Bill Calculation) needs information from another component (e.g. ODA component TMFC028 - Party Management). For example customer market segment can impact a calculated charge. When the charge (applied customer billing rate) needs to be calculated, TMFC031 does a GET PartyRole or GET Customer to retrieve the value of the market segment property (e.g. residential), and applies the business rule to calculate the charge amount based on the value.

    BUT

    It turns out that there was a pending event coming from somewhere else that should have caused the market segment value of this customer to change from residential to SOHO, but the event had an error condition and was in manual fallout awaiting correction and resubmission. So the value at the time of charge calculation was actually incorrect.

    SO

    Does it make sense that a consumer of information (by GET) should be able to receive from the provider (somehow, perhaps in the response header) that there is a known or suspected issue with the current data. This of course would require the provider to be aware of errors, DLQs, and manual fallouts that relate to the requested information, with all the concomitant impact (complexity, latency, BoM). The consumer would then be able to decide whether to continue with processing or perhaps wait until the problem was fixed (polling, or listening to update events on the data item).

    OR

    This is paranoia at its worst - the assigned master of information should be assumed to be correct with respect to data retrievals.

    Looking forward to your insights



    ------------------------------
    Jonathan Goldberg
    Amdocs Management Limited
    Any opinions and statements made by me on this forum are purely personal, and do not necessarily reflect the position of the TM Forum or my employer.
    ------------------------------


  • 2.  RE: Data Consistency

    TM Forum Member
    Posted Jul 26, 2024 02:13
    Edited by Peter Bruun Jul 26, 2024 02:23

    Hi Jonathan,

    You are definitely not paranoid :)

    Your example is just one of many cases where one system is requesting information from another. Also, you mention a fallout handling case, but that is just a special case of information impacted by an on-going transaction:

    1. Transaction is actively on-going. Some transactions may take minutes or hours to complete.
    2. Error/fallout situation being handled (manually/automated)
    3. Transaction is awaiting some external event before completing:
      • Manual confirmation, signature, approval or processing (manual business process)
      • External systems event, such as a partner systems response

    We have seen operators with different policies around this:

    1. Just use the current information in effect
    2. For live, ongoing transactions, wait for the transaction to complete
    3. Return a warning that there is an on-going transaction for the information, letting the API user (system/person) decide
    4. Fail with an error attribute set or an exception result (HTTP 5xx)

    Furthermore, you mention the GET case, but there is also the LIST case, where each result in the response should have an attribute indicating if it pertains to an on-going transaction and which type of transaction it is. Filtering may be applied to such an attribute.

    This LIST case is interesting because the information about on-going transactions may not simply come from a database/inventory but may need to be joined across systems. There may be thousands or millions of potential rows and so returning such an attribute, or worse filtering by it, could incur a significant performance penalty.

    So your question is both relevant and non-trivial, and the above is even a simplification - there are several additional complexities.



    ------------------------------
    Peter Bruun
    Hewlett Packard Enterprise
    ------------------------------



  • 3.  RE: Data Consistency

    TM Forum Member
    Posted Jul 26, 2024 03:23

    Hi Jonathan,

    Honestly, only one word comes to my mind: KISS.

    Anyway, it's not because the GET operation is successful and does not carry any whiff of change about the information you seek that it will not change the milliseconds after you retrieved it.



    ------------------------------
    Frederic Thise
    Proximus SA
    ------------------------------



  • 4.  RE: Data Consistency

    TM Forum Member
    Posted Jul 26, 2024 06:53
    Edited by Peter Bruun Jul 26, 2024 06:54

    While I agree, that keeping things simple is preferable, the experience is that many large Telcos are not satisfied with that and will not accept solutions that do not consider the questions that Jonathan is asking.

    The problems are real. You won't see them while the volume of transactions is low, but if the solution is used at scale - sometimes with millions of transactions daily - then not addressing those cases becomes costly for the operators. This is not theory, this is experience.

    Within a single, monolithic application developed and delivered by one vendor, this is all down to the vendor testing and guaranteeing consistent operation of the application.

    The point of the standardized TMF APIs is, however, to ensure interoperation between applications (as in TAM) delivered by different vendors.

    That architecturally leads to federation of information, and taking the "ostrich approach" to all the potential inconsistencies and race-conditions that such federation will entail, is not satisfactory.

    My experience is that attempts to solve concurrency and consistency issues "later" will inevitably fail. If these aspects are not built into the fundamental design, they cannot be "glued on" later, when the problems begin to appear.

    Since the consistency issues are a direct consequence of the intent of the TMF standards, I believe that protocols for solving those issues have to be within scope of the TMF APIs.

    Sometimes complex problems just do not have simple solutions, I'm afraid.



    ------------------------------
    Peter Bruun
    Hewlett Packard Enterprise
    ------------------------------



  • 5.  RE: Data Consistency

    TM Forum Member
    Posted Jul 26, 2024 06:22
    Edited by Matthieu Hattab Jul 26, 2024 10:34

    A doctor would say: focus time and energy on the root cause, not on the symptoms. It seems better value for money to use the time and money to improve the architecture, API etc so that these errors happen less often, than improve the process to handle the errors.



    ------------------------------
    Kind regards,

    Matthieu Hattab
    Lyse Platform
    ------------------------------



  • 6.  RE: Data Consistency

    TM Forum Member
    Posted Jul 26, 2024 07:17

    In this case the "root cause" is the separation of Party information from that of Bill Calculation in two separate applications with a standard TMF API between them.

    So the "root cause" is the fundamental non-monolithic application design, which is the whole point of the TMF API standardization. So nobody here is going to fix that root cause.

    The argument is a statistical one of accepting that these errors should just happen "less often". Assume a large Telco has 1 million such request for Party information daily, and that 0.01% of those fail - that leads to a fallout of 100 each day. Manual handling of this fallout can be costly - remediation labor, delays, lost orders and dissatisfied customer churn.

    Of course, in some cases the effect of getting inconsistent information is negligible - it could be a matter of retrying a few minutes later. In other cases, the impact can be considerable. At our level - the TMF standards - there is no way of predicting which API interaction failures could be costly and which ones it would be overkill to handle. So we need to standardize the mechanisms (protocol, attributes, etc) that would enable the operator to ensure consistency at those points where failing to do so is costly.

    Enabling such mechanisms might be relatively simple if they are already available in the standards and the implementing products.

    But inventing and standardizing the mechanisms later, once a real problem is discovered, would take years. As I mentioned above - gluing solutions onto existing protocols and products almost always fails.

    So here, I think Jonathan is quite justified in bringing up the question.



    ------------------------------
    Peter Bruun
    Hewlett Packard Enterprise
    ------------------------------



  • 7.  RE: Data Consistency

    TM Forum Member
    Posted Jul 26, 2024 10:09

    How much of this scenario just boils down to state management? If I apply the HATEOAS (Hypermedia As The Engine of Application State approach) concept I would say the response to the GET on the resource should convey, via hypermedia controls/link object, the state. So if there is a pending or outstanding operation to be performed, the link object would reflect

     

    If all is well

     

    response to a GET on resource

    {

     

    <Resource properties>,

    _links:{

                                    self:        {

    href: < link to same resource>

    }

                    some-completed-rel-name:         {

    href: <some link>

    }

    }

    }

     

    If there is a pending action

     

    response to a GET on resource

    {

     

    <Resource properties>,

    _links:{

                                    self:        {

    href: < link to same resource>

    }

                    some-pending-rel-name:{

    href: < link to same resource >

    }

    }

    }

     






  • 8.  RE: Data Consistency

    TM Forum Member
    Posted Jul 26, 2024 10:14

    I think the answer is, "a bit of both".  While there are a number of strategies to prevent the most egregious occurrences of this issue, the fact that even these activities (locking/etc) take some amount of time means that you can never truly prevent a race condition.  This is where I believe the paranoia comes into play, chasing the ever more extreme use case.

    Even in your specific example, if the market segment was available directly in the bill calculation data model, an update that came in a millisecond late would get shut out by a lock, or not mark the record dirty fast enough to prevent the GET from accessing technically incorrect data.  As noted by most other replies mitigation for the more obvious deficiencies could be warranted, but at some point, it is inevitable.

    A perhaps "less paranoid" way to look at your example might be assessing what the cost to the business is of a incorrect tax calculation x the expected occurrence rate, and consider if that amount is negligible or worth pursuing.



    ------------------------------
    Martin Vossler
    Wideopenwest, Inc.
    ------------------------------



  • 9.  RE: Data Consistency

    TM Forum Member
    Posted Jul 26, 2024 11:12

    You are right, that you cannot prevent a race condition.

    What I think is being overlooked in the responses are these parts of Jonathan's example:

    ... don't challenge realism of the specific example - it's the principle that's important.

    ... manual fallout awaiting correction and resubmission

    This means that there may be a period of hours/days where the response is "dubious".

    This cannot be prevented, and until the fallout is resolved the Party Management, as the master of that information, could return the previous information about the Party.

    Operators do, however, have varying policies and, as you say, the policies would depend on the cost to the business. So if we ignore the specifics of Party Management that may not be the most costly potential inconsistencies, there can in the general case be policies that:

    • Proceed with the previous information - that is probably the default behavior of many implementations
    • Issue a warning to the caller that something is about to change - personnel or north-bound systems may then react to such warnings or relay them to customers
    • Fail the transaction with an error message indicating the problem, instructing the caller to await resolution

    In practice, I have seen all three policies applied for various concrete use-cases.

    The problem is such policies can only be implemented if the response from GET Something contains sufficient information about the transient state of the entity being returned.

    So for the API, this means that there needs to be some attribute(s) in the response that indicate the situation, and since this is not specific to Party, it should be a standard attribute, present in most GET responses.



    ------------------------------
    Peter Bruun
    Hewlett Packard Enterprise
    ------------------------------



  • 10.  RE: Data Consistency

    TM Forum Member
    Posted Jul 28, 2024 10:23

    Thanks all for feedback and input so far.

    What I am hearing is that at the end of the day it needs to be a business decision. As I pointed out in my question, and confirmed by @Peter Bruun's helpful replies, there is a cost involved in understanding if the master data is accurate or not at a given moment in time. It's more moving parts, latency, compute power, etc. I think a business function in the telco (revenue assurance?) would have to take a look, with the aid of architects (telco and vendor), to see if there are any critical paths where accuracy has to be 100%.

    I am actually trying to find out what is going on in the banking industry (non-scientifically, by asking a former colleague who is now there), which is presumably even more demanding than the telco for financial accuracy. Watch this space.



    ------------------------------
    Jonathan Goldberg
    Amdocs Management Limited
    Any opinions and statements made by me on this forum are purely personal, and do not necessarily reflect the position of the TM Forum or my employer.
    ------------------------------



  • 11.  RE: Data Consistency