Open APIs

Expand all | Collapse all

Service Problem Management

  • 1.  Service Problem Management

    TM Forum Member
    Posted May 21, 2021 13:23

    Hello All,

    I was checking the Service problem management API and automating the service problem management in fault management domain of NaaS.

    As per this process, the operational domain is concerned with performing the root cause analysis and correlations to pre-empt the service problems and identify the actual fault, location, related alarms and affected resources etc. After this should the related ticket to the TT system should be a problem ticket or an incidence ticket?

    Also in cases where there is going to be prediction of service degradation, will that qualify for a problem ticket or an incidence ticket (621)?

    More light on the use cases and corresponding APIs in these scenarios will be very helpful.

    @Johanne Mayer

    Warm Regards,
    Jigna



    ------------------------------
    Jigna Srivastava
    LIGHTSTORM TELECOM CONNECTIVITY PRIVATE LIMITED
    ------------------------------


  • 2.  RE: Service Problem Management

    TM Forum Member
    Posted 30 days ago
    Edited by Johanne Mayer 30 days ago
    Hi Jigna,

        I think that your use of "incident" may be different from what is used within ITIL and TMF.  Maybe you can explain further if it doesn't match with the definition below.

         For example, I reviewed the information posted by @Jacob Avraham as the TMF621 owner on the difference between an incident and a problem on the TMF API project:
      1. BMC reference: 
        1. incident management: http://www.bmcsoftware.in/guides/itil-incident-management.html
        2. problem management: http://www.bmcsoftware.in/guides/itil-problem-management.html
      2. Incident Vs. Problem: 
        1. http://www.conceptsolutionsbc.com/it-service-management-mainmenu-60/30-it-service-management/182-incident-and-problems-what-is-the-difference
    I personally liked the example provided by Concept Solution that an incident is a disruption of service that needs attention now; while a problem may occur, but no service is affected, and you may decide that the solution is too costly to implement and leave it as is.

    Incident vs. Problem: What is the Difference?

    To illustrate this further, let's take a practical example.

    You are driving your car, and you got a flat tire. This is an incident because it disrupted the service: transportation to a destination. You fix this by either changing the tire yourself or calling roadside assistance. Once the tire has been changed, the incident is closed. But now, you have a problem; you are running on your spare tire.

    To fix the problem, you need to repair the flat tire and put it back.

    Another example would be that you are driving on an almost bald tire. This is a problem. If you continue to drive your car with that bald tire, you are bound to have an incident.

    Normally, an incident needs to be fixed within a specific timeline. Problems can be left indefinitely until an incident happens.

    From a NaaS domain perspective, you can report on both incidents/problems (as per their definition above) via the service problem management API.  The TT system is listening to the create/update/status changes/delete.  This is typically to let other systems know of service affecting issues so CSRs are aware (or info accessible from a customer portal via APIs) and can pass this on to the customer.

    From a prediction of service degradation, there is a new white paper on performance management/assurance on most APIs used (SLA, PM, Threshold, alarms, SQM, etc) and their role waiting for approval before being published on confluence.  Once it is released we plan on adding best practices in the IG1224 NaaS Service Fulfillment project.

    Hopefully, this helps with what you are trying to do.

    ------------------------------
    Johanne Mayer
    MayerConsult Inc
    ------------------------------



  • 3.  RE: Service Problem Management

    TM Forum Member
    Posted 30 days ago
    Edited by Jigna Srivastava 30 days ago
    Thank you so much Johanne for the references that you provided. 
    With the references given by ITIL (as most TT systems use ITIL) and based on your inputs I can conclude that the operational domain (one doing the alarm correlations and root cause analysis) can either make incident/problem tickets with the TT system using 621 (TT API).
    We have a few RCA scenarios like Fibre cut and optical line failure (for correlation) and hence I thought this can trigger a problem ticket which can be linked to the incident tickets that are raised by the CRM systems to the TT system. Will this be correct?
    Apart from this, I understand from the documentation that the operational domain  (one doing the alarm correlations and root cause analysis) should expose the Service Problem APIs as listener (supporting service problem lifecycle - TMF 656 APIs).  Service problem handler is OSS and these processes are similar to the fault management processes. Is this understanding correct?
    Also is there any service assurance guide that explain more flows in this domain in detail?

    @Johanne Mayer
    Warm Regards,
    Jigna





  • 4.  RE: Service Problem Management

    TM Forum Member
    Posted 28 days ago
    Hi Jigna,

        Here is the distinction that I see as per the TM Forum eTOM processes Business Process Framework (eTOM) Poster R18.5 | TM Forum | TM Forum

    Service problem management deals with problems or incidents affecting a service, while Fault management (TMF refers to Trouble Management) is a resource-level function.

    With NaaS, the Domain has to manage both its resources and services and expose information at the service level. Hence, we keep an abstraction independent of supplier and, where we can, technology.   Is the CRM sending a request to the network domain to fix something, else how would the domain know about the CRM TT to associate with??  Unless you have an inventory system with an impact analysis that could create a service problem and contain the customers/services affected?

    SID may have more information on assurance, but I checked with Pierre Gauthier today, and the Performance management white paper has been approved, so it should be available on the TMF website very soon. I will send you a pdf version from the TMF RAND GitHub, where it is posted (under the Exemplar folder).

    Best regards... Johanne

    ------------------------------
    Johanne Mayer
    MayerConsult Inc
    ------------------------------



  • 5.  RE: Service Problem Management

    TM Forum Member
    Posted 27 days ago
    Thank you so much Johanne, this is very helpful.

    Regards,
    Jigna








  • 6.  RE: Service Problem Management

    TM Forum Member
    Posted 25 days ago
    Edited by Johanne Mayer 12 days ago
    Hi @Jigna Srivastava, in case you didn't find the information in the SID - it is well hidden under the Common section GB922 Common v20.0.1 | TM Forum | TM Forum

    Trouble or Problem ABE

    • A description of a problem that can be shared between the service provider and the customer. Trouble or Problem is an indication that an entity (such as Resource, Service or Product) is no longer functioning according to the expected SLA.
    • GB922 Resource level GB922 Resource v19.5.1 | TM Forum | TM Forum has the following Resource Trouble ABE definition:

      3.8.     Resource Trouble ABE

      The Resource Trouble ABE manages problems found in allocated resource instances, regardless of whether the problem is physical or logical. Entities in this ABE detect these problems, act to determine their root cause, resolve these problems and maintain a history of the activities involved in diagnosing and solving the problem. Detecting problems can be done via software (e.g. responding to an alarm) and/or by hardware (e.g. a measurement or probe) and/or manually (e.g. visual inspection). This includes tracking, reporting, assigning people to fix the problem, testing and verification, and overall administration of repair activities.

      Regarding Fault management the best definition is from ITU alarm M3703 M.3703 : Common management services - Alarm management – Protocol neutral requirements and analysis (itu.int)

      The occurrence of failures in a NE may cause a deterioration of this NE's function and/or service quality and will, in severe cases, lead to the complete unavailability of the respective NE. In order to minimize the effects of such failures on the quality of service (QoS) as perceived by the network
      users, it is necessary to:
      • detect failures in the network as soon as they occur and alert the operating personnel as fast
      as possible;
      • isolate the failures (autonomously or through operator intervention), i.e., switch off faulty
      units and, if applicable, limit the effect of the failure as much as possible by reconfiguration
      of the faulty NE/adjacent NEs;
      • if necessary, determine the cause of the failure using diagnosis and test routines; and
      • repair/eliminate failures in due time through the application of maintenance procedures.

      This aspect of the management environment is termed "fault management" (FM). The purpose of
      FM is to detect failures as soon as they occur and to limit their effects on the network quality of
      service (QoS) as far as possible.
      This is supported by the TMF642 alarm management API which is more of a NaaS network domain function or E2E network domain function (i.e. stays within the production functional block)

      SID has a special addendum ID TIP Resource Alarm Management Information agreement in case you are interested.

      I hope this helps you further and we discussed bringing some of that information in our NaaS IG1224 work for others to benefit!
      Special thanks to @pierregauthier for some of the pointers!

      Best regards.... Johanne






    ------------------------------
    Johanne Mayer
    MayerConsult Inc
    ------------------------------