The Problem with SLAs...
Most customers and IT suppliers are familiar with Service Level Agreements (SLAs). SLAs are meant to define services, measure them and determine expectations. In reality, they are minefields that sow discord, disdain, lose production hours, cost money and set the two parties further apart from eachother. This is why a well-defined, smartly formulated, mutually agreed SLA, saves time, money and relationships. This article will tell you how you can get your IT services supplier to provide the value for money you need and they claim to deliver. Ask yourself if you are satisfied with the service your IT services supplier delivers? If you are, count your blessings. If not, research suggests you are not alone. So how do you stimulate your IT services supplier(s) to take your interests into account when contracting and delivering services? Let’s start by identifying the causes of dissatisfaction:
- Poor resolution of incidents and reactive (instead of proactive), even passive behavior
- Constant disruption to business processes as a result of incidents
- Late delivery of changes and projects
- Over-budget projects
- Over-priced service (particularly in relation to poor quality service)
These are the most common IT services supplier behaviors that every customer despises. So what do most IT customers want? The answer can be summarized in three brief and basic requests:
- “Keep my existing IT stuff working”
- “Give me new stuff as soon as possible”
- “Advise me on how I can best use what I’ve got and how new stuff can be of use for my business”
This article looks at defining and creating agreements between IT customers and IT services suppliers for request number 1, the maintenance of existing IT “stuff”. How do we define “maintenance”? This means the IT services supplier ensures that currently available functionality works, continues to work and that the availability of IT services is maximized while minimizing the number of IT incidents and their resolution times.
So What Is an SLA?
A Service Level Agreement is often defined as an agreement between two parties that defines services, expectations, responsibilities and priorities. It also recordswhat level of performance is acceptable and determines how to assess adequate service. An SLA can be seen as a communication tool that can be adjusted on a regular basis where both customer and service provider assess and fine tune services. An SLA ensures that both parties use the same criteria when evaluating service quality as well as preventing conflicts and disputes by providing a shared understanding of needs and priorities. The soul of an SLA is that both parties, customer and service provider, have a say in it. The effect of investing more effort in the creation process of an SLA will result in a better understanding of the common criteria relating to quality, needs and priorities, as well as how conflicts and disputes can be avoided. In this case, it is likely that the SLA can safely be ignored considering both parties have already learned how to work together (which is the desired situation).
However, an agreement is much more than simply building the SLA document. It is typically a many-month process of information gathering, analyzing, negotiating and consensus building and this process must involve the customer. If customers are not part of the process, it is not an agreement and it should not be called an agreement! Let’s look at how current Service Level Agreements are contracted.
In practice, many SLAs are defined and created by the service provider and presented to a customer as a “fait accompli”. This is often the case in situations when the customer is dissatisfied with the quality of delivered services. Service providers often rush to implement an SLA, but this often results in a backlash with the customer using the hastily compiled, poorly thought out SLA as yet another source of problems and more complaints follow. Furthermore, Service Levels regarding performance are usually poorly defined in most SLAs and leave IT services suppliers too much room to score “in the green” on paper i.e. agreed performance is met, while in reality the customer perceives performance to be poor. As a consequence, customer satisfaction is low and the relationship between the customer and the IT services supplier suffers. In general, if the monthly service level report is “green” but the satisfaction is low, the chances are very large that the internal performance of the supplier is not as good as the report states. The service is a “watermelon”: green on the outside, red on the inside. Here is an example of one sentence from a typical SLA filled with problematic definitions.
“It is agreed that 95% of high priority incidents per year need to be resolved within 8 hours.”
Why is the last 5% undefined? This often leads to 5% of the incidents NEVER being resolved. They are forgotten and discarded. What’s more, actually closing the incident would negatively impact the resolution stats. So what happens? They remain open…forever. Here’s another common issue.
Your SLA states that incidents must be solved within 8 hours. Your supplier thinks, “Hey! I’ve got a whole 8 hours. I’ll get to this when I’m a little less busy”. Some SLAs agree a reaction time that guarantees someone “look” at the call within an hour. “Looking” is not “resolving” which means you, the customer, are still waiting. Six hours later, the engineer finally picks up the incident and resolves it within an hour. In the end, you’ve waited 7 hours. What’s more, this was high priority incident, which means many people have been impacted by the incident. Imagine something goes wrong and it takes the engineer longer than expected time or he or she gets another critical, even higher priority incident? The call then exceeds the 8 hours in the SLA, the incentive to resolve the issue is gone and this incident now becomes one of the unresolved 5% that simply disappears…and you, the customer, impotently wait.
In the mean time, users either try to solve the problem themselves, create workarounds, or do nothing and become “unavailable”. None of these things result in real productivity for your company. Time and money is lost. This poor quality of service causes you to question the ability of the service provider to do what he or she has been paid to do. It takes surprisingly few of these cases to seriously damage the reputation of IT. You, the IT customer,need to take action. You might ask why the customer needs to take action and not the IT services supplier. After all, the responsibility for creating an SLA and providing service has always been with the supplier. This is part of the problem. Suppliers deliberately write agreements that provide ample “room for maneuver” with regard to cost, effort and time, leading to more “watermelons”.
Lost Production Hours
What is availability? Availability can more graphically be defined by its opposite: unavailability. Unavailability is an employee staring at a blank screen, waiting for a screen to refresh, waiting for a printout to arrive, failed attempts to send an email or other activities that cause a loss of productivity. Unavailability is Lost Production Hours. In this respect, one can directly compare Lost Production Hours to unavailability due to illness. Most businesses have realized they can counter productivity loss through illness by introducing better policies like more ergonomic workplaces, “stop smoking” courses, stress reduction courses, etc.
However, research has shown that IT-related Loss of Productivity has attracted little attention in the boardroom, unless it becomes excessive and threatens the survival of the company. This must change. Businesses need to proactively steer their IT services suppliers. This will reduce Lost Production Hours and increase the perception of satisfaction with the current IT services supplier. Lost Production Hours is a Key Performance Indicator that measures the effect IT services suppliers’ under performance has on business productivity. So how do we determine Lost Production Hours? Lost Production Hours of a single incident is a measure based on the time to repair of that incident. The priority of the incident determines the Business Effect factor, based on the number of people affected. In a given organization(or even industry), an average employee has a dependency on ICT. In an administrative organization, an incident with a critical priority may cause an employee to lose a large amount of their effectiveness (e.g. 75%); whereas in aproduction environment, production may continue but orders are fulfilled later causing a loss of effectiveness of 50%. This is known as the Dependency factor.
Multiplying the time to repair of the incident by Business Effect Factor and Dependency Factor, results in the total Lost Production Hours for that incident. As shown in the figure below, there is a clear cause-effect relationship between Lost Production Hours and excess cost and lost revenue.
- No. of incidents and resolution
- Time that incidents are open
- Time that IT customers cannot work
- Excess cost and lost revenue
The goal of this measure is not to achieve an exact number. The aim is to create a benchmark measurement from which a trend (downwards!) can be managed. By including Lost Production Hours in the SLA, we can introduce an incentive for the whole chain of IT supply to resolve incidents as quickly as possible.
And it works in practice
A multinational feedstock company wanted to improve the availability of its services to customers. Using the LPH measurement methodology, we measured an average cost per month of Lost Production Hours in the business of 4 million euros in 2008. By focusing the organization and introducing improvements to the process, the below graph was realized through 2009. This graph representsa structural 90+% reduction in the impact of incidents on customers. Incidentally, this was achieved with a 15% reduction in workforce. A second organization (an energy trading company) renegotiated the SLAs with its two key service providers based on Lost Production Hours. This resulted in a reduction of the fixed price cost of the SLA from 3.1 million euros per year to 1.68 million euros per year, allowing the customer to reserve a substantial amount (600k euros) for a bonus system to reward excellent performance. After a transition period, the suppliers were both able to achieve large parts of their bonuses, giving the energy trading company a much better service than in previous years. Both suppliers had clear targets and an understanding of how to achieve them. The result is that both suppliers and customer benefit from the improved performance.
By including Lost Production Hours in the SLA, we can introduce an incentive for the whole chain of IT supply to resolve incidents as quickly as possible. Authors:
Harriette Blauwboer, Quint Wellington Redwood
Claudine Koers, Quint Wellington Redwood
Frank Willems, Quint Wellington Redwood
Publicationdate, April 2015