AllOvercoming Alert Fatigue in a Modern Ops Environment.
Overcoming Alert Fatigue in a Modern Ops Environment.
alert fatigue
what it is, and what causes it
The basic definition of alert fatigue is simple: When the frequency of alerts exceeds the ability of the operators to
effectively triage those alerts, IT Operations’ workflows break down, and alerts are missed. It becomes harder and
harder to find the real signals in the noise, and consequently, responders can become desensitized to them.
Alert fatigue comes in different forms and can result from several types of problems —
or a combination of them.
The most common culprits include:
Having multiple
Having a monitoring
monitoring
system that simply
systems within
generated too many
an organization
unnecessary alarms
Alerts being
sent to more
people than
necessary
Failing to
differentiate
between critical
and non-critical
alerts
Alerts being
sent to the
wrong people
!
Understanding that alert fatigue can have multiple causes is important, as it’s not simply an inevitable result of having too
much to monitor. Even with a large and active infrastructure to keep tabs on, alert workflows can be tailored in such a way that
your team can handle a high volume of alerts effectively.
Improper management and triaging of alerts — not the sheer quantity of notifications — is the root cause of alert fatigue.
Preventing alert fatigue is as simple as having the right management strategy in place.
3
The Cost of
alert fatigue
If your IT team fails to respond to alerts, the consequences can add up quickly. For example, a storage bucket in the cloud
that is starting to run out of space over a holiday weekend can be brought under control easily, before customers notice a
disruption, if admins receive an alert about the problem in time and act upon it by adding more space. However, if the issue
goes unnoticed because they missed the alert, the business’s reputation suffers, ultimately leading to lost revenue.
That’s only the tip of the iceberg when it comes to the fallout of alert fatigue. When you throw factors such as contractual
obligations and your organization’s reputation into the mix, many additional problems can result.
If a specified level of performance is written into a contract, excessive downtime or failure of services
Contractual
liability
may trigger automatic financial penalties. If your client is in a highly time-sensitive business where
prompt performance is crucial, or a field in which public safety is at stake, legal and financial penalties
may be even more severe. That’s bad news for both you and your customers.
For your software to be successful, it not only has to work, it must be available when your clients
need it, whenever that may be. If you have too much downtime or loss of functionality, your
Loss of users
or customers
customers will eventually start to look for a replacement solution. This is as true of software designed
for use by the general public, as it is for services provided under contract. For instance, an online
store that isn’t available when people want to make purchases is going to lose customers to a
competitor that is live and available.
When potential clients are considering buying your products or services, they generally want to
Loss of sales
know your track record—and with a bit of searching online, it won’t take long to uncover any legal
problems, reliability issues, lost contracts, or what people are saying about you.
4
Please complete the form to gain access to this content