WTF is Alert Fatigue?: The Financial Times Edition

January 14th 2021

There’s a reason that prank calls to emergency services are illegal and that we can turn certain notifications off on our phones — because alerts don’t always alert you for the right reasons. Join Sarah Wells, technical director for operations and reliability at the Financial Times, for the first WhatTheFuckinar of 2021.


Sarah’s tackling ‘Alert Fatigue’—something that can affect even the most experienced teams. Together with Container Solutions' CEOJamie Dobson, she explores how to avoid alert overload, which often is the downside of you build it, you run it.

When your system was a monolith, you probably didn’t have huge numbers of things to monitor. When you move to Cloud Native, things change: firstly, you probably have microservices, and that considerably increases the number of alerts that can fire. 

Secondly, if you’re doing it properly, you should have a resilient distributed architecture. That means that some parts of your system can be broken, but the overall system is still fine. 

Cool story! And of course, with great distribution comes great… amounts of alerts.

So now how do you distinguish between alerts that mean ‘this instance is being restarted’ vs. ‘the whole thing is on fire’? 

And especially, how do you make sure only the second one wakes you up at 3 in the morning?

At the Financial Times, Sarah’s team spends a lot of time trying to make sure it can distinguish between OK-broken and wake-people-up-and-fix-it broken. Sarah’s got the lowdown on how to keep 3 a.m. wake ups to a minimum. (The SRE wake ups, we mean— if you live with an on-call doctor or you have a newborn... sorry.)

Who Should Attend?

SREs, DevOps engineers, system administrators, SOC analysts, product owners, Cloud Native fans

Takeaways:

  • Understanding why alert fatigue exists 
  • The tech side of managing alert fatigue
  • The mindset/culture side of it
  • Practical approaches to efficient alerts
  • Understanding another part of #WTFisCloudNative

 

Who will I be listening to?

sarahwellsSarah Wells is a Technical Director at the Financial Times. She has been leading delivery teams across consultancy, financial services and media for 20 years. Over the last few years she has developed a deep interest in operability, observability and devops - and particularly around doing that with a microservices architecture - and at the beginning of 2018 this led to her taking over responsibility for Operations and Reliability at the Financial Times.

 

Jamie Dobson is co-founder and CEO of Container Solutions, a professional services company that specialises in Cloud Native transformation. With clients like Shell, Adidas, and other large enterprises, CS helps organisations navigate not only technology 2-wtf-is-cn-jamiesolutions but also adapt their internal culture and set business strategy. Jamie is the co-author of the new book Cloud Native Transformation: Practical Patterns for Innovation, (O'Reilly Media, 2020). A veteran software engineer, he specialises in leadership and organisational strategy, and is a frequent presenter at conferences.

 

PSchaos_engineering CTA

 

Share
Download our ebook on SRE:
The Cloud Native Approach to Operations
WTF-is-cloud-native-maturity-matrix