Saturday, June 19, 2010

Fail: Tales of a Recent Disaster

A client had a recent system failure that points out a bundle of things that all organizations (and us regular people too) need to remember:
  • You don't always see a disaster coming. The organization was well prepared for a weather emergency, an earthquake, or a staffing shortage but when a contractor pulled the wrong set of wires and set off a small fire, all of their communications systems went dead. Dead dead. It wasn't a pretty scene from what I understand.
  • Their emergency manager was nowhere to be found when the disaster struck. I realize that people take time off, maybe leave for an appointment, and in most positions this isn't such a big deal but I prefer my emergency managers to be a type A, always in control, and always connected to their cell phone 24/7 even if they are in the middle of a root canal kind of person.
  • The emergency manager problem would have been less of an issue if an entire team had been properly trained to handle any emergency that may happen. Some companies think that if they have an emergency manager, they are read for a disaster. People being, well, people, there is a chance that no matter how dedicated your emergency manager is, he may not be available when a disaster happens which is why emergency preparedness needs to be in the job description of EVERY employee and why an entire team needs to be ready to respond to a disaster.
  • Communications is almost always a huge problem no matter what kind of disaster occurs. In this case, it was the problem. Imagine an organization where timely communication is critical. Now imagine that all phones, computers, and servers are down and the link to the back up generators was also blown. People are spread out in buildings all over a campus and they have no idea what is happening, when it will be fixed, or what to do. Big problem.
  • Lesson learned: emergency radios need to be in the places where they will be used during a disaster. In this case, all of the radios were in a storage locker and there was quite a delay in getting them to the various buildings/stations where they could be used. The thinking was that radios left around would walk away, be forgotten, or otherwise not be used. Their new thinking is that radios will be pre-positioned in a similar fashion to fire suppression gear which is secured behind a "break glass bar to access cabinet" system.
  • Lesson learned: Everyone had cell phones which could theoretically be used in the event of a disaster, however the cell phone numbers to other departments and staff were on a the computers which were now out of commission due to the server system and electrical system being dead.
  • All records in this organization were on computer. Basically all data, files, records, and other critical information are stored digitally in a warehouse of servers which were rendered useless. They went back to writing information by hand, somewhat, but there was no way to check critical information which for this organization could have resulted in very bad outcomes. A back-up system of your data is important. A way to access this information when your back-up system is toast is more important.
  • They thought they had doubly redundant and triply redundant communication systems that went something like land lines, e-mail notification, and overhead paging. None of these systems worked due to the nature of the disaster. In this case, a quadruply redundant communication system would have been useful. You can't have enough back-up plans.
  • When planning for a disaster, you need to have plans for outages of short duration, medium duration, and long-term duration. In this case, landlines were up in an hour or so, power was restored in a few hours, but the servers were down for nearly a full day.

Overall, this organization learned a lot from this incident which I am sure will be neatly typed up in an after action report. What will stay with them long after the report is gathering dust on a shelf, however, is the fact that written plans are nice but once they get written they are put aside and are pretty much "out of sight, out of mind". The annual disaster exercise is only slightly better in that once a year the staff gets to "pretend" there is a disaster, which, useful as drills and exercises are, they leave out a whole bunch of stuff that happens in real life disaster situations. What I hope they will take away from this situation is that preparedness is a thing that needs to be done on a daily basis by ALL employees.

How can this be done? If you only use radios once a year during an exercise, there is a pretty good possibility that people won't know how to use them when a real disaster happens so why not use radios, if only for a check in, once a week and include all departments? If your staff will be expected to be on duty 24/7 if needed right after a disaster, do you do random inspections on a regular basis to ensure that they have their emergency bag (change of clothes, toiletries, food, and water) immediately available to them? If there is special equipment that may need to be used during a disaster, does your staff practice with the equipment regularly (ie: once a month, not once a year)? Preparedness is not a one-time thing. It is an ongoing process subject to refinement after each actual disaster.

No comments:

Post a Comment