If your control system failure experiences are anything like mine, most of them haven't occur on weekday mornings when it's convenient for you and your team to evaluate how good your recovery procedures are.
Instead, they're more likely to occur at inconvenient times, whether it be in the middle of the night or over the weekend.
We often go into such events hoping for a smooth recovery that won't take any longer than it should.
And we often leave these events with a clear understanding of how a little pro-activity and preparation can be the difference between a short outage and losing an entire shift to downtime.
That's why it's important to evaluate our system backups and procedures: it's a key element in having a strong and efficient control systems group that can maintain facility operations with minimal downtime.
The simple fact is our operational teams depend on us to be able to resolve these issues as quickly and efficiently as possible.
What kind of system equipment should I maintain backups for?
In the control system world we have a little bit of everything.
This can include Programmable Controllers, Distributed Control Systems, HMI Terminals, PC Workstations and Servers (which can be hardware or virtual systems,) as well as Network Equipment including Switches, Routers and Firewalls.
But no matter what the mix of equipment is, the important question is, “do you have a recoverable backup for every intelligent device, and is every device inside your OT network accounted for?”
Manual or automatic backups
While implementing a schedule to manually backup all your individual programmable devices can be the easiest solution to initially setup, there are also automatic backup infrastructures to consider.
Many automation companies have products that automatically backup their DCS, PLC, HMI, and SCADA systems. Depending on the size of your facility and system networks, these automated backup systems can be a worthwhile investment.
And when it comes to individual servers, workstations, and virtual machines, there are some great systems with management servers that do automatic scheduling of backups.
These automated backup systems can save your team lots of time over the long run, time which can be spent on tasks that improve operational performance.
Where will I store my backups?
Network storage is a great place to start, however I don't recommend placing all your eggs in one basket so to speak.
Data storage centers on opposite ends of a facility are a best practice in case of fire or other disasters.
Backups are great, but recovery time is what matters
All the backups in the world are great, but when the time comes the only aspect that really matters is the effectiveness of your recovery plan.
To that end, you should ask yourself if every member of you control system team has the following:
- Access to documented procedures for recovering all system backups?
- Understanding of where to find backups, and how to recover them?
- Has shared knowledge from previous recovery events, so if a team member leaves other team members aren't left in the dark?
These are questions and evaluations that we must all ask ourselves from time to time to ensure that our facility has the insurance it needs when a failure occurs.
By being proactive, we can eliminate potential lost production time due to having to conduct an emergency investigation to locate procedures, backups, configuration data, passwords, and other critical system information.
When the time comes, these failure “opportunities” will either expose our weaknesses, or substantiate our preparedness.
Either way, our response in these situations will have a huge impact on how much faith the operational team has in their control systems team.
Written by Brandon Cooper
Senior Controls Engineer and Freelance Writer