Downtime is the big bad nemesis of every business. Outages can cost over $100 million and jeopardise security, revenue loss and harm to reputation. With outage costs on the rise, a greater demand for data centres and the ever-increasing value of the data they house – downtime can be disastrous.
Within data centres, failure causes look like this: 27% service issues, 26% installation issues, 22% staffing issues and 20% due to insufficient preventative maintenance.
Processes and procedures help ditch downtime. When it comes to the built environment, design contrast and colour-coded equipment can increase clarity and diminish human error. But conversely – uniform design can also help champion repetitive tasks. And checklists? They’re a simple but effective tool – tick!
As the standard bearer for Digital Infrastructure performance, The Uptime Institute authors a data centre classification system outlining four tiers, each matching a specific business purpose. Tier I = basic capacity, Tier II = redundant capacity, Tier III = concurrently maintainable and Tier IV = fault-tolerant.
The tiers describe the availability of infrastructure resources in a facility and determine benchmarks for maintenance, power, cooling and fault capabilities. Used to plan, build and manage thousands of sites across over 110 countries, the Institute corroborates the design documents against its criteria before giving a tier classification.
Because data centres are dynamic, facility managers make quick, immediate decisions. The total cost of ownership (TCO) approach oversees obtaining, designing, building and maintaining assets for less overall risk and expense. It’s big-picture thinking for ultimate uptime, competitive edge and profitability.
Here are four simple practices that can reduce your TCO:
1. Don’t react – prevent
Your equipment won’t deteriorate on cue. Use data as an early warning to remedy issues, and avoid purchasing new equipment and downtime.
2. FM tech talk
Human error goes with the territory, but automating operations with centralised and regulated information can curb this. And with robust data and analytics, you can confidently act on energy waste rather than react with band-aid solutions.
3. Asset management plan
An inventory will deliver data on the condition and functionality of your equipment. Using this, you can control whether you repair or replace and optimise your centre long-term.
4. The same page
Robust standards protect against changes in your environment. For optimum FM, it’s crucial to establish compliance methods and standard operating procedures (SOP) and then train your people to meet benchmarks.
In the Uptime Institute’s 2020 survey, 75% of respondents said downtime was preventable with better management or processes. Enter the device-driven checklist; keep tabs on daily maintenance tasks and fortify powering down and restarting sequences for ever-evolving equipment.
Because data is more valuable than ever, centres are the focus of escalating threats. Apply checklists around security protocols to solidify physical security. Track comings and goings, confirm all door keypads and cameras are working, and ensure outer fencing and vehicle entry gates are sound.
Exposure to floods, storms and numerous other environmental conditions is an increasing reality. Crisis readiness means putting protocols in place and embedding these in your checklists.
Older data facilities can be a liability in terms of energy and equipment, but upgrading a legacy centre can also bring issues. To maximise assets, educate employees about the three critical aspects of your system.
1. Electrical conduct
Always opt for a qualified and trusted expert for your electrical maintenance. Generators are high-maintenance, so run them often and exercise automatic transfer switches with your generator runs. Ensure optimal battery performance with quarterly inspections for uninterruptible power supply (UPS). Include a power distribution unit (PDU) inspection for bulbs, displays, missing hardware, corrective suggestions and check alarm schedules.
2. Keep cool
No matter its configuration – your cooling system significantly impacts uptime. Finding the balance between unnecessary overcooling costs and keeping temperature and humidity at bay is hard. Cooling systems have many parts, so their maintenance is nuanced. Make sure you clearly outline the frequency of inspections and maintenance tasks.
3. Fire fortification
Numerous factors are in play regarding fire detection and suppression; the size of your space, whether an existing sprinkler or gaseous suppression system is in place, and the nature of your sensor set-up. Critical spaces require careful management; custom processes and maintenance must comprise daily and weekly inspections and yearly tests.
Data demands preventative maintenance to stay ahead of equipment failure and efficiency. Keep equipment clean, check bearings and cycle dual-path. Monitor mechanical, electrical and subsystems and balance cooling and humidity to create stability and manage electrostatic discharge risk.
Conduct periodic performance testing, including partial failovers, bypass operations, and loss of redundant components. And always check under perforated floor tiles and server racks (and fix holes). Document everything and file testing reports so it’s all up to date.
Cooling and energy costs are connected, so you might want to raise the return temperature by keeping supply and return streams self-contained. Prevent hot spots with temporary solutions like fans while considering the efficiency of permanent solutions.
Get Grosvenor Engineering Group to help protect your critical infrastructure, mitigate potential downtime and minimise the risk of unforeseen outages at your Data Centre facility.