|
Home Up Search Industrial Products
| |
|
MIT
Network Storage Series
|
|
SAN
Solution - 3
Fibre
Channel Fabric Switch - High Availability
|
|
The ability of a system to perform its function continuously (without
interruption) for a significantly longer period of time than the
reliabilities of its individual components would suggest. High
availability is most often achieved through failure tolerance. High
availability is not an easily quantifiable term. Both the bounds of a
system that is called highly available and the degree to which its
availability is extraordinary must be clearly understood on a case-by-case
basis.
As always, in today’s business
environment, time is money. However, it is no longer an eight-hour a
day, five-day-a-week affair.
Today’s businesses run
|

|
| twenty-four
hours a day, seven days a week, fifty-two weeks a year. Whenever data is
not available, it costs businesses money in terms of lost transactions,
lost management opportunities and lost customer relationships. While this
has long been the case for large global enterprises, it is now also
increasingly true for even small and medium sized
organizations - lack of access to data at any time is painful, often
disruptive and occasionally fatal
to such organizations.
Causes of Data Unavailability
There are several factors that affect downtime or unavailability of
data as shown in Figure 1. The results
are not surprising:

Figure 1: Causes of Data Unavailability
While most data unavailability is due to unexpected systems outages,
such as hardware and software problems, a significant amount of data
unavailability is due to planned downtime. Altogether, systems outages of
all kinds account for about 80% of the total. Environmental issues such as
power outages, fires, floods, or people errors such as the man on a
backhoe digging in the wrong place cause the remaining 20% of the
downtime.
There are many ways to manage these problems. For example, data
unavailability due to people errors and software problems can be managed
with improved procedures and processes, but nevertheless will probably not
be completely eliminated. Hardware problems can be managed with redundancy
and other techniques, but likewise cannot be completely eliminated. Things
will break, regardless.
Certainly, you can increase protection by increasing the safeguards,
but such protection can be expensive. The question is, how much guaranteed
data availability can you afford? The answer is "It depends."
Let’ s look at some of the cost
trade-offs:
Given twenty-four hour, seven-days a week utilization, 99% availability
translates to data being unavailable the equivalent 3.9 days per year.
Critical applications would generally not accept this amount of downtime,
whereas some applications would find this acceptable. Think of your
manufacturing management application not available for almost 4 days a
year. Could your business afford it? Probably not.
Moving to 99.9% availability translates to a more acceptable 8.8 hours
of downtime in a year. But it might also cost ten times as much for such a
system. Is it worth it? The reason that an application is important to a
business is the economic value per hour it has to that business. For
example, at a major financial center it might cost $2 million per hour for
such downtime, but non-critical applications may cost only $10,000 per
hour or less. Thus, there is a gradient of trade-offs that must be
assessed. This, in turn, has lead to a gradient of High Availability
options, each with its own cost/benefit ratio.
High Availability Objectives

Figure 2:
SAN High Availability
The traditional solution to data unavailability has been High
Availability (HA) systems. This is a catchall phrase that covers many
different approaches to increasing data availability. But generically, a
HA system is designed to do several things:
· First,
it must reduce downtime due to planned maintenance.
· Second, it must be
resilient to unexpected hardware and software failure.
· Third, it must also
deal with problems such as disaster recovery in case of fire, earthquake,
flood, or even the errant backhoe operator.
Traditionally, the solution to a HA requirement has been the fault
tolerant system, a term that is often applied to a hardware configuration
that allows redundant or otherwise protected components to fail-over (or
switch) to a new component or set of components so that downtime is
minimized. F ailure tolerance in disk subsystems is often achieved
by including redundant instances of components whose failure would make
the system inoperable, coupled with facilities that allow the redundant
components to assume the function of failed ones.
|
|