SmartSwitch Virtual Router Clusters
Abstract
This document describes the Virtual Router Clusters feature
of the SmartSwitch
Router family of high-performance Gigabit switch/routers.
A Virtual Router Cluster allows a group of SmartSwitch
Routers on a LAN to be defined as members of a cluster acting
as backup routers to each other. SmartSwitch Routers in the
cluster use the IETF Virtual Router Redundancy Protocol (VRRP)
to monitor each other and each other's LAN connections. If a
SmartSwitch Router or its LAN connection fails, another
SmartSwitch Router takes over and re-routes traffic around the
point of failure. Router recovery happens very quickly,
typically in a few seconds rather than minutes, and in a way
that is totally transparent to applications running on IP
hosts.
The Virtual Router Cluster is a unique mechanism that has
been designed into the SmartSwitch Router software to ensure
high availability of network applications and improved
reliability and resilience in mission-critical backbone
network environments.
Introduction
In most customer network environments, the network is critical
to the operation of the business and any downtime or outage
can severely impact the customer's business. For example:
- Internet service providers providing web hosting
facilities have to typically guarantee 99.9999% uptime to
their clients to ensure that the client's web servers are
always available to the public.
- Telecommunications companies need to respond to user
requests within a prescribed time, otherwise the service
requested would be assumed to be broken and another
vendor's service used.
- Process control applications must always be able to
access the systems they are controlling, otherwise the
process could get out of control with possible disastrous
consequences.
IP has become the protocol of choice in the majority of
production backbone networks. More and more IP hosts are using
protocols such as Dynamic Host Configuration Protocol (DHCP) 1
to assign their own IP address. However, many IP hosts use
manual configuration as the only way of finding out about
their gateway router address. Some host implementations snoop
on gateway messages but this approach is "not
recommended" (RFC 1122). Active pinging of gateways is
also prohibited. ICMP Router Discovery (DISC) 2
allows routers to be discovered by IP hosts but is not yet
widely implemented.
This means that there is no way for most IP hosts to know
quickly whether a router or its LAN connection has failed. It
can take a long time for an IP host to detect the failure and
switch to an alternative router.
In the meantime, network applications running on the IP
host may have timed out. For customers whose applications are
critical to the operation of their business, the loss of the
network can have a serious impact.
Any mechanism that improves the availability of the network is
therefore of major benefit. Attempts to solve this problem
have been made, but in a proprietary way. For instance,
Cisco's Hot Standby Router Protocol (HSRP) 3
and Digital Equipment Corporations IP Standby Protocol (IPSTB)
4 have addressed
this problem, but are proprietary to these vendors' own
products. An IETF group has recently proposed a draft Internet
standard called the Virtual Router Redundancy Protocol (VRRP) 5
to address this problem.The SmartSwitch Router implements VRRP
and can be used to provide the required level of resilience
and redundancy in mission-critical networks.
Sample Configuration
The following diagram helps to explain how SmartSwitch Virtual
Router Clusters work. In this diagram, the IP hosts (H1, H2,
etc.) are connected to a LAN on which there is more than one
router (R1, R2, etc.). R1 and R2 provide connectivity to
destination D1, though not necessarily at the same cost.
Normally the hosts H1, H2, etc. are configured with the IP
address of a single router, in this case either R1 or R2.The
problem is that if one of the routers fails, one or more of
the hosts H1, H2, etc. could loose connectivity to the
destination D1. Normal IP protocols such as the Routing
Information Protocol (RIP) 6
and the Open Shortest Path First protocol (OSPF) 7
mean that the hosts H1, H2, etc. may take a long time to
discover the failure.This can typically be of the order of 45
to 90 seconds and can cause network applications relying on
TCP connections to time out.
What are Virtual Router Clusters?
Virtual Router Clusters use VRRP to allow a group of
SmartSwitch routers on a LAN to be defined as a routing
cluster in which members of the cluster act as backup routers
to each other. If a SmartSwitch Router in the cluster or its
LAN connection fails, another SmartSwitch Router takes over
very quickly and re-routes traffic around the point of
failure.
Fast failover between SmartSwitch Routers happens so
quickly that it is totally transparent to IP hosts and the
applications running on them.Typically, failover occurs in
less than five seconds whereas an IP host will typically take
minutes (some implementations never!) to notice a failure.
The SmartSwitch Routers in the cluster monitor each other
using VRRP advertisements.
Virtual Router Clusters greatly improve network resilience
and offer improved routing redundancy for mission-critical
applications in backbone network environments.The
implementation used by the SmartSwitch Router is unique and is
implemented in such a way that no changes or software upgrades
need to be made to IP host systems.
How Virtual Router Clusters Work
Members of Virtual Router Clusters use "virtual" MAC
addresses. Each SmartSwitch Router in the cluster is assigned
and associates a virtual MAC (LAN) address with each IP
address on the SmartSwitch Router's LAN circuits. Each
SmartSwitch Router is configured to know the virtual MAC and
IP addresses of all the other SmartSwitch Routers in the
cluster. IP Hosts are configured with any or all SmartSwitch
Routers as their routers.
The SmartSwitch Routers in the cluster use VRRP protocol
messages to elect a Master Router.The other SmartSwitch Router
becomes an "active" Backup Router. Each SmartSwitch
Router in the cluster supplies its virtual MAC address (rather
than its real MAC address) as the source address in ARP
responses to IP hosts, and each SmartSwitch Router also routes
IP host traffic which they optimize using ICMP redirects. If
any of the SmartSwitch Routers in the cluster, or their LAN
connections fail, the master SmartSwitch Router
"impersonates" it for forwarding traffic and ARP
responses by taking over its virtual MAC address and IP
address. If the master SmartSwitch Router fails, the remaining
backup SmartSwitch Routers in the cluster elect a new master
router which takes over the virtual MAC address and IP address
of the failed SmartSwitch Router.
ICMP Router Discovery and ICMP Redirect
ICMP Router Discovery allows IP hosts to distinguish routers
using ICMP messages and procedures. Using ICMP, the
SmartSwitch Router periodically broadcasts ICMP Router
Advertisement messages and responds to ICMP Router
Solicitation messages from IP hosts.
ICMP Redirect is a mechanism that allows routers to inform
IP hosts of better routes to specific IP destination
addresses. ICMP redirect is mandatory in RFC 1122.
Virtual Router Clusters use the standard ICMP redirect
mechanism to control the routers to which IP hosts send
packets and to route around wide area circuit failures.
Packets are still forwarded by the SmartSwitch Router until
the redirect takes effect, ensuring that no packets are lost
during the changeover.
Sample Configuration
The following diagram helps to explain how SmartSwitch Virtual
Router Clusters work.In this diagram, both SSR A and B are
members of the same Virtual Router Cluster and are configured
as Master Router in their own cluster and as Backup Routers in
the other router cluster. SSR A is configured with its own IP
address and virtual MAC address (shown in normal type) and is
also configured to know the IP address and virtual MAC address
of SSR B (shown in italic type).
Likewise, SSR B is configured with its own IP address and
virtual MAC address (shown in normal type) and is also
configured to know the IP address and virtual MAC address of
SSR A (shown in italic type).
In this example, SSR A is elected the Master Router and SSR
B therefore becomes the Backup Router.
Both routers respond to ARP requests from IP Hosts 1, 2 and
3 using their virtual MAC addresses (in normal type), and both
are able to route traffic from IP Hosts 1, 2 and 3.The
"traffic lights" indicate that both routers are
operational.
What Happens if a SmartSwitch Router Fails?
Now lets consider what happens if there is a failure. Suppose
that SSR B, its LAN connection or one of its WAN circuits
fails, as shown in the following diagram. SSR A notices the
failure very quickly and rapidly adopts the IP address and MAC
address of SSR B. SSR A now responds to ARP requests intended
for the failed router's IP address, giving the failed router's
MAC address as the source address. SSR A also receives IP data
intended for the virtual MAC address of the failed router and
forwards it to the correct destination.This occurs
transparently to the IP host systems.
What Happens When the SmartSwitch Router Recovers?
If SSR B recovers, SSR B announces itself and SSR A stops
impersonating it. SSR B now responds to ARP requests using its
virtual MAC address and packets are routed by both routers
again.
There is a very brief interval between a router recovering
and the Master Router ceasing to use its virtual MAC address.
In this short time, it is possible that packets may be lost or
duplicated. In any event, the TCP transport protocol is able
to ensure end-to-end data recovery.
What are the Benefits of Using Virtual Router Cluster?
The main benefit of using Virtual Router Clusters is that they
improve IP failover times by a factor of ten.As failover
occurs transparently to IP hosts, there is little chance of
network applications failing between the time a router fails
and the time the network recovers. Network applications remain
up and consequently connectivity is maintained. Typically,
host systems using ARP and routers using RIP take minutes to
detect (or may never detect) a failure. In a typical
configuration, a SmartSwitch Router using OSPF and configured
as a member of a Virtual Router Cluster can detect and confirm
a failure in less than five seconds. This ability to recognize
and re-route around failures offers greatly improved
resilience for backbone network applications.
References
1. (DHCP) Droms, R., "Dynamic Host
Configuration Protocol," RFC 2131, March 1997.
2. (DISC) Deering, S., "ICMP Router
Discovery Messages," RFC 1256, September 1991.
3. [HSRP] Li,T., Cole, B., Morton, P., and
D. Li, "Cisco Hot Standby Router Protocol (HSRP),"
RFC 2281, March 1998.
4. [IPSTB] Higginson, P., M. Shand,
"Development of Router Clusters to Provide Fast Failover
in IP Networks," Digital Technical Journal,Volume 9
Number 3,Winter 1997.
5. (VRRP) Virtual Router Redundancy
Protocol. S. Knight, D. Weaver, D. Whipple, R. Hinden, D.
Mitzel, P. Hunt, P. Higginson, M. Shand,A. Lindem.April 1998.
6. (RIP) Hedrick, C., "Routing
Information Protocol," RFC 1058, June 1988.
7. (OSPF) Moy, J., "OSPF Version
2," STD 54, RFC 2328, April 1998.
|