25-08-2017, 09:32 PM
Multiple Routing Configurations for Fast IP Network Recovery
Multiple Routing Configurations.docx (Size: 80.55 KB / Downloads: 13)
Abstract—
As the Internet takes an increasingly central role
in our communications infrastructure, the slow convergence of
routing protocols after a network failure becomes a growing
problem. To assure fast recovery from link and node failures in
IP networks, we present a new recovery scheme called Multiple
Routing Configurations (MRC). Our proposed scheme guarantees
recovery in all single failure scenarios, using a single mechanism
to handle both link and node failures, and without knowing the
root cause of the failure. MRC is strictly connectionless, and
assumes only destination based hop-by-hop forwarding. MRC is
based on keeping additional routing information in the routers,
and allows packet forwarding to continue on an alternative
output link immediately after the detection of a failure. It can be
implemented with only minor changes to existing solutions. In
this paper we present MRC, and analyze its performance with
respect to scalability, backup path lengths, and load distribution
after a failure. We also show how an estimate of the traffic
demands in the network can be used to improve the distribution
of the recovered traffic, and thus reduce the chances of congestion
when MRC is used.
I. INTRODUCTION
In recent years the Internet has been transformed from a
special purpose network to an ubiquitous platform for a wide
range of everyday communication services. The demands on
Internet reliability and availability have increased accordingly.
A disruption of a link in central parts of a network has the potential
to affect hundreds of thousands of phone conversations
or TCP connections, with obvious adverse effects.
The ability to recover from failures has always been a central
design goal in the Internet [1]. IP networks are intrinsically
robust, since IGP routing protocols like OSPF are designed
to update the forwarding information based on the changed
topology after a failure. This re-convergence assumes full
distribution of the new link state to all routers in the network
domain. When the new state information is distributed, each
router individually calculates new valid routing tables.
This network-wide IP re-convergence is a time consuming
process, and a link or node failure is typically followed by a
period of routing instability. During this period, packets may
be dropped due to invalid routes. This phenomenon has been
studied in both IGP [2] and BGP context [3], and has an
adverse effect on real-time applications [4]. Events leading
to a re-convergence have been shown to occur frequently [5].
Much effort has been devoted to optimizing the different
steps of the convergence of IP routing, i.e., detection, dissemination
of information and shortest path calculation, but the
Manuscript received December 21, 2006, revised July 21 2007
All authors are with Simula Research Laboratory, Oslo, Norway
convergence time is still too large for applications with real
time demands [6]. A key problem is that since most network
failures are short lived [7], too rapid triggering of the reconvergence
process can cause route flapping and increased
network instability [2].
The IGP convergence process is slow because it is reactive
and global. It reacts to a failure after it has happened, and
it involves all the routers in the domain. In this paper we
present a new scheme for handling link and node failures
in IP networks. Multiple Routing Configurations (MRC) is a
proactive and local protection mechanism that allows recovery
in the range of milliseconds. MRC allows packet forwarding
to continue over pre-configured alternative next-hops immediately
after the detection of the failure. Using MRC as a
first line of defense against network failures, the normal IP
convergence process can be put on hold. This process is
then initiated only as a consequence of non-transient failures.
Since no global re-routing is performed, fast failure detection
mechanisms like fast hellos or hardware alerts can be used
to trigger MRC without compromising network stability [8].
MRC guarantees recovery from any single link or node failure,
which constitutes a large majority of the failures experienced
in a network [7]. MRC makes no assumptions with respect to
the root cause of failure, e.g., whether the packet forwarding
is disrupted due to a failed link or a failed router.
The main idea of MRC is to use the network graph and
the associated link weights to produce a small set of backup
network configurations. The link weights in these backup
configurations are manipulated so that for each link and
node failure, and regardless of whether it is a link or node
failure, the node that detects the failure can safely forward the
incoming packets towards the destination on an alternate link.
MRC assumes that the network uses shortest path routing and
destination based hop-by-hop forwarding.
The shifting of traffic to links bypassing the failure can
lead to congestion and packet loss in parts of the network [9].
This limits the time that the proactive recovery scheme can
be used to forward traffic before the global routing protocol is
informed about the failure, and hence reduces the chance that
a transient failure can be handled without a full global routing
re-convergence. Ideally, a proactive recovery scheme should
not only guarantee connectivity after a failure, but also do so in
a manner that does not cause an unacceptable load distribution.
This requirement has been noted as being one of the principal
challenges for precalculated IP recovery schemes [10]. With
MRC, the link weights are set individually in each backup
configuration. This gives great flexibility with respect to how
the recovered traffic is routed. The backup configuration used
after a failure is selected based on the failure instance, and
thus we can choose link weights in the backup configurations
that are well suited for only a subset of failure instances.
The rest of this paper is organized as follows. In Sec. II
we describe the basic concepts and functionality of MRC. We
then define MRC formally and present an algorithm used to
create the needed backup configurations in Sec. III. In Sec. IV,
we explain how the generated configurations can be used to
forward the traffic safely to its destination in case of a failure.
We present performance evaluations of the proposed method
in Sec. V. In Sec. VI, we discuss how we can improve the
recovery traffic distribution if we have an estimate of the
demands in the network. In Sec. VII, we discuss related work,
and finally we conclude in Sec. VIII.
II. MRC OVERVIEW
MRC is based on building a small set of backup routing
configurations, that are used to route recovered traffic on
alternate paths after a failure. The backup configurations differ
from the normal routing configuration in that link weights
are set so as to avoid routing traffic in certain parts of the
network. We observe that if all links attached to a node are
given sufficiently high link weights, traffic will never be routed
through that node. The failure of that node will then only
affect traffic that is sourced at or destined for the node itself.
Similarly, to exclude a link (or a group of links) from taking
part in the routing, we give it infinite weight. The link can
then fail without any consequences for the traffic.
Our MRC approach is threefold. First, we create a set of
backup configurations, so that every network component is
excluded from packet forwarding in one configuration. Second,
for each configuration, a standard routing algorithm like OSPF
is used to calculate configuration specific shortest paths and
create forwarding tables in each router, based on the configurations.
The use of a standard routing algorithm guarantees
loop-free forwarding within one configuration. Finally, we
design a forwarding process that takes advantage of the backup
configurations to provide fast recovery from a component
failure.
In our approach, we construct the backup configurations
so that for all links and nodes in the network, there is a
configuration where that link or node is not used to forward
traffic. Thus, for any single link or node failure, there will
exist a configuration that will route the traffic to its destination
on a path that avoids the failed element. Also, the backup
configurations must be constructed so that all nodes are
reachable in all configurations, i.e., there is a valid path with
a finite cost between each node pair. Shared Risk Groups
can also be protected, by regarding such a group as a single
component that must be avoided in a particular configuration.
In Sec. III, we formally describe MRC and how to generate
configurations that protect every link and node in a network.
Using a standard shortest path calculation, each router
creates a set of configuration-specific forwarding tables. For
simplicity, we say that a packet is forwarded according to a
configuration, meaning that it is forwarded using the forwarding
table calculated based on that configuration. In this paper
we talk about building a separate forwarding table for each
configuration, but we believe that more efficient solutions can
be found in a practical implementation.