27-09-2012, 01:54 PM
Fast IP Network Recovery using Multiple Routing Configurations
Fast IP Network Recovery.pdf (Size: 153.97 KB / Downloads: 39)
Abstract
As the Internet takes an increasingly central role
in our communications infrastructure, the slow convergence of
routing protocols after a network failure becomes a growing
problem. To assure fast recovery from link and node failures in
IP networks, we present a new recovery scheme called Multiple
Routing Configurations (MRC). MRC is based on keeping
additional routing information in the routers, and allows packet
forwarding to continue on an alternative output link immediately
after the detection of a failure. Our proposed scheme guarantees
recovery in all single failure scenarios, using a single mechanism
to handle both link and node failures, and without knowing the
root cause of the failure. MRC is strictly connectionless, and
assumes only destination based hop-by-hop forwarding. It can
be implemented with only minor changes to existing solutions.
In this paper we present MRC, and analyze its performance with
respect to scalability, backup path lengths, and load distribution
after a failure.
INTRODUCTION
In recent years the Internet has been transformed from a
special purpose network to an ubiquitous platform for a wide
range of everyday communication services. The demands on
Internet reliability and availability have increased accordingly.
A disruption of a link in central parts of a network has the potential
to affect hundreds of thousands of phone conversations
or TCP connections, with obvious adverse effects.
The ability to recover from failures has always been a central
design goal in the Internet [1]. IP networks are intrinsically
robust, since IGP routing protocols like OSPF are designed
to update the forwarding information based on the changed
topology after a failure. This re-convergence assumes full
distribution of the new link state to all routers in the network
domain. When the new state information is distributed, each
router individually calculates new valid routing tables.
MRC OVERVIEW
MRC is based on using a small set of backup routing
configurations, where each of them is resistant to failures of
certain nodes and links. Given the original network topology, a
configuration is defined as a set of associated link weights. In a
configuration that is resistant to the failure of a particular node
n, link weights are assigned so that traffic routed according to
this configuration is never routed through node n. The failure
of node n then only affects traffic that is sent from or destined
to n. Similarly, in a configuration that is resistant to failure
of a link l, traffic routed in this configuration is never routed
over this link, hence no traffic routed in this configuration is
lost if l fails. In MRC, node n and link l are called isolated
in a configuration, when, as described above, no traffic routed
according to this configuration is routed through n or l.
Termination
The algorithm runs through all nodes trying to make them
isolated in one of the backup configurations. If a node cannot
be isolated in any of the configurations, the algorithm terminates
without success. However, the algorithm is designed so
that any biconnected topology will result in a successful termination,
if the number of configurations allowed is sufficiently
high.
For an intuitive proof of this, look at a situation where
the number of configurations created is |V|. In this case,
the algorithm will only isolate one node in each backup
configuration. In biconnected topologies any node can be
removed, i.e. isolated, without disconnecting the network, and
hence invariant 1 above is not violated. Along with a node vi,
all attached links except one (ei,j ) can be isolated. By forcing
node vj to be the next node processed (line 39), and the link
ej,i to be first among Ej (line 41), we guarantee that ej,i and vj
can be isolated in the next configuration. This can be repeated
until we have configurations so that every node and link is
isolated. This holds also for the last node processed, since its
last link will always lead to a node that is already isolated in
another configuration.
LOCAL FORWARDING PROCESS
The algorithm presented in Sec. III creates a set of backup
configurations. Based on these, a standard shortest path algorithm
is used in each configuration, to calculate configuration
specific forwarding tables. In this section, we describe how
these forwarding tables are used to avoid a failed component.
When a packet reaches a point of failure, the node adjacent
to the failure, called the detecting node, is responsible for finding
the configuration where the failed component is isolated,
and to forward the packet according to this configuration.
With our proposal, the detecting node must find the correct
configuration without knowing the root cause of failure.
PERFORMANCE EVALUATION
MRC is a local, proactive recovery scheme that resumes
packet forwarding immediately after the failure is detected,
and hence provides fast recovery. State requirements and the
influence on network traffic are other important metrics, which
will be evaluated in this section.
MRC requires the routers to store additional routing configurations.
The amount of state required in the routers, is related
to the number of such backup configurations. Since routing in
a backup configuration is restricted, MRC will potentially give
backup paths that are longer than the optimal paths. Longer
backup paths will affect the total network load and also the
end-to-end delay. We use a routing simulator to evaluate these
metrics on a wide range of synthetic topologies. We also use a
packet simulator to study the effect of failures on the network
traffic in one selected topology.
RELATED WORK
Much work has lately been done to improve robustness
against component failures in IP networks [22]. In this section,
we focus on some important contributions aimed at restoring
connectivity without a global re-convergence. Tab. III
summarizes important features of the different approaches.
We indicate whether each mechanism guarantees one-fault
tolerance in an arbitrary biconnected network, for both link
and node failures, independent of the root cause of failure
(failure agnostic). We also indicate whether the approaches
guarantees shortest path routing in the failure-free case, and
whether they solve the ”last hop problem”.
CONCLUSION AND FUTURE WORK
We have presented Multiple Routing Configurations as an
approach to achieve fast recovery in IP networks. MRC is
based on providing the routers with additional routing configurations,
allowing them to forward packets along routes that
avoid a failed component. MRC guarantees recovery from
any single node or link failure in an arbitrary biconnected
network. By calculating backup configurations in advance, and
operating based on locally available information only, MRC
can act promptly after failure discovery.
MRC operates without knowing the root cause of failure,
i.e., whether the forwarding disruption is caused by a node
or link failure. This is achieved by using careful link weight
assignment according to the rules we have described. The link
weight assignment rules also provide basis for specification of
a forwarding procedure that successfully solves the last hop
problem.
Fast IP Network Recovery.pdf (Size: 153.97 KB / Downloads: 39)
Abstract
As the Internet takes an increasingly central role
in our communications infrastructure, the slow convergence of
routing protocols after a network failure becomes a growing
problem. To assure fast recovery from link and node failures in
IP networks, we present a new recovery scheme called Multiple
Routing Configurations (MRC). MRC is based on keeping
additional routing information in the routers, and allows packet
forwarding to continue on an alternative output link immediately
after the detection of a failure. Our proposed scheme guarantees
recovery in all single failure scenarios, using a single mechanism
to handle both link and node failures, and without knowing the
root cause of the failure. MRC is strictly connectionless, and
assumes only destination based hop-by-hop forwarding. It can
be implemented with only minor changes to existing solutions.
In this paper we present MRC, and analyze its performance with
respect to scalability, backup path lengths, and load distribution
after a failure.
INTRODUCTION
In recent years the Internet has been transformed from a
special purpose network to an ubiquitous platform for a wide
range of everyday communication services. The demands on
Internet reliability and availability have increased accordingly.
A disruption of a link in central parts of a network has the potential
to affect hundreds of thousands of phone conversations
or TCP connections, with obvious adverse effects.
The ability to recover from failures has always been a central
design goal in the Internet [1]. IP networks are intrinsically
robust, since IGP routing protocols like OSPF are designed
to update the forwarding information based on the changed
topology after a failure. This re-convergence assumes full
distribution of the new link state to all routers in the network
domain. When the new state information is distributed, each
router individually calculates new valid routing tables.
MRC OVERVIEW
MRC is based on using a small set of backup routing
configurations, where each of them is resistant to failures of
certain nodes and links. Given the original network topology, a
configuration is defined as a set of associated link weights. In a
configuration that is resistant to the failure of a particular node
n, link weights are assigned so that traffic routed according to
this configuration is never routed through node n. The failure
of node n then only affects traffic that is sent from or destined
to n. Similarly, in a configuration that is resistant to failure
of a link l, traffic routed in this configuration is never routed
over this link, hence no traffic routed in this configuration is
lost if l fails. In MRC, node n and link l are called isolated
in a configuration, when, as described above, no traffic routed
according to this configuration is routed through n or l.
Termination
The algorithm runs through all nodes trying to make them
isolated in one of the backup configurations. If a node cannot
be isolated in any of the configurations, the algorithm terminates
without success. However, the algorithm is designed so
that any biconnected topology will result in a successful termination,
if the number of configurations allowed is sufficiently
high.
For an intuitive proof of this, look at a situation where
the number of configurations created is |V|. In this case,
the algorithm will only isolate one node in each backup
configuration. In biconnected topologies any node can be
removed, i.e. isolated, without disconnecting the network, and
hence invariant 1 above is not violated. Along with a node vi,
all attached links except one (ei,j ) can be isolated. By forcing
node vj to be the next node processed (line 39), and the link
ej,i to be first among Ej (line 41), we guarantee that ej,i and vj
can be isolated in the next configuration. This can be repeated
until we have configurations so that every node and link is
isolated. This holds also for the last node processed, since its
last link will always lead to a node that is already isolated in
another configuration.
LOCAL FORWARDING PROCESS
The algorithm presented in Sec. III creates a set of backup
configurations. Based on these, a standard shortest path algorithm
is used in each configuration, to calculate configuration
specific forwarding tables. In this section, we describe how
these forwarding tables are used to avoid a failed component.
When a packet reaches a point of failure, the node adjacent
to the failure, called the detecting node, is responsible for finding
the configuration where the failed component is isolated,
and to forward the packet according to this configuration.
With our proposal, the detecting node must find the correct
configuration without knowing the root cause of failure.
PERFORMANCE EVALUATION
MRC is a local, proactive recovery scheme that resumes
packet forwarding immediately after the failure is detected,
and hence provides fast recovery. State requirements and the
influence on network traffic are other important metrics, which
will be evaluated in this section.
MRC requires the routers to store additional routing configurations.
The amount of state required in the routers, is related
to the number of such backup configurations. Since routing in
a backup configuration is restricted, MRC will potentially give
backup paths that are longer than the optimal paths. Longer
backup paths will affect the total network load and also the
end-to-end delay. We use a routing simulator to evaluate these
metrics on a wide range of synthetic topologies. We also use a
packet simulator to study the effect of failures on the network
traffic in one selected topology.
RELATED WORK
Much work has lately been done to improve robustness
against component failures in IP networks [22]. In this section,
we focus on some important contributions aimed at restoring
connectivity without a global re-convergence. Tab. III
summarizes important features of the different approaches.
We indicate whether each mechanism guarantees one-fault
tolerance in an arbitrary biconnected network, for both link
and node failures, independent of the root cause of failure
(failure agnostic). We also indicate whether the approaches
guarantees shortest path routing in the failure-free case, and
whether they solve the ”last hop problem”.
CONCLUSION AND FUTURE WORK
We have presented Multiple Routing Configurations as an
approach to achieve fast recovery in IP networks. MRC is
based on providing the routers with additional routing configurations,
allowing them to forward packets along routes that
avoid a failed component. MRC guarantees recovery from
any single node or link failure in an arbitrary biconnected
network. By calculating backup configurations in advance, and
operating based on locally available information only, MRC
can act promptly after failure discovery.
MRC operates without knowing the root cause of failure,
i.e., whether the forwarding disruption is caused by a node
or link failure. This is achieved by using careful link weight
assignment according to the rules we have described. The link
weight assignment rules also provide basis for specification of
a forwarding procedure that successfully solves the last hop
problem.