22-09-2016, 10:06 AM
1455613863-756915538terabitrouting.docx (Size: 203.03 KB / Downloads: 4)
ABSTRACT
Just a few years back, no one would have thought that internet traffic will increase at such a rapid rate that even gigabit capacity routers in the backbone will be insufficient to handle it. Today, routers with terabit switching capacities have become an essential requirement of the core backbone networks and gigabit routers find their place at the mid-core or even the edge. This survey paper explains the issues in designing terabit routers and the solutions for them. It also discusses about some of the key products in this area.
. INTRODUCTION
In the present network infrastructure, world's communication service providers are laying fiber at very rapid rates. And most of the fiber connections are now being terminated using DWDM. The combination of fiber and DWDM have made raw bandwidth available in abundance. 64-channel OC-192 capacity fibers are not uncommon these days and OC-768 speeds will be available soon. Terabit routing technologies are required to convert massive amounts of raw bandwidth into usable bandwidth. Present day network infrastructure is shown in Fig 1. Currently, Add/Drop multiplexers are used for spreading a high-speed optical interface across multiple lower-capacity interfaces of traditional routers. But carriers require high-speed router interfaces that can directly connect to the high-speed DWDM equipment to ensure optical inter operability. This will also remove the overhead associated with the extra technologies to enable more economical and efficient wide area communications. As the number of channels transmitted on a single fiber increases with DWDM, routers must also scale port densities to handle all those channels. With increase in the speed of interfaces as well as the port density, next thing which routers need to improve on is the internal switching capacity. 64-channel OC-192 will require over a terabit of switching capacity. Considering an example, a current state-of-the-art gigabit router with 40 Gbps switch capacity can support only a 4-channel OC-48 DWDM connection. Four of these will be required to support a 16-channel
OC-48 DWDM connection. And 16 of these are required to support 16-channel OC-192 DWDM connection with a layer of 16 4::1 SONET Add/Drop Multiplexers in between. In comparison to that a single router with terabit switching capacity can support 16-channel OC-192 DWDM connection. With this introduction, we now proceed to understand what is required to build full routers with terabit capacities.
Router Functions
Functions of a router can be broadly classified into two main categories [Nick97]:
1. Datapath Functions : These functions are applied to every datagram that reaches the router and successfully routed without being dropped at any stage. Main functions included in this category are the forwarding decision, forwarding through the backplane and output link scheduling.
2. Control Functions : These functions include mainly system configuration, management and update of routing table information. These does not apply to every datagram and therefore performed relatively infrequently.
Goal in designing high speed routers is to increase the rate at which datagrams are routed and therefore datapath functions are the ones to be improved to enhance the performance. Here we discuss briefly about the major datapath functions :
The Forwarding Decision: Routing table search is done for each arriving datagram and based on the destination address, output port is determined. Also, a next-hop MAC address is appended to the front of the datagram, the time-to-live(TTL) field of the IP datagram header is decremented, and a new header checksum is calculated.
Forwarding through the backplane: Backplane refers to the physical path between the input port and the output port. Once the forwarding decision is made, the datagram is queued before it can be transferred to the output port across the backplane. If there are not enough space in the queues, then it might even be dropped.
Output Link Scheduling: Once a datagram reaches the output port, it is again queued before it can be
transmitted on the output link. In most traditional routers, a single FIFO queue is maintained. But most advanced routers maintain separate queues for different flows, or priority classes and then carefully schedule the departure time of each datagram in order to meet various delay and throughput guarantees.
2.2 Evolution of Present Day Routers
The architecture of earliest routers was based on that of a computer as shown in Fig 2. It has a shared central bus, central CPU, memory and the Line cards for input and output ports. Line cards provide MAC-layer functionality and connects to the external links. Each incoming packet is transferred to the CPU across the shared bus. Forwarding decision is made there and the packet then traverses the shared bus again to the output port. Performance of these routers is limited mainly by two factors : first, processing power of the central CPU since route table search is a highly time-consuming task and second, the fact that every packet has to traverse twice through the shared bus.
3. SWITCHING Vs ROUTING
The basic difference between switching and routing is that switching uses 'indexing' for determining the next hop for a packet in the address table whereas routing uses 'searching'. Since indexing is O(1) operation, it is much faster than any search technique. Because of this, many people started thinking about replacing routers with switches wherever possible and vendors flooded the market with several products under the name of switches. To differentiate their products, vendors gave different names to them like Layer 3 Switch, IP Switch, Layer 4 Switch, Tag Switch etc. and regardless of what a product does, it is called a switch [Decis97] [Decis96][Torrent]. Therefore it is important to understand the difference between all these different forms of switches.
3.1 Switching Hubs
It operates at Layer 1 of the OSI networking model. Individual ports are assigned to different LAN segments as in a bridge. But while they are useful for managing configuration changes, it must be noted that they still propagate contention among their ports and therefore different from layer 2 bridges.
3.2 Layer 2 Switching
Layer 2 Switches is just another name for multiport bridges. As we know, bridges are used to extend the LANs without extending the contention domain. So Layer 2 switches have been used in some places to replace routers for connecting various LANs to produce one big flat network. But the problem with this approach was the broadcast traffic which is propagated across all ports of a Layer 2 switch. To solve this problem, people soon came up with the concept of "Virtual LAN" or VLAN. Basic feature of VLAN is to divide one large LAN
connected by layer 2 switches into many independent and possibly overlapping LANs. This is done by limiting the forwarding of packets in these switches and there are several ways of doing this :
Port based grouping: Packet coming on a certain port may be forwarded to only a subset of all the ports. Layer 2 address based grouping: Looking at the layer 2 address of the packet, set of output ports is decided.
Layer 3 protocol based grouping: Bridges can also segregate traffic based on the Protocol Type field of the
packet ( 2 bytes, between the Layer 2 and Layer 3 address fields).
Layer 3 subnet based grouping: For some layer 3 protocols like IP, bridges may only forward traffic to other ports belonging to the same IP subnet. For this they have to look at layer 3 address of the packet.
In brief, VLAN switches modify the forwarding of bridged traffic. Devices referred as Layer 3 VLAN Switches, still operate at layer 2 but they use some layer 3 information.
ATM cell switching is an entirely new form of switching. Even though it is fundamentally different from a traditional LAN bridge, it is important to note that ATM switches fall in the category of Layer 2 products.
3.3 Layer 3 Switching
There is no consistent definition of "Layer 3 Switches" and they refer to wide variety of products. The only common thing between all of these devices is that they use layer 3 information to forward packets. Therefore, as discussed in the previous section, even Layer 2 VLAN switches with protocol/subnet awareness are sometimes referred as Layer 3 VLAN switches. Other products in this category are :
3.3.1 Layer 3 routing functionality in VLAN switches
There are several reasons for doing this. Pattern of network traffic is changing and the 80-20 rule, which says that
80% of all network traffic is intra LAN, is no longer valid. More traffic is crossing the LAN boundaries these days and to forward this kind of traffic, VLAN switches have to use layer 3 routing functionality. Traditionally, these VLAN switches forwarded such traffic to some route servers but as this type of traffic is increasing, it makes more sense to build this functionality within these switches. Many proprietary solutions are available for doing this.
3.3.2 Layer 2 ATM Switching with IP routing
Reason for doing this is that most of the service providers have invested heavily in ATM technology for their backbones. And now they need to map IP traffic on it. There are two very different approaches used in mapping layer 3 traffic to ATM circuits. The first approach aims at improving routing performance by separating the transmission of network control information from the normal data traffic. Control traffic passes through the routers and route servers to initiate call, whereas normal data traffic is switched through already established path. There are proprietary solutions for this like IP Switching, and there are standard techniques like MPOA( Multi Protocol over ATM) as well. The other approach addresses WAN route scalability issues. Routing decisions are performed once at the entry point to the WAN and the remaining forwarding decisions within the WAN are based
on switching techniques. Tag Switching is one proprietary solution based on this approach and the IETF is working to develop a standard, MPLS.
3.3.3 Route Caching
Since, number of internet hosts is increasing at an exponential rate, it is not possible to have an entry for each of them in every routing table. Therefore, routers combine many of these entries which have a same next hop. But this worsens already complex task of route search. To improve route lookup time, many products keep a route cache of frequently seen addresses. When addresses are not found in the cache, then the search goes through
traditional software-based slow path. Many products in this category, combine layer 2 switching features with route cache based routing and vendors have named them as Layer 3 Switches, Multilayer Switches, Routing Switches and Switching Routers. Cache sizes range from 2000 to 64,000. Most of these products have a processor based slow-path for looking up routes for cache misses, but few of them take help of external router to perform these functions and they are sometimes referred as "Layer 3 Learning Bridges". Route Cache technique scale poorly with routing table size, and cannot be used for backbone routers that support large routing tables. Frequent topology changes and random traffic pattern also eliminate any benefits from the route cache, and performance is bounded by the speed of the slow path.
3.3.4 Full Routing
Some of the latest products in the market perform full routing at very high speeds. Instead of using a route cache, these products actually perform a complete routing table search for every packet. These products are often called Real Gigabit Routers, Gigabit Switching Routers etc. By eliminating the route cache, these products have a predictable performance for all traffic at all times even in most complex internetworks. Unlike other forms of layer 3 switches, these products improve all aspects of routing to gigabit speeds and not just a subset. These
products are suited for deployment in large scale carrier backbones. Some of the techniques used in these products to improve route lookup are discussed later in the paper.
3.4 Switching Above Layer 3
Layerless Switching and Layer 3 Switching are the new buzzwords in the industry. Again there is no consistent definition of what these products do. Vendors are adding the ability to look at layer 4 header information and sometimes more into layer 3 products and marketing them as Layer 4 or Layerless switches. Products operating at layers 2 and 3 handle each packet the same way whether it is part of a long flow between two hosts or one travelling alone. But at layer 4 and higher, there is awareness of the flows and the higher-level applications to which this packet belongs. This information can be used to classify packets into different categories and
depending on how the switch is architected, can be used to provide differentiated services and implement service level agreements in the network.
Back to Table of Contents
4. EFFICIENT ROUTING TABLE SEARCH
One of the major bottlenecks in backbone routers is the need to compute the longest prefix match for each incoming packet. Data links now operate at gigabits/sec or more and generate nearly 150,000 packets per second at each interface. New protocols, such as RSVP, require route selection based on Protocol Number, Source Address, Destination Port and source Port and therefore make it even more time consuming. The speed of a route lookup algorithm is determined by the number of memory accesses and the speed of the memory. This should be kept in mind while evaluating various route lookup techniques described below.
4.1 Tree-based Algorithms
Each path in the tree from root to leaf corresponds to an entry in the forwarding table and the longest prefix
match is the longest path in the tree that matches the destination address of an incoming packet. In the worst case, it takes time proportional to the length of the destination address to find the longest prefix match. The main idea in tree based algorithms is that most nodes require storage for only a few children instead of all possible ones and therefore make frugal use of memory at cost of doing more memory lookups. But as the memory costs are dropping, these algorithms are not the best ones to use. In this category, Patricia-tree algorithm is one of the most common. [Craig99] [Kesh98]
4.2 Techniques to Improve Route Lookup
Various techniques have been proposed to improve route lookup time [Kesh98]. They can be broadly classified into :
Hardware-oriented techniques: Some of these techniques are as simple as using more memory as the costs are dropping and have a separate entry for each internet address. Longest prefix match is not required in this case and complexity of the search is reduced. Other techniques try to reduce the memory access time by combining logic and memory together in a single device.
Table compaction techniques: These techniques make use of the fact that forwarding entries are sparsely distributed in the space of all internet addresses. So they use some complicated compact data structure to store the forwarding table in the primary cache of a processor. This allows route lookup at gigabit speeds.
Hash based techniques: Hashing operates strictly on an exact-match basis and therefore longest prefix match limits the use of hashing for route lookup. The solution to this problem is to try different masks and choosing the one with the longest mask length. Choice of masks can be iterative or hierarchical but none of these solutions scale well with the size of the destination address.
4.3 Route Search at Gigabit Speeds
The solutions described above solve the route lookup problem in most cases. But with media speeds going up, it requires very careful implementation of one or more of the above techniques combined together to have possible advantages from all of them. With 1.5 million packets coming in, a router has only 672 nanoseconds to validate a packet, find an outgoing route and send the packet. Many vendors and research groups in universities have come up with innovative solutions for this. Details of most of the proprietary solutions from vendors have not been disclosed because of patent pending or similar reasons. Some of the well known solutions are mentioned below.
4.3.1 WASHU Algorithm [Craig99]: Developed at Washington University St. Louis., it is a scalable algorithm that uses binary hashing. The algorithm computes a separate hash table for each possible prefix length and therefore maintains 33 hash tables in total. Instead of starting from the longest possible prefix, a binary search on the prefix lengths is performed. Search starts at table 16 and if there is a hit, look for longer match, otherwise look for
shorter match. But this scheme has a bug that if the longest match is a 17-bit prefix and there is no entry in table
16 that leads to looking at higher tables. Therefore markers are added, which track best matching shorter prefix. So now the algorithm works as follows. Hash first 16 bits and look in table 16. If find a marker, save best match from marker and look for longer prefix at table 24. If find a prefix, save prefix as best match and look for longer prefix at table 24. If miss, look for shorter prefix. Continue algorithm until tables exhausted.
4.3.2 Stanford University's algorithm [Nick98]also has a very good performance and all the details are available in a paper available on their site. A brief description of how it works is given here. This algorithm makes use of
the fact that most of the prefixes in route tables of the backbone routers are shorter than 24 bits. The basic
scheme makes use of two tables, both stored in DRAM. The first table (TBL24) stores all possible route prefixes that are up to, and including, 24 bits long. Prefixes shorter than 24 bits are expanded and multiple 24 bit entries are kept for them. Second table (TBLLong) stores all route prefixes in the routing table that are longer than
24-bits. Each entry in TBLLong corresponds to one of the 256 possible longer prefixes that share the single 24-bit
prefix in TBL24. The first 24 bits of the address are used as an index into the first table TBL24 and a single memory read is performed, yielding 2 bytes. If the first bit equals zero, then the remaining 15 bits describe the next hop. Otherwise, the remaining 15 bits are multiplied by 256, and the product is added to the last 8 bits of the original destination address, and this value is used as a direct index into TBLLong, which contains the next hop. Two memory accesses in different tables can be pipelined and the algorithm allows 20 million packets per second to be processed. Fig 5. shows how the two tables are accessed to find the next hop.