19-12-2012, 05:44 PM
Lord of the Links: A Framework for Discovering Missing Links in the Internet Topology
1Lord of the Links.pdf (Size: 759.67 KB / Downloads: 24)
Abstract
The topology of the Internet at the Autonomous
System (AS) level is not yet fully discovered despite significant
research activity. The community still does not know how many
links are missing, where these links are and finally, whether the
missing links will change our conceptual model of the Internet
topology. An accurate and complete model of the topology would
be important for protocol design, performance evaluation and
analyses. The goal of our work is to develop methodologies and
tools to identify and validate such missing links between ASes. In
this work, we develop several methods and identify a significant
number of missing links, particularly of the peer-to-peer type.
Interestingly, most of the missing AS links that we find exist as
peer-to-peer links at the Internet Exchange Points (IXPs). First,
in more detail, we provide a large-scale comprehensive synthesis
of the available sources of information. We cross-validate and
compare BGP routing tables, Internet Routing Registries, and
traceroute data, while we extract significant new information from
the less-studied Internet Exchange Points (IXPs). We identify
40% more edges and approximately 300% more peer-to-peer
edges compared to commonly used data sets. All of these edges
have been verified by either BGP tables or traceroute. Second,
we identify properties of the new edges and quantify their effects
on important topological properties. Given the new peer-to-peer
edges, we find that for some ASes more than 50% of their paths
stop going through their ISPs assuming policy-aware routing. A
surprising observation is that the degree of an AS may be a poor
indicator of which ASes it will peer with.
INTRODUCTION
AN ACCURATE and complete model of the Internet
topology is critical for future protocol design, performance
evaluation, simulation and analysis [1]. The current
initiatives of rethinking and redesigning the Internet and its
operation from scratch would also benefit from such a model.
However, it remains as a challenge to develop an accurate
representation of the Internet topology at the AS level, despite
the recent flurry of studies [2]–[9]. Currently, there is a list of
sources that contain such topological information. The list includes
archives of BGP routing tables.
Related Work and Comparison
There has been a large number of measurements studies related
to topology discovery, with different goals, at different
times, and using different sources of information.
Our work has the following characteristics that distinguish it
from most previous other efforts, such as [9], [2]: 1) We make
extensive use of topological information from the Internet
Exchange Points to identify more edges. It turns out that IXPs
“conceal” many links which did not appear in most previous
topology studies. 2) We use a more sophisticated, comprehensive
and thorough tool [12] to filter the less accurate IRR
data, which was not used by previous studies. 3) We employ a
“guess-and-verify” approach for finding more edges by identifying
potential edges and validating them through targeted
traceroutes. This greatly reduced the number of traceroutes that
were needed. 4) We accept new edges conservatively and only
when they are confirmed by a BGP table or a traceroute. In
contrast, some of the previous studies included edges from IRR
without confirming them with traceroute.
The most relevant previous work is done by Chang et al.
[2] with data collected in 2001. They identify new edges by
looking at several sources of topological information including
BGP tables and IRR. They estimate that 25%-50% AS links
were missing from Oregon RouteviewBGP table, the most commonly
used data set for AS topology studies. Their work was an
excellent first step towards a more complete topology.
FRAMEWORK FOR FINDING MISSING LINKS
In this section, we present a systematic framework for extracting
and synthesizing the AS level topology information
from different sources. The different sources have complementary
information of variable accuracy. Thus, we cannot just
simply take the union of all the edges. A careful synthesis and
cross-validation is required. At the same time, we are interested
in identifying the properties of the missing AS links.
In a nutshell, our study arrives at three major observations
regarding the properties of the missing AS links: 1) most of the
missing AS edges are of the peer-to-peer type; 2) most of the
missing AS edges from BGP tables appear in IRR; and 3) most
newfound AS edges are incident at IXPs. At different stages of
the research, these three observations direct us to discover even
more edges, some of which do not appear in any other source of
information currently.
Are We Missing a Lot More Peer Edges?
Currently, the ALL graph has approximately 20.9K peer-topeer
edges. However, we were very conservative in adding edges
from IRRnc: we required that the edges are verified by RETRO.
So, a natural question is, how many more edges could we verify
from IRRnc if we had more RETROservers. In otherwords, how
many edges could we be missing?We attempt to provide an estimate
by extrapolating the success of our method in finding new
edges. Given the results above, we expect that the new edges
would be of the peer-to-peer type.
CONCLUSION
In a nutshell, our work develops a systematic framework for
the cross-validation and the synthesis of most available sources
of topological information. We are able to find and confirm approximately
300% additional edges. Furthermore, we recognize
that Internet Exchange Points (IXPs) hide significant topology
information and most of those new discovered peer-to-peer AS
links are incident at IXPs. The reason for such a phenomenon
is probably because, most missing peer-to-peer links are likely
to be at the middle or lower level of the Internet hierarchy, and
peering at some IXP is a cost-efficient way for the ASes to setup
peering relationships with other ASes. We show that by adding
these new AS links, some research results based on previous incomplete
topology, such as routing decision and ISP profit/cost,
change dramatically. Our study suggest that business-oriented
studies of the Internet should make a point of taking into consideration
as many peer-to-peer edges as possible.