16-05-2012, 04:00 PM
The Geometric Efficient Matching Algorithm for Firewalls
The Geometric Efficient Matching Algorithm for Firewalls.pdf (Size: 389.41 KB / Downloads: 60)
Abstract
Since firewalls need to filter all the traffic crossing the network perimeter, they should be able to sustain a very high
throughput, or risk becoming a bottleneck. Firewall packet matching can be viewed as a point location problem: Each packet (point) has
5 fields (dimensions), which need to be checked against every firewall rule in order to find the first matching rule. Thus, algorithms from
computational geometry can be applied. In this paper we consider a classical algorithm that we adapted to the firewall domain. We call
the resulting algorithm “Geometric Efficient Matching” (GEM). The GEM algorithm enjoys a logarithmic matching time performance.
However, the algorithm’s theoretical worst-case space complexity is O(n4) for a rule-base with n rules. Because of this perceived high
space complexity, GEM-like algorithms were rejected as impractical by earlier works. Contrary to this conclusion, this paper shows that
GEM is actually an excellent choice.
Based on statistics from real firewall rule-bases, we created a Perimeter rules model that generates random, but non-uniform, rulebases.
We evaluated GEM via extensive simulation using the Perimeter rules model. Our simulations show that on such rule-bases,
GEM uses near linear space, and only needs approximately 13MB of space for rule-bases of 5,000 rules. Moreover, with use of
additional space improving heuristics, we have been able to reduce the space requirement to 2-3MB for 5,000 rules.
But most importantly, we integrated GEM into the code of the Linux iptables open-source firewall, and tested it on real traffic loads.
Our GEM-iptables implementation managed to filter over 30,000 packets-per-second on a standard PC, even with 10,000 rules.
Therefore, we believe that GEM is an efficient, and practical, algorithm for firewall packet matching.
Index Terms—Network Communication, Network-level security and protection
F
1 INTRODUCTION
1.1 Motivation
The firewall is one of the central technologies allowing highlevel
access control to organization networks. Packet matching
in firewalls involves matching on many fields from the TCP
and IP packet header. At least five fields (protocol number,
source and destination IP addresses, and ports) are involved
in the decision which rule applies to a given packet. With
available bandwidth increasing rapidly, very efficient matching
algorithms need to be deployed in modern firewalls to ensure
that the firewall does not become a bottleneck.
Modern firewalls all use “first match” semantics [5], [40],
[43]: The firewall rules are numbered from 1 to n, and the
firewall applies the policy (e.g., pass or drop) associated with
the first rule that matches a given packet. See Fig 1 for an
illustrated example.
Firewall packet matching is reminiscent of the well studied
router packet matching problem. However, there are several
crucial differences which make the problems quite different.
First, unlike firewalls, routers use “longest prefix match”
semantics. Next, the firewall matching problem is 4- or 5-
dimensional, whereas router matching is usually 1- or 2-
dimensional: A router typically matches only on IP addresses,
and does not look deeper, into the TCP or UDP packet headers.
Finally, major firewall vendors support rules that utilize IP
Part of this work has appeared, in extended abstract form, in the 23rd
convention of IEEE Israel.
D. Rovniagin and A. Wool are with the School of Electrical Engineering,
Tel Aviv University, Ramat Aviv 69978, Israel. E-mail:
dmitry.rovniagin[at]gmail.com, yash[at]acm.org
address ranges, in addition to subnets or CIDR blocks:1 this
is the case for Check Point and Juniper—the main exception
is Cisco, that only supports individual IP addresses or subnets.
Therefore, firewalls require their own special algorithms.
1.2 Statefull Firewall Matching
Most modern firewalls are stateful. This means that after the
first packet in a network flow is allowed to cross the firewall,
all subsequent packets belonging to that flow, and especially
the return traffic, is also allowed through the firewall. This
statefulness has two advantages. First, the administrator does
not need to write explicit rules for return traffic—and such
return-traffic rules are inherently insecure since they rely on
source-port filtering (see discussion in [43] and Check Point’s
patent [29]). So stateful firewalls are fundamentally more
secure than simpler, stateless, packet filters. Second, state
lookup algorithms are typically simpler and faster than rulematch
algorithms, so statefulness potentially offers important
performance advantages.
Firewall statefulness is commonly implemented by two
separate search mechanisms: (i) a slow algorithm that implements
the “first match” semantics and compares a packet
to all the rules, and (ii) a fast state lookup mechanism that
checks whether a packet belongs to an existing open flow. In
many firewalls, the slow algorithm is a naive linear search
of the rule-base, while the state lookup mechanism uses a
1. It is possible to convert an arbitrary range of IP addresses into a collection
of subnets—however, as many as 62 subnets may be necessary to cover
a single IP address range, thus there is a great loss of efficiency in the
conversion.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING ,VOL. 8, NO. 1, JAN-FEB 2011
2
access-list 101 permit tcp 12.20.51.0 255.255.255.0 host 1.2.3.4 gt 0
access-list 101 deny tcp 12.20.51.0 255.255.255.0 1.2.0.0 255.255.0.0 eq 135
Fig. 1. Excerpts from a Cisco PIX firewall configuration file, showing 2 rules. Both rules refer to the TCP protocol.
The source in both rules is the same subnet. The first rule has a single IP address as a destination but a range of
destination ports (1–65535), while the second rule has a range of destination IP addresses but a single destination
port. Note that a TCP packet with source IP 12.20.51.1, destination IP 1.2.3.4, and destination port 135 matches both
rules, but because of the first-match semantics, the first rule’s decision (“permit”) is triggered.
hash-table or a search-tree: This is the case for the opensource
firewalls pf [24] and iptables [23]. There are strong
indications that commercial firewalls use linear search for the
slow rule-match as well: E.g., Check Point rules are translated
into an assembly-like macro language called INSPECT [40]
with linear semantics, and the INSPECT language is simply
translated into bytecode. Moreover, the standard advise for
improving firewall performance, for all vendors, is to place the
most popular rules near the top of the rule-base (cf. [14], [7]).
This advise doesn’t make much sense if the firewall rearranges
the rules into a complex search data structure.
Note that a stateful firewall’s two-part design provides its
highest performance on long TCP connections, for which the
fast state lookup mechanism handles most of the packets.
However, connectionless UDP2 and ICMP traffic, and short
TCP flows, like those produced in extremely high volume by
Distributed Denial of Service attacks (cf. [19]), only activate
the “slow” algorithm, making it a significant bottleneck. Our
main result is that the “slow” algorithm does not need to be
slow, even in a software-only implementation running on a
general-purpose operating system. We show that the GEM algorithm
has a matching speed that is comparable to that of the
state lookups: In isolation the algorithm required under 1μsec
per packet, and our Linux GEM-iptables implementation
sustained a matching rate of over 30,000 packets-per-second
(pps), with 10,000 rules, without losing packets, on a standard
PC workstation.
1.3 Contributions
In this paper we revisit a classical algorithm from computational
geometry (cf. [10], [22]), and apply it to the firewall
packet matching. In the firewall context we call this algorithm
the Geometric Efficient Matching (GEM) algorithm. This
algorithm performs matching in O(d log n) time, where n is
the number of rules in the firewall rule-base and d is the
number of fields to match. The worst-case space complexity
of GEM is O(nd). For instance, for TCP and UDP we have
d = 4, giving a search time of O(log n) and worst case space
complexity of O(n4).
The GEM data structure allows easy control over the order
of fields to be matched. The data structure can be used for any
number of dimensions d, but typical values for firewall packet
matching are either d = 2 for opaque protocols like IPsec
(protocol 50 or 51) or d = 4 for TCP, UDP, and ICMP. We
2. Some firewalls treat UDP traffic as connection-oriented and perform state
lookups on UDP packets as well.
focus on the more difficult case for the algorithm, with d = 4,
in which the match fields are: source IP address, destination
IP address, and source and destination port numbers. This
fits TCP and UDP filtering, and also ICMP (using the 8-bit
message type and code instead of 16-bit port numbers).
Note that the worst-case space complexity can only be
caused by an unlucky rule-base structure, and not by the
packets that the firewall encounters. Furthermore, knowledge
of the rule-base does not help an attacker force the firewall into
poor performance since the search time is deterministically
logarithmic in the worst case—so GEM is not subject to
algorithmic complexity attacks [8], [3].
To address the worst-case space complexity, we propose two
approaches. One approach involves optimization heuristics.
The other is a time-space trade-off, which at the cost of a factor
ℓ slowdown in the search time, provides an ℓd−1 decrease in
the space complexity.
The next step in our evaluation of the GEM algorithm was
an extensive simulation study. Our simulations showed that, in
isolation, the algorithm required under 1μsec per packet, on a
standard PC, even for rule-bases of 10,000 rules. Furthermore,
we found that the worst case space complexity manifests itself
when the rule-base consists of uniformly-random rules.
However, real firewall rule-bases are far from random.
Rule-bases collected by the Lumeta (now AlgoSec) Firewall
Analyzer [42], [44] show that, e.g., the source port field is
rarely specified, and the destination port field is usually a
single port number (not a range) taken from a set of some
200 common values.
Based on statistics we gathered from real rule-bases, we
created a non-uniform model for random rule-base generation,
which we call the Perimeter rule model. On rule-bases generated
by this model, we found that the order of field evaluation
has a strong impact on the data structure size (several orders
of magnitude difference between best and worst). We found
that the evaluation order which results in the minimal space
complexity is: destination port, source port, destination IP
address, source IP address. With this evaluation order, the
growth rate of the data structure is nearly linear with the
number of rules. The data structure size for rule bases of 5,000
rules is ≈ 13MB, which is entirely practical. Using more
aggressive space optimizations allows us to greatly reduce the
data structure at a cost of a factor of 2 or 3 slowdown. For
instance, using 3-part heuristic division, we get a data structure
size of 2MB for 10,000 rules.
Beyond simulations, we created a fully functional GEM
implementation within the Linux iptables open-source
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING ,VOL. 8, NO. 1, JAN-FEB 2011
3
TABLE 1
Header field numbering.
number description space
0 source IP address 32bit
1 destination IP address 32bit
2 source port number 16bit
3 destination port number 16bit
4 protocol 8bit
firewall [23], and tested its performance in a laboratory
testbed. Our GEM-iptables Linux implementation sustained
a matching rate of over 30,000 pps, with 10,000 rules,
without losing packets. In comparison, the non-optimized
iptables could only sustain a rate of ≈ 2500 pps with
the same rule-base.
Thus, we conclude that the GEM algorithm is an excellent,
practical, algorithm for firewall packet matching: Its matching
speed is far better than the naive linear search, and its space
complexity is well within the capabilities of modern hardware
even for very large rule-bases.
Parts of this work have appeared, in extended abstract form,
in [28].
Organization: Section 2 formally defines the matching
problem and describes the GEM algorithm along with its
data structure. Section 3 describes the statistics we gathered
from firewall rule-bases. Section 4 introduces the non-uniform
Perimeter rule model, and describes the simulation results in
this model. Section 5 describes our iptables implementation
and the performance it achieved. Section 6 describes
the time-space trade-off and space optimizations. Section 7
describes related work, and we conclude with Section 8.
2 THE ALGORITHM
2.1 Definitions
The firewall packet matching problem finds the first rule that
matches a given packet on one or more fields from its header.
Every rule consists of set of ranges [li, ri] for i = 1, . . . , d,
where each range corresponds to the i-th field in a packet
header. The field values are in 0 ≤ li, ri ≤ Ui, where Ui =
232 − 1 for IP addresses, Ui = 65535 for port numbers, and
Ui = 255 for ICMP message type or code. Table 1 lists the
header fields we use (the port fields can double as the message
type and code for ICMP packets). For notation convenience
later on, we assign each of these fields a number, which is
also listed in the table.
Remarks:
• Most firewalls allow matching on additional header
fields, such as IP options, TCP flags, or even the packet
payload (so called “deep packet inspection”). However,
real rule-bases [44] very rarely use such futures. Nearly
all the firewall rules that we have seen only refer to the
five fields listed in Table 1.
• The description above, and the GEM algorithm, is
mostly suitable to firewalls whose rules use contiguous
ranges of IP addresses. This is not a limitation for enterprise
firewalls—we have never encountered an enterprise
firewall that uses non-contiguous masks.
• We use ‘∗’ to denote wildcard: An ‘∗’ in field i means
any value in [0,Ui].
• We are ignoring the action part of the rule (e.g., pass
or drop), since we are only interested in the matching
algorithm.
2.2 The Sub-Division of Space
In one dimension, each rule defines one range, which divides
space into at most 3 parts. It is easy to see that n possibly
overlapping rules define a subdivision of one-dimensional
space into at most (2n − 1) simple ranges. To each simple
range we can assign the number of the winner rule. This is
the first rule which covers the simple range.
In d-dimensions, we pick one of the axes and project all the
rules onto that axis, which gives us a reduction to the previous
one-dimension case, with a subdivision of the one dimension
into at most (2n − 1) simple ranges. The difference is that
each simple range corresponds to a set of rules in (d − 1)
dimensions, called active rules. We continue to subdivide the
(d−1) dimensional space recursively. We call each projection
onto a new axis a level of the algorithm, thus for a 4-
dimensional space algorithm we have 4 levels of subdivisions.
The last level is exactly a one-dimensional case—among all
the active rules, only the winner rule matters.
At this point we have a subdivision of d-dimensional space
into simple hyper-rectangles, each corresponding to single
winning rule. In Section 2.4 we shall see how to efficiently
create this subdivision of d-dimensional space, and how it
translates into an efficient search structure.
2.3 Dealing with the Protocol Field
Before delving into the details of the search data structure,
we first consider the protocol header field. The protocol
field is different from the other four fields: very few of the
256 possible values are in use, and it makes little sense to
define a numerical “range” of protocol values. This intuition
is validated by the data gathered from real firewalls (see
Section 3): The only values we saw in the protocol field in
actual firewall rules were those of specific protocols, plus the
wildcard ‘∗’, but never a non-trivial range.
Thus, the GEM algorithm only deals with single values in
the protocol field, with special treatment for rules with ‘∗’ as
a protocol. We preprocess the firewall rules into categories, by
protocol, and build a separate search data structure for each
protocol (including a data structure for the ‘∗’ protocol). The
actual geometric search algorithm only deals with 4 fields.
Now, a packet can only belong to one protocol—but it is
also affected by protocol = ‘∗’ rules. Thus every packet needs
to be searched twice: once in its own protocol’s data structure,
and once in the ‘∗’ structure. Each search yields a candidate
winner rule.3 We take the action determined by the candidate
with the lower number.
In the remainder of this paper, we focus on the TCP
protocol, which has d = 4 dimensions, although the same
3. If no rule matches, we assume that the packet matches an implicit default
catch-all rule with a maximal rule-number.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING ,VOL. 8, NO. 1, JAN-FEB 2011
4
0 1 2 6 17 254 255
Protocol array
#Num of points
ORDER
Level 1
#Num of
points
Lower bound
Basic element
#Num of points
ORDER
Level 1
Level 2 Level 2 Level 2
Level 3 Level 3 Level 3 Level 3
Level 4 Level 4 Level 4 Level 4
Rule #
*
Fig. 2. Overall GEM data structure overview.
discussion applies for UDP and ICMP. In Section 3 we shall
see that TCP alone accounts for 75% of rules on real firewalls,
and collectively, TCP, UDP, and ICMP account for 93% of the
rules.
2.4 The Data Structure
The GEM search data structure consists of three parts. The first
part is an array of pointers, one for each protocol number,
along with a cell for the ‘∗’ protocol (as mentioned in
Section 2.3). We build the second and third parts of the search
data structure for each protocol separately.
The second part is a protocol database header, which
contains information about the order of data structure levels.
The order in which the fields of packet header are checked is
encoded as a 4-tuple of field numbers, using the numbering
of Table 1. The protocol database header also contains the
pointer to the first level and the number of simple ranges in
that level.
The third part represents the levels of data structure themselves.
Every level is a set of nodes, where each node is an
array. Each array cell specifies a simple range, and contains a
pointer to the next level node. In the last level the simple range
information contains the number of the winner rule instead of
the pointer to the next level. See Fig 2 for an illustration.
The basic cell in our data structure (i.e., an entry in the
sorted array which is a node in the structure) has a size of 12
bytes: 4 for the value of the left boundary of the range, 4 for
the pointer to the next level, and 4 for the number of cells in
the next-level node. The nodes at the deepest level are slightly
different, consisting of only 8 bytes: 4 for the left boundary
of the range and 4 for the number of winner rule.
Note that the order of levels is encoded in the protocol
database header, which gives us convenient control over the
field evaluation order.