31-10-2012, 05:25 PM
Whitepaper: Understanding Web Filtering Technologies
Filters bloxx_whitepaper_webfilter_us.pdf (Size: 1.09 MB / Downloads: 73)
ABSTRACT
The Internet is now a huge resource of information and plays an increasingly important role in business and education. However, without adequate controls in place, organisations are likely to be faced with a broad range of issues. These range from excessive personal use of the Internet during business impacting staff productivity to legal risks if users access inappropriate content.
This has led to the emergence of web filtering products, which organisations can deploy to enable proactive management of Internet access for users.
This whitepaper discusses some of the technology options available for web filtering and reviews the benefits and limitations of each.
PAGE
WHY IS WEB FILTERING REQUIRED?
The term web filtering is broadly used to describe the process and tools that companies can use to restrict or monitor their users’ Internet use.
The Internet provides an extremely effective way for organisations to increase productivity, lower costs and increase sales. It provides a quick and efficient way to undertake research, improve learning, interact with customers and transact business.
However, it is also a source of content that is very likely to be inappropriate for users to access, such as pornography, violence and racism to name just a few. In addition to inappropriate content, there are a range of other types of content, such as shopping, social networking and news that can have a huge productivity and consequently a financial impact on a company.
In addition, there a number of legal risks such as Employment and Sex Discrimination legislation, which companies may encounter if they do not proactively manage Internet access. In some circumstances this can lead to financial and reputation losses1 .
THE EMERGENCE OF WEB FILTERING
When the Internet consisted of a few hundred thousand web sites, then web filtering was relatively simple. IT management simply had to maintain a list of “good” and “bad” sites that users were allowed or not allowed to visit. This was usually done within a network element, i.e. a Firewall.
However, the Internet quickly became so large that this approach could no longer be sustained. It’s worth noting that by the end of 2007, the Internet grew by 48%2 and the major search engines indexed 46 billion web pages3 .
It also highlighted another issue of how to manage access to sites that may be good or bad. For example a travel site could be “good” in a business context for some users but “bad” for others.
Web filtering quickly became established due to these two key drivers.
The remainder of this white paper discusses some of the alternative technologies used in web filtering.
BASIC WEB FILTERING - FIREWALLS
The most basic level of web filtering can be performed by a network Firewall. Whilst this allows a degree of filtering, there are significant performance implications as the Firewall needs to inspect the traffic to identify the requested site to make the decision on allowing or blocking the site.
In addition, the reporting capability of a Firewall is very basic. It allows limited management review of web access, albeit after access to the website has occurred. However, this is typically labour intensive and inadequate in providing substantive proof that a particular user was responsible for any inappropriate access.
Black lists can be employed to list undesirable web addresses and prevent access to those sites. White lists can be used to list acceptable web addresses and are often used to restrict access to only those sites that are contained on the white list. The scale of the Internet is now such that maintenance of lists is a formidable challenge; for users it is very frustrating if they have a genuine reason or need to access a site but first must seek approval and have it included in allowed URLs.
In more sophisticated solutions, black and white lists are used to list exceptions to the rule for users. For example, a user may not be allowed access to travel sites but is provided access to low cost airline websites to book flights.
URL DATABASE WEB FILTERING
One of the most prevalent methods of web filtering is to create a database of many millions of web addresses categorised according to their content, e.g. shopping, gambling, violence, etc. Typically known as a URL database, these have increased in size over time to try to match the growth of the Internet.
The URL database can be used to effectively create a range of user Internet access profiles so that different groups of users get controlled access to the Internet.
In practise, this means that one user may get no access to the Internet, another user may only have access to a single web site, a secretary may have access to travel sites but not shopping and a manager may have wider Internet access with the exception of inappropriate sites such as violence, racism and pornography.
The demand for this type of approach has created a growing number of URL database builders who typically invest heavily to maintain and grow their URL lists.
CREATING AND MAINTAINING URL DATABASES
Many thousands of URLs are initially “harvested” each day using a variety of methods, but manual categorisation using human URL reviewers is still predominantly used to ensure accuracy of categorisation.
Manual categorisation requires each URL reviewer to read the content and look at images on the website, decide the kind of web site it is and categorise it in the database accordingly. The accuracy of this approach is variable - it is relatively easy to spot a pornography web site for example, but less easy to identify anonymous proxy sites.
CHALLENGES WITH URL DATABASES
URL databases present a number of challenges.
Misclassification With limited time available to categorise each site, any web site that deliberately seeks to mislead a reviewer (e.g. a cookery site which when examined in more depth turns out to be pornography) can easily be successful in having the inappropriate URL categorised as legitimate.
Misclassifications of web sites are extremely frustrating for users and are often a source of conflict between suppliers and customers.
Keeping pace with the growth and dynamic nature of the Internet The Internet is said to be currently growing by approximately 7.5 million new or renamed web addresses each day. However, a URL classifier will typically review and classify only around 500 web addresses daily. To keep pace with this growth and to keep a URL databases current and relevant would require around 15,000 classifiers. From a cost perspective this is clearly unrealistic.