04-01-2016, 04:53 PM
Abstract:
Telecommunication companies generate a tremendous amount of data. These
data include call detail data, which describes the calls that traverse the
telecommunication networks, network data, which describes the state of the
hardware and software components in the network, and customer data, which
describes the telecommunication customers. This chapter describes how data
mining can be used to uncover useful information buried within these data
sets. Several data mining applications are described and together they
demonstrate that data mining can be used to identify telecommunication fraud,
improve marketing effectiveness, and identify network faults.
INTRODUCTION
The telecommunications industry generates and stores a tremendous amount of data. These data include call detail data, which describes the calls that traverse the telecommunication networks, network data, which describes the state of the hardware and software components in the network, and customer data, which describes the telecommunication customers. The amount of data is so great that manual analysis of the data is difficult, if not impossible. The need to handle such large volumes of data led to the development of knowledge-based expert systems. These automated systems
performed important functions such as identifying fraudulent phone calls and identifying network faults. The problem with this approach is that it is time- consuming to obtain the knowledge from human experts (the “knowledge acquisition bottleneck”) and, in many cases, the experts do not have the requisite knowledge. The advent of data mining technology promised
solutions to these problems and for this reason the telecommunications industry was an early adopter of data mining technology. Telecommunication data pose several interesting issues for data mining.The first concerns scale, since telecommunication databases may contain billions of records and are amongst the largest in the world. A second issue is that the raw data is often not suitable for data mining. For example, both call detail and network data are time-series data that represent individual
events. Before this data can be effectively mined, useful “summary” features
must be identified and then the data must be summarized using these features. Because many data mining applications in the telecommunications industry involve predicting very rare events, such as the failure of a network element or an instance of telephone fraud, rarity is another issue that must be dealt with. The fourth and final data mining issue concerns real-time performance: many data mining applications, such as fraud detection, require that any learned model/rules be applied in real-time. Each of these four issues are discussed throughout this chapter, within the context of real data
mining applications.