08-11-2016, 03:47 PM
Prepaid subscribers churn prediction in telecommunications industryby using machine learning algorithms
1467017830-PrepaidchurnpredictionTayanasoftware.docx (Size: 369.79 KB / Downloads: 5)
1. Abstract
With the rapid development of telecommunication industry, the service providers are inclined more towards expansion of the subscriber base. To meet the need of surviving in the competitive environment, the retention of existing customers has become a huge challenge. In the survey done in the Telecom industry, it is stated that the cost of acquiring a new customer is far more that retaining the existing one. Therefore, by collecting knowledge from the telecom industries can help in predicting the association of the customers as whether or not they will leave the company. The required action needs to be undertaken by the telecom industries in order to Initiate the acquisition of their associated customer who are about to churn.Subscriber Churn in telecom is defined as subscriber closing a relationship with the telecom operator. A customer has a mobile connection with a telecom. The customer comes to the telecom service outlet (or calls to the call center) and request to close the connection and this will be called a customer/subscriber churn. The telecom providers look for events, trends & patterns which can help to identify the subscribers who have high chance of churn.Our paper proposes a framework for the churn prediction model and implements it using Machine learning algorithms in data mining. The efficiency and the performance of Decision tree and Logistic regression techniques have been compared.
2. Introduction
Over the last few decades, mobile telecommunication has emerged as the dominant medium of communication across the world. In several countries, market saturation has reached a level where every potential customer has to be won over from competitors. At the same time, standardization of mobile infrastructure and public regulation allowed the customer to port easily from one network to another, resulting in a fluid market. As a result churn prediction and prevention have become one of the most crucial Business Analytic applications. Churn can be broadly classified into two: voluntary and involuntary churn. Voluntary churn occurs when a customer initiates termination of aservice contract while involuntary churn occurs when
Customers are disconnected by the company for fraud, non-payment, or under-utilization of subscribed services.
In this paper, we present our experience on churn prediction and customer insights for aAfrican based telecom operator. We build a data mining model to predict churners using key performance indicators (KPI) based on customer Call Detail Records (CDR) and additional customer data available with the operator.
3. Methodology
In order to successfully create and implement a churn prediction model, there needs to be a process and framework in place. The process begins with a clear definition of the business objectives.For example have an accuracy rate of 70-80% in the prediction of subscribers likely to churn.
Once objectives have been decided, consideration is then given to factors such as data availability, cleansing and final transformation. Data preparation is by far the most time consuming element in the entire process. The required data is often located in disparate locations which need to be integrated into a central source.If the organization has a relational database management system, this will require a specialist with a strong SQL (Structured Query Language) background to extract the necessary information. Research suggests that as much as 70% of the entire prediction model development is spent on data preparation.
The model construction element is an iterative process and various models based on different algorithms need to be tested and compared. Not all factors used in the model will be beneficial and hence the need to run numerous iterations. There are several types of models used in churn prediction, and some of the more commonly used are Decision Trees, Logistic Regressions.
Before the modelling begins, data is typically divided into two distinct groups; the training and testing set. The split is approximately 70% training and 30% test. The model is constructed using the training data and then verified on the unseen test data to evaluate model performance.
Once the model has been tested, it will need to undergo a final verification called cross-validation. This technique assists in the development and fine-tuning of the model by partitioning the training data set into cross-sections, using one of the partitions as a new test set, and the remaining partitions as the training set. This process is repeated several times confirming model robustness.
3.1 Data sampling
The dataset that we used was obtained from a African telecommunications company and were Collectedtraining data for a period of 6 months.(2015 Nov – 2016 Apr). It consist of both working & churned prepaid numbers whose age in network is >=180 days and testing data for a period of nearly 2 months (2016 May – 2016 June 20th)
Oversampling is a technique that alters the proportion of the outcomes in the Training set. More specifically, it increases the proportion of the less frequent Outcome. This will make a model more sensible to the outcome that is least Represented Imagine that there are 100.000 observations of which 99.00 are labeled non-churn, and only 1000 are labeled churned. Almost any model will classify a new observation as non-churn, as it will be right in 99% of the time. This is exactly the case in our churn application. Churn outcomes are under-representedWhen compared to non-churning outcomes. Oversampling is used to increase the frequency of the churn outcomes. The proportion of churn and non-churn represented in the training set is 1/3 and 2/3 respectively. This is a typical split that workwell [2]. The actual training set is built with all its churn observations and is filled up with twice as much, randomly selected, non-churn observations.
3.2 Data availability/Feature extraction
Feature extraction plays an important role in determining the performance of predictive models in the terms of prediction rates for churn recognition. If a robust set of features can be extracted in this phase, the prediction rates can be significantly improved.
Following features are extracted.
Days since last recharge
No of active days in last 30 days
No of inactive days in the last 30 days
Days since last billable event made
No of customer care calls in last 30 days
Days since last voice call
Days since last incoming call
No of days spent with <= 0 balance in last 30 days
Available balance
Age in network
Recharge count in last 30 days
Outgoing voice call count in last 30 days
Incoming voice call count in last 30 days
Total revenue
3.3 Data cleaning
Noise is the irrelevant information which would cause problems for the subsequent processing steps. Therefore, noisy data should be removed. This irrelevant information includes missing
Values, duplicated information (e.g. the same attributes with the same values are in different tables of a database). This noise can be removed by finding their locations and using the correct values to replace them, or some times by deleting them if the missing values are too many.However, obtaining such a good set of features is not an easy task.
3.4 Model construction
There are several types of models used in churn prediction, and some of the more commonly used are Decision Trees, Logistic Regressions.
Logistic regression:
Logistic regression sometimes called the logistic model or logit model, Analyzes the relationship between multiple independent variables and a categorical dependent variable, and estimates the probability of occurrence of an event by fitting data to a logistic curve. There are two models of logistic regression, binary logistic regression and multinomial logistic regression. Binary logistic regression is typically used when the dependent variable is dichotomous and the independent variables are either continuous or categorical. When the dependent variable is not dichotomous and is comprised of more than two categories, a multinomial logistic Regression can be employed. Here our objective is to determine subscriber will churn or not so we used binary logistic model.
Odds of an event are the ratio of the probability that an event will occur to the probability that it will not occur. If the probability of an event occurring is p, the probability of the event not occurring is (1-p). Thenthe corresponding odds is a value given by
odds of {Event}= p/1-p
Since logistic regression calculates the probability of an event occurring
over the probability of an event not occurring, the impact of independent
variables is usually explained in terms of odds. With the logistic regression
the mean of the response variable p in terms of an explanatory
variablex is modeled relating p and x through the equation
p=α+βx.
With logistic regression we model the natural log odds as a linear function of the explanatory variable:
logit (y)=ln (odds)=ln(p/1-p)=a + βχ
p = ea+bx/1+ ea+bx
p -> response variable
x - > input variable
Decision Trees:
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables
Decision trees can be split into classification and regression trees. Classification trees are used to predict a categorical outcome, whereas regression trees are used in case of a continuous outcome. Since we are dealing with abinary outcome, i.e. churn, a classification tree is used. In a decision treeeach interior node corresponds to a variable. An arc to a child represents a possible value of that variable. A leaf represents the outcome given the values of the variables represented by the path from the root.One of the advantages of decision trees is that they can be very easily interpreted, since they produce a set of understandable rules.
Decision tree identifies the most significant variable and it’s value that gives best homogeneous sets of data. Now the question which arises is, how does it identify the variable and the split? To do this, decision tree uses various algorithms such as Gini index, Entropy.
Entropy:A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one.
To build a decision tree, we need to calculate two types of entropy using frequency tables as follows:
a) Entropy using the frequency table of one attribute:
b) Entropy using the frequency table of two attributes:
Information Gain:
Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches). Steps are as below
Calculate entropy of the target.
The dataset is then split on the different attributes. The entropy for each branch is calculated. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain
Choose attribute with the largest information gain as the decision node.
A branch with entropy of 0 is a leaf node.A branch with entropy more than 0 needs further splitting.
The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.