Telecom Churn Analysis
Churn is one of the biggest threat to the telecommunication industry. Every telecommunication industry deploys the best models that suit their need to avoid the voluntary or involuntary churn of a customer. This is called churn modelling. Below I will take you through the terms frequently used in building this model.
- Churn represents the loss of an existing customer to a competitor
- A prevalent problem in retail:
- Mobile phone services
- Home mortgage refinance
- Credit card
- Churn is a problem for any provider of a subscription service or recurring purchasable
- Costs of customer acquisition and win-back can be high
- Much cheaper to invest in customer retention
- Difficult to recoup costs of customer acquisition unless customer is retained for a minimum length of time
- Churn is especially important to mobile phone service providers
- easy for a subscriber to switch services
- Phone number portability will remove last important obstacle
Predicting Churn: Key to a Protective Strategy
- Predictive modelling can assist churn management
- By tagging customers most likely to churn
- High risk customers should first be sorted by profitability
- Campaign targeted to the most profitable at-risk customers
- Typical retention campaigns include
- Incentives such as price breaks
- Special services available only to select customers
- To be cost-effective retention campaigns must be targeted to the right customers
- Customers who would probably leave without the incentive
- Costly to offer incentives to those who would stay regardless
Here, We have a sample telecom data on which we will run Churn Modelling using R code.
|
library(rattle) # The weather data set and normVarNames().
library(randomForest) # Impute missing values using na.roughfix().
library(rpart) # decision tree
library(tidyr) # Tidy the data set.
library(ggplot2) # Visualize data.
library(dplyr) # Data preparation and pipes %>%.
library(lubridate) # Handle dates.
library(corrgram)
|
Loading data directly from the web
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
nm <- read.csv("http://www.sgi.com/tech/mlc/db/churn.names", skip=4, colClasses=c("character", "NULL"),
header=FALSE, sep=":")[[1]]
dat <- read.csv("http://www.sgi.com/tech/mlc/db/churn.data", header=FALSE, col.names=c(nm, "Churn"))
nobs <- nrow(dat)
colnames(dat)
dsname <- "dat"
ds <- get(dsname)
dim(ds)
(vars <- names(ds))
target<- 'Churn';
ds$phone.number<-NULL;
ds$churn<-(as.numeric(ds$Churn) - 1)
ds$Churn<-NULL
ds$state<-NULL
## Split ds into train and test
## 75% of the sample size
smp_size <- floor(0.75 * nrow(ds))
## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ds)), size = smp_size)
train <- ds[train_ind, ]
test <- ds[-train_ind, ]
dim(train)
dim(test)
corrgram(train, lower.panel=panel.ellipse, upper.panel=panel.pie);
|
Fitting a Model
|
lm.fit <- lm(churn~., data=train);
# Multiple R-squared: 0.1784, Adjusted R-squared: 0.1724
|
How does the linear model perform
1
2
3
4
5
6
7
8
9
10
11
12
|
pred.lm.fit<-predict(lm.fit, test);
RMSE.lm.fit<-sqrt(mean((test$churn)^2))
RMSE.lm.fit; #0.3232695
# building a simpler model, similar R2
lm.fit.step <- lm(churn ~ international.plan + voice.mail.plan + total.day.charge +
total.eve.minutes + total.night.charge + total.intl.calls +
total.intl.charge + number.customer.service.calls, data=train);
# Multiple R-squared: 0.1767, Adjusted R-squared: 0.174
pred.lm.fit.step <-predict(lm.fit.step, test);
RMSE.lm.fit.step <-sqrt(mean((pred.lm.fit.step-test$churn)^2))
RMSE.lm.fit.step; #0.3227848 <- simpler, and better RMSE
|
How does the Logistic Regression perform
|
# logistic regression using a generalized linear model
glm.step <- glm(churn ~ international.plan + voice.mail.plan + total.day.charge +
total.eve.minutes + total.night.charge + total.intl.calls +
total.intl.charge + number.customer.service.calls, family = binomial, data = train)
pred.glm.step <- predict.glm(glm.step, newdata = test, type = "response")
RMSE.glm.step <- sqrt(mean((pred.glm.step-test$churn)^2))
RMSE.glm.step; #0.3179586 <- better than the linear model
|
How does the Decision Tree perform
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# build a decision tree based on the selected variables
rpart.fit.step <- rpart(churn ~ international.plan + voice.mail.plan + total.day.charge +
total.eve.minutes + total.night.charge + total.intl.calls +
total.intl.charge + number.customer.service.calls, data=train, method="class");
pred.rpart.step <- predict(rpart.fit.step, test); # See correction below
RMSE.rpart.step <- sqrt(mean((pred.rpart.step-test$churn)^2))
RMSE.rpart.step; #0.6742183 <- much worse than the linear model
# forgot type="class"
pred.rpart.step <- as.numeric(predict(rpart.fit.step, test, type="class")) - 1;
RMSE.rpart.step <- sqrt(mean((pred.rpart.step-test$churn)^2))
RMSE.rpart.step; #0.2423902 <- better than the linear model
sum(pred.rpart.step==test$churn)/nrow(test)
# 0.941247 94% tests are correctly matched
|
How does the Random Forest perform
1
2
3
4
5
6
7
8
9
10
11
12
13
|
# Build Random Forest Ensemble
set.seed(415)
rf.fit.step <- randomForest(as.factor(churn) ~ international.plan + voice.mail.plan + total.day.charge +
total.eve.minutes + total.night.charge + total.intl.calls +
total.intl.charge + number.customer.service.calls,
data=train, importance=TRUE, ntree=2000)
varImpPlot(rf.fit.step);
pred.rf.fit.step <- as.numeric(predict(rf.fit.step, test))-1;
RMSE.rf.fit.step <- sqrt(mean((pred.rf.fit.step-test$churn)^2))
RMSE.rf.fit.step; #0.2217221 improvement from the linear model, so a non-linear, decision tree approach is better
sum(pred.rf.fit.step==test$churn)/nrow(test)
# 0.9508393 95% tests are correctly matched
|
Conclusion:
Algorithm | RMSE | Comment |
| | |
Linear Model | 0.3232695 | |
| | |
Simpler Linear Model | 0.1767 | |
| | |
Logistic Regression | 0.3179586 | better than the linear model |
| | |
Decision Tree | 0.6742183 | much worse than the linear model (Overfitting) |
| | |
Decision Tree (Without type = “class”) | 0.2423902 | better than the liniar model |
| | |
Random Forest | 0.2217221 | improvement from the linear model so a non-linear decision tree approach is better |