Customer Churn Forecasting Using Gradient Boosting Machines

Customer Churn Forecasting Using Gradient Boosting Machines


The ability to anticipate customers who might churn is a priority for any business. By implementing accurate churn prediction models as a guide for effective retention programs, customer defections can be mitigated, thus stemming significant revenue loss. In the Telco industry, customer attrition is a major concern to service providers worldwide because of the low barriers associated with switching to a competing provider. However, to maintain optimal revenue margins, Telco marketers are mainly interested in extending discounts or incentives to only those customers who are at risk of churning. By estimating the churn propensity for each service user, a Telco can segment its customer base and focus its marketing communications towards those segments that are most at risk of switching service providers.


AlgoTactica has engaged research and development efforts to determine some of the factors driving churn behavior in the Telco industry. Based on these findings, we have developed design strategies involving gradient boosting machines to train regression tree ensembles which will identify customers most likely to churn.


This project analyzed 7043 customer profiles for a Telco that offered telephone and internet service. Each profile held 20 predictor variables, along with a classification (Y/N) variable indicating if the customer had churned. A preliminary analysis determined that the most influential variables in predicting churn were the type of contract held by the customer and the length of time (tenure), along with type of service and cost.

Subsequently, a data drill-down procedure was designed to extract curves denoting probability of churn associated with contract type and tenure. The Probability of Churn by Contract Type graph shows that the most likely customers to churn were those who held a service contract from month to month. Focusing on this group, the Monthly Contract: Probability of Churn graph shows that those who purchased the high-speed fiber optic service were nearly twice as likely to churn as those who used the slower DSL service.

After this exploratory data analysis, a classification tree model was designed to predict which customers would churn, based on the 20 variables in each customer profile. This used a randomized grid search that tested various configurations in order to tune the model parameters so that it provided the best classification accuracy. During model testing involving data not used for training, an AUC value of 0.85 was achieved, indicating that the model has a good true-positive classification accuracy when used to identify customers who are at high risk of churning.

Another important metric in determining predictive efficacy is model lift, which measures the capability whereby a member of a data class is properly identified by a model targeting that class, versus a general attempt to find a member of the class by mere random selection from the overall data set. For this measure, our model performance was also good, with lift values indicating that for over half of the data, the model was 1.8-3.4 times more likely to identify a churn candidate, than could be achieved by just randomly selecting from the data set; the Model Lift by Sample Fraction graph displays those performance results.

Back to top