1. Gradient boosting (#XGBoost or LGBM) is state of the art for in the real world. I don't believe they put much effort into tuning their benchmark models, so don't believe the claims of higher accuracy.
2. #Churn models should *not* be evaluated with precision/recall but rather with AUC: True/false churn predictions are NEVER used, but rather risk rankings. (Always use predict_proba for churn, never predict.)
Importantly, the precision/recall metrics they show in their results will be sensitive to the thresholds which are not detailed, and thats a tricky issue for imbalanced data. This is another reason not to believe the supposed accuracy improvement.
3. #Gradientboosting is VERY interpretable with the #SHAPley method. They are totally misleading by saying their Deep Neural Network is more interpretable and boosting is not interpretable. They are apparently ignorant of these important advanced in interpretability more than 5 years old now.
4. Despite a lot of talk about class imbalance, the churn datasets are not very imbalanced - 10%-20% churn rates. Really imbalanced data is low single digit churn rates.
Incentives are to present something new and flashy even if real world use case is small or nonexistent. Time series forecasting with deep learning has been a classic example of this problem. Lots of “SOTA” models benchmarked against clearly untuned and improper models.
Comments