In this project, we use Snowflake, Snowpark Python, and machine learning to predict customer churn and forecast user engagement patterns. We dive into customer demographics, show data, and subscription information to uncover insights that guide strategic decisions.
We kick off by establishing a connection to the Snowflake database using Snowpark Python.
We curate customer demographics, shows, and subscription data from Snowflake for a comprehensive analysis.
Before diving into machine learning, we perform crucial data preprocessing steps:
Prior to training machine learning models, we conduct feature selection using chi-square test. The selection is based on significant columns identified through chi-square testing. Finally, we train a Random Forest model, evaluating performance metrics like MSE and R2.
Dive into the impact each feature adds to our churn prediction game:
Heatmap of Confusion Matrix: It's a glimpse into the accuracy and missteps of our churn predictions.
The Churn Risk Spectrum: We're not just predicting churn; we're classifying risk levels! Low, medium, or high.
For detailed instructions, please refer to CF_Churn_prediction.ipynb on GitHub.
Similar to churn prediction, we establish a seamless connection to the Snowflake database using Snowpark Python.
For time series forecasting, we focus on historical customer and subscription data, preparing it for in-depth analysis.
Transforming Snowflake data into a Pandas DataFrame, we aggregate based on 'view date' and derive additional insights such as the days a customer spent before leaving the platform.
Choosing the XGBoost Regressor for its prowess in handling intricate data relationships, we fine-tune parameters like the learning rate, maximum depth, and number of estimators.
To streamline the entire process, we implement functions for creating datasets at the customer level, training models, and forecasting for various time periods.
A sneak peek into the future of Spotflix:
Now our XGBoost model is ready to start forecasting customer viewing patterns for the next 6 months or 1 year. This will give us crucial insights to make better decisions. Witness how we predict engagement levels and pinpoint those high-risk viewers:
For detailed instructions, please refer to CF_time_series_forecasting.ipynb on GitHub.
For step-by-step instructions on how to import .tml files into your ThoughtSpot cluster, please refer to How to use TML files.