Credit scoring is becoming increasingly vital in financial decisions. Forbes reported an average credit card debt of $5,474 per borrower in Q3 2022, totaling $38 billion. The intersection of technology and finance, notably in credit evaluation, is rapidly evolving. This project aims to utilize machine learning to assess 'good' or 'bad' credit risks, offering insights into improving traditional financial models by utilizing the dataset found on Kaggle.
- Project dataset consists of application_record.csv and credit_record.csv, mergeable via the client number (ID). a. application_record.csv includes personal/financial info (gender, car ownership, income, etc.): 17 columns and ~440,000 rows b. credit_record.csv tracks monthly credit history, overdue days, and payments: 3 columns and ~1,000,000 rows
- Found during research on credit scoring and finance machine learning on Kaggle.
- De-duplication
- Dealing with Sparse Columns
- Handling Outliers
- Imputing Missing Values using MICE
- Balancing dataset using SMOTE
Random Forest performed the best after tuning hyperparameters. The results are shown below:
Based on the feature importances of the variables, we recommend:
- Age and employment critical to approving credit card apps
- Have tailored strategies for different age groups and employment categories
- Consider personalized credit offerings based on family dynamics