Credit Risk Case Study
NextBrain is a powerful and user-friendly machine learning software that makes it easy for you to build accurate classification models and examine your model further. With its no-code interface, you can train models and make predictions without having any prior programming experience. In this tutorial, you will learn how to build a model that can predict if a customer can pay back its loan.
To be able to see the financial services dataset, you can click on your profile on the top right, and add Financial services to your industries.
You can get a brief context about your data on the Problem description page with short descriptions about each features and overall content of the data.
In the next section, you can explore various properties of your data. At the top, you can observe datatypes, and the number of null entries for each column. Morever, the distribution of values will also be given as a bar plot on top of each column. At the bottom, you can create scatter plots for each two features in the data. This can be helpful to notice the correlations between features, and can help you narrow your focus how features behave compared to each other on a more detailed range.
After exploring your data, you are now ready to build a model on the train model section. You can click on the columns to predict bar to choose which column you want to set as a target column. In this task, we will predict the default column that denotes whether a column will pay off (1) their loans or not (0). You willl notice that a Classification widget will appear after you selected the default column.
You can also drop some columns on the training columns although you should be careful since this can result in poor metric results. You can also set training quality by training your models for one, five, or ten minutes. We will set the training quality to five minutes for this task.
Furthermore, you can feed synthetic data to your model; decide which metrics to prioritize; and change the train/test split ratio on the advanced bar. Let's add synthetic data, leave the metrics AUTO, and train/test split ratio as 80/20.
You will be able to see the model performance results after a few minutes. On the left side, you can see the accuracy and the error rate of your model compared to a baseline method. You will also be given some suggestion to improve your model performance further. For instance, we are told to add more data to our model for better metrics results in our case. Moreover, you can see how each feature influences the prediction on the column importance bar.
At the bottom of the model training section, you are able to see the model training history. The procedures such as scaling, train/test ratio, and removed columns are presented in this history part.
You can click on the dashboard widget on the top right for a more detailed report. Don't forget to click on the help icon for more comprehensive explanations about results. Here is the brief summary for some of the features on the dashboard:
Combined feature influence: This is a Sankey diagram demonstraing the flow of predictions.
Model accuracy: The accuracy results for training and test sets.
Data distribution: This shows the distribution of the target variable. It can be practical for unbalanced sets, and you may need to add more data in favor of underrepresented target labels.
Confusion matrix: This will show predicted versus actual labels on a 2 x 2 matrix format.
Finally, you can make new predictions at the Prediction bar after entering values for each features!
To be able to see the financial services dataset, you can click on your profile on the top right, and add Financial services to your industries.
You can get a brief context about your data on the Problem description page with short descriptions about each features and overall content of the data.
In the next section, you can explore various properties of your data. At the top, you can observe datatypes, and the number of null entries for each column. Morever, the distribution of values will also be given as a bar plot on top of each column. At the bottom, you can create scatter plots for each two features in the data. This can be helpful to notice the correlations between features, and can help you narrow your focus how features behave compared to each other on a more detailed range.
After exploring your data, you are now ready to build a model on the train model section. You can click on the columns to predict bar to choose which column you want to set as a target column. In this task, we will predict the default column that denotes whether a column will pay off (1) their loans or not (0). You willl notice that a Classification widget will appear after you selected the default column.
You can also drop some columns on the training columns although you should be careful since this can result in poor metric results. You can also set training quality by training your models for one, five, or ten minutes. We will set the training quality to five minutes for this task.
Furthermore, you can feed synthetic data to your model; decide which metrics to prioritize; and change the train/test split ratio on the advanced bar. Let's add synthetic data, leave the metrics AUTO, and train/test split ratio as 80/20.
You will be able to see the model performance results after a few minutes. On the left side, you can see the accuracy and the error rate of your model compared to a baseline method. You will also be given some suggestion to improve your model performance further. For instance, we are told to add more data to our model for better metrics results in our case. Moreover, you can see how each feature influences the prediction on the column importance bar.
At the bottom of the model training section, you are able to see the model training history. The procedures such as scaling, train/test ratio, and removed columns are presented in this history part.
You can click on the dashboard widget on the top right for a more detailed report. Don't forget to click on the help icon for more comprehensive explanations about results. Here is the brief summary for some of the features on the dashboard:
Combined feature influence: This is a Sankey diagram demonstraing the flow of predictions.
Model accuracy: The accuracy results for training and test sets.
Data distribution: This shows the distribution of the target variable. It can be practical for unbalanced sets, and you may need to add more data in favor of underrepresented target labels.
Confusion matrix: This will show predicted versus actual labels on a 2 x 2 matrix format.
Finally, you can make new predictions at the Prediction bar after entering values for each features!
Updated on: 10/02/2023
Thank you!