After completing previous stages of predictive modeling, problem definition, hypothesis generation, data extraction, data exploration, we are now ready to build our first predictive model. We will start with a basic model. This basic model will give the first estimate for example:- what accuracy score will be considered as a good score. These basic models also tell what would be the business logic to solve these problems. And your predictive model should be better than this.
Steps to make the base model
Create the dataset for the predictive model:- The aim of the predictive model is to find mapping which takes the independent variable as input and gives the dependent variable as the output through the past data. The first step to identify the dependent variable from the available data. In order to find a mapping, we divide the dataset into training and testing part. We divide the dataset in order to make sure that our model is robust.
Create a benchmark model:- First, we will see how to create a benchmark of a regression problem for predicting continuous value. Let us start with a problem to build a predictive model and find out the sales of each product at a particular store. To find the sales of the product in a particle store, we can use mean to predict it. We can improve the model by taking a mean with respect to other variables. We can improve the further model by taking into account more variables. This will become our benchmark model
Now, how to create a benchmark for the classification problem. Take a problem to predict whether the passenger of titanic would have survived or not. In this problem, the simplest way to create a benchmark is a mode of survival. We can improve the model by taking a mode with respect to gender. We can improve the further model by taking into account more variables. This will become our benchmark model.
Evaluation of a model:- Evaluation method depends on the type of problem. The simplest way to evaluate a model of the regression problem is the mean absolute error which is the sum of the absolute difference between every observation divided by the number of observations. We can evaluate a model of the classification problem through the accuracy of the model which is correctly predicted observation upon total observation.
Dataset and jupyter notebook file of creating a benchmark model
Comments