top of page

Understanding Hypothesis generation and Data Extraction.

Writer's picture: TECH BUDDYTECH BUDDY






After problem definition, the next stage of predictive modeling is hypothesis generation. A hypothesis is an opinion or view of a problem he/she wants to solve.

For example:- if you want to build a model that will predict the house price. Think of what are the factors related to the price of the house.


The first thing, Price of the house in some locality will be higher than other

Second, the house which has a parking facility will be expensive.

Third, the house which has more number of the balcony will cost higher


After making all the hypothesis you don't know which of these would be true. So, the question comes to our mind about why it is important. This question sounds very intuitive. But think about scenario first one is when you make a hypothesis and another one when you don't. Think about the approach when you don't make a hypothesis. In today’s world, there is no end to what data you can capture and how much time you can spend in trying to find out more variables/data. You will try to go through each variable and understand the correlation between them which will take a lot of time. you don’t know exactly what you are looking for and you are exploring every possible variable and relationship in a hope to use all. It's very difficult and time-consuming.


Think about the other approach, when you list down a hypothesis that might influence the problem. Next, you see which variables are readily available or can be collected. Now, this list should give you a set of smaller, specific individual pieces of analysis to work on. For example, instead of understanding all 500 variables first, you check whether the bureau provides a number of past defaults or not and use it in your analysis. This saves a lot of time and effort and if you progress on hypothesis in order of your expected importance, you will be able to finish the analysis in a fraction of time.



Data Extraction




Data extraction allows you to collect a record of past circumstances so that we can use data analysis to find repeating patterns. These patterns help in making predictive models using a different algorithm that looks for trends and predicts future changes. After completing the hypothesis, you have an idea about what you are looking for. You can collect data from as many sources to check whether your hypothesis is correct or not. Also when you look at data you can figure out more hypothesis which will improve your model. Suppose you want to build a model which will predict the default rate of customer. You can collect data from as many resources to build a model. For example:- To build this model you can collect data from demographics of a customer, transaction history, payment history, and credit history of a customer.. Each of these data could either approve or disapprove your data. The quality of the model is dependent on the data from which they are built. The data should free from error and contain relevant information
















47 views0 comments

Recent Posts

See All

Comments


Post: Blog2_Post
bottom of page