top of page

Univariate and Bivariate Analysis

Writer's picture: TECH BUDDYTECH BUDDY




Univariate Analysis


What is a univariate analysis? What is the use of univariate analysis?

univariate analysis is when we focus on a single variable at a time, summarise the variable, and use this summary to discover insights and anomalies. Exploration methods on the type of variables. Let's discuss univariate analysis for two different variable


Univariate analysis for continuous variables


Univariate analysis for continuous variables describes the central tendency and dispersion of variables such as mean, median, and mode. It tells about the distribution of the variable whether it is symmetric, right-skewed, left-skewed. It also helps in identifying missing values and outliers.


Methods of performing univariate analysis of continuous variable


Tabular methods:- Mean, Median, standard deviation and missing values

describe() function returns all required result in tabular form.

df.describe()

Graphical methods:- Distribution of variables, presence of outliers

The histogram is used to identify the distribution of continuous variables. Boxplot is used to identify outliers.

df['var1'].plot.hist()
df['var1'].plot.box()


Univariate analysis for categorical variables.


Univariate analysis of categorical variables is used to identify the absolute frequency of each category. Sometimes it is more to find a proportion of different categories in a categorical variable through univariate analysis. Suppose you want to find a number of the house in house prediction dataset which has a parking facility or how many percentages of the house has a parking facility. It can be found through univariate analysis.


Methods of performing univariate analysis of categorical variable


Tabular method- frequency tables

Value_counts function is used to find the frequency table

df['var1'].value_counts()

Graphical method- Barplots

Barplot is used to visualize the frequency of the table.

df['var1'].value_counts.plot.bar()

Bivariate Analysis


What is a bivariate analysis? What is the use of bivariate analysis?

Bivariate analysis is when we explore two-variable together for their empirical relationship or to check whether two variables are associated with each other or not. The bivariate analysis helps in prediction one may be used to infer others. It also helps in detecting outliers.



Types of Bivariate analysis


Continuous-continuous Variables:- This type is used to identify the relationship between two continuous variables. Example. Does the weight of a person increase with its height? It can be found through an analysis test correlation. Correlation is used to identify the unique relationship between two continuous variables. Two variables have a positive correlation when the value of correlation is positive.

df['var1'].corr(df['var2'])


Categorical - continuous analysis.:- This type is used to identify the relationship between continuous and categorical variables. Example:- is the mean age of a male is different from the mean age of females? Another analysis test which is known as the T-test used to solve this problem.

df.groupby('sex')['Age'].mean().plot.bar()
#importing the scipy library for ttest 
from scipy.stats import ttest_ind
males= df[df['sex']=='male']
females= df[df['sex']=='female']
ttest_ind(males['Age'], females['Age']


Categorical- categorical analysis:- This type is used to identify the relationship between two categorical variables. Example:- Does gender have any effect on survival rates in a titanic problem? Analysis test which is known as the Chi-square test used to solve this problem.


pd.crosstab(df['sex'],df['survived'])
from scipy.stats import chi2_contingency
chi2_contingency(pd.crosstab(df['sex'],df['survived']))

Data set and jupyter notebook for data exploration of titanic problem is given below:-









44 views0 comments

Recent Posts

See All

コメント


Post: Blog2_Post
bottom of page