Data Analysis: Univariate, Bivariate, Multivariate
Univariate analysis is the simplest form of data analysis where the data being analyzed contains only one variable. Since it’s a single variable it doesn’t deal with causes or relationships. The main purpose of univariate analysis is to describe the data and find patterns that exist within it
You can think of the variable as a category that your data falls into. One example of a variable in univariate analysis might be “age”. Another might be “height”. Univariate analysis would not look at these two variables at the same time, nor would it look at the relationship between them.
Some ways you can describe patterns found in univariate data include looking at mean, mode, median, range, variance, maximum, minimum, quartiles, and standard deviation. Additionally, some ways you may display univariate data include frequency distribution tables, bar charts, histograms, frequency polygons, and pie charts.
Reasons to Use Univariate Data
Data is gathered for the purpose of answering a question, or more specifically, a research question. Univariate data does not answer research questions about relationships between variables, but rather it is used to describe one characteristic or attribute that varies from observation to observation. To describe how net worth varies, we would use univariate data to find the statistics that represent the center value for all American households along with how the other values spread from that center value.
A researcher would want to conduct a univariate analysis for two purposes. The first purpose would be to answer a research question that calls for a descriptive study on how one characteristic or attribute varies, such as describing how net worth varies from American family to American family.
A second purpose would be to examine how each characteristic or attribute varies before including two variables in a study using bivariate data or more than two variables in a study using multivariate data (bivariate data being for a 2-variable relationship and multivariate data being for a more than 2-variable relationship). For example, it would be beneficial to examine how net worth per family varies before including it in an analysis that correlates it with a second variable, say, educational attainment.
Bivariate analysis is used to find out if there is a relationship between two different variables. Something as simple as creating a scatterplot by plotting one variable against another on a Cartesian plane (think X and Y axis) can sometimes give you a picture of what the data is trying to tell you. If the data seems to fit a line or curve then there is a relationship or correlation between the two variables. For example, one might choose to plot caloric intake versus weight.
Bivariate analysis means the analysis of bivariate data. It is one of the simplest forms of statistical analysis, used to find out if there is a relationship between two sets of values.
Types of Bivariate Analysis
Common types of bivariate analysis include:
- Scatter plots,
These give you a visual idea of the pattern that your variables follow.
- Regression Analysis
Regression analysis is a catch all term for a wide variety of tools that you can use to determine how your data points might be related. In the image above, the points look like they could follow an exponential curve (as opposed to a straight line). Regression analysis can give you the equation for that curve or line. It can also give you the correlation coefficient.
- Correlation Coefficients
Calculating values for correlation coefficients are using performed on a computer, although you can find the steps to find the correlation coefficient by hand here. This coefficient tells you if the variables are related. Basically, a zero means they aren’t correlated (i.e. related in some way), while a 1 (either positive or negative) means that the variables are perfectly correlated (i.e. they are perfectly in sync with each other).
Multivariate analysis is the analysis of three or more variables. There are many ways to perform multivariate analysis depending on your goals. Some of these methods include Additive Tree, Canonical Correlation Analysis, Cluster Analysis, Correspondence Analysis / Multiple Correspondence Analysis, Factor Analysis, Generalized Procrustean Analysis, MANOVA, Multidimensional Scaling, Multiple Regression Analysis, Partial Least Square Regression, Principal Component Analysis / Regression / PARAFAC, and Redundancy Analysis.
Multivariate analysis methods typically used for:
- Consumer and market research
- Quality control and quality assurance across a range of industries such as food and beverage, paint, pharmaceuticals, chemicals, energy, telecommunications, etc
- Process optimization and process control
- Research and development
With Multivariate Analysis you can:
- Obtain a summary or an overview of a table. This analysis is often called Principal Components Analysis or Factor Analysis. In the overview, it is possible to identify the dominant patterns in the data, such as groups, outliers, trends, and so on. The patterns are displayed as two plots
- Analyze groups in the table, how these groups differ, and to which group individual table rows belong. This type of analysis is called Classification and Discriminant Analysis
- Find relationships between columns in data tables, for instance relationships between process operation conditions and product quality. The objective is to use one set of variables (columns) to predict another, for the purpose of optimization, and to find out which columns are important in the relationship. The corresponding analysis is called Multiple Regression Analysis or Partial Least Squares (PLS), depending on the size of the data table