Set of statistical processes for estimating the relationships among variables.
Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest. While there are many types of regression analysis, at their core they all examine the influence of one or more independent variables on a dependent variable.
Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables:
The simple linear regression model is expressed as Y = a + bX + e, where:
Multiple linear regression is an extension of simple linear regression used to predict an outcome variable (Y) based on multiple distinct predictor variables (X). With three or more variables involved, the data is modeled as a hyperplane in multidimensional space.
The multiple linear regression model is expressed as Y = a + b1X1 + b2X2 + ... + bnXn + e, where:
The coefficient of determination, often denoted as R^2, is a statistical metric that is used to measure the extent of variance in the dependent variable that is predictable from the independent variable(s). It is an important tool for determining the goodness of fit of the regression model.
R^2 always lies between 0 and 1, where 0 indicates that the proposed model does not improve prediction over the mean model, and 1 indicates perfect prediction. Improvement in the regression model results in proportional increases in R-squared.
There are several key assumptions that underpin the use of regression analysis:
By understanding these concepts, you will be able to build and interpret simple and multiple linear regression models, which are foundational to machine learning and predictive analytics.