Linear Regression and Correlation

Understand the concept of correlation and regression.
Compute and interpret the correlation coefficient.
Learn the simple linear regression model.
Estimate regression coefficients using least squares.
Visualize regression lines and data correlation.

Definition of Correlation
Correlation Coefficient
Simple Linear Regression
Least Squares Estimation
Examples

Definition of Correlation

Correlation measures the strength and direction of the relationship between two variables.

If two variables increase together, they have a positive correlation.
If one variable increases while the other decreases, they have a negative correlation.
If there is no systematic relationship, they are uncorrelated.

Correlation Coefficient

The **Pearson correlation coefficient** (\( r \)) measures the strength of a linear relationship:

\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \]

Interpretation of \( r \):

Correlation Coefficient (\( r \))	Interpretation
\( 1.0 \)	Perfect positive correlation
\( 0.5 \) to \( 1.0 \)	Strong positive correlation
\( 0.0 \) to \( 0.5 \)	Weak positive correlation
\( -0.5 \) to \( 0.0 \)	Weak negative correlation
\( -1.0 \) to \( -0.5 \)	Strong negative correlation

Simple Linear Regression

Linear regression is a method for modeling the relationship between two variables using a straight-line equation:

\[ y = b_0 + b_1 x + \varepsilon \]

where:

\( y \) = dependent variable (response)
\( x \) = independent variable (predictor)
\( b_0 \) = intercept (value of \( y \) when \( x = 0 \))
\( b_1 \) = slope (change in \( y \) for a unit change in \( x \))
\( \varepsilon \) = error term

Least Squares Estimation

The regression coefficients (\( b_0 \) and \( b_1 \)) are estimated using the **least squares method**:

\[ b_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \] \[ b_0 = \bar{y} - b_1 \bar{x} \]

Derivation

The sum of squared residuals is given by:

\[ S = \sum (y_i - (b_0 + b_1 x_i))^2 \]

Taking the derivative with respect to \( b_0 \) and \( b_1 \) and solving for zero gives:

\[ b_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \] \[ b_0 = \bar{y} - b_1 \bar{x} \]

Visualization of Regression Line and Correlation

Examples

Example 1: A researcher collects the following data on study hours (\( x \)) and test scores (\( y \)):

Study Hours (\( x \))	Test Score (\( y \))
2	60
4	65
6	70
8	75
10	80

Compute the regression equation.

\( \bar{x} = 6 \), \( \bar{y} = 70 \)
\( b_1 = 2.5 \)
\( b_0 = 55 \)

Regression equation:

\[ y = 55 + 2.5x \]

Exercises

Question 1: Given the dataset \( (1,2), (2,4), (3,6), (4,8) \), compute \( b_0 \) and \( b_1 \).
Question 2: If the correlation coefficient between height and weight is \( r = 0.8 \), interpret its meaning.
Question 3: Find the regression line for \( (1,3), (2,5), (3,7) \).

Answer 1: \( b_0 = 0 \), \( b_1 = 2 \).
Answer 2: Strong positive correlation.
Answer 3: \( y = 1 + 2x \).

Math Notes

Search This Blog