- Understand the concept of correlation and regression.
- Compute and interpret the correlation coefficient.
- Learn the simple linear regression model.
- Estimate regression coefficients using least squares.
- Visualize regression lines and data correlation.
Definition of Correlation
Correlation measures the strength and direction of the relationship between two variables.
- If two variables increase together, they have a positive correlation.
- If one variable increases while the other decreases, they have a negative correlation.
- If there is no systematic relationship, they are uncorrelated.
Correlation Coefficient
The **Pearson correlation coefficient** (\( r \)) measures the strength of a linear relationship:
\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \]Interpretation of \( r \):
Correlation Coefficient (\( r \)) | Interpretation |
---|---|
\( 1.0 \) | Perfect positive correlation |
\( 0.5 \) to \( 1.0 \) | Strong positive correlation |
\( 0.0 \) to \( 0.5 \) | Weak positive correlation |
\( -0.5 \) to \( 0.0 \) | Weak negative correlation |
\( -1.0 \) to \( -0.5 \) | Strong negative correlation |
Simple Linear Regression
Linear regression is a method for modeling the relationship between two variables using a straight-line equation:
\[ y = b_0 + b_1 x + \varepsilon \]where:
- \( y \) = dependent variable (response)
- \( x \) = independent variable (predictor)
- \( b_0 \) = intercept (value of \( y \) when \( x = 0 \))
- \( b_1 \) = slope (change in \( y \) for a unit change in \( x \))
- \( \varepsilon \) = error term
Least Squares Estimation
The regression coefficients (\( b_0 \) and \( b_1 \)) are estimated using the **least squares method**:
\[ b_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \] \[ b_0 = \bar{y} - b_1 \bar{x} \]Derivation
The sum of squared residuals is given by:
\[ S = \sum (y_i - (b_0 + b_1 x_i))^2 \]Taking the derivative with respect to \( b_0 \) and \( b_1 \) and solving for zero gives:
\[ b_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \] \[ b_0 = \bar{y} - b_1 \bar{x} \]Visualization of Regression Line and Correlation
Examples
Example 1: A researcher collects the following data on study hours (\( x \)) and test scores (\( y \)):
Study Hours (\( x \)) | Test Score (\( y \)) |
---|---|
2 | 60 |
4 | 65 |
6 | 70 |
8 | 75 |
10 | 80 |
Compute the regression equation.
- \( \bar{x} = 6 \), \( \bar{y} = 70 \)
- \( b_1 = 2.5 \)
- \( b_0 = 55 \)
Regression equation:
\[ y = 55 + 2.5x \]Exercises
- Question 1: Given the dataset \( (1,2), (2,4), (3,6), (4,8) \), compute \( b_0 \) and \( b_1 \).
- Question 2: If the correlation coefficient between height and weight is \( r = 0.8 \), interpret its meaning.
- Question 3: Find the regression line for \( (1,3), (2,5), (3,7) \).
- Answer 1: \( b_0 = 0 \), \( b_1 = 2 \).
- Answer 2: Strong positive correlation.
- Answer 3: \( y = 1 + 2x \).