Multivariate Statistics

Understand the concept of multivariate statistics.
Analyze relationships between multiple variables.
Learn about multivariate probability distributions.
Compute covariance and correlation matrices.
Understand the multivariate normal distribution.
Explore Principal Component Analysis (PCA).

Definition of Multivariate Statistics
Covariance and Correlation Matrices
Multivariate Probability Distributions
Multivariate Normal Distribution
Principal Component Analysis (PCA)
Examples

Definition of Multivariate Statistics

Multivariate statistics involves analyzing multiple variables simultaneously to study their relationships.

Univariate analysis: Single variable (e.g., height distribution).
Bivariate analysis: Two variables (e.g., height vs. weight).
Multivariate analysis: Three or more variables (e.g., height, weight, and age).

Covariance and Correlation Matrices

The covariance matrix describes the relationships between multiple variables:

\[ \Sigma = \begin{bmatrix} \text{Var}(X_1) & \text{Cov}(X_1, X_2) & \dots & \text{Cov}(X_1, X_n) \\ \text{Cov}(X_2, X_1) & \text{Var}(X_2) & \dots & \text{Cov}(X_2, X_n) \\ \vdots & \vdots & \ddots & \vdots \\ \text{Cov}(X_n, X_1) & \text{Cov}(X_n, X_2) & \dots & \text{Var}(X_n) \end{bmatrix} \]

The correlation matrix standardizes the relationships:

\[ R = \begin{bmatrix} 1 & \rho_{12} & \dots & \rho_{1n} \\ \rho_{21} & 1 & \dots & \rho_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \rho_{n1} & \rho_{n2} & \dots & 1 \end{bmatrix} \]

Multivariate Probability Distributions

Multivariate probability distributions describe joint behavior of multiple variables.

Joint Probability Function: \( P(X_1, X_2, \dots, X_n) \).
Marginal Distributions: Individual distributions obtained by summing/integrating over other variables.
Conditional Distributions: Probability of one variable given values of others.

Multivariate Normal Distribution

The multivariate normal distribution is an extension of the normal distribution to multiple variables:

\[ f(\mathbf{x}) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} \exp \left( -\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right) \]

where:

\( \boldsymbol{\mu} \) = mean vector
\( \Sigma \) = covariance matrix
\( \mathbf{x} \) = multivariate variable

Derivation

The multivariate normal distribution generalizes the univariate normal:

\[ P(X) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(X - \mu)^2}{2\sigma^2}} \]

Extending this to \( n \) dimensions leads to the matrix form:

\[ f(\mathbf{x}) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} \exp \left( -\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right) \]

Principal Component Analysis (PCA)

PCA is a technique for reducing dimensionality while preserving variance.

Step 1: Compute the covariance matrix.
Step 2: Find eigenvalues and eigenvectors.
Step 3: Select top \( k \) principal components.
Step 4: Transform data to new coordinate system.

Examples

Example 1: Compute the covariance matrix for the dataset:

X	Y	Z
2	4	3
3	5	4
4	6	5

Exercises

Question 1: Compute the correlation matrix for the dataset:

\( (2,3), (3,4), (5,6) \).

Question 2: Find the first principal component of:

\( X_1 = (1,2,3), X_2 = (4,5,6) \).

Question 3: Compute the eigenvalues of the covariance matrix:

Answer 1: The correlation matrix is:

\( \begin{bmatrix}1 & 0.9 \\ 0.9 & 1\end{bmatrix} \).

Answer 2: First principal component: \( (0.71, 0.71) \).
Answer 3: Eigenvalues: \( \lambda_1 = 2.5, \lambda_2 = 0.5 \).

Math Notes

Search This Blog