PCA Cheat Sheet

Dec 18

1. PCA (Principal Component Analysis) Cheat Sheet

Dimensionality Reduction

Unsupervised Learning

High-Level Overview:

PCA transforms possibly correlated features into linearly uncorrelated features called principal components. It reduces dimensionality by selecting the principal components that capture the most variance in the data.

Imagine you have a dataset of red nail polishes described by multiple features such as Brightness (of red), Shine Intensity, and Drying Time. PCA helps you turn these three features into fewer, more meaningful directions that summarize most of the variation in your polishes.

Key Concepts:

Features: Original measurements, e.g. Brightness, Shine Intensity, Drying Time.
Variance: How spread out data is along a certain direction.
Principal Components (PCs): New axes capturing maximum variance.
Eigenvectors: Directions of maximum variance in feature space.
Eigenvalues: Amount of variance captured by each eigenvector.
Dimensionality Reduction: Keep top PCs (largest eigenvalues) to represent data with fewer dimensions.

Step-by-Step Summary:

Collect Data: E.g., a table of red nail polishes with (Brightness, Shine Intensity, Drying Time).
Center the Data: Subtract the mean from each feature.
Compute Covariance Matrix: Measures how features vary together.
Eigen Decomposition: Find eigenvectors & eigenvalues of covariance matrix.
Select PCs: Sort by eigenvalue, choose top ones capturing most variance.
Transform Data: Project onto these PCs to reduce dimensions.

Intuition with a Nail Polish Example:

With 3 features (Brightness (B), Shine (S), Drying Time (D)), data is in 3D space. PCA finds 3 new axes (PCs). The first PC might distinguish very bright & shiny polishes from dull ones. By keeping just the first two PCs, you go from 3D to 2D, capturing most of the variation while simplifying the data.

Why Does This Reduce Dimensions?

Each principal component is a mix of all original features arranged so that the largest sources of variation come first. By keeping only the first few components, you represent most of the structure in fewer dimensions.

Example Code (Python with scikit-learn):

                
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Example Data: Suppose we have features:
# [Brightness, Shine Intensity, Drying Time]
data = np.array([
    [0.8, 0.7, 0.5],
    [0.75, 0.9, 0.55],
    [0.9, 0.85, 0.6],
    [0.6, 0.65, 0.4],
    [0.7, 0.8, 0.45]
])

# 1. Standardize the data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# 2. Initialize PCA (reduce from 3D to 2D)
pca = PCA(n_components=2)

# 3. Fit and transform the data
reduced_data = pca.fit_transform(data_scaled)

print("Original shape:", data.shape)
print("Transformed shape:", reduced_data.shape)
print("Principal Components:\n", pca.components_)
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
                
            

Covariance Matrix (ASCII Illustration):

                
Imagine your raw data (B, S, D):

    Data Matrix X (n x d):
    -------------------------------------
    |   B        S        D              |
    -------------------------------------
    | 0.8      0.7      0.5             |
    | 0.75     0.9      0.55            |
    | 0.9      0.85     0.6             |
    | 0.6      0.65     0.4             |
    | 0.7      0.8      0.45            |
    -------------------------------------
    n=5 samples, d=3 features

After centering, we compute the covariance matrix Σ (3x3):

    Σ = | cov(B,B)  cov(B,S)  cov(B,D) |
        | cov(S,B)  cov(S,S)  cov(S,D) |
        | cov(D,B)  cov(D,S)  cov(D,D) |

Example values might look like:

    Σ = | 0.02   0.015  0.01  |
        | 0.015  0.03   0.012 |
        | 0.01   0.012  0.025 |

This matrix captures how each pair of features vary together.
                
            

Key Takeaways:

PCA uses eigenvectors and eigenvalues of the covariance matrix to find new axes (PCs).
Each PC is a linear combination of the original features.
By focusing on the top PCs, you reduce dimensions while retaining most data variance.

Tal Nabulsi