LInear Regression

Linear Regression Cheat Sheet

1. Linear Regression Cheat Sheet

Regression
Supervised Learning

High-Level Overview:

Linear Regression is a simple supervised learning technique that models the relationship between one or more input features (independent variables) and a continuous output (dependent variable) using a linear function.

Machine Learning Engineers often use it as a baseline model to quickly understand the data or to compare against more complex methods.

Key Concepts:

  • Model: A line (in 2D) or hyperplane (in higher dimensions) that best fits the data.
  • Parameters: Weights (coefficients) and intercept that define the model’s line or plane.
  • Cost Function: A metric (often Mean Squared Error) to measure how well the model fits.
  • Optimization: Finding the parameters that minimize the cost function (e.g., via Gradient Descent).

Linear Model Equation:

\[ y = w_0 + w_1 x_1 + \ldots + w_n x_n \]

Here:

  • \(y\): Predicted value (target)
  • \(w_0\): Intercept (bias term)
  • \(w_1, \ldots, w_n\): Weights (coefficients) for each feature
  • \(x_1, \ldots, x_n\): Input features (independent variables)
Each weight \( w_j \) shows how \( y \) changes as the corresponding feature \( x_j \) changes.

Cost Function (Mean Squared Error):

\[ J(\mathbf{w}) = \frac{1}{2m}\sum_{i=1}^{m}(y_i - \hat{y}_i)^2 \]

Here:

  • \(m\): Number of data samples
  • \(y_i\): Actual value of the target for the \(i\)-th sample
  • \(\hat{y}_i\): Predicted value for the \(i\)-th sample
\( J(\mathbf{w}) \) measures how far predictions are from actual values, on average.

Gradient Descent Update Rule:

\[ w_j := w_j - \alpha \frac{1}{m}\sum_{i=1}^{m}(\hat{y}_i - y_i)x_{i,j} \]

Here:

  • \(\alpha\): Learning rate, controlling the size of each update step
  • \(x_{i,j}\): Value of feature \( j \) for the \(i\)-th sample
Each iteration adjusts weights to reduce the cost \( J(\mathbf{w}) \).

Normal Equation (Closed-form Solution):

\[ \mathbf{w} = (X^T X)^{-1} X^T y \]

Here:

  • \(X\): Design matrix of input features
  • \(y\): Vector of target values
The Normal Equation finds the optimal weights \(\mathbf{w}\) without iteration, but can be computationally expensive for very large feature sets.

Step-by-Step Summary:

  1. Collect Data: Gather features and continuous target values.
  2. Define Model: Assume a linear relationship \(y = w_0 + \sum w_j x_j\).
  3. Choose Cost Function: MSE is common to measure prediction error.
  4. Optimize Parameters: Use Gradient Descent or the Normal Equation to find \(w_j\) that minimize MSE.
  5. Evaluate & Refine: Check performance (e.g., RMSE), adjust the learning rate, add features, or regularize if needed.

ML Engineer Perspective:

  • Use Linear Regression as a quick baseline.
  • Interpret weights to understand feature importance.
  • Compare with more complex models to see if complexity is justified.

Code Example (Python with scikit-learn):


import numpy as np
from sklearn.linear_model import LinearRegression

# Example: House size vs. House price
X = np.array([[800], [1000], [1200], [1500], [2000]])  # House sizes
y = np.array([180, 200, 240, 300, 360])                # Prices (in thousands)

model = LinearRegression()
model.fit(X, y)

pred_price = model.predict([[1300]])
print("Predicted price for 1300 sq ft:", pred_price[0])
print("Intercept (w0):", model.intercept_)
print("Slope (w1):", model.coef_[0])
                

For multiple features, provide a 2D array for X. The model finds a weight for each feature plus an intercept.

Key Takeaways:

  • Linear Regression fits a linear model to predict a continuous variable.
  • MSE is a common cost function measuring prediction error.
  • Use Gradient Descent or the Normal Equation for parameter estimation.
  • Simple, interpretable baseline for many regression tasks.
Previous
Previous

Logistic Regression

Next
Next

PCA Cheat Sheet