Logistic Regression

Logistic Regression Cheat Sheet

1. Logistic Regression Cheat Sheet

Classification
Supervised Learning

High-Level Overview:

Logistic Regression is a supervised learning method used to predict a binary outcome (e.g., "premium" vs. "regular" nail polish). Instead of predicting a continuous value, it predicts the probability that a given instance belongs to a certain class.

Machine Learning Engineers often use Logistic Regression for quick classification baselines, as it’s interpretable and efficient for moderate-sized datasets.

Key Concepts:

  • Model Output: Probability that the instance belongs to the positive class (e.g., premium nail polish).
  • Sigmoid Function: Maps any real number to a probability between 0 and 1.
  • Parameters: Weights and intercept similar to linear models, but interpreted in terms of odds.
  • Cost Function: Binary Cross-Entropy (Log Loss) that measures how well the model fits the data.
  • Optimization: Adjust weights to minimize the log loss (often using Gradient Descent).

Logistic Model Equation:

\[ \hat{y} = \sigma(z) \quad \text{where} \quad z = w_0 + w_1 x_1 + \ldots + w_n x_n \]

Here:

  • \(\hat{y}\): Predicted probability of the positive class (e.g., probability that nail polish is premium)
  • \(z\): Linear combination of features (same form as in linear regression)
  • \(\sigma(z) = \frac{1}{1+e^{-z}}\): Sigmoid function that maps \(z\) to a probability in \([0,1]\)
  • \(w_0, w_1, ..., w_n\): Weights (parameters) to be learned
  • \(x_1, ..., x_n\): Features (e.g., Shine Intensity, Drying Time, Price)

Cost Function (Binary Cross-Entropy Loss):

\[ J(\mathbf{w}) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1-\hat{y}_i) \right] \]

Here:

  • \(m\): Number of training samples
  • \(y_i \in \{0,1\}\): Actual class label (e.g., 1 if premium, 0 if regular)
  • \(\hat{y}_i \in [0,1]\): Predicted probability for the \(i\)-th sample
This cost function penalizes confident wrong predictions heavily and encourages probabilities that match actual outcomes.

Gradient Descent Update Rule:

\[ w_j := w_j - \alpha \frac{1}{m}\sum_{i=1}^{m}(\hat{y}_i - y_i) x_{i,j} \]

Here:

  • \(\alpha\): Learning rate
  • \(x_{i,j}\): Feature \(j\) of the \(i\)-th sample
The update rule is similar to linear regression’s gradient descent, but \(\hat{y}_i\) is derived from the sigmoid function.

Detailed Example (Nail Polish Classification):

Suppose you have a dataset of nail polishes with features like:

  • Shine Intensity (\(x_1\))
  • Drying Time (\(x_2\))
  • Price (\(x_3\))
You want to predict whether a nail polish is "premium" (1) or "regular" (0).

After collecting data, you have a table:

  • Nail Polish A: Shine=0.8, Drying=0.5, Price=15 (Premium=1)
  • Nail Polish B: Shine=0.4, Drying=0.7, Price=8 (Premium=0)
  • Nail Polish C: Shine=0.9, Drying=0.4, Price=20 (Premium=1)
  • Nail Polish D: Shine=0.6, Drying=0.6, Price=10 (Premium=0)
  • ... and so forth.

Logistic Regression finds weights \( w_0, w_1, w_2, w_3 \) to produce: \[ \hat{y} = \sigma(w_0 + w_1 \cdot \text{Shine} + w_2 \cdot \text{Drying} + w_3 \cdot \text{Price}) \] If \(\hat{y} > 0.5\), predict Premium (1), else Regular (0).

By iteratively applying Gradient Descent, the model adjusts \(w_j\) to minimize the log loss. After training, you might find that very shiny, quick-drying, and higher-priced polishes have higher predicted probabilities of being premium.

Step-by-Step Summary:

  1. Collect & Prepare Data: Gather features and binary labels (e.g., premium vs. regular).
  2. Define Model: Use a linear combination of features inside a sigmoid function to predict probabilities.
  3. Choose Cost Function: Binary Cross-Entropy (Log Loss) to measure prediction quality.
  4. Optimize Parameters: Use Gradient Descent to find \( w_j \) that minimize log loss.
  5. Evaluate & Adjust: Check accuracy, precision, recall, etc. Adjust learning rate or features as needed.

ML Engineer Perspective:

  • Use Logistic Regression as a baseline for classification tasks.
  • Interpret coefficients to understand how each feature affects the odds of being premium.
  • Compare with more complex models (like random forests or neural nets) to see if complexity is beneficial.

Code Example (Python with scikit-learn):


import numpy as np
from sklearn.linear_model import LogisticRegression

# Example Data: Nail polishes with features [Shine, Drying, Price]
X = np.array([
    [0.8, 0.5, 15],
    [0.4, 0.7,  8],
    [0.9, 0.4, 20],
    [0.6, 0.6, 10],
    [0.75,0.5, 12]
])
# Labels: 1 for Premium, 0 for Regular
y = np.array([1, 0, 1, 0, 1])

model = LogisticRegression()
model.fit(X, y)

# Predict the probability of premium for a new polish: Shine=0.7, Drying=0.55, Price=14
new_polish = np.array([[0.7, 0.55, 14]])
prob_premium = model.predict_proba(new_polish)[0,1]
prediction = model.predict(new_polish)[0]

print("Predicted Probability (Premium):", prob_premium)
print("Predicted Class:", prediction)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
                

This example trains a logistic regression model on the nail polish dataset and predicts whether a new polish is likely to be premium.

Key Takeaways:

  • Logistic Regression predicts probabilities for binary classification.
  • The sigmoid function converts linear combinations of features into probabilities.
  • Optimize parameters using log loss and Gradient Descent.
  • A good baseline classifier that's easy to implement and interpret.
Previous
Previous

K means

Next
Next

LInear Regression