Logistic Regression
1. Logistic Regression Cheat Sheet
High-Level Overview:
Logistic Regression is a supervised learning method used to predict a binary outcome (e.g., "premium" vs. "regular" nail polish). Instead of predicting a continuous value, it predicts the probability that a given instance belongs to a certain class.
Machine Learning Engineers often use Logistic Regression for quick classification baselines, as it’s interpretable and efficient for moderate-sized datasets.
Key Concepts:
- Model Output: Probability that the instance belongs to the positive class (e.g., premium nail polish).
- Sigmoid Function: Maps any real number to a probability between 0 and 1.
- Parameters: Weights and intercept similar to linear models, but interpreted in terms of odds.
- Cost Function: Binary Cross-Entropy (Log Loss) that measures how well the model fits the data.
- Optimization: Adjust weights to minimize the log loss (often using Gradient Descent).
Logistic Model Equation:
\[ \hat{y} = \sigma(z) \quad \text{where} \quad z = w_0 + w_1 x_1 + \ldots + w_n x_n \]
Here:
- \(\hat{y}\): Predicted probability of the positive class (e.g., probability that nail polish is premium)
- \(z\): Linear combination of features (same form as in linear regression)
- \(\sigma(z) = \frac{1}{1+e^{-z}}\): Sigmoid function that maps \(z\) to a probability in \([0,1]\)
- \(w_0, w_1, ..., w_n\): Weights (parameters) to be learned
- \(x_1, ..., x_n\): Features (e.g., Shine Intensity, Drying Time, Price)
Cost Function (Binary Cross-Entropy Loss):
\[ J(\mathbf{w}) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1-\hat{y}_i) \right] \]
Here:
- \(m\): Number of training samples
- \(y_i \in \{0,1\}\): Actual class label (e.g., 1 if premium, 0 if regular)
- \(\hat{y}_i \in [0,1]\): Predicted probability for the \(i\)-th sample
Gradient Descent Update Rule:
\[ w_j := w_j - \alpha \frac{1}{m}\sum_{i=1}^{m}(\hat{y}_i - y_i) x_{i,j} \]
Here:
- \(\alpha\): Learning rate
- \(x_{i,j}\): Feature \(j\) of the \(i\)-th sample
Detailed Example (Nail Polish Classification):
Suppose you have a dataset of nail polishes with features like:
- Shine Intensity (\(x_1\))
- Drying Time (\(x_2\))
- Price (\(x_3\))
After collecting data, you have a table:
- Nail Polish A: Shine=0.8, Drying=0.5, Price=15 (Premium=1)
- Nail Polish B: Shine=0.4, Drying=0.7, Price=8 (Premium=0)
- Nail Polish C: Shine=0.9, Drying=0.4, Price=20 (Premium=1)
- Nail Polish D: Shine=0.6, Drying=0.6, Price=10 (Premium=0) ... and so forth.
Logistic Regression finds weights \( w_0, w_1, w_2, w_3 \) to produce: \[ \hat{y} = \sigma(w_0 + w_1 \cdot \text{Shine} + w_2 \cdot \text{Drying} + w_3 \cdot \text{Price}) \] If \(\hat{y} > 0.5\), predict Premium (1), else Regular (0).
By iteratively applying Gradient Descent, the model adjusts \(w_j\) to minimize the log loss. After training, you might find that very shiny, quick-drying, and higher-priced polishes have higher predicted probabilities of being premium.
Step-by-Step Summary:
- Collect & Prepare Data: Gather features and binary labels (e.g., premium vs. regular).
- Define Model: Use a linear combination of features inside a sigmoid function to predict probabilities.
- Choose Cost Function: Binary Cross-Entropy (Log Loss) to measure prediction quality.
- Optimize Parameters: Use Gradient Descent to find \( w_j \) that minimize log loss.
- Evaluate & Adjust: Check accuracy, precision, recall, etc. Adjust learning rate or features as needed.
ML Engineer Perspective:
- Use Logistic Regression as a baseline for classification tasks.
- Interpret coefficients to understand how each feature affects the odds of being premium.
- Compare with more complex models (like random forests or neural nets) to see if complexity is beneficial.
Code Example (Python with scikit-learn):
import numpy as np
from sklearn.linear_model import LogisticRegression
# Example Data: Nail polishes with features [Shine, Drying, Price]
X = np.array([
[0.8, 0.5, 15],
[0.4, 0.7, 8],
[0.9, 0.4, 20],
[0.6, 0.6, 10],
[0.75,0.5, 12]
])
# Labels: 1 for Premium, 0 for Regular
y = np.array([1, 0, 1, 0, 1])
model = LogisticRegression()
model.fit(X, y)
# Predict the probability of premium for a new polish: Shine=0.7, Drying=0.55, Price=14
new_polish = np.array([[0.7, 0.55, 14]])
prob_premium = model.predict_proba(new_polish)[0,1]
prediction = model.predict(new_polish)[0]
print("Predicted Probability (Premium):", prob_premium)
print("Predicted Class:", prediction)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
This example trains a logistic regression model on the nail polish dataset and predicts whether a new polish is likely to be premium.
Key Takeaways:
- Logistic Regression predicts probabilities for binary classification.
- The sigmoid function converts linear combinations of features into probabilities.
- Optimize parameters using log loss and Gradient Descent.
- A good baseline classifier that's easy to implement and interpret.