Logistic Regression

Sigmoid (Logistic) Function: Mapping to Probability

The sigmoid function takes any real number and squishes it onto a scale between 0 and 1. This makes it perfect for converting the output of a linear model into a probability.

Formula: $f(x) = \frac{1}{1 + e^{-x}}$
Input: Any real number from $-\infty$ to $+\infty$ .
Output: A value between 0 and 1 (a probability).
Purpose: It answers the question, "Given this input, what is the probability of the outcome?"

Logit Function: Mapping to Log-Odds

The logit function does the exact opposite of the sigmoid function. It takes a probability and stretches it out onto an infinite scale. This is a two-step process that goes through Odds.

1. From Probability to Odds

First, we convert the probability into odds, which is the ratio of the probability of an event happening to the probability of it not happening.

Probability ( $p$ ): A number between 0 and 1.
Odds Formula: $Odds = \frac{p}{1-p}$
Output: A value from 0 to $+\infty$ . For example, a probability of 0.8 is odds of $\frac{0.8}{0.2} = 4$ (or "4 to 1").

2. From Odds to Log-Odds (The Logit)

Next, the logit function simply takes the natural logarithm of the odds. This final step transforms the odds scale [0, +∞] to the log-odds scale [-∞, +∞].

Logit Formula: $Logit(p) = ln(Odds) = ln\left(\frac{p}{1-p}\right)$
Input: A probability ' $p$ ' between 0 and 1.
Output: A real number from $-\infty$ to $+\infty$ (the log-odds).
Purpose: It answers the question, "What underlying value corresponds to this probability?"

The Mismatch Problem

Linear models (like $y = mx + b$ ) are simple and powerful, but their output is unbounded. As the input x increases or decreases, the output y can go to positive or negative infinity.
Probability, on the other hand, is strictly bounded. It must be a number between 0 and 1.

You can't directly map an unbounded output to a bounded one. Trying to say "let the output of the linear model be the probability" would fail, as the model could easily predict a probability of 1.5 or -0.2, which is nonsensical.

The Solution: Change the Scale

Instead of changing the model, we change the scale of the target variable. The logit function ( $ln(\frac{p}{1-p})$ ) provides the perfect solution.

By converting the probability (0 to 1) into log-odds (negative to positive infinity), you create a new target variable that matches the unbounded nature of the linear model.

This allows a linear model to do what it does best: predict a value on an infinite scale. That value (the log-odds) can then be effortlessly converted back into a valid probability using the sigmoid function.

In short, the conversion creates a mathematical "bridge" that connects the unbounded world of linear models with the bounded world of probability. 🌉