Least square method: deciding model parameters by minimizing RSS (residual sum of squares)
ML (maximum likelihood) estimation: estimating model parameters by finding the parameter values that maximize the likelihood of making the observations given the parameters
MAP (maximum a posteriori) estimation


Bayes’ Theorem

    \[ P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} \]

where A and B are events and P(B) ≠ 0.

P(A \mid B): conditional probability of event A given that B is true.
P(A)P(B): probability of event A, probability of event B without regarding each other.
P(B \mid A): conditional probability of event A given that B is true.

    \[ P(A \cap B) = P(B \cap A) \]

    \[ P(A)P(A \mid B) = P(B)P(B \mid A) \]

    \[ P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} \]



Suppose that a probability of having disease A is 0.5%.
And suppose that a test is 99% sensitive (true positive rate), 95% specific (true negative rate).
If you’re detected as positive by the test. What is the probability that you have disease A?

    \[ P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} = \frac{0.99 * 0.005}{0.99 * 0.005 + 0.05 * 0.995} \approx 0.090 \]

Even if you’re diagnosed as positive, the probability of having disease is only around 9%


Bayesian inference

\theta = (\mu, \sigma): parameters of probability distribution
x = (x_1, x_2, \dots, x_N): observed data (fixed)
f(\theta \mid x): posterior probability
f(x \mid \theta): likelihood
f(\theta): prior probability
f(x): marginal likelihood or normalization constant

    \[ f(\theta \mid x) = \frac{f(x \mid \theta) \, f(\theta)}{f(x)} \propto f(x \mid \theta)f(\theta) \]

Try to find \theta which maximizes f(\theta \mid x)