Binomial Distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability p) or failure (with probability q = 1 − p). A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The type of distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution remains a good approximation, and is widely used.

A distribution usually used for determining confidence for proportions. If there are two possible outcomes, such as either “pass” or “fail” for product tests, or either “heads” or “tails” for coin tosses, then the binomial distribution might be used to estimate the probability of 5 passes and 1 fail in 6 product tests or 2 heads and 2 tails in 4 coin tosses.

In general, if the random variable X follows the binomial distribution with parameters n ∈ ℕ and p ∈ [0,1], we write X ~ B(n, p). The probability of getting exactly k successes in n independent Bernoulli trials is given by the probability mass function:

$f(k,n,p)=\Pr(k;n,p)=\Pr(X=k)={\binom {n}{k}}p^{k}(1-p)^{n-k}$

for k = 0, 1, 2, …, n, where

${\binom {n}{k}}={\frac {n!}{k!(n-k)!}}$

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: k successes occur with probability p^k and n − k failures occur with probability (1 − p)^n − k. However, the k successes can occur anywhere among the n trials, and there are ${\binom {n}{k}}$ different ways of distributing k successes in a sequence of n trials.