3. Probability Distribution

Ways of Displaying Probability Distributions

Probability distributions can be shown in tables and graphs or they can also be described by a formula. For example, the binomial formula is used to calculate binomial probabilities.

The following table shows the probability distribution of a tomato packing plant receiving rotten tomatoes. Note that if you add all of the probabilities in the second row, they add up to 1 (.95 + .02 +.02 + 0.01 = 1).

probability distribution toms

The following graph shows a standard normal distribution, which is probably the most widely used probability distribution. The standard normal distribution is also known as the “bell curve.” Lots of natural phenomenon fit the bell curve, including heights, weights and IQ scores. The normal curve is a continuous probability distribution, so instead of adding up individual probabilities under the curve we say that the total area under the curve is 1.

In a normal distribution, the percentages of scores you can expect to find for any standard deviations from the mean are the same.

Note: Finding the area under a curve requires a little integral calculus, which you won’t get into in elementary statistics. Therefore, you’ll have to take a leap of faith and just accept that the area under the curve is 1!

3. Probability Distribution

General definition

A probability distribution can be described in various forms, such as by a probability mass function or a cumulative distribution function. One of the most general descriptions, which applies for continuous and discrete variables, is by means of a probability function {\displaystyle P\colon {\mathcal {A}}\rightarrow \mathbb {R} }{\displaystyle P\colon {\mathcal {A}}\rightarrow \mathbb {R} } whose input space {\displaystyle {\mathcal {A}}}{\mathcal {A}} is related to the sample space, and gives a real number probability as its output.[7]

The probability function P can take as argument subsets of the sample space itself, as in the coin toss example, where the function P was defined so that P(heads) = 0.5 and P(tails) = 0.5. However, because of the widespread use of random variables, which transform the sample space into a set of numbers (e.g., {\displaystyle \mathbb {R} }\mathbb {R} , {\displaystyle \mathbb {N} }\mathbb {N} ), it is more common to study probability distributions whose argument are subsets of these particular kinds of sets (number sets),[8] and all probability distributions discussed in this article are of this type. It is common to denote as P(X {\displaystyle \in }\in  E) the probability that a certain variable X belongs to a certain event E.[4][9]

The above probability function only characterizes a probability distribution if it satisfies all the Kolmogorov axioms, that is:

  1. {\displaystyle P(X\in E)\geq 0\;\forall E\in {\mathcal {A}}}{\displaystyle P(X\in E)\geq 0\;\forall E\in {\mathcal {A}}}, so the probability is non-negative
  2. {\displaystyle P(X\in E)\leq 1\;\forall E\in {\mathcal {A}}}{\displaystyle P(X\in E)\leq 1\;\forall E\in {\mathcal {A}}}, so no probability exceeds {\displaystyle 1}1
  3. {\displaystyle P(X\in \bigsqcup _{i}E_{i})=\sum _{i}P(X\in E_{i})}{\displaystyle P(X\in \bigsqcup _{i}E_{i})=\sum _{i}P(X\in E_{i})} for any disjoint family of sets {\displaystyle \{E_{i}\}}\{E_{i}\}

The concept of probability function is made more rigorous by defining it as the element of a probability space {\displaystyle (X,{\mathcal {A}},P)}{\displaystyle (X,{\mathcal {A}},P)}, where {\displaystyle X}X is the set of possible outcomes, {\displaystyle {\mathcal {A}}}{\mathcal {A}} is the set of all subsets {\displaystyle E\subset X}E\subset X whose probability can be measured, and {\displaystyle P}P is the probability function, or probability measure, that assigns a probability to each of these measurable subsets {\displaystyle E\in {\mathcal {A}}}{\displaystyle E\in {\mathcal {A}}}.[10]

Probability distributions are generally divided into two classes. A discrete probability distribution is applicable to the scenarios where the set of possible outcomes is discrete (e.g. a coin toss, a roll of a die) and the probabilities are encoded by a discrete list of the probabilities of the outcomes; in this case the discrete probability distribution is known as probability mass function. On the other hand, continuous probability distributions are applicable to scenarios where the set of possible outcomes can take on values in a continuous range (e.g. real numbers), such as the temperature on a given day. In the case of real numbers, the continuous probability distribution is the cumulative distribution function. In general, in the continuous case, probabilities are described by a probability density function, and the probability distribution is by definition the integral of the probability density function.[4][5][9] The normal distribution is a commonly encountered continuous probability distribution. More complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures.

A probability distribution whose sample space is one-dimensional (for example real numbers, list of labels, ordered labels or binary) is called univariate, while a distribution whose sample space is a vector space of dimension 2 or more is called multivariate. A univariate distribution gives the probabilities of a single random variable taking on various different values; a multivariate distribution (a joint probability distribution) gives the probabilities of a random vector – a list of two or more random variables – taking on various combinations of values. Important and commonly encountered univariate probability distributions include the binomial distribution, the hypergeometric distribution, and the normal distribution. A commonly encountered multivariate distribution is the multivariate normal distribution.

Besides the probability function, the cumulative distribution function, the probability mass function and the probability density function, the moment generating function and the characteristic function also serve to identify a probability distribution, as they uniquely determine an underlying cumulative distribution function.

3. Probability Distribution

Types of Probability Distributions

There are many different classifications of probability distributions. Some of them include the normal distribution, chi square distribution, binomial distribution, and Poisson distribution. The different probability distributions serve different purposes and represent different data generation processes. The binomial distribution, for example, evaluates the probability of an event occurring several times over a given number of trials and given the event’s probability in each trial. and may be generated by keeping track of how many free throws a basketball player makes in a game, where 1 = a basket and 0 = a miss. Another typical example would be to use a fair coin and figuring the probability of that coin coming up heads in 10 straight flips. A binomial distribution is discrete, as opposed to continuous, since only 1 or 0 is a valid response.

The most commonly used distribution is the normal distribution, which is used frequently in finance, investing, science, and engineering. The normal distribution is fully characterized by its mean and standard deviation, meaning the distribution is not skewed and does exhibit kurtosis. This makes the distribution symmetric and it is depicted as a bell-shaped curve when plotted. A normal distribution is defined by a mean (average) of zero and a standard deviation of 1.0, with a skew of zero and kurtosis = 3. In a normal distribution, approximately 68% of the data collected will fall within +/- one standard deviation of the mean; approximately 95% within +/- two standard deviations; and 99.7% within three standard deviations. Unlike the binomial distribution, the normal distribution is continuous, meaning that all possible values are represented (as opposed to just 0 and 1 with nothing in between).

3. Probability Distribution

How Probability Distributions Work

Perhaps the most common probability distribution is the normal distribution, or “bell curve,” although several distributions exist that are commonly used. Typically, the data generating process of some phenomenon will dictate its probability distribution. This process is called the probability density function.

Probability distributions can also be used to create cumulative distribution functions (CDFs), which adds up the probability of occurrences cumulatively and will always start at zero and end at 100%.

Academics, financial analysts and fund managers alike may determine a particular stock’s probability distribution to evaluate the possible expected returns that the stock may yield in the future. The stock’s history of returns, which can be measured from any time interval, will likely be composed of only a fraction of the stock’s returns, which will subject the analysis to sampling error. By increasing the sample size, this error can be dramatically reduced.


  • A probability distribution depicts the expected outcomes of possible values for a given data generating process.
  • Probability distributions come in many shapes with different characteristics, as defined by the mean, standard deviation, skewness, and kurtosis.
  • Investors use probability distributions to anticipate returns on assets such as stocks over time and to hedge their risk.
3. Probability Distribution

What Is a Probability Distribution?

A probability distribution is a statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. This range will be bounded between the minimum and maximum possible values, but precisely where the possible value is likely to be plotted on the probability distribution depends on a number of factors. These factors include the distribution’s mean (average), standard deviationskewness, and kurtosis.