A probability distribution can be described in various forms, such as by a probability mass function or a cumulative distribution function. One of the most general descriptions, which applies for continuous and discrete variables, is by means of a probability function {\displaystyle P\colon {\mathcal {A}}\rightarrow \mathbb {R} } whose **input space** {\displaystyle {\mathcal {A}}} is related to the sample space, and gives a real number **probability** as its output.^{[7]}

The probability function *P* can take as argument subsets of the sample space itself, as in the coin toss example, where the function *P* was defined so that *P*(heads) = 0.5 and *P*(tails) = 0.5. However, because of the widespread use of random variables, which transform the sample space into a set of numbers (e.g., {\displaystyle \mathbb {R} }, {\displaystyle \mathbb {N} }), it is more common to study probability distributions whose argument are subsets of these particular kinds of sets (number sets),^{[8]} and all probability distributions discussed in this article are of this type. It is common to denote as *P*(*X* {\displaystyle \in } *E*) the probability that a certain variable *X* belongs to a certain event *E*.^{[4]}^{[9]}

The above probability function only characterizes a probability distribution if it satisfies all the Kolmogorov axioms, that is:

- {\displaystyle P(X\in E)\geq 0\;\forall E\in {\mathcal {A}}}, so the probability is non-negative
- {\displaystyle P(X\in E)\leq 1\;\forall E\in {\mathcal {A}}}, so no probability exceeds {\displaystyle 1}
- {\displaystyle P(X\in \bigsqcup _{i}E_{i})=\sum _{i}P(X\in E_{i})} for any disjoint family of sets {\displaystyle \{E_{i}\}}

The concept of probability function is made more rigorous by defining it as the element of a probability space {\displaystyle (X,{\mathcal {A}},P)}, where {\displaystyle X} is the set of possible outcomes, {\displaystyle {\mathcal {A}}} is the set of all subsets {\displaystyle E\subset X} whose probability can be measured, and {\displaystyle P} is the probability function, or **probability measure**, that assigns a probability to each of these measurable subsets {\displaystyle E\in {\mathcal {A}}}.^{[10]}

Probability distributions are generally divided into two classes. A **discrete probability distribution** is applicable to the scenarios where the set of possible outcomes is discrete (e.g. a coin toss, a roll of a die) and the probabilities are encoded by a discrete list of the probabilities of the outcomes; in this case the discrete probability distribution is known as probability mass function. On the other hand, **continuous probability distributions** are applicable to scenarios where the set of possible outcomes can take on values in a continuous range (e.g. real numbers), such as the temperature on a given day. In the case of real numbers, the continuous probability distribution is the cumulative distribution function. In general, in the continuous case, probabilities are described by a probability density function, and the probability distribution is by definition the integral of the probability density function.^{[4]}^{[5]}^{[9]} The normal distribution is a commonly encountered continuous probability distribution. More complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures.

A probability distribution whose sample space is one-dimensional (for example real numbers, list of labels, ordered labels or binary) is called univariate, while a distribution whose sample space is a vector space of dimension 2 or more is called multivariate. A univariate distribution gives the probabilities of a single random variable taking on various different values; a multivariate distribution (a joint probability distribution) gives the probabilities of a random vector – a list of two or more random variables – taking on various combinations of values. Important and commonly encountered univariate probability distributions include the binomial distribution, the hypergeometric distribution, and the normal distribution. A commonly encountered multivariate distribution is the multivariate normal distribution.

Besides the probability function, the cumulative distribution function, the probability mass function and the probability density function, the moment generating function and the characteristic function also serve to identify a probability distribution, as they uniquely determine an underlying cumulative distribution function.