3. Sampling Distribution

Factors that influence sampling distribution

The sampling distribution’s variability can be measured either by standard deviation, also called “standard error of the mean,” or population variance, depending on the context and inferences you are trying to draw. They both are mathematical formulas that measure the spread of data points in relation to the mean.

There are three primary factors that influence the variability of a sampling distribution. They are:

The number observed in a population: This variable is represented by “N.” It is the measure of observed activity in a given group of data.

The number observed in the sample: This variable is represented by “n.” It is the measure of observed activity in a random sample of data that is part of the larger grouping.

The method of choosing the sample: How the samples were chosen can account for variability in some cases.

3. Sampling Distribution

Importance of Using a Sampling Distribution

Since populations are typically large in size, it is important to use a sampling distribution so that you can randomly select a subset of the entire population. Doing so helps eliminate variability when you are doing research or gathering statistical data.

It also helps make the data easier to manage and builds a foundation for statistical inferencing, which leads to making inferences for the whole population. Understanding statistical inference is important because it helps individuals understand the spread of frequencies and what various outcomes are like within a dataset.

3. Sampling Distribution

Practical Example

Suppose you want to find the average height of children at the age of 10 from each continent. You take random samples of 100 children from each continent, and you compute the mean for each sample group.

For example, in South America, you randomly select data about the heights of 10-year-old children, and you calculate the mean for 100 of the children. You also randomly select data from North America and calculate the mean height for one hundred 10-year-old children.

As you continue to find the average heights for each sample group of children from each continent, you can calculate the mean of the sampling distribution by finding the mean of all the average heights of each sample group. Not only can it be computed for the mean, but it can also be calculated for other statistics such as standard deviation and variance.

3. Sampling Distribution

Types of Sampling Distribution

1. Sampling distribution of mean

As shown from the example above, you can calculate the mean of every sample group chosen from the population and plot out all the data points. The graph will show a normal distribution, and the center will be the mean of the sampling distribution, which is the mean of the entire population.

2. Sampling distribution of proportion

It gives you information about proportions in a population. You would select samples from the population and get the sample proportion. The mean of all the sample proportions that you calculate from each sample group would become the proportion of the entire population.

3. T-distribution

T-distribution is used when the sample size is very small or not much is known about the population. It is used to estimate the mean of the population, confidence intervals, statistical differences, and linear regression.

3. Sampling Distribution

How Does it Work?

1) Select a random sample of a specific size from a given population.

2) Calculate a statistic for the sample, such as the mean, median, or standard deviation.

3) Develop a frequency distribution of each sample statistic that you calculated from the step above.

4) Plot the frequency distribution of each sample statistic that you developed from the step above. The resulting graph will be the sampling distribution.

3. Sampling Distribution

Special Considerations

A population or one sample set of numbers will have a normal distribution. However, because a sampling distribution includes multiple sets of observations, it will not necessarily have a bell-curved shape.

Following our example, the population average weight of babies in North America and in South America has a normal distribution because some babies will be underweight (below the mean) or overweight (above the mean), with most babies falling in between (around the mean). If the average weight of newborns in North America is seven pounds, the sample mean weight in each of the 12 sets of sample observations recorded for North America will be close to seven pounds as well.

However, if you graph each of the averages calculated in each of the 1,200 sample groups, the resulting shape may result in a uniform distribution, but it is difficult to predict with certainty what the actual shape will turn out to be. The more samples the researcher uses from the population of over a million weight figures, the more the graph will start forming a normal distribution.

3. Sampling Distribution

Understanding Sampling Distribution

A lot of data drawn and used by academicians, statisticians, researchers, marketers, analysts, etc. are actually samples, not populations. A sample is a subset of a population. For example, a medical researcher that wanted to compare the average weight of all babies born in North America from 1995 to 2005 to those born in South America within the same time period cannot within a reasonable amount of time draw the data for the entire population of over a million childbirths that occurred over the ten-year time frame. He will instead only use the weight of, say, 100 babies, in each continent to make a conclusion. The weight of 200 babies used is the sample and the average weight calculated is the sample mean.

Now suppose that instead of taking just one sample of 100 newborn weights from each continent, the medical researcher takes repeated random samples from the general population, and computes the sample mean for each sample group. So, for North America, he pulls up data for 100 newborn weights recorded in the US, Canada and Mexico as follows: four 100 samples from select hospitals in the US, five 70 samples from Canada and three 150 records from Mexico, for a total of 1200 weights of newborn babies grouped in 12 sets. He also collects a sample data of 100 birth weights from each of the 12 countries in South America.

The average weight computed for each sample set is the sampling distribution of the mean. Not just the mean can be calculated from a sample. Other statistics, such as the standard deviation, variance, proportion, and range can be calculated from sample data. The standard deviation and variance measure the variability of the sampling distribution.

The number of observations in a population, the number of observations in a sample and the procedure used to draw the sample sets determine the variability of a sampling distribution. The standard deviation of a sampling distribution is called the standard error. While the mean of a sampling distribution is equal to the mean of the population, the standard error depends on the standard deviation of the population, the size of the population and the size of the sample.

Knowing how spread apart the mean of each of the sample sets are from each other and from the population mean will give an indication of how close the sample mean is to the population mean. The standard error of the sampling distribution decreases as the sample size increases.

3. Sampling Distribution

What Is a Sampling Distribution?

A sampling distribution is a probability distribution of a statistic obtained from a larger number of samples drawn from a specific population. The sampling distribution of a given population is the distribution of frequencies of a range of different outcomes that could possibly occur for a statistic of a population.

In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements. A population can thus be said to be an aggregate observation of subjects grouped together by a common feature.

  • A sampling distribution is a statistic that is arrived out through repeated sampling from a larger population.
  • It describes a range of possible outcomes that of a statistic, such as the mean or mode of some variable, as it truly exists a population.
  • The majority of data analyzed by researchers are actually drawn from samples, and not populations.