Applied Statistics

Monday, July 14, 2014

Sampling Distributions of Sample Means

Sample Mean
Let the random variables X1, X2,……….,Xn denote a random sample from a population. The sample mean value of these random variables is defined as

Consider the sampling distribution of the random variable

. At this point we cannot determine the shape of the sampling distribution, but we can determine the mean and variance of the sampling distribution. We know that the expectation of a linear combination of random variables is the linear combination of the expectations:

Thus, the mean of the sampling distribution of the sample means is the population mean. If samples of n random and independent observations are repeatedly and independently drawn from a population, then as the number of samples becomes very large, the mean of the sample means approaches the true population mean.
The variance of the sample mean is denoted by

, is given by

and the corresponding standard deviation, called the standard error of

Standard Normal Distribution for the Sample Means

Whenever the sampling distribution of the sample means is a normal distribution, we can compute a standardized normal random variable, Z, that has mean 0 and variance 1:

Example: Suppose that the annual percentage salary increases for the chief executive officers of all midsize corporations are normally distributed with mean 12.2% and standard deviation 3.6%. A random sample of nine observations is obtained from this population and the sample mean computed. What is the probability that the sample mean will be less than 10%?

Example: A spark plug manufacturer claims that the lives of its plugs are normally distributed with mean 36,000 miles and standard deviation 4,000 miles. A random sample of 16 plugs had an average life of 34,500 miles. If the manufacture’s claim is correct, what is the probability of finding a sample mean of 34,500 or less?

Statement of the Central Limit Theorem

Statement

Let X1, X2,……….,Xn be a set of n independent random variables having identical distribution with mean

and variance

, and with X as the sum and

as the mean of these random variables. As n becomes large, the central limit theorem states that the distribution of

approaches the standard normal distribution.

Example: Antelope Coffee Inc. is considering the possibility of opening a gourmet coffee shop in city. Previous research has indicated that its shops will be successful in cities of this size if the per capita annual income is above $60,000. It is also known that the standard deviation of income is $5,000. A random sample of 36 people was obtained and the mean income was $62,300. Does this sample provide evidence to conclude that a shop should be opened?

Solution: The distribution of income is known to be skewed, but the central limit theorem enables us to conclude that the sample mean is approximately normally distributed. To answer the questions, we need to determine the probability of obtaining a sample mean at least as high a

= 62,300 if the population mean is

= 60,000.
First, compute the standardized normal Z-statistic:

From the standard normal table we find that the probability of obtaining a Z value of 2.76 or larger is 0.0029. Because this probability is very small, we can conclude that it is likely that the population mean income is not 60,000 but is a larger value. This result provides strong evidence that the population mean income is higher than $60,000 and that the coffee shop is likely to be a success.

Sampling Distributions of Sample Proportions

Let X be the number of success in a binomial sample of n observations with parameter P. The parameter is the proportion of the population members that have a characteristic of interest. We define the sample proportion as

X is the sum of a set of n independent Bernoulli random variables, each with probability of success P. As a result,

is the mean of a set of independent random variables. The central limit theorem can be used to argue that the probability distribution for

can be modeled as a normally distributed random variable.

The mean and variance of the sampling distribution of the sample proportion

can be obtained from the mean and variance of the number of success, X. E(X) = nP Var(X) = nP(1 – P)
And, thus,

We see that the mean of the distribution of

The variance of is the variance of the population distribution of the Bernoulli random variables divided by n,

is the population proportion, P.

The standard deviation of

, which is the square root of the variance, is called its standard error.

Sampling Distribution of the Sample Proportion

Let

be the sample proportion of successes in a random sample from a population with proportion of success P. Then
1. The sampling distribution of

has mean P:
E(

) = p
2. The sampling distribution of p has standard deviation

3. If the sample size is large, the random variable

is approximately distributed as a standard normal. This approximation is good if
nP(1 – P) > 9
Example: A random sample of 250 homes was taken from a large population of older homes to estimate the proportion of homes with unsafe wiring. If, in fact, 30% of the homes have unsafe wiring, what is the probability that the sample proportion will be between 25% and 30% of homes with unsafe wiring?
Solution: For this problem we have
P = 0.30 n = 250

We can compute the standard deviation of the sample proportion,

, as

The required probability is

Thus, we see that the probability that the sample proportion is within the interval 0.25 to 0.35, given P = 0.30, is 0.9146. This interval is called a 91.46% acceptance interval.

Example: It has been estimated that 43% of business graduates believe that a course in business ethics is very important for imparting ethical values to students. Find the probability that more than one-half of a random sample of 80 business graduates have this belief.

The probability of having one-half of the sample believing in the value of business ethics courses is approximately 0.1.

Sampling Distributions of Sample Variances

Sample Variance

Let x₁, x_2,……..,x_n be a random sample of observations from a population. The quantity

is called the sample variance, and its square root, s, is called the sample standard deviation. Given a specific random sample, we could compute the sample variance, and the sample variance would be different for each random sample because of differences in sample observations.

Chi-Square Distribution of Sample and Population Variances

Given a random sample of n observations from a normally distributed population whose population variance is

and whose resulting sample variance is s2, it can be shown that

has a distribution known as the

(chi-square) distribution with n – 1 degrees of freedom.

Sampling Distribution of the Sample Variance

Let s2 denote the sample variance for a random sample of n observations from a population with a variance

. Then

1. The sampling distribution of s2 has mean

2. The variance of the sampling distribution of s2 depends on the underlying population distribution. If that distribution is normal, then

3. If the population distribution is normal, then

is distributed as

Thus, if we have a random sample from a population with a normal distribution, we can make inferences about the sample variance

by using s2 and the chi-square distribution.

Example: George Samson is responsible for quality assurance at Integrated Electronics. He has asked you to established a quality monitoring process for the manufacture of control device A. The variability of the electrical resistance, measured in ohms, is critical for this device. Manufacturing standards specify a standard deviation of 3.6, the population distribution of resistance measures is normal. The monitoring process requires that a random sample of n = 6 observations be obtained from the population of devices and the sample variance computed. Determine an upper limit for the sample variance such that the probability of exceeding this limit, given a population standard deviation of 3.6, is less than 0.05.

Solution: For this problem we have n = 6 and

=12.96. Using the chi-square distribution, we can state that

where K is the desired upper limit and

is the upper 0.05 critical value of the chi-square distribution with 5 degrees of freedom from row 5 of Table 7. The required upper limit for s2—labeled as K—can be obtained by solving

If the sample variance, s2, from a random sample of size n = 6 exceeds 28.69, there is strong evidence to suspect that the population variance exceeds 12.96 and that the manufacturing process should be halted and appropriate adjustments performed.

Sampling from a Population

We often use samples instead of the entire population because the cost and time of measuring every item in the population would be prohibitive. Also, in some cases measurement requires destruction of individual items. In general, we achieve greater accuracy by carefully obtaining a random sample of the population instead of spending the resources to measure every item. There are two important reasons for this result. First, it is often very difficult to obtain and measures every item in a population, and even if possible, the cost would be very high for a large population.

Simple Random Sample

Suppose that we want to select a sample of size n objects from a population of N objects. A simple random sample is selected such that every object has an equal probability of being selected and the objects are selected independently—the selection of one object does not change the probability of selecting any other objects.

Simple random sampling can be implemented in many ways. We can place the N population items—for example, colored balls—in a large barrel and mix them thoroughly. Then from this well-mixed barrel w can select individual balls from different parts of the barrel. In practice, we often use random numbers to select objects that can be assigned some numerical value. Various statistical computer software and spreadsheets have routines for obtaining random numbers, and these are generally used for most sampling studies.

To see how to use random number table, suppose that we have 100 employees in a company and wish to interview a randomly chosen sample of 10. We could get such a random sample by assigning every employee a number of 00 to 99, consulting a Random Number Table, and picking a systematic method of selecting two-digit numbers. In this case, let’s do the following:

Go from the top to the bottom of the columns beginning with the left-hand column, and read only the first two digits in each row.

Systematic Sampling

In systematic sampling, elements are selected from the population at a uniform interval that is measured in time, order, or space. If we want to interview every twentieth student on a college campus, we would choose a random starting point in the first 20 names in the student directory and then pick every twentieth name thereafter.

Stratified Sampling and Cluster Sampling

To use stratified sampling, we divide the population into relatively homogeneous groups, called strata. Then we use one of two approaches. Either we select at random from each stratum a specified number of elements corresponding to the proportion of that stratum in the population as whole or we draw an equal number of elements from each stratum and give weight to the results according to the stratum’s proportion of total population. With either approach, stratified sampling guarantees that every element in the population has a chance of being selected.

Stratified sampling is appropriate when the population is already divided into groups of different sizes and we wish to acknowledge this fact. Suppose that a physician’s patients are divided into four groups according to age.

Age group	Percentage of total
Birth—19 years	30
20—39	40
40—59	20
60 years and older	10

The physician wants to find out how many hours his patients sleep. To obtain an estimate of this characteristic of the population, he could take a random sample from each of the four age groups and give weight to the sample according to the percentage of patients in that group. This would be an example of a stratified sample.

Cluster Sampling

In cluster sampling, we divide the population into groups, or clusters, and then select a random sample of these clusters. We assume that these individual clusters are representative of the population as a whole. If a market research team is attempting to determine by sampling the average number of television sets per household in a large city, they could use a city map to divide the territory into blocks and then choose a certain number of blocks (clusters) for interviewing. Every household in each of these blocks would be interviewed. A well-designed cluster sampling procedure can produce a more precise sample at considerable less cost than of simple random sampling.

Sampling Distributions

Consider a random sample selected from a population that is used to make an inference about some population characteristic, such as the population mean,

, using a sample statistic, such as the sample mean,

. The inference is based on the realization that every random sample has a different number for

, and, thus,

is a random variable. The sampling distribution of this statistic is the probability distribution of the sample means obtained from all possible samples of the same number of observations drawn from the population.

We illustrate the concept of a sampling distribution by considering the position of supervisor with six employees, whose years of experience are

Two of these employees are to be chosen randomly for a particular work group. The mean of the years of experience for this population of six employees is

Now, let us consider the mean number of years of experience of the two employees chosen randomly from the population of six. Fifteen (

) possible different random samples could be selected. Table 1 shows all of the possible samples and associated sample means.
Table1: Samples and sample means from the worker population sample size n = 2.
Sample    Sample mean    Sample    Sample mean
2,4      3.0          4, 8        6.0
2.6    4.0      6,6    6.0
2,6                 4.0      6,6   6.5
2,7   4.5 6,8   7.0
2,8   5.0      6,7    6.5
4,6      5.0   6,8   7.0
4,6      5.0    7,8    7.5
4,7      5.5
Each of the 15 samples in Table 1 has the same probability, 1/15, of being selected. Note that there are several occurrences of the same sample mean. For example, the sample mean 5.0 occurs three times, and, thus, the probability of obtaining a sample 5.0 is 3/15. Table 2 represents the sampling distribution for the various sample means from the population, and the probability function is graphed in Figure 1.
Table 2: Sampling distribution of the sample means from the worker population sample size n = 2.

Sample mean    Probability of

        3.0   1/15
        4.0                 2/15
        4.5    1/15
        5.0   3/15
        5.5    1/15
        6.0    2/15
        6.5                 2/15
        7.0   2/15
        7.5   1/15

We see that, while the number of years of experience for the six workers ranges from 2 to 8, the possible values of the sample mean have a range from only 3.0 to 7.5. In addition, more of the values lie in the central portion of the range.

Table 3 presents similar result for a sample of size n = 5 for sampling distribution. Notice that the means are concentrated over a narrow range. These sample means are all close to the population mean

=5.5. We will always find this to be true—the sampling distribution becomes concentrated closer to the population mean as the sample size increases. This is the important result provides an important foundation for statistical inference.
Table3: Samples and sample means from the worker population sample size n = 5.
Sample       Means       Probability
2,4,6,6,7    5.0          1/6
2,4,6,6,8           5.2      1/6
2,4,6,7,8   5.4      1/6
2,6,6,7,8   5.8     1/6
4,6,6,7,8    6.2     1/6

nav bar