Applied Statistics: Sampling Distributions

Showing posts with label Sampling Distributions. Show all posts

Monday, July 14, 2014

Sampling Distributions of Sample Means

Sample Mean
Let the random variables X1, X2,……….,Xn denote a random sample from a population. The sample mean value of these random variables is defined as

Consider the sampling distribution of the random variable

. At this point we cannot determine the shape of the sampling distribution, but we can determine the mean and variance of the sampling distribution. We know that the expectation of a linear combination of random variables is the linear combination of the expectations:

Thus, the mean of the sampling distribution of the sample means is the population mean. If samples of n random and independent observations are repeatedly and independently drawn from a population, then as the number of samples becomes very large, the mean of the sample means approaches the true population mean.
The variance of the sample mean is denoted by

, is given by

and the corresponding standard deviation, called the standard error of

Standard Normal Distribution for the Sample Means

Whenever the sampling distribution of the sample means is a normal distribution, we can compute a standardized normal random variable, Z, that has mean 0 and variance 1:

Example: Suppose that the annual percentage salary increases for the chief executive officers of all midsize corporations are normally distributed with mean 12.2% and standard deviation 3.6%. A random sample of nine observations is obtained from this population and the sample mean computed. What is the probability that the sample mean will be less than 10%?

Example: A spark plug manufacturer claims that the lives of its plugs are normally distributed with mean 36,000 miles and standard deviation 4,000 miles. A random sample of 16 plugs had an average life of 34,500 miles. If the manufacture’s claim is correct, what is the probability of finding a sample mean of 34,500 or less?

Statement of the Central Limit Theorem

Statement

Let X1, X2,……….,Xn be a set of n independent random variables having identical distribution with mean

and variance

, and with X as the sum and

as the mean of these random variables. As n becomes large, the central limit theorem states that the distribution of

approaches the standard normal distribution.

Example: Antelope Coffee Inc. is considering the possibility of opening a gourmet coffee shop in city. Previous research has indicated that its shops will be successful in cities of this size if the per capita annual income is above $60,000. It is also known that the standard deviation of income is $5,000. A random sample of 36 people was obtained and the mean income was $62,300. Does this sample provide evidence to conclude that a shop should be opened?

Solution: The distribution of income is known to be skewed, but the central limit theorem enables us to conclude that the sample mean is approximately normally distributed. To answer the questions, we need to determine the probability of obtaining a sample mean at least as high a

= 62,300 if the population mean is

= 60,000.
First, compute the standardized normal Z-statistic:

From the standard normal table we find that the probability of obtaining a Z value of 2.76 or larger is 0.0029. Because this probability is very small, we can conclude that it is likely that the population mean income is not 60,000 but is a larger value. This result provides strong evidence that the population mean income is higher than $60,000 and that the coffee shop is likely to be a success.

Sampling Distributions of Sample Proportions

Let X be the number of success in a binomial sample of n observations with parameter P. The parameter is the proportion of the population members that have a characteristic of interest. We define the sample proportion as

X is the sum of a set of n independent Bernoulli random variables, each with probability of success P. As a result,

is the mean of a set of independent random variables. The central limit theorem can be used to argue that the probability distribution for

can be modeled as a normally distributed random variable.

The mean and variance of the sampling distribution of the sample proportion

can be obtained from the mean and variance of the number of success, X. E(X) = nP Var(X) = nP(1 – P)
And, thus,

We see that the mean of the distribution of

The variance of is the variance of the population distribution of the Bernoulli random variables divided by n,

is the population proportion, P.

The standard deviation of

, which is the square root of the variance, is called its standard error.

Sampling Distribution of the Sample Proportion

Let

be the sample proportion of successes in a random sample from a population with proportion of success P. Then
1. The sampling distribution of

has mean P:
E(

) = p
2. The sampling distribution of p has standard deviation

3. If the sample size is large, the random variable

is approximately distributed as a standard normal. This approximation is good if
nP(1 – P) > 9
Example: A random sample of 250 homes was taken from a large population of older homes to estimate the proportion of homes with unsafe wiring. If, in fact, 30% of the homes have unsafe wiring, what is the probability that the sample proportion will be between 25% and 30% of homes with unsafe wiring?
Solution: For this problem we have
P = 0.30 n = 250

We can compute the standard deviation of the sample proportion,

, as

The required probability is

Thus, we see that the probability that the sample proportion is within the interval 0.25 to 0.35, given P = 0.30, is 0.9146. This interval is called a 91.46% acceptance interval.

Example: It has been estimated that 43% of business graduates believe that a course in business ethics is very important for imparting ethical values to students. Find the probability that more than one-half of a random sample of 80 business graduates have this belief.

The probability of having one-half of the sample believing in the value of business ethics courses is approximately 0.1.

Sampling Distributions of Sample Variances

Sample Variance

Let x₁, x_2,……..,x_n be a random sample of observations from a population. The quantity

is called the sample variance, and its square root, s, is called the sample standard deviation. Given a specific random sample, we could compute the sample variance, and the sample variance would be different for each random sample because of differences in sample observations.

Chi-Square Distribution of Sample and Population Variances

Given a random sample of n observations from a normally distributed population whose population variance is

and whose resulting sample variance is s2, it can be shown that

has a distribution known as the

(chi-square) distribution with n – 1 degrees of freedom.

Sampling Distribution of the Sample Variance

Let s2 denote the sample variance for a random sample of n observations from a population with a variance

. Then

1. The sampling distribution of s2 has mean

2. The variance of the sampling distribution of s2 depends on the underlying population distribution. If that distribution is normal, then

3. If the population distribution is normal, then

is distributed as

Thus, if we have a random sample from a population with a normal distribution, we can make inferences about the sample variance

by using s2 and the chi-square distribution.

Example: George Samson is responsible for quality assurance at Integrated Electronics. He has asked you to established a quality monitoring process for the manufacture of control device A. The variability of the electrical resistance, measured in ohms, is critical for this device. Manufacturing standards specify a standard deviation of 3.6, the population distribution of resistance measures is normal. The monitoring process requires that a random sample of n = 6 observations be obtained from the population of devices and the sample variance computed. Determine an upper limit for the sample variance such that the probability of exceeding this limit, given a population standard deviation of 3.6, is less than 0.05.

Solution: For this problem we have n = 6 and

=12.96. Using the chi-square distribution, we can state that

where K is the desired upper limit and

is the upper 0.05 critical value of the chi-square distribution with 5 degrees of freedom from row 5 of Table 7. The required upper limit for s2—labeled as K—can be obtained by solving

If the sample variance, s2, from a random sample of size n = 6 exceeds 28.69, there is strong evidence to suspect that the population variance exceeds 12.96 and that the manufacturing process should be halted and appropriate adjustments performed.

nav bar