A Sample Statistic, and a Population Parameter
A sample for statistics, or statistic, is any number that you have computed using your sample data. The sample average, median, and standard deviation are all examples. Because it is based upon data from random sampling, a random experiment, a statistic is a random variable. A statistic can be known or random.
A population parameter or just parameter is any number that can be computed for the entire population. The population means and the population standard deviation are two examples. Because there is no randomness, a parameter is a fixed number. You won’t usually have the data for all of your population. A parameter can be unknown or fixed.
It is common to find a natural correlation between statistics and parameters. There is often a natural correspondence between statistics and parameters. For each population parameter (a number that you want to know but can’t exactly know), there is a sample statistic. This statistic is calculated from data that provides the best information possible about the unknown parameter. An estimator for the population parameter is the description of this sample statistic. The actual number that is calculated from the data is the estimate for the population parameter. For example, the sample average is an estimate of the population mean. In a particular case, it might be “18.3”. The error in estimation is the estimator (or estimate) less the population parameter. It is often unknown.
An unbiased estimator does not systematically exceed or fall below the population parameter. This property is desirable for an estimator. An estimator is considered impartial if its mean value (the average of its sampling distribution) equals the population parameter.
Sample for statistics
Many statistical estimators that are commonly used are either unbiased or nearly unbiased. The sample average X is an unbiased estimate of the population mean m. X- for any given data set will (usually), be high or low relative to the population mean, m. The average result of the sampling process would be close to m if you repeated it many times and calculated a new X for each sample. This would ensure that the results are not consistently too high/low.
Although it is almost unbiased, the sample standard deviation is a biased estimate of the population standard deviation. The square of the sample variance is an unbiased estimate of the population variation s 2. The sample proportion p can be used to estimate the population proportion.
Statistics uses sample statistics to calculate population parameters. We have so far seen point estimation. This is a single numeric value that can be used to estimate the population parameter. The point estimate of the population average is called the sample average x_. The point estimate of the population variance is the sample variance s2. The point estimate of the population proportion is given by the sample proportion p. These estimates are our best guess at the population parameter’s value.
Although our point estimates and sample statistics provide a lot of information, they are only an estimate. However, we all know that sample statistics can vary from one sample to the next. We would have many possible values for the population parameter if we collected multiple samples. In certain situations, it seems reasonable to want an assortment of possible values rather than one estimate. Interval estimates can provide this type of estimate. An interval estimation is an interval of possible values that estimates our unknown population parameters. We are most familiar with interval estimates because they are used in political polling. Approval ratings and election leads are often described as an interval. An interval estimate in these situations is often given as a point estimate, plus or minus a margin for error. The margin error shows how exact the point estimate is. It also indicates the degree of uncertainty that could exist due to uncontrollable errors.
Quinnipiac University did a poll before the 2017 Virginia gubernatorial elections and found that Ralph Northam, the Democratic candidate, was leading Ed Gillespie by 9 points plus or minus 3.7 points. The point estimate for Northam’s lead in this instance is 9 points, and the margin of error is 3.7. This meant that Northam could win the race by any number between 9+4.7=12.7 and 9-3.7=5.3.
In ideal conditions, the margin of error can give us an indication of our confidence in the estimation of our population parameter. Let’s say we have two interval estimates, where the point estimates are identical but the margins of error differ. The first interval estimate suggests that our parameter has plausible values between 4 and 8, while the second estimates suggest that they have plausible values between 2 and 10. We are more inclined to believe the results because the range of possible values for the first interval estimate is less. There are several factors that affect our margin of error. These include the variability in our data, sample size, and, most importantly, the likelihood that our interval will be “right”.