Hypothesis testing definition

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.

Statistical hypothesis test: The testing process

In statistical literature, statistical hypothesis testing plays a fundamental role. There are two mathematically equivalent processes that can be used.

The usual line of reasoning is as follows:

  • There is an initial research hypothesis of which the truth is unknown.
  • The first step is to state the relevant null and alternative hypotheses. This is important, as misstating the hypotheses will muddy the rest of the process.
  • The second step is to consider the statistical assumptions being made about the sample in doing the test; for example, assumptions about the statistical independence or about the form of the distributions of the observations. This is equally important as invalid assumptions will mean that the results of the test are invalid.
  • Decide which test is appropriate, and state the relevant test statistic T.
  • Derive the distribution of the test statistic under the null hypothesis from the assumptions. In standard cases, this will be a well-known result. For example, the test statistic might follow a student’s t distribution with known degrees of freedom, or a normal distribution with known mean and variance. If the distribution of the test statistic is completely fixed by the null hypothesis, we call the hypothesis simple, otherwise it is called composite.
  • Select a significance level (α), a probability threshold below which the null hypothesis will be rejected. Common values are 5% and 1%.
  • The distribution of the test statistic under the null hypothesis partitions the possible values of T into those for which the null hypothesis is rejected—the so-called critical region—and those for which it is not. The probability of the critical region is α. In the case of a composite null hypothesis, the maximal probability of the critical region is α.
  • Compute from the observations the observed value of the test statistic T.
  • Decide to either reject the null hypothesis in favor of the alternative or not reject it. The decision rule is to reject the null hypothesis H0 if the observed value is in the critical region, and not to reject the null hypothesis otherwise.

A common alternative formulation of this process goes as follows:

Compute from the observations the observed value of the test statistic T.
Calculate the p-value. This is the probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed (the maximal probability of that event, if the hypothesis is composite).
Reject the null hypothesis, in favor of the alternative hypothesis, if and only if the p-value is less than (or equal to) the significance level (the selected probability) threshold (α), for example 0.05 or 0.01.
The former process was advantageous in the past when only tables of test statistics at common probability thresholds were available. It allowed a decision to be made without the calculation of probability. It was adequate for classwork and for operational use, but it was deficient for reporting results. The latter process relied on extensive tables or on computational support not always available. The explicit calculation of a probability is useful for reporting. The calculations are now trivially performed with the appropriate software.

hypothesis test: The difference in the two processes applied to the Radioactive suitcase example (below):

“The Geiger-counter reading is 10. The limit is 9. Check the suitcase.”
“The Geiger-counter reading is high; 97% of safe suitcases have lower readings. The limit is 95%. Check the suitcase.”
The former report is adequate, the latter gives a more detailed explanation of the data and the reason why the suitcase is being checked.

Not rejecting the null hypothesis does not mean the null hypothesis is “accepted” (see the Interpretation section).

The processes described here are perfectly adequate for computation. They seriously neglect the design of experiments considerations.

It is particularly critical that appropriate sample sizes be estimated before conducting the experiment.

The phrase “test of significance” was coined by statistician Ronald Fisher.

References

Wikipedia. Statistical hypothesis testing. https://en.wikipedia.org/wiki/Statistical_hypothesis_testing