Table of contents
How to Conduct a Simple Hypothesis Test in Six Sigma
I was teaching a Six Sigma Green Belt methods course in Washington, DC, and was asked to simplify the basic road map in Hypothesis Testing, so I thought; this is a great idea to publish as a blog post so that more people can have access to this valuable information.
In the example below, we are testing whether or not there is a correlation between two continuous variables.
What is Hypothesis Testing?
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses, by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.
What is an Alternative and Null Hypothesis?
Alternative and null hypotheses can be used for statistical tests of hypotheses. Null hypotheses are used to predict that there is no relationship or effect between variables. Alternative hypotheses state your research predictions of an effect.
Basic Hypothesis Testing Process
Step 1: Identify the question
Once you have that, you can develop your alternative and null hypotheses. You’re looking for a simple question asking whether factor x has an impact on scenario y. The answer is either ‘yes or no’.
Step 2: Determine the Significance
The ideal sample size is the entire population that you are focusing on. It is not cost-effective to collect data for a large population, such as the whole of the United States. You need to have a sampling that represents a cross-section large enough for the hypothesis being tested.
Decide on your level of confidence. You need to know how confident you are that your results are statistically significant and that you can draw the correct conclusion from them. After you have decided this, you can calculate alpha levels which are simply (1 minus the confidence level). The standard level of confidence is 95% or 0.95. The standard alpha is therefore 5% or 0.05.
Step 3: Select the Test
The test you choose for your sample is important. It depends on what hypotheses and type of data you are using. You can use some basic questions to help you decide on the test that is best for your situation.
- What is the level of measurement used?
- How many different samples are used?
- What kind of analysis is required?
Step 4: Interpret Results
You’ll get your results once you run your data through your selected test. That’s not all! Next, you need to interpret the results. The p-value is one of the most important values provided by any statistical test. It gives you the likelihood that your conclusions will be incorrect based on the results.
Step 5: Make a Decision
You must then draw your conclusions and make a final decision based on your results. When you have interpreted your results, you can take two main decisions:
- Reject the null hypotheses and support the alternative.
- Failing to reject the null hypotheses.
After you have made your decision, you will need to write a conclusion. It should include your original hypothesis, a sample that was used to test it, and the conclusion you reached.
Hypothesis Test Example
1. What is the practical question?
– Does an increase in tire pressure cause an increase in tread wear?
2. What is the X? (Input being Controlled)
– Tire Pressure
3. What is the Y? (What is being Measured)
– Tread Wear
4. Gather Data, Run the Analysis, and determine the Pearson Correlation Co-efficient and P-Value
5. Run a Correlation Analysis (selecting the Pearson method). In this example the Pearson value ( r ) = 0.554 with a P-value = 0.0228
6. The results indicate that there is a medium correlation between the two factors and that the data set is significant at an Alpha risk level of .0228.
A guideline of R Values:
- between -0.2 and +0.2 is the random chance of a relationship
- between -0.8 and -1.0 or +0.8 and +1.0 is a strong relationship
- the remaining levels/values indicate there is a weak to moderate relationship
Remember that correlation is not causation. It is only a test of the strength of the linear relationship of two continuous factors; so the R-Value indicates the strength of the relationship and the P-Value indicates the Alpha Level at the relationship can be stated to be significant.
Share your thoughts in the comments below!