What is a Chi-Squared Test?

The Chi-Squared Test is used to determine if an Attribute or Discrete “X” has an association with another Attribute or Discrete “Y.” The example below is a hypothesis (or an educated guess) that there is a relationship in Loan Default Rates between Bank Branches.

The Chi-Squared Test is intended to test how likely it is that an observed distribution of data is due to chance. The Chi-Squared Test is also called the “goodness of fit” statistic because it measures how well the observed distribution of data fits with the distribution that is expected if the variables are independent.

The Null and Alternative Hypotheses are:

  • Ho = Variable A and Variable B are independent
    • There is no effect on Defaulted Loans due to Bank Branches
  • Ha = Variable A and Variable B are not independent
    • There is an effect on Defaulted Loans due to Bank Branches

Chi-Squared Example

  • A Bank has 200 Loan Approvals
    • 100 Approvals from Branch “A”
    • 100 Approvals from Branch “B”
  • There are:
    • 5 Loans defaulted from Branch “A”
    • 9 Loans defaulted from Branch “B”

The Question: Is there a “real” difference in the default rates between the branches?

First, collect and summarize the data. This is the observed data.

Entered in Minitab as:

Chi-Squared-Example

Analyzed as a Cross Tabulation and Chi-Square analysis the results are:

Chi-Sqared-Data-Output

The expected values are calculated as follows:

r = Row total*(Column total / Grand total)

Example: 100*(186/200) = 93

The P value of 0.268 indicates that there is no relationship between branch and loan results. A p-value < 0.05 indicates a relationship may exist.

Next, the r values are investigated and the largest r value will have the most influence on the relationship.

Have you used a Chi-Squared Analysis in a Six Sigma Project? If so, can you give us a brief example of the analysis?