Show the relationship between Power and Sample Size. The Power of the comparison test refers to the likelihood the decision is made that there is a significant difference when it actually exist.
The Power of a test determines if there is enough sensitivity in the test to detect actual (true) differences. Understand that more power and sample size are necessary to detect smaller differences. The power quantifies the smallest difference the comparison test is capable of detecting.
Power = 1 - Beta Risk (Type II Error)
Confidence Level = 1 - Alpha Risk (Type I Error)
How to read the table shown above:
The first row of the table indicates that as the probability of a Type I error increases (Alpha Risk), the Power increases and the probability of a Type II error (Beta Risk) decreases.
Power levels of 80-90% are typically considered to be effective, which is the same as Beta Risk of 10-20%,
In other words, as the Producer is willing to reject more non-defective parts to ensure the defective parts are rejected then the probability of Consumers getting any defects is reduced. This becomes more "powerful" in protecting the Consumers, perhaps the most important risk to protect.
Collecting data consumes time and resources; there is a tangible cost. It is important to collect enough data to detect the difference required but without creating waste by collecting excess data.
The level of Power needed should be determined by the GB/BB/MBB or combination along with input from the team. This value is normally higher as the application becomes more critical. Life dependent, regulatory, and safety applications would require higher levels of power.
Beta should be no higher than 5% to allow for a minimum Power level of 95% in critical applications. For example, choosing a Power of 99%, means that you are willing to accept a 1% chance of having Beta risk. There is a 1% chance that a decision is made that no parts are defective but there is are defective parts and the consumer will suffer.
Determining the level of Power is the starting point in determining the amount of samples to be collected. And getting this quantity of samples add confidence to the test results and inferences to the population.
Again, there is always an argument to achieve near perfect power (99.99999...%) because someone may feel their test or application is of premium importance. However, there is often a price for perfection and this would require an impractical data amount of resources. The more samples collected and analyzed the stronger the power will be to detect smaller differences.
Type I Error = Alpha Risk = Significance Level = Producers Risk = False Positive.
This is when the decision is made that there is a difference when the truth is there is not. In other words, parts have been determined defective (possibly scrapped) and they were not defective. The Producer suffered by losing stock and needing to make up the lost inventory.
Type II Error = Beta Risk = Consumers Risk = False Negative
This is when the decision is made that there is not a difference when the truth is there is a difference. In other words, parts have been determined not defective and sent to the customer (or downstream operation) and they were defective. The Consumer suffered by receiving defects.
Determine the sample size needed to detect a mean shift of 0.049 on a process with standard deviation of 0.03924. Use alpha of 5% and beta of 10%. The mean of one set of 40 samples from a normally distributed set of data was 0.430 and the mean from another set of 40 samples from a normally distributed set of data was 0.381.
From the top picture, notice that 15 samples were needed to detect difference of 0.049 at a Power of 90%. From the bottom picture, notice the true Power is >99% since the sample size was actually 40.
Keep in mind this example focuses on statistically detecting a shift in the mean. This does not indicate anything about the variation between the two sets of data. If this were a before and after analysis, it is possible the variation increased after while still shifting the mean favorably. The emphasis of Six Sigma is on variation reduction and the F-test is used in this case (normal data) to determine if there is a statistical change in the variation.
Six Sigma Modules
The following presentations are available to download
Green Belt Program 1,000+ Slides
Cause & Effect Matrix
Central Limit Theorem
1-Way Anova Test
Correlation and Regression