Power and Sample Size

Theory

κ²€μ •λ ₯ (Power) μ΄λž€ νŠΉμ • ν‘œλ³Έ 쑰건 (size and variability) μ—μ„œ νŠΉμ •ν•œ 효과 크기 (effect size) λ₯Ό μ•Œμ•„λ‚Ό 수 μžˆλŠ” ν™•λ₯ μ„ μ˜λ―Έν•œλ‹€. μ‹€μ œ 차이가 크면 클수둝 그것을 λ°ν˜€λ‚Ό κ°€λŠ₯성도 λ”°λΌμ„œ 컀질 것이고, κ·Έ 차이가 μž‘μ„μˆ˜λ‘ 더 λ§Žμ€ 데이터가 ν•„μš”ν•˜κ²Œ λœλ‹€.

Example)

25νƒ€μ„μ—μ„œ 3ν•  3ν‘Ό νƒ€μžμ™€ 2ν•  νƒ€μžλ₯Ό ꡬ뢄할 수 μžˆμ„ ν™•λ₯ μ€ 0.75이닀. β†’ n=25일 λ•Œμ˜ μ‹€ν—˜μ€ 0.130의 효과 크기에 λŒ€ν•΄ 0.75(75%)의 κ²€μ •λ ₯을 κ°€μ§„λ‹€κ³  λ³Ό 수 μžˆλ‹€.

κ²€μ •λ ₯의 사전적 μ •μ˜

  • λŒ€λ¦½κ°€μ„€μ΄ 사싀일 λ•Œ, 이λ₯Ό μ‚¬μ‹€λ‘œμ„œ κ²°μ •ν•  ν™•λ₯ 

  • κ²€μ •λ ₯이 90%라고 ν•˜λ©΄, λŒ€λ¦½κ°€μ„€μ΄ μ‚¬μ‹€μž„μ—λ„ λΆˆκ΅¬ν•˜κ³  귀무가섀을 채택할 ν™•λ₯ (2μ’… 였λ₯˜, Ξ² error) 의 ν™•λ₯ μ€ 10%이닀.

  • κ²€μ •λ ₯ = 1 - Ξ²

검쑍λ ₯을 μ™œ μ•Œμ•„μ•Ό ν•˜λŠ”κ°€?

  • κ²€μ •λ ₯ κ³„μ‚°μ˜ 주된 μš©λ„λŠ” ν‘œλ³Έν¬κΈ°κ°€ μ–΄λŠ 정도 ν•„μš”ν•œκ°€λ₯Ό μΆ”μ •ν•˜λŠ” κ²ƒμž„.

  • '효과크기'κ°€ ν‘œλ³Έν¬κΈ°λ₯Ό μ’Œμš°ν•¨! (κΈ°λŒ€ν•˜λŠ” 효과 크기가 μž‘μ„μˆ˜λ‘ ν‘œλ³Έμ‚¬μ΄μ¦ˆκ°€ μ¦κ°€λ˜μ–΄μ•Ό 함)

κ²€μ •λ ₯/ν‘œλ³Έν¬κΈ° κ³„μ‚°μ˜ 4μš”μ†Œ

  • ν‘œλ³Έν¬κΈ° (Sample size)

  • νƒμ§€ν•˜κ³ μž ν•˜λŠ” 효과크기 (Effect size)

  • 가섀검정을 μœ„ν•œ μœ μ˜μˆ˜μ€€ (Significance level)

  • κ²€μ •λ ₯ (Power)

Ref) κ²€μ •λ ₯ 계산에 κ΄€ν•œ μˆ˜μ‹

Practice (MATLAB)

Ref) MATLAB sampsizepwr

sampsizepwr

sampsizepwr computes the sample size, power, or alternative parameter value for a hypothesis test, given the other two value. For example, you can compute the sample size required to obtain a particular power for a hypothesis test, given the parameter value of the alternative hypothesis.

Example1

A company runs manufacturing process that fills empty bottles with 100 mL of liquid. To monitor quality, the company randomly selects several bottles and measures the volume of liquid inside. Determine the sample size the compnay must use for a t-test to detect a difference between 100 mL and 102 mL with a power of 0.80.

nout = sampsizepwr('t', [100 5], 102, 0.80)
image-20210408232028078

The compnay must test 52 bottles to detect the difference between a mean volume of 100 mL and 102 mL with a power of 0.80.

Generate a power curve to visualize how the sample size affects the power of test.

nn = 1:100;
pwrout =  sampsizepwr('t', [100 5], 102, [], nn);

figure;
plot(nn, pwrout, 'b-', nout, 0.8, 'ro')
title('Power versus Sample Size')
xlabel('Sample Size')
ylabel('Power')
image-20210408232045000

Example2

An employee wants to buy a house near her office. She decides to eliminate from consideration any house that has a mean morning commute time greater than 20 minutes. The null hypothesis for this right-sided test is H0: ΞΌ = 20, and the alternative hypothesis is HA: ΞΌ > 20. The selected significance level is 0.05.

To determine the mean commute time, the employee takes a test drive from the house to her office during rush hour every morning for one week, so her total sample size is 5. She assumes that the standard deviation, Οƒ, is equal to 5.

The employee decides that a true mean commute time of 25 minutes is too different from her targeted 20-minute limit, so she wants to detect a significant departure if the true mean is 25 minutes. Find the probability of incorrectly concluding that the mean commute time is no greater than 20 minutes.

Compute the power of the test, and then subtract the power from 1 to obtain Ξ².

power = sampsizepwr('t',[20 5],25,[],5,'Alpha',0.05,'Tail','right')
image-20210408232313171
beta = 1 - power
image-20210408232349263

The employee decides that this risk is too high, and she wants no more than a 0.01 probability of reaching an incorrect conclusion. Calculate the number of test drives the employee must take to obtain a power of 0.99.

nout = sampsizepwr('t',[20 5],25,0.99,[],'Tail','right')

The results indicate that she must take 18 test drives from a candidate house to achieve this power level.

The employee decides that she only has time to take 10 test drives. She also accepts a 0.05 probability of making an incorrect conclusion. Calculate the smallest true parameter value that produces a detectable difference in mean commute time.

p1out = sampsizepwr('t',[20 5],[],0.95,10,'Tail','right')
image-20210408232529260

Given the employee's target power level and sample size, her test detects a significant difference from a mean commute time of at least 25.6532 minutes.

Last updated

Was this helpful?