Power and Effect-size

# Power and Effect-size
## Experiment planning
### Matthew Crump
### 2018/07/20 (updated: 2019-03-20)

---

# Don't run an experiment that is designed to fail

---

# How? Power analysis, effect-size, and sample-size planning

---

#overview

1. Z-score reminder
2. Effect-size
3. Power
4. Sample-size planning

---

# z-scores

---

# z-score review

Formula for z-score

`$z = \frac{\text{score} - \text{mean}}{\text{SD}}$`

---

# what does z tell us?

A z-score tells us **how far away a score is from the mean, in standard deviation units**

![](6d_power_files/figure-html/unnamed-chunk-1-1.png)

---

# What is Z?

![](6d_power_files/figure-html/unnamed-chunk-2-1.png)
]

---

# What is Z?

![](6d_power_files/figure-html/unnamed-chunk-3-1.png)
]

---

# What is Z?

![](6d_power_files/figure-html/unnamed-chunk-4-1.png)
]

---

# review of z

- when z=1, the score is 1 sd from the mean
- when z=2, the score is 2 sd from the mean
- when z=3, the score is 3 sd from the mean

---

# Effect-size

---

# Effect-size

When we run an experiment, we are interested in whether the **manipulation** caused a difference in our **measurement**

If, our **manipulation** causes a difference in our **measurement**, then there will be an **effect**.

**Effect-size** refers to how big or small the effect is

---

# Measures of effect-size

There are many different measures of effect size. Consider the simplest measure for two groups, A and B.

**Mean difference**

The difference between the mean of A, and the mean of B, is a measure of the effect size.

- Large mean difference is a large effect
- Small mean difference is a small effect

---

# How big is big?

- What if the mean difference is 50, is that big or small?

- What if the mean difference is 1, is that big or small?

---

# Relative to what?

Mean differences can be interpreted if we know what the difference is relative to.

- mean A = 1000, mean B = 1050
  - difference=50
  - 5% increase, not so big
  
- mean A = 1, mean B = 2
  - difference=1
  - 100% increase, pretty big

---

# Cohen's D

Cohen's D express a mean difference between two samples in terms of standard deviation units (like a z-score). This allows us to know something about the relative size.

- D = .1 (mean difference is shifted by .1 SD)
- D = 1 (mean difference is shifted by 1 SD)
- D = 2 (mean difference is shifted by 2 SD)

---

# Cohen's D formula

The general idea is:

`$d = \frac{\text{MeanA}-\text{MeanB}}{SD}$`

---

# M=0, SD=1, D=1

A = Black, B = Red, Cohen's D = 1

![](6d_power_files/figure-html/unnamed-chunk-5-1.png)

---

# M=100, SD=25, D=1

A = Black, B = Red, Cohen's D = 1

![](6d_power_files/figure-html/unnamed-chunk-6-1.png)

---

# No-difference

If there is no difference, how big is Cohen's D?

What do the distributione for A and B look like?

---

# No-difference

A and B come from the same distribution, no difference

![](6d_power_files/figure-html/unnamed-chunk-7-1.png)

---

# Interpreting Cohen's D

Cohen gives these recommendations:

- **Small**: d = .2  
- **Medium**: d =.5
- **Large**: d >= .8

Note d's larger than 1 are really big, they shift the whole distribution by a whole standard deviation, that's a lot!

---

# D's in Psychology

Many effects in Psychology are **small**, with **d around .2**.

One reason is that we measure people, and people are highly variable.

---

# Power

---

# Power

**Power** is the probability of rejecting the null-hypothesis, **when there is a TRUE DIFFERENCE**

- **Power = .2**, You will reject the null-hypothesis 20% of the time (20/100 experiments)
- **Power = .8**, (considered high power), You will reject the null-hypothesis 80% of the time (80/100 experiments)

---

# Power is a property of a design

Every design has it's own **Power** to detect effects of different sizes.

The power of a design depends on:
  - sample-size (n)
  - Effect-size (d)
  - alpha-criterion

---

# General info about power

1. Increasing sample-size, increases power
2. Increasing effect-size, increases power
3. Lowering alpha (making it easier to reject null), increases power

---

# type I error

---

# Alternative Hypothesis (d > 0)

---

# B = type II error

---

# Power = 1-B

---

# Power and effect-size

---

# Paired-sample t-test (n=10)

- Get 1,000 t-values assuming null is true
- Get 1,000 t-values assuming alternative is true (d=1)

```r
t_null <- replicate(1000,t.test(rnorm(10,0,1),
                                rnorm(10,0,1),
                                paired=TRUE)$statistic)

t_alt <- replicate(1000,t.test(rnorm(10,1,1),
                                rnorm(10,0,1),
                                paired=TRUE)$statistic)
```

---

# look at both t-distributions

power = 0.665 to detect d=1

![](6d_power_files/figure-html/unnamed-chunk-14-1.png)

---

# Increase N from 10 to 50

- Get 1,000 t-values assuming null is true
- Get 1,000 t-values assuming alternative is true (d=1)

```r
t_null <- replicate(1000,t.test(rnorm(50,0,1),
                                rnorm(50,0,1),
                                paired=TRUE)$statistic)

t_alt <- replicate(1000,t.test(rnorm(50,1,1),
                                rnorm(50,0,1),
                                paired=TRUE)$statistic)
```

---

# n=50

power = 0.999 to detect d=1

![](6d_power_files/figure-html/unnamed-chunk-16-1.png)

---

# N=10, Increase d to 2

- Get 1,000 t-values assuming null is true
- Get 1,000 t-values assuming alternative is true (d=1)

```r
t_null <- replicate(1000,t.test(rnorm(10,0,1),
                                rnorm(10,0,1),
                                paired=TRUE)$statistic)

t_alt <- replicate(1000,t.test(rnorm(10,2,1),
                                rnorm(10,0,1),
                                paired=TRUE)$statistic)
```

---

# n=10, d=2

power = 0.993 to detect d=2

![](6d_power_files/figure-html/unnamed-chunk-18-1.png)

---

# Power curves

A specific design, e.g.,

- Paired samples t-test, with n =10

Has different levels of power, to detect effects of different size. This can be shown on a power curve.

---

# Power curve, t.test, n=10

![](6d_power_files/figure-html/unnamed-chunk-19-1.png)

---

# Sample-size planning

**How many subjects do you need for your experiment?**

1. Establish a minumum effect-size of interest
2. Conduct a power-analysis, to show how power changes as a function of sample-size to detect the minimum effect size of interest

---

# Example

1. Minimum effect-size of interest, d = .2
2. Plot the power function

---

# power as a function of n

![](6d_power_files/figure-html/unnamed-chunk-20-1.png)

---

# Don't run an experiment that is designed to fail

---

# How? Do a power analysis, beforehand

---

# Next class: Midterm Review

1. Quiz on t-tests, due next Monday
2. Midterm review next Monday
3. Midterm review sheet and info is posted on Blackboard