class: center, middle, inverse, title-slide # t-tests ## one sample ### Matthew Crump ### 2018/07/20 (updated: 2018-10-09) --- class: pink, center, middle, clear # Student's t-test --- # William Sealy Gosset .pull-left[ - Creator of t-test (1908) - Worked for Guiness breweries, published under a pseuodnym (student) ] .pull-right[ <img src="figs/ttest/student.png" width="640" /> ] --- class: center # The Guiness Problem <img src="figs/ttest/guiness.png" width="220" /> --- # This Class 1. The t statistic 2. Experimental design and t-tests 3. One-sample t-test --- # Common ratio in inferential stastics Many inferential statistics have a common form .center[ `\(\text{Inferential statistic}=\frac{\text{Measure of Effect}}{\text{Measure of Error}}\)` ] Measure of effect = Some measure of the pattern in data Measure of error = Some measure of random fluctuation in the data --- # t-statistic (big idea) (FYI, no one really knows what t stands for...) .center[ `\(t = \frac{\text{Mean}}{\text{Standard Error of the Mean}}\)` ] **Why would anyone bother dividing a mean by the SEM?** --- # Confidence in mean <table> <thead> <tr> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> SEM </th> <th style="text-align:right;"> t </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:right;"> 50.0 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.5 </td> <td style="text-align:right;"> 10.0 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 5.0 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 1.0 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 10.0 </td> <td style="text-align:right;"> 0.5 </td> </tr> </tbody> </table> --- # Two questions 1. What must be true if a t-value is less than 1? 2. What must be true if a t-value is greater than 1? --- # a bit of R ```r my_t <- function(x){ mean(x)/(sd(x)/sqrt(length(x))) } sample <- c(1,5,4,3,6,7) my_t(sample) ``` ``` ## [1] 4.913538 ``` ```r t.test(sample)$statistic ``` ``` ## t ## 4.913538 ``` --- # The sampling distribution of t 1. Take a sample of size n from a normal population 2. Compute t 3. Repeat many times 4. Plot the distribution --- # Simulating the t distribution ```r ts <- c() for(i in 1:1000){ sample <- rnorm(10,0,1) ts[i] <- t.test(sample)$statistic } ``` --- # Plotting the histogram ![](6a_ttest_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- # Formula for t-distribution <img src="figs/ttest/tdist.png" width="1768" /> --- # t distributions .pull-left[ - shaped like a normal - **but**, more spread out - depends on sample-size (df) - blue is normal(0,1) - red is t(df=1) - green is t(df=2, and df=3) ] .pull-right[ <img src="figs/ttest/tdist2.png" width="944" /> ] --- # ts and ps .pull-left[ - t-distribution with 9 degrees of freedom - one-directional test - Only 5% of ts are larger than 1.8331129 ] .pull-right[ ![](6a_ttest_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] --- class: center # ts and ps the old way <img src="figs/ttest/ttable.png" width="45%" /> --- # pt(): find p for a t Use the `pt()` function to find the probability (p) of t-values (from smallest possible to value of t) - Must supply t-value, and df. - For a t-distribution (df=9), what is the probability that a t-value will be 0 or smaller? ```r pt(q=0,df=9) ``` ``` ## [1] 0.5 ``` --- # pt(): more examples - For a t-distribution (df=9), what is the probability that a t-value will be 1 or smaller? ```r pt(q=1,df=9) ``` ``` ## [1] 0.8282818 ``` - For a t-distribution (df=9), what is the probability that a t-value will be 1 or greater? ```r 1-pt(q=1,df=9) ``` ``` ## [1] 0.1717182 ``` --- # qt(): find t for a p Use the `qt()` function to find the t-value associated with a particular p-value. - Must supply p-value, and df. - For a t-distribution (df=9), what value of t has a probability of .5? ```r qt(p=0.5,df=9) ``` ``` ## [1] 0 ``` --- # qt(): more examples - For a t-distribution (df=9), what value of t or smaller occurs 95% of the time? ```r qt(p=.95,df=9) ``` ``` ## [1] 1.833113 ``` - For a t-distribution (df=9), what value of t or smaller occurs 5% of the time ```r qt(p=.05,df=9) ``` ``` ## [1] -1.833113 ``` --- # comparing solutions A simulated t-distribution gives similar p-values to analytic answer (using `pt()`) ```r all_ts<-replicate(10000,t.test(rnorm(10,0,1))$statistic) length(all_ts[all_ts<=2])/10000 ``` ``` ## [1] 0.9616 ``` ```r pt(2,9) ``` ``` ## [1] 0.9617236 ``` --- class: pink, center, middle, clear # t-tests and designs --- # Three kinds of t-tests 1. one-sample 2. paired-sample 3. Independent sample --- # One-sample t-test Purpose: Compare sample mean to a hypothetical population mean --- # Paired-sample t-test Purpose: Compare two sample means in a within-subjects design Within-subjects design: Same subjects are measured across both levels of the experimental manipulation (independent variable) --- # Independent-sample t-test Purpose: Compare two sample means in a between-subjects design Between-subjects design: Different subjects are measured across both levels of the experimental manipulation (independent variable) --- class: pink, center, middle, clear # One-sample t-test --- # One-sample t-test .pull-left[ Purpose: Compare sample mean to a hypothetical population mean - `\(\bar{X}\)` = sample mean - `\(u\)` = hypothetical population mean - `\(s\)` = sample standard deviation (divide by n-1) - `\(n\)` = sample-size ] .pull-right[ `\(t = \frac{\bar{X}-u}{\text{SEM}}\)` `\(t = \frac{\bar{X}-u}{\frac{s}{\sqrt{n}}}\)` `\(s = \sqrt{\frac{\sum{(x_i-\bar{X})^2}}{N-1}}\)` ] --- # An example .pull-left[ Question: What population did this sample come from? ```r mean(scores) ``` ``` ## [1] 0.704 ``` ```r sd(scores) ``` ``` ## [1] 0.1681666 ``` ] .pull-right[ subjects scores --------- ------- 1 0.50 2 0.56 3 0.76 4 0.80 5 0.90 ] --- # Best guesses Remember 1. The sample mean is our best estimate of the population mean 2. The sample standard deviation (dividing by N-1) is our best estimate of the population standard deviation --- # One possibility .pull-left[ .font70[Our sample statistics are consistent with the data coming from a normal distribution with the following mean and standard deviation] ```r mean(scores) ``` ``` ## [1] 0.704 ``` ```r sd(scores) ``` ``` ## [1] 0.1681666 ``` ] .pull-right[ subjects scores --------- ------- 1 0.50 2 0.56 3 0.76 4 0.80 5 0.90 ] --- # Testing other possibilities The one sample t-test allows us to test other possibilities. For example: Could the data have come from a normal distribution with... - mean = .25 - mean = .5 - mean = .75 --- # Conducting the t-test Steps: 1. Compute the observed t-value `\(t_\text{observed}\)` 2. Set alpha criteria (p <. 05) 3. We will conduct a directional test 4. Find the probability that t could be `\(t_\text{observed}\)` or larger --- # Computing t for one-sample test Could the scores have come from a normal distribution with mean =.25? `\(t = \frac{\bar{X}-u}{\frac{s}{\sqrt{n}}}\)` ```r scores<-c(.5,.56,.76,.8,.9) effect <- (mean(scores)-.25) error <- sd(scores)/sqrt(5) t <- effect/error t ``` ``` ## [1] 6.036722 ``` --- # Compute the associated p-value Use pt(), df (degrees of freedom) is n-1. ```r pt(t,df=4) # left side ``` ``` ## [1] 0.9981017 ``` ```r 1-pt(t,df=4) # right side ``` ``` ## [1] 0.001898315 ``` --- # Looking at the evidence - Our sample mean was 0.704 - Observed t was 6.0367217 - The associated p was 0.0018983 What does this mean? The probability that our sample mean (or greater) came from normal distribution with (mean =.25, sd = 0.1681666) is 0.0018983. --- # Making a decision Write up of results: We conducted a one sample t-test comparing the sample mean (0.704) against a population mean of .25, t(4) = 6.04, p = 0.0019. Our conclusion - We set an alpha criteria of p<.05. We reject the hypothesis that our sample mean came from a normal population with mean =.25, and sd = 0.17. --- # t.test() R has a t-test function that let's you do all three kinds of t-tests. Here is how you conduct a one-sample t-test using the function. ```r scores<-c(.5,.56,.76,.8,.9) t.test(scores, mu = .25, alternative="greater") ``` - alternative="greater" specifies a directional test: to find probability of t or greater - alternative="lesser" directional test to find probability of t or less --- # t.test() output ```r t.test(scores, mu=.25, alternative="greater") ``` ``` ## ## One Sample t-test ## ## data: scores ## t = 6.0367, df = 4, p-value = 0.001898 ## alternative hypothesis: true mean is greater than 0.25 ## 95 percent confidence interval: ## 0.5436715 Inf ## sample estimates: ## mean of x ## 0.704 ``` --- # testing u =.5 ```r t.test(scores, mu=.5, alternative="greater") ``` ``` ## ## One Sample t-test ## ## data: scores ## t = 2.7125, df = 4, p-value = 0.0267 ## alternative hypothesis: true mean is greater than 0.5 ## 95 percent confidence interval: ## 0.5436715 Inf ## sample estimates: ## mean of x ## 0.704 ``` --- # testing u =.75 ```r t.test(scores, mu=.75, alternative="greater") ``` ``` ## ## One Sample t-test ## ## data: scores ## t = -0.61165, df = 4, p-value = 0.7131 ## alternative hypothesis: true mean is greater than 0.75 ## 95 percent confidence interval: ## 0.5436715 Inf ## sample estimates: ## mean of x ## 0.704 ``` --- # Extracting values The `t.test()` function generates a bunch of output, sometime you might want to to extract the t-value, and the p-value. ```r x <- t.test(scores, mu=.75, alternative="greater") x$statistic ``` ``` ## t ## -0.6116502 ``` ```r x$p.value ``` ``` ## [1] 0.7130873 ``` --- class: pink, center, middle, clear # Thinking ahead to paired samples-test --- # Consider this Within-subjects experiment, n=5, all subjects are measured in level A and B of the experiment. <table> <thead> <tr> <th style="text-align:right;"> subjects </th> <th style="text-align:right;"> level_A </th> <th style="text-align:right;"> level_B </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- # Empirical question Did the manipulation (A vs. B) cause a difference in the measure? <table> <thead> <tr> <th style="text-align:right;"> subjects </th> <th style="text-align:right;"> level_A </th> <th style="text-align:right;"> level_B </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- # Difference scores How could a one-sample t-test be used to analyze the difference scores? <table> <thead> <tr> <th style="text-align:right;"> subjects </th> <th style="text-align:right;"> level_A </th> <th style="text-align:right;"> level_B </th> <th style="text-align:right;"> differences </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 5 </td> </tr> </tbody> </table> --- # Next class: Paired-Sample t-test 1. Thursday, October 11th: paired sample t-tests --- # Reminder 1. Quiz 5 is due today Tuesday, October, 9th end of day (11:59pm). 2. Quiz for this week will be posted tonight or tomorrow. 3. No quiz next week (midterm review) ---