class: center, middle, inverse, title-slide # t-tests ## Paired sample ### Matthew Crump ### 2018/07/20 (updated: 2019-03-18) --- class: pink, center, middle, clear # t-tests and designs --- # Three kinds of t-tests 1. one-sample 2. paired-sample 3. Independent sample --- # One-sample t-test Purpose: Compare sample mean to a hypothetical population mean --- # Paired-sample t-test Purpose: Compare two sample means in a within-subjects design Within-subjects design: Same subjects are measured across both levels of the experimental manipulation (independent variable) --- # Consider this Within-subjects experiment, n=5, all subjects are measured in level A and B of the experiment. <table> <thead> <tr> <th style="text-align:right;"> subjects </th> <th style="text-align:right;"> level_A </th> <th style="text-align:right;"> level_B </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- # Empirical question Did the manipulation (A vs. B) cause a difference in the measure? <table> <thead> <tr> <th style="text-align:right;"> subjects </th> <th style="text-align:right;"> level_A </th> <th style="text-align:right;"> level_B </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- # Difference scores How could a t-test be used to analyze the difference scores? <table> <thead> <tr> <th style="text-align:right;"> subjects </th> <th style="text-align:right;"> level_A </th> <th style="text-align:right;"> level_B </th> <th style="text-align:right;"> differences </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 5 </td> </tr> </tbody> </table> --- # Paired samples t-test .pull-left[ Purpose: Compare two means from paired samples - `\(\bar{X}_D\)` = mean of difference scores - `\(u_0\)` = hypothetical population mean of 0 - `\(s_D\)` = sample standard deviation of difference scores (divide by n-1) - `\(n\)` = sample-size ] .pull-right[ `\(t = \frac{\bar{X_D}-u_0}{\text{SEM}_D}\)` `\(t = \frac{\bar{X_D}-u_0}{\frac{s_D}{\sqrt{n}}}\)` `\(s = \sqrt{\frac{\sum{(x_i-\bar{X})^2}}{N-1}}\)` ] --- # In other words .pull-left[ A paired samples t-test is a one-sample t-test applied to the difference scores We are testing the null-hypothesis that the differences have a mean of 0 (u=0). ] .pull-right[ Observed `\(t\)` for paired samples test: `\(t = \frac{\text{Mean of Difference scores}}{\text{SEM of Difference scores}}\)` ] --- # Calculating Difference scores Assume 5 subjects participated in both conditions (A and B) of an experiment. ```r A <-c(1,4,3,6,5) B <-c(4,8,7,9,10) difference <- B-A print(difference) ``` ``` ## [1] 3 4 4 3 5 ``` --- # Calculating Mean and SEM ```r A <-c(1,4,3,6,5) B <-c(4,8,7,9,10) difference <- B-A # Calculate Mean mean(difference) ``` ``` ## [1] 3.8 ``` ```r # Calculate SEM sd(difference)/sqrt(length(difference)) ``` ``` ## [1] 0.3741657 ``` --- # Calculate t (paired samples) ```r A <-c(1,4,3,6,5) B <-c(4,8,7,9,10) difference <- B-A mean_D <- mean(difference) SEM_D <- sd(difference)/sqrt(length(difference)) # calculate t mean_D/SEM_D ``` ``` ## [1] 10.15593 ``` --- # using the t.test() function There are two ways to use the t.test() function to calculate t for paired samples 1. Treat the data as difference scores ```r A <-c(1,4,3,6,5) B <-c(4,8,7,9,10) difference <- B-A t.test(difference)$statistic ``` ``` ## t ## 10.15593 ``` --- # using the t.test() function 2. Use both variables for each sample, and set `paired=TRUE` ```r A <-c(1,4,3,6,5) B <-c(4,8,7,9,10) t.test(A,B, paired=TRUE)$statistic ``` ``` ## t ## -10.15593 ``` Note: t is (-) here because the t.test formula computes the differences as the first variable minus the second variable. --- class: pink, center, middle, clear # Hypothesis testing --- # Where did t come from? 1. We can compute t for a paired sample Next Steps for hypothesis testing: Big Question: Could our observed t be produced by chance alone? - How can we figure this out? --- # Null distribution of t Answer: We need to find out what kind of ts can be produced by chance alone - we need to find the null distribution of t --- # What is the null distribution of t? Null distribution of t: - the distribution of t values that would occur by chance alone if the experimental manipulation caused no difference in the sample means --- # Hypothetical possibilities .pull-left[ The null - both samples come from the same distribution ![](6b_paired_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] .pull-right[ The "alternative" - each sample comes from it's own distribution ![](6b_paired_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ] --- # Considering the null .pull-left[ The null - both samples come from the same distribution ![](6b_paired_files/figure-html/unnamed-chunk-11-1.png)<!-- --> ] .pull-right[ Question: .font80[ - if we sampled two sets of scores from the **same distribution**, what would we expect for the sampling distribution of the mean difference scores? - how about the sampling distribution of t? ] ] --- # Observed vs. critical t **Observed t**: - the t-value that you calculate from your data **Critical t**: - a t-value associated with the null-distribution - depends on alpha, and df - e.g., if alpha =.05, then any t bigger than critical t occurs 5% of the time by chance --- # simulating t The following slide allows you to explore simulating a null and alternative distributions for a paired sample t-test 1. pick n (sample-size) 2. choose mean of normal distribution for each sample 3. choose sd of normal distribution 4. choose number of simulations --- class: center, middle, clear <iframe style="width:100%;height:100%;border-style:none;", src="https://crumplab.shinyapps.io/pairedTtest/" /> --- # Questions to ask 1. When each sample comes from the same distribution, what is the average mean difference between the samples? -- 2. What happens to the range of the sampling distribution of mean differences as sample-size increases? -- 3. What happens to the range of the sampling distribution of mean differences as the standard deviation of the population increases? --- # More questions 1. If the two samples are taken from the same distribution, what percent of the time will be the observed t-value be greater than critical t? 2. What happens to observed t when the samples are taken from distributions with different means? 3. What are some ways (e.g., change sample-size, sd, mean difference) to ensure that observed t will generally be larger than critical t? --- # Critical t Critical t is set by two properties: 1. the alpha criterion 2. whether the test is directional (one-tailed) or non-directional (two-tailed) --- # Directional test (reminder) A directional test assumes that the experimental manipulation will cause a difference in a particular direction. - mean for A > (greater than) mean for B - mean for A < (less than) mean for B --- # Critical t (one-tailed) example Critical t for a directional (one-tailed) test - alpha = 0.05, or 5% Critical t is the t-value associated with a null-distribution where this t-value or larger occurs 5% of the time. --- # Critical t (one-tailed) <img src="figs/ttest/6critT-1.png" width="1792" /> --- # Critical t depends on df and alpha .pull-left[ The table shows values of critical t for a one-tailed test - alpha values of .10, .05, and .01 - degress of freedom from 5 to 100 ] .pull-right[ <table> <thead> <tr> <th style="text-align:right;"> df </th> <th style="text-align:right;"> p_10 </th> <th style="text-align:right;"> p_05 </th> <th style="text-align:right;"> p_01 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 1.48 </td> <td style="text-align:right;"> 2.02 </td> <td style="text-align:right;"> 3.36 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 1.44 </td> <td style="text-align:right;"> 1.94 </td> <td style="text-align:right;"> 3.14 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 1.41 </td> <td style="text-align:right;"> 1.89 </td> <td style="text-align:right;"> 3.00 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 1.40 </td> <td style="text-align:right;"> 1.86 </td> <td style="text-align:right;"> 2.90 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 1.38 </td> <td style="text-align:right;"> 1.83 </td> <td style="text-align:right;"> 2.82 </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 1.37 </td> <td style="text-align:right;"> 1.81 </td> <td style="text-align:right;"> 2.76 </td> </tr> <tr> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 1.33 </td> <td style="text-align:right;"> 1.72 </td> <td style="text-align:right;"> 2.53 </td> </tr> <tr> <td style="text-align:right;"> 50 </td> <td style="text-align:right;"> 1.30 </td> <td style="text-align:right;"> 1.68 </td> <td style="text-align:right;"> 2.40 </td> </tr> <tr> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> 1.29 </td> <td style="text-align:right;"> 1.66 </td> <td style="text-align:right;"> 2.36 </td> </tr> </tbody> </table> ] --- # Non-Directional test A non-directional test assumes that the experimental manipulation will cause **any** difference. - mean for A != (will not equal) mean for B E.g., - Mean for A could be bigger or smaller than mean for B --- # Non-directional test (2-tailed) <img src="figs/ttest/6twotailedt-1.png" width="1792" /> --- # Comparing critical t (1 vs 2 tailed) .pull-left[ One-tailed <table> <thead> <tr> <th style="text-align:right;"> df </th> <th style="text-align:right;"> p_10 </th> <th style="text-align:right;"> p_05 </th> <th style="text-align:right;"> p_01 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 1.48 </td> <td style="text-align:right;"> 2.02 </td> <td style="text-align:right;"> 3.36 </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 1.37 </td> <td style="text-align:right;"> 1.81 </td> <td style="text-align:right;"> 2.76 </td> </tr> <tr> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 1.33 </td> <td style="text-align:right;"> 1.72 </td> <td style="text-align:right;"> 2.53 </td> </tr> <tr> <td style="text-align:right;"> 50 </td> <td style="text-align:right;"> 1.30 </td> <td style="text-align:right;"> 1.68 </td> <td style="text-align:right;"> 2.40 </td> </tr> <tr> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> 1.29 </td> <td style="text-align:right;"> 1.66 </td> <td style="text-align:right;"> 2.36 </td> </tr> </tbody> </table> ] .pull-right[ Two-tailed <table> <thead> <tr> <th style="text-align:right;"> df </th> <th style="text-align:right;"> p_10 </th> <th style="text-align:right;"> p_05 </th> <th style="text-align:right;"> p_01 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 2.02 </td> <td style="text-align:right;"> 2.57 </td> <td style="text-align:right;"> 4.03 </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 1.81 </td> <td style="text-align:right;"> 2.23 </td> <td style="text-align:right;"> 3.17 </td> </tr> <tr> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 1.72 </td> <td style="text-align:right;"> 2.09 </td> <td style="text-align:right;"> 2.85 </td> </tr> <tr> <td style="text-align:right;"> 50 </td> <td style="text-align:right;"> 1.68 </td> <td style="text-align:right;"> 2.01 </td> <td style="text-align:right;"> 2.68 </td> </tr> <tr> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> 1.66 </td> <td style="text-align:right;"> 1.98 </td> <td style="text-align:right;"> 2.63 </td> </tr> </tbody> </table> ] --- # Making decisions .pull-left[ One-tailed - reject null - observed t in green area - fail to reject null - observed t in white area ] .pull-right[ <img src="figs/ttest/6critT-1.png" width="100%" /> ] --- # Making decisions .pull-left[ Two-tailed - reject null - observed t in green area - fail to reject null - observed t in white area ] .pull-right[ <img src="figs/ttest/6twotailedt-1.png" width="100%" /> ] --- class: pink, center, middle, clear # Example from lab --- # Mehr, Song, and Spelke (2016) <img src="figs/ttest/song.png" width="100%" /> --- # Research question Do infants use melodies as a cue about social interaction? If an infant heard and watched an unfamiliar adult singing a familiar melody, would they pay more attention to that person (by looking at them)? --- # Study design <img src="figs/ttest/design.png" width="90%" /> --- # Study predictions <img src="figs/ttest/predictions.png" width="90%" /> --- # Data from first 5 infants <img src="figs/ttest/first5.png" width="1725" /> --- # Means <img src="figs/ttest/first5b.png" width="1731" /> --- # observed t <img src="figs/ttest/first5c.png" width="2139" /> --- # r code ```r baseline <- c(.44,.41,.75,.44,.47) test <- c(.6,.68,.72,.28,.5) ``` --- # r results (two-tailed) ```r t.test(test,baseline,paired=TRUE) ``` ``` ## ## Paired t-test ## ## data: test and baseline ## t = 0.72381, df = 4, p-value = 0.5092 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.1531384 0.2611384 ## sample estimates: ## mean of the differences ## 0.054 ``` --- # Interpretation (two-tailed) The results from the two-tailed test were: - t(4) = .723, p = .5092 Interpretation: - p is the probability that the null-distribution produces an **absolute value** of t=.723 or larger - 50.92% of t-values from the null-distribution (assuming no difference) are larger than .723, and smaller than -.723. --- # r results (one-tailed) ```r t.test(test,baseline,paired=TRUE, alternative="greater") ``` ``` ## ## Paired t-test ## ## data: test and baseline ## t = 0.72381, df = 4, p-value = 0.2546 ## alternative hypothesis: true difference in means is greater than 0 ## 95 percent confidence interval: ## -0.1050478 Inf ## sample estimates: ## mean of the differences ## 0.054 ``` --- # Interpretation (one-tailed) The results from the one-tailed test were: - t(4) = .723, p = .2546 Interpretation: - p is the probability that the null-distribution produces a value of t=.723 or larger - 25.46% of t-values from the null-distribution (assuming no difference) were larger than .723. --- # Increasing N We only looked at data from the first 5 infants... We found that the observed t-value could easily have been produced by chance, and we did not reject the null-hypotheses (p-values were not less than .05) Let's see what happens when we use all of the data --- # results using all of the data <img src="figs/ttest/allinfants.png" width="1699" /> --- # Next class: Independent sample t-test 1. Tuesday, October 16th: Independent samples t-tests --- # Reminder 1. Quiz for this week will be posted tonight (Thursday, Oct. 11), Due NEXT THURSDAY, Oct. 18 end of day. 2. Midterm review sheet will be posted before next class. I will announce on blackboard.