class: center, middle, inverse, title-slide # Descriptive Statistics ## What to do with lots of numbers ### Matthew Crump ### 2018/07/20 (updated: 2019-02-04) --- class: pink, center, middle, clear # What do lots of number look like? --- # Lots of Numbers look like this Like this <div class=rtable> <table> <tbody> <tr> <td style="text-align:right;"> 52 </td> <td style="text-align:right;"> -23 </td> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> 91 </td> <td style="text-align:right;"> -42 </td> <td style="text-align:right;"> 34 </td> <td style="text-align:right;"> -59 </td> <td style="text-align:right;"> -50 </td> <td style="text-align:right;"> -80 </td> <td style="text-align:right;"> -71 </td> <td style="text-align:right;"> 35 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> 49 </td> <td style="text-align:right;"> -14 </td> <td style="text-align:right;"> -91 </td> <td style="text-align:right;"> 48 </td> <td style="text-align:right;"> 24 </td> <td style="text-align:right;"> -49 </td> </tr> <tr> <td style="text-align:right;"> -53 </td> <td style="text-align:right;"> -1 </td> <td style="text-align:right;"> 25 </td> <td style="text-align:right;"> -33 </td> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> -48 </td> <td style="text-align:right;"> -49 </td> <td style="text-align:right;"> -25 </td> <td style="text-align:right;"> -75 </td> <td style="text-align:right;"> 81 </td> <td style="text-align:right;"> -69 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> -85 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:right;"> -69 </td> <td style="text-align:right;"> 98 </td> <td style="text-align:right;"> -11 </td> <td style="text-align:right;"> 89 </td> <td style="text-align:right;"> -24 </td> </tr> <tr> <td style="text-align:right;"> -60 </td> <td style="text-align:right;"> -95 </td> <td style="text-align:right;"> -18 </td> <td style="text-align:right;"> 68 </td> <td style="text-align:right;"> -55 </td> <td style="text-align:right;"> -14 </td> <td style="text-align:right;"> -51 </td> <td style="text-align:right;"> 49 </td> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> -71 </td> <td style="text-align:right;"> 91 </td> <td style="text-align:right;"> 77 </td> <td style="text-align:right;"> 68 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> -81 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 86 </td> <td style="text-align:right;"> -32 </td> <td style="text-align:right;"> 15 </td> </tr> <tr> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> 92 </td> <td style="text-align:right;"> -42 </td> <td style="text-align:right;"> -89 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> -22 </td> <td style="text-align:right;"> 85 </td> <td style="text-align:right;"> -57 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 54 </td> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> 76 </td> <td style="text-align:right;"> -11 </td> <td style="text-align:right;"> 83 </td> <td style="text-align:right;"> -60 </td> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 93 </td> <td style="text-align:right;"> -53 </td> </tr> <tr> <td style="text-align:right;"> 43 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> -68 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:right;"> -68 </td> <td style="text-align:right;"> 51 </td> <td style="text-align:right;"> 85 </td> <td style="text-align:right;"> -58 </td> <td style="text-align:right;"> -56 </td> <td style="text-align:right;"> 38 </td> <td style="text-align:right;"> -34 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 66 </td> <td style="text-align:right;"> -52 </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> -10 </td> <td style="text-align:right;"> -34 </td> <td style="text-align:right;"> -42 </td> </tr> <tr> <td style="text-align:right;"> -29 </td> <td style="text-align:right;"> -8 </td> <td style="text-align:right;"> -56 </td> <td style="text-align:right;"> -20 </td> <td style="text-align:right;"> 99 </td> <td style="text-align:right;"> 24 </td> <td style="text-align:right;"> -30 </td> <td style="text-align:right;"> -1 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 46 </td> <td style="text-align:right;"> -11 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 67 </td> <td style="text-align:right;"> -17 </td> <td style="text-align:right;"> -48 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> -62 </td> <td style="text-align:right;"> -86 </td> </tr> <tr> <td style="text-align:right;"> -34 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> -36 </td> <td style="text-align:right;"> -24 </td> <td style="text-align:right;"> -28 </td> <td style="text-align:right;"> -9 </td> <td style="text-align:right;"> -13 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> -3 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:right;"> -63 </td> <td style="text-align:right;"> -28 </td> <td style="text-align:right;"> -18 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 92 </td> <td style="text-align:right;"> 28 </td> <td style="text-align:right;"> -94 </td> <td style="text-align:right;"> -25 </td> </tr> <tr> <td style="text-align:right;"> -96 </td> <td style="text-align:right;"> -10 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> -56 </td> <td style="text-align:right;"> 26 </td> <td style="text-align:right;"> 93 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> -90 </td> <td style="text-align:right;"> 62 </td> <td style="text-align:right;"> -19 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> -27 </td> <td style="text-align:right;"> -67 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -19 </td> <td style="text-align:right;"> -46 </td> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 48 </td> </tr> <tr> <td style="text-align:right;"> -10 </td> <td style="text-align:right;"> -89 </td> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> -31 </td> <td style="text-align:right;"> -45 </td> <td style="text-align:right;"> 98 </td> <td style="text-align:right;"> -56 </td> <td style="text-align:right;"> -48 </td> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 98 </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> -32 </td> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 68 </td> <td style="text-align:right;"> -2 </td> <td style="text-align:right;"> -99 </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> 66 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> -80 </td> </tr> <tr> <td style="text-align:right;"> -63 </td> <td style="text-align:right;"> -52 </td> <td style="text-align:right;"> 54 </td> <td style="text-align:right;"> -55 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> -49 </td> <td style="text-align:right;"> 92 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> -54 </td> <td style="text-align:right;"> -95 </td> <td style="text-align:right;"> -73 </td> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> -71 </td> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 52 </td> <td style="text-align:right;"> -1 </td> <td style="text-align:right;"> 8 </td> </tr> </tbody> </table> </div> --- # What can we say about them? We can see they aren't all the same. Not much else really. Looking at a bunch of numbers is hard work. <div class=rtable> <table> <tbody> <tr> <td style="text-align:right;"> 52 </td> <td style="text-align:right;"> -23 </td> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> 91 </td> <td style="text-align:right;"> -42 </td> <td style="text-align:right;"> 34 </td> <td style="text-align:right;"> -59 </td> <td style="text-align:right;"> -50 </td> <td style="text-align:right;"> -80 </td> <td style="text-align:right;"> -71 </td> <td style="text-align:right;"> 35 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> 49 </td> <td style="text-align:right;"> -14 </td> <td style="text-align:right;"> -91 </td> <td style="text-align:right;"> 48 </td> <td style="text-align:right;"> 24 </td> <td style="text-align:right;"> -49 </td> </tr> <tr> <td style="text-align:right;"> -53 </td> <td style="text-align:right;"> -1 </td> <td style="text-align:right;"> 25 </td> <td style="text-align:right;"> -33 </td> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> -48 </td> <td style="text-align:right;"> -49 </td> <td style="text-align:right;"> -25 </td> <td style="text-align:right;"> -75 </td> <td style="text-align:right;"> 81 </td> <td style="text-align:right;"> -69 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> -85 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:right;"> -69 </td> <td style="text-align:right;"> 98 </td> <td style="text-align:right;"> -11 </td> <td style="text-align:right;"> 89 </td> <td style="text-align:right;"> -24 </td> </tr> <tr> <td style="text-align:right;"> -60 </td> <td style="text-align:right;"> -95 </td> <td style="text-align:right;"> -18 </td> <td style="text-align:right;"> 68 </td> <td style="text-align:right;"> -55 </td> <td style="text-align:right;"> -14 </td> <td style="text-align:right;"> -51 </td> <td style="text-align:right;"> 49 </td> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> -71 </td> <td style="text-align:right;"> 91 </td> <td style="text-align:right;"> 77 </td> <td style="text-align:right;"> 68 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> -81 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 86 </td> <td style="text-align:right;"> -32 </td> <td style="text-align:right;"> 15 </td> </tr> <tr> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> 92 </td> <td style="text-align:right;"> -42 </td> <td style="text-align:right;"> -89 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> -22 </td> <td style="text-align:right;"> 85 </td> <td style="text-align:right;"> -57 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 54 </td> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> 76 </td> <td style="text-align:right;"> -11 </td> <td style="text-align:right;"> 83 </td> <td style="text-align:right;"> -60 </td> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 93 </td> <td style="text-align:right;"> -53 </td> </tr> <tr> <td style="text-align:right;"> 43 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> -68 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:right;"> -68 </td> <td style="text-align:right;"> 51 </td> <td style="text-align:right;"> 85 </td> <td style="text-align:right;"> -58 </td> <td style="text-align:right;"> -56 </td> <td style="text-align:right;"> 38 </td> <td style="text-align:right;"> -34 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 66 </td> <td style="text-align:right;"> -52 </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> -10 </td> <td style="text-align:right;"> -34 </td> <td style="text-align:right;"> -42 </td> </tr> <tr> <td style="text-align:right;"> -29 </td> <td style="text-align:right;"> -8 </td> <td style="text-align:right;"> -56 </td> <td style="text-align:right;"> -20 </td> <td style="text-align:right;"> 99 </td> <td style="text-align:right;"> 24 </td> <td style="text-align:right;"> -30 </td> <td style="text-align:right;"> -1 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 46 </td> <td style="text-align:right;"> -11 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 67 </td> <td style="text-align:right;"> -17 </td> <td style="text-align:right;"> -48 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> -62 </td> <td style="text-align:right;"> -86 </td> </tr> <tr> <td style="text-align:right;"> -34 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> -36 </td> <td style="text-align:right;"> -24 </td> <td style="text-align:right;"> -28 </td> <td style="text-align:right;"> -9 </td> <td style="text-align:right;"> -13 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> -3 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:right;"> -63 </td> <td style="text-align:right;"> -28 </td> <td style="text-align:right;"> -18 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 92 </td> <td style="text-align:right;"> 28 </td> <td style="text-align:right;"> -94 </td> <td style="text-align:right;"> -25 </td> </tr> <tr> <td style="text-align:right;"> -96 </td> <td style="text-align:right;"> -10 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> -56 </td> <td style="text-align:right;"> 26 </td> <td style="text-align:right;"> 93 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> -90 </td> <td style="text-align:right;"> 62 </td> <td style="text-align:right;"> -19 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> -27 </td> <td style="text-align:right;"> -67 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -19 </td> <td style="text-align:right;"> -46 </td> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 48 </td> </tr> <tr> <td style="text-align:right;"> -10 </td> <td style="text-align:right;"> -89 </td> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> -31 </td> <td style="text-align:right;"> -45 </td> <td style="text-align:right;"> 98 </td> <td style="text-align:right;"> -56 </td> <td style="text-align:right;"> -48 </td> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 98 </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> -32 </td> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 68 </td> <td style="text-align:right;"> -2 </td> <td style="text-align:right;"> -99 </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> 66 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> -80 </td> </tr> <tr> <td style="text-align:right;"> -63 </td> <td style="text-align:right;"> -52 </td> <td style="text-align:right;"> 54 </td> <td style="text-align:right;"> -55 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> -49 </td> <td style="text-align:right;"> 92 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> -54 </td> <td style="text-align:right;"> -95 </td> <td style="text-align:right;"> -73 </td> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> -71 </td> <td style="text-align:right;"> -61 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 52 </td> <td style="text-align:right;"> -1 </td> <td style="text-align:right;"> 8 </td> </tr> </tbody> </table> </div> --- # Summary numbers It would be nice to reduce the big set of numbers down to a few numbers that we can look at and make sense of. **Sameness (Central Tendency)** - What are all the numbers close to? **Differentness (Variance)** - How different are the numbers? --- # Descriptive Statistics - Give us summaries of big sets of numbers - Useful single numbers to look at - They tell us about patterns of sameness and differentness --- class: pink, center, middle, clear # Graph the numbers to get a better look --- # Dot plot (unordered) Graphing the numbers gives a quick and dirty sense of what they are like. Here's 200 numbers presented as dots <img src="2-Descriptives_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # Dot plot (ordered) Sorting the numbers from smallest to largest <img src="2-Descriptives_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- # Histograms Histograms count up the numbers inside specific ranges <img src="2-Descriptives_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- # Histograms Bars show you which bins have more or less numbers in the range <img src="2-Descriptives_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- # So what are these numbers like? What single number would you say best describes most of these numbers? <img src="2-Descriptives_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- # Question Is the red or blue value a better summary of all the numbers? <img src="2-Descriptives_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- class: pink, center, middle, clear # Measures of Central Tendency --- # Central Tendency 1. **Central tendency** should describe what most of the data is like -- 2. We want our summary number to be most like the other numbers. We want it to be a **representative value** -- 3. There are **multiple measures** of central tendency -- 4. They have **different properties** -- 5. Some work better than others depending on the data --- class: pink, center, middle, clear # Mode --- # Mode The mode is the single most frequently occuring number > 1 1 2 2 3 4 5 6 7 7 7 7 7 - The mode is 7 because 7 happens the most - Find the mode by counting the occurence of each number, the mode is the most frequently occuring number - If there is a tie, then you have two or three or more modes (depends on how many different numbers tie) --- # Finding the Mode in R We make 25 numbers, how do we get R to find the mode? ```r #make some numbers a <- round(rnorm(n=25, mean=24, sd=5)) ``` --- --- --- --- --- 22 22 30 30 27 23 27 15 20 19 26 20 17 21 28 28 21 33 18 26 28 21 25 24 21 --- --- --- --- --- --- # Finding the Mode in R `table` function automatically counts the occurence of each number ```r table(a) ``` ``` ## a ## 15 17 18 19 20 21 22 23 24 25 26 27 28 30 33 ## 1 1 1 1 2 4 2 1 1 1 2 2 3 2 1 ``` We can see that 21 occurs the most --- # Custom function for the mode in R You can always write your own function for the mode. This one is called `my_mode` ```r my_mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } my_mode(a) ``` ``` ## [1] 21 ``` --- # Thinking about the mode 1. Tells us the most frequent number(s) -- 2. Is it representative of all the numbers? -- 3. When would the mode be a good thing to know? --- class: pink, center, middle, clear # Median --- # Median The median is the middle number > 1 1 2 2 3 4 **5** 6 7 7 7 7 7 - The median is 5 because it is the middle number - Find the median by ordering the numbers from smallest to largest, then take the number in the middle --- # Median (even number of numbers) If there are an even number of numbers, find the two in the middle, and > 1 2 3 **4** **5** 6 7 8 - The median is 4.5 because, 4.5 is in between the two middle numbers --- # Finding the Median in R Put some numbers in a variable. The `c()` function combines numbers. ```r #make some numbers a <- c(1,1,2,2,3,4,5,6,7,7,7,7,7) ``` --- --- --- --- --- 1 2 5 7 7 1 3 6 7 1 2 4 7 7 1 --- --- --- --- --- ```r median(a) ``` ``` ## [1] 5 ``` --- # median() R has a median function. ``` median(my_variable) ``` The median function will compute the median of a variable that contains numbers ```r a<-c(1,2,3,4,5,6,7) median(a) ``` ``` ## [1] 4 ``` --- # median() You can also put the numbers inside a median function this way using the `c()` function ```r median(c(1,2,3,4,5,6,7)) ``` ``` ## [1] 4 ``` --- # Thinking about the median 1. Tells us the number in the middle of the ordered numbers -- 2. Is it representative of all the numbers? -- 3. When would the median be a good thing to know? --- class: pink, center, middle, clear # Mean --- # Mean The Mean (also called average) is the sum of the numbers, divided by the number of numbers `\(\text{Mean} = \frac{\text{sum of numbers}}{\text{number of numbers}}\)` > 1 1 2 2 3 4 5 6 7 7 7 7 7 - Sum = 1+1+2+2+3+4+5+6+7+7+7+7 = 59 - Number of numbers = 13 - Mean = 59/13 = 4.538462 --- # Mean `\(\text{Mean} = \bar{X} = \frac{\sum_{i=1}^{i=N}{x_i}}{N}\)` - `\(\bar{X}\)` bar symbolizes the mean - `\(\sum_{i=1}^{i=N}{x_i}\)` Summation notation - `\(x\)` = all the numbers (1,2,3,4...) - `\(i\)` = an index value, representing the first to last and all the numbers in between of x. - `\(N\)` = the number of numbers - `\(\sum\)` = instruction to add up numbers --- # Summation example `\(x = 4,7,9\)` `\(\sum_{i=1}^{i=N}{x_i}\)` = `\(x_{i=1} = 4\)` + `\(x_{i=2} = 7\)` + `\(x_{i=3} = 9\)` `\(\sum_{i=1}^{i=N}{x_i} = 4 + 7 + 9 = 20\)` --- # Mean in a table <table> <thead> <tr> <th style="text-align:left;"> index </th> <th style="text-align:left;"> x </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 4 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 7 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 9 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 8 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:left;"> 30 </td> </tr> <tr> <td style="text-align:left;"> N </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> Mean </td> <td style="text-align:left;"> 6 </td> </tr> </tbody> </table> --- # The mean equally divides the sum <table> <thead> <tr> <th style="text-align:left;"> index </th> <th style="text-align:left;"> x </th> <th style="text-align:left;"> equal_parts </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 6 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 6 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 6 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> 6 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 6 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:left;"> 30 </td> <td style="text-align:left;"> 30 </td> </tr> <tr> <td style="text-align:left;"> N </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> Mean </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 6 </td> </tr> </tbody> </table> --- # The mean is the balancing point .pull-left[ - deviation = score minus mean - sum of deviations will always equal zero ] .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> index </th> <th style="text-align:left;"> x </th> <th style="text-align:left;"> deviations </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> -2 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> -4 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> 3 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:left;"> 30 </td> <td style="text-align:left;"> 0 </td> </tr> <tr> <td style="text-align:left;"> N </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> Mean </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 0 </td> </tr> </tbody> </table> ] --- # Finding the Mean in R Use the `mean()` function ```r #make some numbers a <- c(1,1,2,2,3,4,5,6,7,7,7,7,7) mean(a) ``` ``` ## [1] 4.538462 ``` --- # sum() and length() - `sum()` sums up the numbers - `length()` counts up the number of numbers in the variable ```r a<-c(1,2,3,4,5,6,7) sum(a) ``` ``` ## [1] 28 ``` ```r length(a) ``` ``` ## [1] 7 ``` --- # Mean = sum()/length() ```r a<-c(1,2,3,4,5,6,7) sum(a)/length(a) ``` ``` ## [1] 4 ``` --- # Thinking about the Mean 1. The mean divides the total sum into equal parts. -- 2. Is it representative of all the numbers? -- 3. When would the mean be a good thing to know? --- class: pink, center, middle, clear # Do descriptive statistics for central tendency actually describe the data? ## It depends on the data --- # Histogram shape: Bell-Shaped Mean (Red), Median (Green), Mode (Blue) <img src="2-Descriptives_files/figure-html/unnamed-chunk-24-1.png" width="450px" style="display: block; margin: auto;" /> --- # Right-skewed <img src="2-Descriptives_files/figure-html/unnamed-chunk-25-1.png" width="450px" style="display: block; margin: auto;" /> --- # Outliers Outliers are really big or really small values that are unusual compared to the rest of the data <img src="2-Descriptives_files/figure-html/unnamed-chunk-26-1.png" width="400px" style="display: block; margin: auto;" /> --- # Mean, Median, and outliers The mean is influenced by outliers, the median is not. <img src="2-Descriptives_files/figure-html/unnamed-chunk-27-1.png" width="400px" style="display: block; margin: auto;" /> --- # Zooming in The big number (2000) makes the mean really big, because it is included in the sum. <img src="2-Descriptives_files/figure-html/unnamed-chunk-28-1.png" width="400px" style="display: block; margin: auto;" /> --- class: pink, center, middle, clear # Always plot your data --- # Big ideas 1. Descriptive statistics help us reduce a large pile of numbers to a few numebrs that "describe the data" -- 2. Mode, median, mean, are descriptives for central tendency in the data (meant to represent what most of the numbers are like) -- 3. Measures of central tendency can be "off" by quite a bit depending on the shape of the data, need to look at data to see if they are appropriate --- # Next class: Variation 1. Today we looked measures of central tendency that capture "sameness" in the data 2. When data is variable (have different values), the measures of central tendency will always have some **error**. 3. Next class we look at descriptive statistics for summarizing the amount of **error** (the amount of differences in the data) --- # Reminder 1. Quiz 1 for last week is due tonight @ 11:59pm. If you do not complete the quiz, you will be given 0 points 2. Quiz 2 for this week begins Today, due next Monday end of day @ 11:59pm