Chapter 10 “psych” Package

Author: Zoren Degtyarev

10.1 Procedures for Psychological, Psychometric, and Personality Research

10.2 Introduction

A sim.anova is a simulation function that falls under an all incompasing package called “psych.” The sim.anova function enables one to simulate a three way ANOVA and generate a matrix design which will include three independent variables (independent variable 1,2 and 3). The simulated independent variables can have different number of levels as well as effect sizes for every main effect as well as interaction. The sim.anova function can also simulate one or multiple dependent variables. Since psychological research requires lots of time as well as resources, the sim.anova function is a very useful tool that enables a researcher to simulate and inspect data prior to conducting an actual experiment. In this tutorial I will demonstrate how to generate an analysis of variance with three independent variables, I will also demonstrate how to produce an output of an ANOVA table, and lastly how to use a ggplot2 package to plot the means.

10.3 Install (“psych”)

The first step to simulating data is to make sure that you have installed a “psych” package. This can be done in the packages window.

10.3.1 Creating simulated data for three way ANOVA

The fist step in creating a simulated data for 3-way ANOVA is to create a data frame function (data.df), this is where your sim.anova function will be placed (data.df <- sim.anova). Since the sim.anova function can contain different arguments such as the effect size as well as different levels of independent variables, the first step is to identify these different arguments. The effect sizes are identified as “es1” “es2” and es3 for the effect size of the first, second and third independent variable. The argument n stands for the sample size in each cell. The argument n1, n2 and n3 stand for numbers of levels in the independent variables one, two and three. By default the sim.anova function sets the effect sizes to zero, and levels in each independent variables to 2. All simulations will be a between subject designs. If you are interested in the

library(psych)
data.df <- sim.anova()  
describe(data.df)
##      vars  n mean   sd median trimmed  mad   min  max range skew kurtosis
## IV1*    1 16  1.5 0.52   1.50    1.50 0.74  1.00 2.00  1.00  0.0    -2.12
## IV2*    2 16  1.5 0.52   1.50    1.50 0.74  1.00 2.00  1.00  0.0    -2.12
## IV3*    3 16  1.5 0.52   1.50    1.50 0.74  1.00 2.00  1.00  0.0    -2.12
## DV      4 16  0.0 1.07   0.13   -0.03 1.05 -1.64 2.07  3.71  0.1    -1.15
##        se
## IV1* 0.13
## IV2* 0.13
## IV3* 0.13
## DV   0.27

In the above example we generated a 3 way ANOVA, where by default all of the effect sizes are zero for all independent variables and by default there are 2 levels in each independent variable. Also by default the sample size per cell is 2, since there are a total of 8 cells the n=16 as seen above.

data.df <- sim.anova(es1=1, es2 = 0, es3 = 3)  
describe(data.df)
##      vars  n mean   sd median trimmed  mad   min  max range skew kurtosis
## IV1*    1 16 1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00 0.00    -2.12
## IV2*    2 16 1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00 0.00    -2.12
## IV3*    3 16 1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00 0.00    -2.12
## DV      4 16 0.04 3.21  -0.32    0.04 3.50 -5.32 5.45 10.77 0.12    -1.32
##        se
## IV1* 0.13
## IV2* 0.13
## IV3* 0.13
## DV   0.80

In the above example we manipulated the effect sizes for independent variabl 1,2 and 3

data.df <- sim.anova(es1=1, es2 = 0, es3 = 3, es12 = 2, es13 = 2)  
describe(data.df)
##      vars  n  mean   sd median trimmed  mad   min  max range skew kurtosis
## IV1*    1 16  1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00 0.00    -2.12
## IV2*    2 16  1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00 0.00    -2.12
## IV3*    3 16  1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00 0.00    -2.12
## DV      4 16 -0.38 4.41  -0.41   -0.54 4.24 -6.99 8.43 15.42 0.39    -0.89
##        se
## IV1* 0.13
## IV2* 0.13
## IV3* 0.13
## DV   1.10

In the above example we also manipulated the effect sizes of the interaction between independent variable 1 and 2 (es12), as well as the effect size for interaction between IV1 and IV2

data.df <- sim.anova(es1=1, es2 = 0, es3 = 3, es12 = 2, es13 = 2, n=3)  
describe(data.df)
##      vars  n  mean   sd median trimmed  mad   min  max range skew kurtosis
## IV1*    1 24  1.50 0.51   1.50    1.50 0.74  1.00 2.00  1.00  0.0    -2.08
## IV2*    2 24  1.50 0.51   1.50    1.50 0.74  1.00 2.00  1.00  0.0    -2.08
## IV3*    3 24  1.50 0.51   1.50    1.50 0.74  1.00 2.00  1.00  0.0    -2.08
## DV      4 24 -0.13 4.13  -0.11   -0.27 4.70 -6.58 7.64 14.22  0.3    -0.96
##        se
## IV1* 0.10
## IV2* 0.10
## IV3* 0.10
## DV   0.84

The above example was manipulated to have a cell sample size equal to 3 (n=3)

data.df <- sim.anova(es1=1, es2 = 0, es3 = 3, es12 = .5, es13 = 2, n1=3, n2=4, n3=5)  
describe(data.df)
##      vars   n mean   sd median trimmed  mad   min   max range skew
## IV1*    1 120 2.00 0.82   2.00    2.00 1.48  1.00  3.00  2.00 0.00
## IV2*    2 120 2.50 1.12   2.50    2.50 1.48  1.00  4.00  3.00 0.00
## IV3*    3 120 3.00 1.42   3.00    3.00 1.48  1.00  5.00  4.00 0.00
## DV      4 120 0.13 3.99  -0.31   -0.06 3.33 -8.01 10.46 18.47 0.46
##      kurtosis   se
## IV1*    -1.52 0.07
## IV2*    -1.39 0.10
## IV3*    -1.33 0.13
## DV      -0.06 0.36

By manipulating the levels in each independent variable in the above example to n1=3, n2=4, n3=5; we can see that the sample size in each independent variable increased to 120.

10.4 Producing an ANOVA table

It is very useful to see the results of the 3 way ANOVA simulation in the table. To do so in the code below we will create an ANOVA tabble by using the “aov” or the “lm” function. Let’s take a look at an example below

summary(aov(DV~IV1*IV2*IV3,data=data.df)) 
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## IV1          2  108.6   54.31  62.895 1.88e-15 ***
## IV2          3    0.8    0.27   0.317 0.812892    
## IV3          4 1201.0  300.24 347.710  < 2e-16 ***
## IV1:IV2      6   26.3    4.38   5.077 0.000286 ***
## IV1:IV3      8  469.1   58.64  67.910  < 2e-16 ***
## IV2:IV3     12   12.8    1.07   1.237 0.280408    
## IV1:IV2:IV3 24   21.1    0.88   1.017 0.461067    
## Residuals   60   51.8    0.86                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the above example we first incorporated a “summary” function, which is used to generate summaries of various statistical models, the “aov” function enables one to generate a succinc description of the ANOVA table which includes degrees of freedom, sum of squares, mean squares, F values, and the probabilities of the F values. The table summary is very usefull as one can inspect which independent variables as well as interactions are statistically significant.

summary(lm(DV~IV1*IV2*IV3,data=data.df)) #compare with the aov model
## 
## Call:
## lm(formula = DV ~ IV1 * IV2 * IV3, data = data.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6560 -0.3548  0.0000  0.3548  1.6560 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.66427    0.65707  -2.533 0.013946 *  
## IV10             -3.49814    0.92923  -3.765 0.000382 ***
## IV11             -5.90623    0.92923  -6.356 3.10e-08 ***
## IV2-1            -0.80378    0.92923  -0.865 0.390488    
## IV21              0.20407    0.92923   0.220 0.826916    
## IV23             -1.21676    0.92923  -1.309 0.195382    
## IV3-1             0.26894    0.92923   0.289 0.773253    
## IV30              1.44410    0.92923   1.554 0.125427    
## IV31              1.96471    0.92923   2.114 0.038651 *  
## IV32              1.38202    0.92923   1.487 0.142179    
## IV10:IV2-1        0.86154    1.31413   0.656 0.514593    
## IV11:IV2-1        1.36088    1.31413   1.036 0.304558    
## IV10:IV21         1.60769    1.31413   1.223 0.225968    
## IV11:IV21         0.03478    1.31413   0.026 0.978971    
## IV10:IV23         1.63626    1.31413   1.245 0.217927    
## IV11:IV23         3.84153    1.31413   2.923 0.004880 ** 
## IV10:IV3-1        3.48285    1.31413   2.650 0.010268 *  
## IV11:IV3-1        4.12221    1.31413   3.137 0.002646 ** 
## IV10:IV30         3.58240    1.31413   2.726 0.008389 ** 
## IV11:IV30         7.80727    1.31413   5.941 1.54e-07 ***
## IV10:IV31         6.12665    1.31413   4.662 1.79e-05 ***
## IV11:IV31        10.25065    1.31413   7.800 1.07e-10 ***
## IV10:IV32         8.54586    1.31413   6.503 1.75e-08 ***
## IV11:IV32        14.29862    1.31413  10.881 7.95e-16 ***
## IV2-1:IV3-1       1.30243    1.31413   0.991 0.325620    
## IV21:IV3-1       -0.80996    1.31413  -0.616 0.539997    
## IV23:IV3-1       -0.46193    1.31413  -0.352 0.726438    
## IV2-1:IV30        1.45008    1.31413   1.103 0.274237    
## IV21:IV30        -1.72755    1.31413  -1.315 0.193650    
## IV23:IV30         1.12272    1.31413   0.854 0.396316    
## IV2-1:IV31       -0.43237    1.31413  -0.329 0.743290    
## IV21:IV31        -0.52972    1.31413  -0.403 0.688314    
## IV23:IV31        -0.40268    1.31413  -0.306 0.760345    
## IV2-1:IV32        1.83402    1.31413   1.396 0.167975    
## IV21:IV32        -0.45365    1.31413  -0.345 0.731145    
## IV23:IV32         0.15269    1.31413   0.116 0.907891    
## IV10:IV2-1:IV3-1 -2.25221    1.85847  -1.212 0.230315    
## IV11:IV2-1:IV3-1 -1.30386    1.85847  -0.702 0.485655    
## IV10:IV21:IV3-1  -0.94391    1.85847  -0.508 0.613387    
## IV11:IV21:IV3-1   1.63604    1.85847   0.880 0.382199    
## IV10:IV23:IV3-1  -1.62986    1.85847  -0.877 0.383990    
## IV11:IV23:IV3-1  -1.96050    1.85847  -1.055 0.295701    
## IV10:IV2-1:IV30  -1.20641    1.85847  -0.649 0.518725    
## IV11:IV2-1:IV30  -1.52705    1.85847  -0.822 0.414518    
## IV10:IV21:IV30    0.78631    1.85847   0.423 0.673739    
## IV11:IV21:IV30    1.34731    1.85847   0.725 0.471297    
## IV10:IV23:IV30   -0.83140    1.85847  -0.447 0.656228    
## IV11:IV23:IV30   -3.86763    1.85847  -2.081 0.041702 *  
## IV10:IV2-1:IV31  -1.23386    1.85847  -0.664 0.509290    
## IV11:IV2-1:IV31   0.59493    1.85847   0.320 0.749992    
## IV10:IV21:IV31   -1.36971    1.85847  -0.737 0.463989    
## IV11:IV21:IV31    0.09510    1.85847   0.051 0.959360    
## IV10:IV23:IV31   -0.65619    1.85847  -0.353 0.725264    
## IV11:IV23:IV31   -0.16497    1.85847  -0.089 0.929561    
## IV10:IV2-1:IV32  -2.43409    1.85847  -1.310 0.195280    
## IV11:IV2-1:IV32  -3.01053    1.85847  -1.620 0.110499    
## IV10:IV21:IV32   -0.98478    1.85847  -0.530 0.598145    
## IV11:IV21:IV32    1.23285    1.85847   0.663 0.509634    
## IV10:IV23:IV32   -0.49529    1.85847  -0.267 0.790763    
## IV11:IV23:IV32   -0.45313    1.85847  -0.244 0.808202    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9292 on 60 degrees of freedom
## Multiple R-squared:  0.9726, Adjusted R-squared:  0.9457 
## F-statistic: 36.11 on 59 and 60 DF,  p-value: < 2.2e-16

The “aov” function is not the only function to generate an ANOVA table. The “aov” function is a subset of an “lm” function. As seen in the above example the “lm” function gives more comprehensive breakdown of the the 3 way anova simulation.

10.5 Graphing the means of using ggplot2 function

ggplot2 function is a very usefull function as it enabbles one to visualize your data. The examples below will illustrate how to graph the means of the independent variables.

The first step in graphing the means is to first isolate the column under the title “mean”. To do we have to use the “describe” function, which summarizes basic statistics. By using square brackets [,] we can isolate the exact column in our data frame. In the example below we are interested in means which fall under the 3rd column. by using [,3] we are able to isolate the 3rd column

describe(data.df)[,3]
##  
## 
##  NA  Variables      13  Observations
## ---------------------------------------------------------------------------
## Error: $ operator is invalid for atomic vectors

The example above illustrates the generated means. The next step is to insert the means into a new variable, in the example below that variable is “means_variable”

means_variable <- describe(data.df)[,3]

The example above created our new variable with the independent variable means as well as dependent variable mean, and is stored in the ’Global Environment

In the next step we are ready to graph the means using the ggplot2 function. The first step is to install the ggplot2 package. This can be accomplished by either installing the ggplot2 in the “Packages” tab on your right or by using install.packages(“ggplot2”) function. Installing the ggplot2 can also be done by using the function install.packages(“tidyverse”), both functions will install the package for you.

As seen in the example below your line of code should always start with library(ggplot2)

The next step is to encode your vactor as a factor,in the example below, it is c(“IV1”,“IV2”,“IV3”,“DV”), followed by putting this factor into a new function, in the example below called “variables”.

Next, a new variable “Means” has to be created. This function will hold the means that were isolated earlier in the data frame and put in the function “means_variable”

Finally the variable plot_df is created in order to hold your data frame with IV’s and DV: “data.frame”

Now we are ready to plot the basic bar graph by using ggplot function, used to declare your input data frame which includes a set of plot common aesthetics to be graphed. The hight of the bars, position and color of your bars and outline of the bars can be modified to your specifiations.

library(ggplot2)
variables <- rep(as.factor(c("IV1","IV2","IV3","DV"))) 
Means <- means_variable

plot_df <- data.frame(variables,
                  
                      Means)
## Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors): cannot coerce class '"describe"' to a data.frame
# basic bar graph

ggplot(plot_df, aes(x=variables,y=Means, 
                   ))+
  geom_bar(stat="identity", 
           position="dodge",
           color="red")
## Error in ggplot(plot_df, aes(x = variables, y = Means, )): object 'plot_df' not found

THe above example shows a basic bar graph of the IV’s and DV of the simulated ANOVA

10.5.1 Another way to graph the means is to pull the actual numerical means from the “means_variable”

In the example below the numerical mean values are manually inserted into the “Means” variable

library(ggplot2)
variables <- rep(as.factor(c("IV1","IV2","IV3","DV"))) 
Means <- c(2, 2.5, 3, -0.107)

plot_df <- data.frame(variables,
                  
                      Means)

# basic bar graph

ggplot(plot_df, aes(x=variables,y=Means, # aestetics definition has gotten 
                   # color=factor_two))+  # color here means the border collor
                   ))+
  geom_bar(stat="identity", 
           position="dodge",
           color="red")

10.6 Summary

sim.anova is part of an all encompasing package called “psych”. It is a very useful function in psychological research, used for simulating a 3 way ANOVA with an arbitrary amounts of sample sizes, effect sizes as well as levels in each independent variable. In addition this function, the simulated data can be used to create an ANOVA table using an “aov” or “lm” function. Furthermore, by using the ggplot2 function the statistical data such as the means of the variables can be depicted graphically.

10.7 Citations

Revelle, W. (n.d.). Psych. Retrieved June 24, 2019, from https://www.rdocumentation.org/packages/psych/versions/1.8.12/topics/sim.anova

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, http://ggplot2.org.