Assignment

Your assignment is to create an R package containing at least two functions. You will also be uploading your R package to a new github repository, and learning how to share R packages through Github.

Steps

  1. Create a new R project in a new directory. Choose the option to create an R package.
  2. The R folder contains an example hello() function in a .r file. Add two new .r files, each containing one of your two functions.
  3. The man folder contains an .Rd file that provides an example for writing help documentation for the hello() function. Create two more .Rd files and write help documentation for your two new functions
  4. Build and load your package to make sure it is working locally.
  5. Create a new github repository on your github account. Do this online first. After you have created the new empty repository wait a couple seconds. You should then able to view the new repository in github desktop. Clone the repository to your local computer. Copy all of the R package files that you created into the github folder and push to the cloud.
  6. Edit the readme file on the github repository to give instructions about how someone can download and install the package. For an example, see the following github repository, which shows how a github repo containing an R package can be downloaded and installed using the console

https://github.com/CrumpLab/crumplabr

You can put any two (or more) functions into your R package. If you can’t think of something to do, then I suggest you write a standard deviation function, and a variance function, both for the population, not a sample. The sd() and var() functions in base R both use n-1 in the denominator by default. You would need to right your own functions that divide by N for the population standard deviation and variance.

Background

R is free open-source software. Many people have contributed to the development of R by creating a distributing packages. Packages are a way to collect and share R functions with other people.

About R packages

R-studio has a tab called Packages in the lower right panel. If you click on the packages tab you can see all of the packages that are currently installed.

Viewing package contents

You can click the package name in blue to view the contents of the package. You should see some links to documentation, and a list of individual functions, this will show up in the help menu. You can click individual functions to view their help document.

Loading a package

In order to use the functions in a package, you need to load the package using the library() function.

For example, if you have the ggplot2 library installed, you can load it using:

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4

Installing a package from CRAN

You must first install packages before the library command will load the package. Ensure you are connected to the internet. Then, click install under the Packages tab. Generally, you will be installing from CRAN. This is an online repository of R packages. Type the name of the package, and if it is available, you should see the name begin to autocomplete. It is generally a good idea to click “install dependencies”, this will install additional packages that may be required.

Other tidbits

  1. The R packages you install are stored with your version of R. Remember that R is a separate program from R-studio. R-studio runs on top of R, and acts as a GUI (graphic user interface) for R. As a result, if you delete or re-install R (something you should do once in a while especially to upgrade your version of R), then all of the previous packages you had installed will be deleted, and you will have to re-install them.

  2. The library() function loads the package into R’s memory. You can check the package is loaded by looking at the list of packages. You should see a checkmark indicating the package is loaded. You can also load a package directly without using library, just by clicking the box next to the package.

Writing Functions help

In this assignment you will be asked to write two custom functions. They can do whatever you want them to do. Here are some reminders about writing functions in R

Basic function syntax

my_function <- function(input){
  #body of code
  return(output)
}

my_function is the name of the function. R knows you are writing a function because the function(){} command is called. Inputs to the function fo inside the (). The body of the function is placed between {}. The return() function is used inside in order to output the results of the function.

example, add 1 to an input

# write the function
add_one <- function(x){
  return(x+1)
}

# test the function
add_one(5)
## [1] 6

Notice the body of the function only contains one line. This is a simple function, so we can return the answer without much additional work. Here is the same function with some extra commenting and steps.

add_one <- function(x){
  # add one to x, and save the output
  save_result <- x+1
  # output the contents of save_result
  return(save_result)
}

It is also possible to write everything on one line. Intriguingly, the {} are not necessary here:

add_one <- function(x) return(x+1)

It is good practice to write functions so they are clear and easy to read. Use comments when necessary, but if the code is self-explanatory, then don’t use them

no inputs

Functions don’t need inputs to return outputs. For example, let’s say you wanted a custom function to roll a dice, and get a random number from one to 6.

# the following code returns a randomly sampled value from 1 to 6
sample(1:6,1)
## [1] 4
# we can put this in a function
roll_dice <- function(){
  return(sample(1:6,1))
}

# run the function with no input
roll_dice()
## [1] 2

What is wrong with this function?

Let’s say you have some numbers, and you want to write a function to get the sum. What is wrong with the function below?

my_numbers <- c(1,4,3,4,5,6)

my_sum <- function(x){
  get_sum <- sum(my_numbers)
  return(get_sum)
}

multiple inputs

Functions can take multiple inputs. Let’s say you want to find the mean of numbers between a range (a minimum and maximum value). You will need three inputs one to define the input of numbers, and then two to define a min and max.

ranged_mean <- function(x, min_val, max_val){
  restricted_values <- x[x>min_val &
                           x < max_val]
  return(mean(restricted_values))
}

some_numbers<-c(3,4,3,2,3,4,5,6,7,8,8,8,9,8)
ranged_mean(some_numbers,2,4)
## [1] 3

multiple outputs

A function can output multiple values and variables. From the above, let’s say you wanted to output three kinds of information.

  1. The origingal set of numbers
  2. The restricted set of numbers that are within the range
  3. The mean of the restricted set of numbers

Inside our function we can collect all of this information in a list, and the return the list

ranged_mean2 <- function(x, min_val, max_val){
  restricted_values <- x[x>min_val &
                           x < max_val]
  outputs <- list(original_values = x,
                  restricted_values = restricted_values,
                  restricted_mean = mean(restricted_values))
  return(outputs)
}

some_numbers<-c(3,4,3,2,3,4,5,6,7,8,8,8,9,8)
ranged_mean2(some_numbers,2,4)
## $original_values
##  [1] 3 4 3 2 3 4 5 6 7 8 8 8 9 8
## 
## $restricted_values
## [1] 3 3 3
## 
## $restricted_mean
## [1] 3
# putting the results into a separate value, and accessing the parts

stored_answer <- ranged_mean2(some_numbers,2,4)

# return everything
stored_answer
## $original_values
##  [1] 3 4 3 2 3 4 5 6 7 8 8 8 9 8
## 
## $restricted_values
## [1] 3 3 3
## 
## $restricted_mean
## [1] 3
# return original values
stored_answer$original_values
##  [1] 3 4 3 2 3 4 5 6 7 8 8 8 9 8
# return restricted values
stored_answer$restricted_values
## [1] 3 3 3
# return restricted mean
stored_answer$restricted_mean
## [1] 3