2 Getting Started with R

2.1 The RStudio interface

The first screen we see in R should look like this:

The first screen we see when we open RStudio

Here, there are three main window panes. The “Console” window is where we type the code to tell the computer to do stuff.

The “Environment-History-Connections” window has three tabs. “Environment” shows the objects that we save during our session (I will explain this soon); “History” shows a record of all the code we ask the computer to run; and “Connections” (which we will not use here) shows us the connections we have to remote databases.

The “Files-Plots-Packages” window has several tabs. “Files” shows us all the files in our current working directory, which is the default location where R will look for files we want to load and where it will put any files we save. “Plots” will display the plots we make in R. “Packages” shows us the packages installed in our computer and whether they are loaded in our current session. “Help” allows us to search and read the documentation for packages and functions. “Viewer” can display content that usually belongs to a web browser (i.e., html files).

We can tell R to do stuff by typing commands in the line that begins with > in the console window. This line is called “command prompt”. Let’s begin with something simple:

1 + 1

[1] 2

The [1] that appears next to the result informs us that the line begins with the first value in the result. This information is helpful when we run commands that produce multiple values, like in this case:

80:110

 [1]  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98
[20]  99 100 101 102 103 104 105 106 107 108 109 110

If we type an incomplete command, R will assume that we are still writing and will show a +. This + is not an arithmetical operator; it is a prompt for us to continue writing in the next line and will not go away unless we complete the command or we press the “Escape” key. For example:

> 42 -
+
+
+ 1
[1] 41

If R does not understand a command, it will display an error message.

> 7 % 2
Error: unexpected input in "7 % 2"
>

In this case, what I meant was

7 %% 2

[1] 1

2.2 Using R as a calculator

R can perform basic arithmetic operations. Type the following expressions at the command prompt (the line that begins with >) in the R console window.

5 + 3

[1] 8

5 - 3

[1] 2

5*3

[1] 15

5/3

[1] 1.666667

R can also do exponentiation with ^¹; modulus, also known as remainder from division, with %%; and integer division with %/%.

If our math is amiss, R will inform us (but not necessarily with an error message):

0/0

[1] NaN

-1/0

[1] -Inf

R also has logical operators that will return “TRUE” or “FALSE”.

5 == 6

[1] FALSE

5 != 6

[1] TRUE

5 < 6

[1] TRUE

The logical operators “and” (&) and “or” (|) allow us to combine multiple logical statements:

(5 < 6) & (7 == 8)

[1] FALSE

(5 < 6) | (7 == 8)

[1] TRUE

Practice

Think of an integer, double it, add six, divide it in half, subtract the number you started with, and then square it. If your code is correct, the answer should be nine.

Be careful with comparisons and floating point arithmetic:

(.1 + .2) == .3

[1] FALSE

(.1 + .2) - .3

[1] 5.551115e-17

This imprecision is not R’s fault. All computers have limited space to store numbers and decimal positions, so they often use numbers with “hidden” decimals even when it looks like they use round numbers. For example, what looks like a 3 may actually be 3.000005. You can find more information about this imprecision in FAQ (7.31).

Being able to do these computations is nice, but what happens if we want to use a result later without redoing or retyping everything? R can save numbers (and more) using something called “objects”.

2.3 Objects

In simple terms, an object is a named piece of information that we can access whenever we want. A single number, a word, a plot, and a set of instructions can all be objects if we name them. To create an object, we must first choose a name, then type an arrow <- (this is a “lower than” symbol < followed by a “minus” sign -), and then write the value that we want to save.

my_object <- 42

We can also use the “equal to” sign = instead of the arrow.

my_object = 42

In this context, both <- and = are acting as “assignment operators” because they assign the value 42 to the name my_object. In R, the symbol = has several common uses, so if you want to avoid confusion, I suggest you use <-. But you can pick whichever symbol you like the most as long as you are consistent.

Now the environment window shows that there is an object called my_object, which we can access at anytime by typing its name.

my_object

[1] 42

my_object has a number as its value, so we can do mathematical operations with it.

my_object + 7

[1] 49

my_object^my_object

[1] 1.501309e+68

We can also create objects based on other objects.

a <- 10
b <- a
a

[1] 10

[1] 10

Be careful: if we try to reuse a name, R will overwrite the previous object without notifying us in any way.

c <- 2
c

[1] 2

c <- 711
c

[1] 711

However, if an object is the copy of another object, overwriting the original does not affect the copy; and overwriting the copy does not affect the original.

a <- 10
b <- a
b

[1] 10

a <- 5
b

[1] 10

We can name our objects in almost any way we want. The only rules are:

Names can be a combination of letters, digits, periods . and underscores _.
Names cannot include white spaces.
If a name starts with a period ., it cannot be followed by a digit.
Names cannot start with a number or an underscore _.
Names are case sensitive (for example, age, Age and AGE are three different objects).
Reserved words (like TRUE, FALSE, NULL, if, …) cannot be used as names.

Also, here are a few suggestions that can save you many hours of frustration:

Avoid giving your object the same name as a built-in function.
If you need to create objects with multiple words in their name, separate them with an underscore (my_object) or a dot (my.object), or capitalize the different words (MyObject). Choose whichever format you like most and be consistent with it.
Use informative names. Writing names like x or value1 is quick and easy—but it is also vague. Your code will be clearer if your objects have names that describe their contents and purpose. Your colleagues and your future self will really appreciate it².

2.4 Using functions

R has many built-in functions to do anything from basic math to text manipulation to advanced statistical models. R thinks of functions as objects that store commands that we can apply to an input. For example, we can input a number to round it, or to calculate its square root, its factorial, or its natural logarithm. To use a function, we must write the name of a function followed by the input we want it to use inside a pair of round brackets.

round(2.1415)
sqrt(9)
factorial(7)
log(100)

The pieces of information inside the round brackets are called “arguments”. Some functions, like sqrt() and factorial(), only use one argument (the number that we want to use). But other functions, like round() and log(), can take more arguments (separated by commas) to modify their behavior. For example, round() accepts a second argument that corresponds to the number of decimal places that we want to keep.

round(2.1415, 0)

[1] 2

round(2.1415, 2)

[1] 2.14

The order of these arguments is important (what happens if we try to run round(2, 2.1415)?) Fortunately, arguments have names that we can use to specify which data to use in each case. So, we can write

round(x = 2.1415, digits = 3)

[1] 2.142

Even better, we can pass arguments in any order as long as we name them all.

round(digits = 3, x = 2.1415)

[1] 2.142

Note that using the wrong name in a function will likely yield an error message.

round(x = 2.1415, basket = 3)

Error in round(x = 2.1415, basket = 3): unused argument (basket = 3)

A quick way to check what arguments we can use is to invoke args(), which takes the name of a function as its argument:

args(round)

function (x, digits = 0, ...) 
NULL

The output shows that round() has two arguments: x, which is the number we want to round; and digits, which determines how many decimal places to keep. The output also shows that digits is already set to zero. This means that round will use 0 as the default value for digits, i.e., it will not keep any decimal places. We can omit arguments with default values and R will automatically fill them for us in the background.

I encourage you to name all the arguments you define when using a function, or at least the ones after the first argument. It is hard to remember all the arguments that can go into more complicated functions, and it is even harder to remember them in the correct order. It is easy to accidentally switch unnamed arguments and get wrong results without even noticing. This is why naming the arguments will prevent errors and clarify the function’s behavior.

We can also run functions inside other functions. This works because R first runs the innermost function and then uses the result in the outermost function.

sqrt(round(16.483, digits = 0))

[1] 4

The functions we have used are simple and intuitive to use. But sometimes we need to know more about a function before we can use it. Where can we find explanations to help us?

2.5 Getting help

The help() function allows us to access R’s built-in help information on any function. For example, to open the help page for round(), we do

help(round)

A shorter way of writing this is to use ? before the name of the function.

?round

After you run the code, the help page is displayed in the “Help” tab in the “Files-Plots-Packages” pane (usually in the bottom right of RStudio).

As a novice user, help pages may seem arcane—perhaps because they aim for shortness and use technical vocabulary. But this short vocabulary makes (most of) the explanations precise, so we can use the information we need without having to read the entire document. Also, all help pages are organized similarly, so we don’t have to relearn how to navigate them. With a bit of practice, you will be able to find exactly what you need in mere seconds.

The first line of the help document displays the name of the function and the package that contains the function. Other sections are:

Description: summarizes the function.
Usage: names the arguments associated with the function and their possible default values.
Arguments: describes the purpose and format of each input.
Details: explicates what the function does and how it does it.
Value: if applicable, describes the type and structure of the object that the function returns.
See Also: suggests other help pages with similar or related content.
Examples: shows code to illustrate how to use the function. We can copy and paste examples in the console, and we can access them at any time by using the example() function (e.g., example("round")).

How to get started with help pages?

Start by reading the help pages of functions that you already understand. This will teach you how to understand the structure of the pages and will familiarize you with the jargon. As you use R you will likely need other, more complicated functions, so reading more help pages will happen almost naturally.

Keep in mind that help pages explain the code, not the underlying concepts. If you don’t know what it means to round a number, reading the documentation for round() will not help you.

help() is useful if we know the full name of the function. But if all we remember is a key word in the name, we can search through R’s help system using help.search()

help.search("round")

Note that in this case we have to use quotes " " around the name of the function. We can also use the shortcut ??

??round

As before, the ‘Help’ tab in RStudio will display the results of the search. help.search() looks for the pattern in the help documentation, code demonstrations, and package vignettes and displays the results as clickable links that we can follow.

Another useful function is apropos(), which lists all functions containing a specified character string. Note that apropos() returns names even when the word we used is only part of a larger word, like in “surround”. For example, to find all functions with round in their name, we use³

apropos("round")

[1] "round"        "round.Date"   "round.POSIXt"

Another useful function is RSiteSearch(), which allows us to search for keywords and phrases in function help pages and vignettes for all CRAN packages, and in CRAN task views. This way we can access the online search engine directly from the Console and display the results in our web browser.

RSiteSearch("regression")

Until now, the console satisfied all our coding needs. But if we wanted to reuse a command from before, we would have to either rewrite it from scratch, or look tediously through our command history in the “Environment-History-Connections” window. Besides, once we close this session in RStudio, all of our work will be gone—like tears in the rain. It would be much better if we wrote our code in a place where we can easily review and modify it.

2.6 Working with scripts

A script is a plain text file where we write our code. With a script we can edit, proofread, and reuse code; we can easily write multi-line code (which is cumbersome to do in the console); and we can save our code so we can come back to it later or share it with others. I think the best way to code in R is to use a script, and I strongly suggest you always use one.

To open a script, click on File > New File > R script in the menu bar on the upper left-hand side of the screen.

We can run a line of code in the script using the Run button in the upper right-hand side of the script window (see picture below). R will run whichever line of code your cursor is on. We can also run all of the code in the script by clicking the Source button that’s next to Run. Since I find clicking buttons slow, I use a keyboard shortcut to run a line of code: Control + Enter in Windows and Linux; Command + Enter in Mac.

To save your script, you can click on the blue square⁴ right below the tab with the script name. The first time you save your script, R will ask you to choose a name and a location to save your file in. I strongly suggest that you pick a name and a location that you will remember easily in the future.

Moving forward, I will write all my code in a script and will assume you are doing the same. Trust me, it’s for your own benefit.

Now that we know how to preserve our hard-gained code, we can do more elaborate work. Previously we worked with one number at the time. But we can also work with groups of numbers (and of other stuff) by using something called “vectors”.

2.7 Working with vectors

In R, an ordered group of numbers is called a vector. To create a vector of numbers, we need to use the function c() (short for “combine”). The arguments of c() are the numbers you want to use in the vector, in the order you want to use them.

my_vec <- c(5, 3, 7, 1, 1, 8) # This is a vector of numbers
my_vec

[1] 5 3 7 1 1 8

Comments

Did you see the text I added next to the code? R will ignore everything in a line that comes after a hashtag #, also known as commenting symbol. Comments can explain confusing chunks of code, or warn future users about potential problems.

Vectors can also contain other types of data, like words:

vector_of_words <- c("monday", "lemon")
vector_of_words

[1] "monday" "lemon"

Note that we must enclose words in quotation marks to let R know that we want to use “monday” and “lemon” as values and not as names for objects. See what happens if we forget the quotation marks:

vector_of_words <- c(monday, lemon)

Error: object 'monday' not found

Later on we will work more with vectors of words, but for now let’s focus on vectors of numbers.

2.7.1 Named vectors

We can name the elements of a vector if we add a name followed by an equal size before the value that we want to save:

my_named_vec <- c(tigers = 2, lions = 5)
my_named_vec

tigers  lions 
     2      5

Named vectors help us store more information in the same object. They also give us another way of accessing the elements inside the vector.

2.7.2 Operations with numerical vectors

R is a “vectorized” language, which means that it can often operate on an entire vector of numbers as easily as on a single number. All the logical and mathematical functions we used before work with vectors:

my_vec / 11

[1] 0.45454545 0.27272727 0.63636364 0.09090909 0.09090909 0.72727273

my_vec <= 7

[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

log(my_vec)

[1] 1.609438 1.098612 1.945910 0.000000 0.000000 2.079442

my_vec*my_vec

[1] 25  9 49  1  1 64

In the last example, R did not obey the rules of linear algebra to multiply two vectors. Instead, R used “element-wise execution”, which means that R applied the same operation to each element of the vector independently of the other elements.

To decide how to apply element-wise execution, R considers the length of the vectors, which refers to the number of elements inside them. When we use two vectors with the same length, R will line up the vectors and perform a sequence of individual operations. For instance, in my_vec*my_vec, R multiplies the first element of vector 1 by the first element of vector 2, then the second element of vector 1 by the second element of vector 2, and so on, until all elements are multiplied. The result will be a new vector with the same length as the first two.

When we use two vectors with different lengths, R will repeat the shorter vector until it has as many elements as the longer vector, and then it will do the same as before. For example:

my_vec * c(1, 2)

[1]  5  6  7  2  1 16

If the length of the short vector does not divide evenly into the length of the long vector, R will do an incomplete repeat of the shorter vector and return a warning.

my_vec * c(1, 2, 3, 4)

Warning in my_vec * c(1, 2, 3, 4): longer object length is not a multiple of
shorter object length

[1]  5  6 21  4  1 16

Repeating the numbers of the vector is known as “vector recycling”, and it helps R do element-wise operations.

Element-wise execution allows us to apply the same instruction to all elements in an object rather than one element at a time. When working with a data set, element-wise execution will ensure that values from one observation or case are only paired with values from the same observation or case. Element-wise execution also facilitates writing our functions in R.

R can do vector and matrix multiplications, but we have to ask for them explicitly. For example, to get the inner product, we need the operator %*%; and to get the outer product, we need %o%. Don’t worry if you are not familiar with matrix operations; you won’t need them in these notes.

2.7.3 Extracting elements

We can access specific elements of vectors using the square bracket [ ] notation. To use it, we first write the name of the vector we want to extract from, followed by the square brackets with an index of the element we wish to extract. This index can be a position, a logical statement, or a name (if available). To extract elements based on their position we simply write the position number inside the [ ]. For example, to extract the 3rd value of my_vec, we use

my_vec[3]

[1] 7

We can store this value in another object.

value_3 <- my_vec[3]

We can also extract multiple elements at the time by using a vector of indices inside the square brackets:

my_vec[c(1, 3, 5)]

[1] 5 7 1

Or we can use : notation. Remember that : helps us create a sequence of values. For example,

3:6

[1] 3 4 5 6

So,

my_vec[3:6]

[1] 7 1 1 8

Note

In R, the positional index starts at 1, so to call the first element of a vector we need to use [1]. In many other programming languages (like Python and C++), the positional index starts at 0.

If the elements of the vector have names, we can extract a value using its name (surrounded by quotes) instead of a positional index.

my_named_vec

tigers  lions 
     2      5

my_named_vec["lions"]

lions 
    5

Another convenient way to extract elements from a vector is to use a logical expression as an index. For example, to extract all elements greater than 4 from my_vec, we do

my_vec[my_vec > 4]

[1] 5 7 8

This works because R uses element-wise execution even for logical statements. So, my_vec > 4 asks if each item of my_vec meets the condition “greater than four” and returns the corresponding vector of TRUE and FALSE. Then, when we add this result to the square brackets, R examines each element of my_vec asking “should I extract this element?”. If the answer is TRUE, the value is extracted; if it’s FALSE, the value is ignored. Under the hood, using my_vec > 4 is equivalent to

my_vec[c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE)]

[1] 5 7 8

Two useful functions to extract values of a vector are head(), which extracts the first few values of a vector; and tail(), which gives us the last few values of a vector. Try them yourself!

2.7.4 Replacing elements

We can replace the elements of a vector by combining the square bracket notation with the assignment operator. For example, to replace the second element of my_vec, we do

my_vec[2] <- 99
my_vec

[1]  5 99  7  1  1  8

To replace multiple elements with the same value, say, elements 5 and 6, we do

my_vec[c(5, 6)] <- 55
my_vec

[1]  5 99  7  1 55 55

R can also do element-wise replacement:

my_vec[c(5, 6)] <- c(100, 200)
my_vec

[1]   5  99   7   1 100 200

What happens if you try to replace two values with a vector that has three (or more) values?

Logical expressions help us replace values that meet specific conditions without having to find them ourselves.

my_vec[my_vec > 44] <- -1
my_vec

[1]  5 -1  7  1 -1 -1

2.7.5 Reordering elements of vectors

To sort the elements of a vector from lowest to highest, we can use sort()

sorted_vec <- sort(my_vec)
sorted_vec

[1] -1 -1 -1  1  5  7

If we want to sort from highest to lowest, we need to set the optional argument decreasing to TRUE.

sorted_vec_decreasing <- sort(my_vec, decreasing = TRUE)
sorted_vec_decreasing

[1]  7  5  1 -1 -1 -1

Another option is to first use sort() and then reverse the sorted vector using rev().

sorted_vec_decreasing <- rev(sort(my_vec))
sorted_vec_decreasing

[1]  7  5  1 -1 -1 -1

A more useful feature of vectors is that we can reorder their elements based on the values of other vectors. To show this, let’s first create a vector of city names and a vector of (my guess of) their typical daily temperatures in degrees Fahrenheit.

cities <- c("Tokyo", "Cairo", "Mexico City", "Helsinki")
temps_fahr <- c(50, 90, 65, -10)

Now imagine that we want to order the cities from coldest to hottest. The first step is to use order() to create a new variable called “temps_order”.

temps_order <- order(temps_fahr)
temps_order

[1] 4 1 3 2

This output says that the lowest value in temps_fahr is in the fourth position, the second lowest value is in the first position, and so on. We can think of temps_order as a vector of positional indices of temperatures in ascending order. Now we can use these indices to reorder the vector of cities.

cities_ordered <- cities[temps_order]
cities_ordered

[1] "Helsinki"    "Tokyo"       "Mexico City" "Cairo"

Ta-da!

These vector manipulations can do more than dazzle your friends. Imagine that we have a data set with two columns of data and that we want to sort it based on the values of the first column. If we use sort() on each column separately, the values of each column will become uncoupled from each other. Instead, we can use order() on one column to make a vector of positional indices. Then we can use these indices on both columns to keep the values of each coupled in the original way.

So far we have relied on R’s built-in capabilities to do everything we need. But sometimes we need to do something for which there isn’t a built-in function. In these cases, we can get inventive and write a function ourselves.

2.8 Writing our own functions

Functions, as you may remember, are objects that store commands. The three basic parts of a function are name, code to implement, and arguments. To assemble these parts, we can use the function() function (yes, really) followed by a pair of curly brackets {}:

my_function <- function() {}

function() will run the code that we write inside the curly brackets. This code is called the body of the function. Let’s try something simple, like adding 1 + 1:

simple_function <- function() {
    1 + 1 # This function is very simple
}

Indents make code more readable

Indenting the body of the function helps the reader notice that the code is only supposed to run inside the function. Indentation doesn’t affect our functions, but it is helpful and widespread among R coders.

To run our function, we have to write its name followed by round brackets, just like with any other function:

simple_function()

[1] 2

Remember to write the round brackets even if they are empty. The round brackets make R run the code inside the function. If we don’t write these brackets, R will show us the code inside the function (try it!).

Let’s write a function to convert Fahrenheit degrees to Celsius. We want to use this function with different temperatures, so we need to include an argument that will tell R which temperature to convert each time.

fahr_to_cels <- function(temperature) {
    (temperature - 32) / 1.8
}
fahr_to_cels(27)

[1] -2.777778

Now let’s try something a bit more complicated: solving a quadratic equation. If we have an equation of the form \(ax^2 + bx + c = 0\), then the solutions are given by \(x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}\). We can write a function that will apply this formula for us.

solve_quadratic <- function(a, b, c) {
    # Quadratic equations can have two solutions
    solution_1 <- (-b + sqrt(b^2 - 4*a*c)) / 2*a # First solution
    solution_2 <- (-b - sqrt(b^2 - 4*a*c)) / 2*a # Second solution
}

Now all we need is to identify values of \(a, b,\) and \(c\) to pass as arguments.

solve_quadratic(a = 1, b = -1, c = -3)

Why didn’t our function show a result? When we run a function, R runs the code in the body and returns the result of the last line of code. In solve_quadratic(), the last line saves the second solution but doesn’t show it. We need more code to ensure that the last line displays both solutions. Since we are using a script (right?), it is easy add one more line to our function:

solve_quadratic <- function(a, b, c) {
    # Quadratic equations can have two solutions
    solution_1 <- (-b + sqrt(b^2 - 4*a*c)) / 2*a # First solution
    solution_2 <- (-b - sqrt(b^2 - 4*a*c)) / 2*a # Second solution
    c(solution_1, solution_2) # Show vector of solutions
}
solve_quadratic(a = 1, b = -1, c = -3)

[1]  2.302776 -1.302776

A more explicit way of ensuring our function will display its result is to use the return() statement inside the function:

solve_quadratic <- function(a, b, c) {
    # Quadratic equations can have two solutions
    solution_1 <- (-b + sqrt(b^2 - 4*a*c)) / 2*a # First solution
    solution_2 <- (-b - sqrt(b^2 - 4*a*c)) / 2*a # Second solution
    return(c(solution_1, solution_2)) # Return vector of solutions
}
solve_quadratic(a = 1, b = -1, c = -3)

[1]  2.302776 -1.302776

return() can appear anywhere in the body and the function output will still be whatever return() contains. Note that running return(solution_1, solution_2) will produce an error because the result of a function must always be a single object. This is why we combined both solutions from solve_quadratic() into a single vector.

Our function can have as many arguments as we like. It is enough to add their names, separated by commas, in the parentheses that follow the function. When the function runs, R will replace each argument name in the function body with the corresponding value that we supply. If we don’t supply a value, R will replace the argument name with the argument’s default value (if we defined one).

We can also use our functions to create other functions. Imagine that we want to multiply the solutions to our equation by an arbitrary value (maybe because we want to convert the units of \(x\) to something else). And let’s pretend that, by default, we want to double the solutions. We can write another function that first calls solve_quadratic() and then multiplies the result by our arbitrary number.

multiply_solutions = function(a, b, c, multiplier = 2) {
    solve_quadratic(a, b, c) * multiplier
}
multiply_solutions(a = 1, b = -1, c = -3)

[1]  4.605551 -2.605551

multiply_solutions(a = 1, b = -1, c = -3, multiplier = 10)

[1]  23.02776 -13.02776

Objects created in functions disappear

All of the objects that we create inside a function will disappear after it finishes running. Only the output will remain, and to save it we need to assign it to an object.

Being able to write our own functions is great, but we don’t need to reinvent the wheel every time we need to do something that is not available in R’s default version. We can easily download packages from CRAN’s online repositories to get many useful functions.

2.9 Acquiring external packages

To install a package from CRAN, we can use the install.packages() function. For example, if we wanted to install the package readxl (for loading .xslx files), we would need:

install.packages("readxl", dependencies = TRUE)

The argument dependencies tells R whether it should also download other packages that readxl needs to work.

R may ask you to select a CRAN mirror which, simply put, refers to the location of the servers you want to download from. Choose a mirror close to where you are.

After installing a package, we need to load it into R before we can use its functions. To load the package readxl, we need to use the function library(), which will also load any other packages required to load readxl and may print additional information.

library("readxl")

Whenever we start a new R session we need to load the packages we need. If we try to run a function without loading its package first, R will not be able to find it and will return an error message.

Writing all our library() statements at the top of our R scripts is almost always good because it helps us know that we need to load the libraries at the start our sessions. It also lets others know quickly that they will need to have those libraries installed to be able to use our code.

A library can contain many objects, but sometimes we only need one or two of its functions. To avoid loading the entire library, we can access the specific function directly by specifying the package name followed by two colons and then the function name. For example:

readxl::read_xlsx("fake_data_file.xlsx")

2.10 Exercise

Let’s try to practice all of the basic features of R that you just learned. Write a function that can simulate the roll of a pair of six-sided dice (let’s call them red and blue) an arbitrary number of times. This function should return a vector with the values of the red die that were strictly larger than the corresponding values of the blue die. Hint: to simulate rolling a die, you can use the function sample().

Exercise step by step

Step 1: define a function that takes one argument, num_rolls, representing the number of times to roll the dice.
Step 2: create two objects called red and blue to store the results from the dice rolls.
Step 3: simulate the dice rolls using function sample() (read its help page if you need to).
Step 4: create a vector of indices that identifies the values in the red die that were larger than the values in the blue die.
Step 5: use this vector of indices to extract the values from the red die.
Step 6: make sure that your function returns the values you extracted in step 5.

2.11 References

Most of this section is based on “Hands-On Programming with R”, by Garret Grolemund; and on “An Introduction to R”, by Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto & David Lusseau.

** also works for exponentiation, but it’s better to stick to ^.↩︎
This page offers more advice for choosing names in code.↩︎
This output should show 6 names instead of 3. I can not fix the output in the notes, but the code works fine inside RStudio.↩︎
Young reader, this square is a floppy disk; not-so-young reader, I feel your pain.↩︎