Description	Keyboard(Windows)
Clear console	Ctrl+L
Interrupt currently executing command	Esc
Run current line/selection	Ctrl+Enter
Show help for function at cursor	F1
Attempt completion	Tab or Ctrl+Space
Navigate history	Up/Down arrow

1. Atomic Vectors

There are five common types of atomic vectors that I’ll discuss in detail: logical, integer, double (often called numeric), complex and character.

Given a vector, you can determine its type with typeof(), or check if it’s a specific type with an “is” function: is.character(), is.double(), is.integer(), is.logical(), or, more generally, is.atomic().

Atomic vectors are usually created with c(), short for combine.

1.1 Logical vectors

# Use TRUE and FALSE (or T and F) to create logical vectors
TRUE;FALSE

## [1] TRUE

## [1] FALSE

T;F

## [1] TRUE

## [1] FALSE

logical(5)

## [1] FALSE FALSE FALSE FALSE FALSE

c(TRUE,TRUE,FALSE)

## [1]  TRUE  TRUE FALSE

as.logical("false")

## [1] FALSE

is.logical(c(FALSE,FALSE,TRUE))

## [1] TRUE

1.2 Integer vectors

2L # With the L suffix, you get an integer rather than a double

## [1] 2

integer(5)

## [1] 0 0 0 0 0

c(1L,2L,3L)

## [1] 1 2 3

1L:5L

## [1] 1 2 3 4 5

is.integer(c(1,2,3))

## [1] FALSE

is.integer(c(1L,2L,3L))

## [1] TRUE

1.3 Numeric vectors

1.5

## [1] 1.5

numeric(5)

## [1] 0 0 0 0 0

c(1,2,3,4.5)

## [1] 1.0 2.0 3.0 4.5

c(1,2,c(2,3))

## [1] 1 2 2 3

1:5

## [1] 1 2 3 4 5

seq(1,10,2)

## [1] 1 3 5 7 9

1+1:5

## [1] 2 3 4 5 6

1.3 Numeric vectors(cont’d)

dbl <- c(1, 2.5, 4.5)
typeof(dbl)

## [1] "double"

class(dbl)

## [1] "numeric"

is.numeric(dbl)

## [1] TRUE

is.double(dbl)

## [1] TRUE

is.atomic(dbl)

## [1] TRUE

1.4 Complex vectors

1+1i  # 1+i will not work
complex(5)
c(1+1i,1+2i,1+3i)
is.complex(c(1+i,1+2i,1+3i))

1.5 Character vectors

"hello, world!"

## [1] "hello, world!"

character(3)

## [1] "" "" ""

c("Hello","World")

## [1] "Hello" "World"

c('This','is','something')

## [1] "This"      "is"        "something"

"This is a 'character' \"enclosed\" in double quotes"

## [1] "This is a 'character' \"enclosed\" in double quotes"

2.1 Coercion-Logical vectors

All elements of an atomic vector must be the same type, so when you attempt to combine different types they will be coerced to the most flexible type. Types from least to most flexible are: logical, integer, double, and character. You can always convert the type of object with as function.

x <- c(FALSE, TRUE, FALSE, TRUE)
as.numeric(x)

## [1] 0 1 0 1

as.integer(x)

## [1] 0 1 0 1

as.character(x)

## [1] "FALSE" "TRUE"  "FALSE" "TRUE"

as.complex(x)

## [1] 0+0i 1+0i 0+0i 1+0i

2.2 Coercion-Integer vectors

x <- 1L:5L
as.logical(x)

## [1] TRUE TRUE TRUE TRUE TRUE

as.numeric(x)

## [1] 1 2 3 4 5

as.character(x)

## [1] "1" "2" "3" "4" "5"

as.complex(x)

## [1] 1+0i 2+0i 3+0i 4+0i 5+0i

2.3 Coercion-Numeric vectors

x <- c(1.5,2.3,7.9,0.1)
as.logical(x)

## [1] TRUE TRUE TRUE TRUE

as.integer(x)

## [1] 1 2 7 0

as.character(x)

## [1] "1.5" "2.3" "7.9" "0.1"

as.complex(x)

## [1] 1.5+0i 2.3+0i 7.9+0i 0.1+0i

2.4 Coercion-Complex vectors

x <- c(1+2i, 3+1i, 6-0.5i)
as.logical(x)

## [1] TRUE TRUE TRUE

as.integer(x)

## Warning: imaginary parts discarded in coercion

## [1] 1 3 6

as.numeric(x)

## Warning: imaginary parts discarded in coercion

## [1] 1 3 6

as.character(x)

## [1] "1+2i"   "3+1i"   "6-0.5i"

2.5 Coercion-Character vectors

x <- c("1","a","3","d")
as.logical(x)

## [1] NA NA NA NA

as.integer(x)

## Warning: NAs introduced by coercion

## [1]  1 NA  3 NA

as.numeric(x)

## Warning: NAs introduced by coercion

## [1]  1 NA  3 NA

as.complex(x)

## Warning: NAs introduced by coercion

## [1] 1+0i   NA 3+0i   NA

2.6 Coercion(cont’d)

c("a", 1)

## [1] "a" "1"

TRUE + 1

## [1] 2

Coercion often happens automatically. Most mathematical functions (+, log, abs, etc.) will coerce to a double or integer, and most logical operations (&, |, any, etc) will coerce to a logical. You will usually get a warning message if the coercion might lose information.

2.6 Coercion(cont’d)

Q1: c('h','i') == "hi", T or F ?
Q2: Why is 1 == "1" true? Why is -1 < FALSE true? Why is "one" < 2 false?

3. Lists

(x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9)))

## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1]  TRUE FALSE  TRUE
## 
## [[4]]
## [1] 2.3 5.9

str(x)

## List of 4
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:3] TRUE FALSE TRUE
##  $ : num [1:2] 2.3 5.9

(l1 <- list(a=1,b=2,c=3)) # same as `as.list(c(a=1,b=2,c=3))`

## $a
## [1] 1
## 
## $b
## [1] 2
## 
## $c
## [1] 3

3. Lists(cont’d)

Lists are different from atomic vectors because their elements can be of any type, including lists.

Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from atomic vectors.

You can turn a list into an atomic vector with unlist(). If the elements of a list have different types, unlist() uses the same coercion rules as c().

3. Lists(cont’d)

x <- list(a=1,b=2,c=3)
unlist(x) # Given a list structure x, unlist simplifies it to produce a vector which contains all the atomic components which occur in x.

## a b c 
## 1 2 3

l4 <- list(a=c(1,2),b=c(2,3,4),c="hello")
unlist(l4)

##      a1      a2      b1      b2      b3       c 
##     "1"     "2"     "2"     "3"     "4" "hello"

3. Lists(cont’d)

Lists are used to build up many of the more complicated data structures in R. For example, both data frames and linear models objects (as produced by lm()) are lists.

mod <- lm(mpg ~ wt, data = mtcars)
is.list(mod)

## [1] TRUE

4. Attributes

All objects can have arbitrary additional attributes, used to store metadata about the object. Attributes can be thought of as a named list (with unique names).

y <- 1:10; attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")

## [1] "This is a vector"

attributes(y)

## $my_attribute
## [1] "This is a vector"

str(attributes(y))

## List of 1
##  $ my_attribute: chr "This is a vector"

4. Attributes(cont’d)

The structure() function returns a new object with modified attributes:

structure(1:10, my_attribute = "This is a vector")

##  [1]  1  2  3  4  5  6  7  8  9 10
## attr(,"my_attribute")
## [1] "This is a vector"

By default, most attributes are lost when modifying a vector.

attributes(y[1])

## NULL

attributes(sum(y))

## NULL

4. Attributes(cont’d)

The only attributes not lost are the three most important:

Names, a character vector giving each element a name.
Dimensions, used to turn vectors into matrices and arrays.
Class, used to implement the S3 object system.

Each of these attributes has a specific accessor function to get and set values. When working with these attributes, use names(x), class(x), and dim(x), not attr(x, "names"), attr(x, "class"), and attr(x, "dim").

4.1 Attributes-Names

You can name a vector in three ways:

When creating it: x <- c(a = 1, b = 2, c = 3).
By modifying an existing vector in place: x <- 1:3; names(x) <- c("a", "b", "c").
By creating a modified copy of a vector: x <- setNames(1:3, c("a", "b", "c")). or attr(x, which = "names") <- c("a", "b", "c")

Names don’t have to be unique. However, character subsetting is the most important reason to use names and it is most useful when the names are unique.

4.1 Attributes-Names(cont’d)

Not all elements of a vector need to have a name. If some names are missing, names() will return an empty string for those elements. If all names are missing, names() will return NULL.

y <- c(a = 1, 2, 3)
names(y)

## [1] "a" ""  ""

z <- c(1, 2, 3)
names(z)

## NULL

You can create a new vector without names using unname(x), or remove names in place with names(x) <- NULL.

4.2 Attributes-Dimensions

dim attribute provides a way to convert a vector to matrix or array.

x <- 1:10
dim(x) <- c(2,5,1)
dim(x)

## [1] 2 5 1

4.3 Attributes-Class

An object can have any class and more than one class. The class can be created by you.

R possesses a simple generic function mechanism which can be used for an object-oriented style of programming.Method dispatch takes place based on the class of the first argument to the generic function.

class(x) <- "A"
class(x)

## [1] "A"

5. Factors

A factor is a vector that can contain only predefined values, and is used to store categorical data.
Factors are built on top of integer vectors using two attributes: the class(), “factor”, which makes them behave differently from regular integer vectors, and the levels(), which defines the set of allowed values.

x <- factor(c("a", "b", "b", "a"))
x

## [1] a b b a
## Levels: a b

class(x)

## [1] "factor"

levels(x)

## [1] "a" "b"

5. Factors(cont’d)

Sometimes when a data frame is read directly from a file, a column you’d thought would produce a numeric vector instead produces a factor. This is caused by a non-numeric value in the column, often a missing value encoded in a special way like . or -.

# Reading in "text" instead of from a file here:
cat("value\n12\n1\n.\n9")

## value
## 12
## 1
## .
## 9

z <- read.csv(text = "value\n12\n1\n.\n9")
str(z)

## 'data.frame':    4 obs. of  1 variable:
##  $ value: Factor w/ 4 levels ".","1","12","9": 3 2 1 4

as.double(z$value)

## [1] 3 2 1 4

# Oops, that's not right: 3 2 1 4 are the levels of a factor, not the values we read in!
class(z$value)

## [1] "factor"

# We can fix it now:
as.double(as.character(z$value))

## Warning: NAs introduced by coercion

## [1] 12  1 NA  9

# Or change how we read it in:
z <- read.csv(text = "value\n12\n1\n.\n9", na.strings=".")
class(z$value)

## [1] "integer"

5. Factors(cont’d)

Unfortunately, most data loading functions in R automatically convert character vectors to factors. This is suboptimal, because there’s no way for those functions to know the set of all possible levels or their optimal order. Instead, use the argument stringsAsFactors = FALSE to suppress this behaviour, and then manually convert character vectors to factors using your knowledge of the data.
While factors look (and often behave) like character vectors, they are actually integers. Be careful when treating them like strings. It’s usually best to explicitly convert factors to character vectors if you need string-like behaviour.

5. Factors(cont’d)

What happens to a factor when you modify its levels?

f1 <- factor(letters[1:5])
levels(f1) <- rev(levels(f1[1:5])) #rev(): reverse elements

What does this code do? How do f2 and f3 differ from f1?

f2 <- rev(factor(letters[1:5]))

f3 <- factor(letters[1:5], levels = rev(letters[1:5]))

6. Matrices and arrays

All columns in a matrix must have the same mode(numeric, character, etc.) and the same length.

# Two scalar arguments to specify rows and columns
a <- matrix(1:6, ncol = 3, nrow = 2)
# One vector argument to describe all dimensions
b <- array(1:12, c(2, 3, 2))

dim(a); dim(b)

## [1] 2 3

## [1] 2 3 2

6. Matrices and arrays(cont’d)

length() and names() have high-dimensional generalisations:

length() generalises to nrow() and ncol() for matrices, and dim() for arrays.
names() generalises to rownames() and colnames() for matrices, and dimnames(), a list of character vectors, for arrays.

6. Matrices and arrays(cont’d)

length(a)
nrow(a)
ncol(a)
rownames(a) <- c("A", "B")
colnames(a) <- c("a", "b", "c")
a

length(b)
dim(b)
dimnames(b) <- list(c("one", "two"), c("a", "b", "c"), c("A", "B"))
b

matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,byrow=TRUE, dimnames=list(c("r1","r2","r3"),c("c1","c2","c3")))

array(c(0,1,2,3,4,5,6,7,8,9),dim=c(1,5,2), dimnames=list(c("r1"),c("c1","c2","c3","c4","c5"),c("k1","k2")))

6. Matrices and arrays(cont’d)

c() generalises to cbind() and rbind() for matrices, and to abind() (provided by the abind package) for arrays. You can transpose a matrix with t(); the generalised equivalent for arrays is aperm().
You can test if an object is a matrix or array using is.matrix() and is.array(), or by looking at the length of the dim(). as.matrix() and as.array() make it easy to turn an existing vector into a matrix or array.
Vectors are not the only 1-dimensional data structure. You can have matrices with a single row or single column, or arrays with a single dimension.

6. Matrices and arrays(cont’d)

str(1:3)                   # 1d vector

##  int [1:3] 1 2 3

str(matrix(1:3, ncol = 1)) # column vector

##  int [1:3, 1] 1 2 3

str(matrix(1:3, nrow = 1)) # row vector

##  int [1, 1:3] 1 2 3

str(array(1:3, 3))         # "array" vector

##  int [1:3(1d)] 1 2 3

7. Data frames

A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier. Under the hood, a data frame is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list.
This means that a data frame has names(), colnames(), and rownames(), although names() and colnames() are the same thing. The length() of a data frame is the length of the underlying list and so is the same as ncol(); nrow() gives the number of rows.
You can subset a data frame like a 1d structure (where it behaves like a list), or a 2d structure (where it behaves like a matrix).

7.1 Data frames-Creation

You create a data frame using data.frame(), which takes named vectors as input:

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
df

##   x y
## 1 1 a
## 2 2 b
## 3 3 c

Beware data.frame()’s default behaviour which turns strings into factors. Use stringAsFactors = FALSE to suppress this behaviour.

7.2 Data frames-Testing

Because a data.frame is an S3 class, its type reflects the underlying vector used to build it: the list. To check if an object is a data frame, use class() or test explicitly with is.data.frame():

typeof(df)

## [1] "list"

class(df)

## [1] "data.frame"

is.data.frame(df)

## [1] TRUE

7.3 Data frames-Combination

You can combine data frames using cbind() and rbind():

cbind(df, data.frame(z = 3:1))

##   x y z
## 1 1 a 3
## 2 2 b 2
## 3 3 c 1

rbind(df, data.frame(x = 10, y = "z"))

##    x y
## 1  1 a
## 2  2 b
## 3  3 c
## 4 10 z

When combining column-wise, the number of rows must match, but row names are ignored. When combining row-wise, both the number and names of columns must match. Use plyr::rbind.fill() to combine data frames that don’t have the same columns.

7.4 Data frames-Combination(cont’d)

It’s a common mistake to try and create a data frame by cbind()ing vectors together. This doesn’t work because cbind() will create a matrix unless one of the arguments is already a data frame. Instead use data.frame() directly:

bad <- data.frame(cbind(a = 1:2, b = c("a", "b")))
str(bad)
good <- data.frame(a = 1:2, b = c("a", "b"),
  stringsAsFactors = FALSE)
str(good)

7.5 Special columns

Since a data frame is a list of vectors, it is possible for a data frame to have a column that is a list:

df <- data.frame(x = 1:3)
df$y <- list(1:2, 1:3, 1:4)
str(df)

## 'data.frame':    3 obs. of  2 variables:
##  $ x: int  1 2 3
##  $ y:List of 3
##   ..$ : int  1 2
##   ..$ : int  1 2 3
##   ..$ : int  1 2 3 4

dfl <- data.frame(x = 1:3, y = I(list(1:2, 1:3, 1:4)))
str(dfl)
    dfl[2, "y"]

Use list and array columns with caution: many functions that work with data frames assume that all columns are atomic vectors.

7.6 Data frames- Coercion

You can coerce an object to a data frame with as.data.frame():

A vector will create a one-column data frame.
A list will create one column for each element; it’s an error if they’re not all the same length.
A matrix will create a data frame with the same number of columns and rows.

7.7 Coercion-matrix, data.frame, vector

m <- matrix(1:6, ncol = 3, nrow = 2)
d <- data.frame(x = 1:3,y = c("a", "b", "c"),stringsAsFactors = FALSE)

as.data.frame(m)

##   V1 V2 V3
## 1  1  3  5
## 2  2  4  6

as.matrix(d)

##      x   y  
## [1,] "1" "a"
## [2,] "2" "b"
## [3,] "3" "c"

as.character(m)

## [1] "1" "2" "3" "4" "5" "6"

as.numeric(m)

## [1] 1 2 3 4 5 6

7.7 Coercion-matrix, data.frame, vector(cont’d)

as.list(m)

## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5
## 
## [[6]]
## [1] 6

as.list(d)

## $x
## [1] 1 2 3
## 
## $y
## [1] "a" "b" "c"

7.7 Coercion-matrix, data.frame, vector(cont’d)

v <- c(1,9,5)
class(as.matrix(v))

## [1] "matrix"

class(as.array(v))

## [1] "array"

class(as.data.frame(v))

## [1] "data.frame"

Chapter 2. Data Structures

0.1 Review of Chapter 1

0.1 Review of Chapter 1 (cont’d)

1. Atomic Vectors

1.1 Logical vectors

1.2 Integer vectors

1.3 Numeric vectors

1.3 Numeric vectors(cont’d)

1.4 Complex vectors

1.5 Character vectors

2.1 Coercion-Logical vectors

2.2 Coercion-Integer vectors

2.3 Coercion-Numeric vectors

2.4 Coercion-Complex vectors

2.5 Coercion-Character vectors

2.6 Coercion(cont’d)

2.6 Coercion(cont’d)

3. Lists

3. Lists(cont’d)

3. Lists(cont’d)

3. Lists(cont’d)

4. Attributes

4. Attributes(cont’d)

4. Attributes(cont’d)

4.1 Attributes-Names

4.1 Attributes-Names(cont’d)

4.2 Attributes-Dimensions

4.3 Attributes-Class

5. Factors

5. Factors(cont’d)

5. Factors(cont’d)

5. Factors(cont’d)

6. Matrices and arrays

6. Matrices and arrays(cont’d)

6. Matrices and arrays(cont’d)

6. Matrices and arrays(cont’d)

6. Matrices and arrays(cont’d)

7. Data frames

7.1 Data frames-Creation

7.2 Data frames-Testing

7.3 Data frames-Combination

7.4 Data frames-Combination(cont’d)

7.5 Special columns

7.6 Data frames- Coercion

7.7 Coercion-matrix, data.frame, vector

7.7 Coercion-matrix, data.frame, vector(cont’d)

7.7 Coercion-matrix, data.frame, vector(cont’d)

7.8 Other useful functions