Chapter 2. Data Structures

Xiaojun Sun

Thursday, October 09, 2014

0.1 Review of Chapter 1

Description Keyboard(Windows)
Clear console Ctrl+L
Interrupt currently executing command Esc
Run current line/selection Ctrl+Enter
Show help for function at cursor F1
Attempt completion Tab or Ctrl+Space
Navigate history Up/Down arrow

0.1 Review of Chapter 1 (cont’d)

# assign
x <- 1
str(x)

# get help
?plot

# working directory
getwd()
setwd("E:/Project/WISE R Club/LearnR")

# options
?options
options(digits=3)

# packages
install.packages("ggplot2")
library(ggplot2)

1. Atomic Vectors

There are five common types of atomic vectors that I’ll discuss in detail: logical, integer, double (often called numeric), complex and character.

1.1 Logical vectors

# Use TRUE and FALSE (or T and F) to create logical vectors
TRUE;FALSE
## [1] TRUE
## [1] FALSE
T;F
## [1] TRUE
## [1] FALSE
logical(5)
## [1] FALSE FALSE FALSE FALSE FALSE
c(TRUE,TRUE,FALSE)
## [1]  TRUE  TRUE FALSE
as.logical("false")
## [1] FALSE
is.logical(c(FALSE,FALSE,TRUE))
## [1] TRUE

1.2 Integer vectors

2L # With the L suffix, you get an integer rather than a double
## [1] 2
integer(5)
## [1] 0 0 0 0 0
c(1L,2L,3L)
## [1] 1 2 3
1L:5L
## [1] 1 2 3 4 5
is.integer(c(1,2,3))
## [1] FALSE
is.integer(c(1L,2L,3L))
## [1] TRUE

1.3 Numeric vectors

1.5
## [1] 1.5
numeric(5)
## [1] 0 0 0 0 0
c(1,2,3,4.5)
## [1] 1.0 2.0 3.0 4.5
c(1,2,c(2,3))
## [1] 1 2 2 3
1:5
## [1] 1 2 3 4 5
seq(1,10,2)
## [1] 1 3 5 7 9
1+1:5
## [1] 2 3 4 5 6

1.3 Numeric vectors(cont’d)

dbl <- c(1, 2.5, 4.5)
typeof(dbl) 
## [1] "double"
class(dbl)
## [1] "numeric"
is.numeric(dbl)
## [1] TRUE
is.double(dbl)
## [1] TRUE
is.atomic(dbl)
## [1] TRUE

1.4 Complex vectors

1+1i  # 1+i will not work
complex(5)
c(1+1i,1+2i,1+3i)
is.complex(c(1+i,1+2i,1+3i))

1.5 Character vectors

"hello, world!"
## [1] "hello, world!"
character(3)
## [1] "" "" ""
c("Hello","World")
## [1] "Hello" "World"
c('This','is','something')
## [1] "This"      "is"        "something"
"This is a 'character' \"enclosed\" in double quotes"
## [1] "This is a 'character' \"enclosed\" in double quotes"

2.1 Coercion-Logical vectors

All elements of an atomic vector must be the same type, so when you attempt to combine different types they will be coerced to the most flexible type. Types from least to most flexible are: logical, integer, double, and character. You can always convert the type of object with as function.

x <- c(FALSE, TRUE, FALSE, TRUE)
as.numeric(x)
## [1] 0 1 0 1
as.integer(x)
## [1] 0 1 0 1
as.character(x)
## [1] "FALSE" "TRUE"  "FALSE" "TRUE"
as.complex(x)
## [1] 0+0i 1+0i 0+0i 1+0i

2.2 Coercion-Integer vectors

x <- 1L:5L
as.logical(x)
## [1] TRUE TRUE TRUE TRUE TRUE
as.numeric(x)
## [1] 1 2 3 4 5
as.character(x)
## [1] "1" "2" "3" "4" "5"
as.complex(x)
## [1] 1+0i 2+0i 3+0i 4+0i 5+0i

2.3 Coercion-Numeric vectors

x <- c(1.5,2.3,7.9,0.1)
as.logical(x)
## [1] TRUE TRUE TRUE TRUE
as.integer(x)
## [1] 1 2 7 0
as.character(x)
## [1] "1.5" "2.3" "7.9" "0.1"
as.complex(x)
## [1] 1.5+0i 2.3+0i 7.9+0i 0.1+0i

2.4 Coercion-Complex vectors

x <- c(1+2i, 3+1i, 6-0.5i)
as.logical(x)
## [1] TRUE TRUE TRUE
as.integer(x)
## Warning: imaginary parts discarded in coercion
## [1] 1 3 6
as.numeric(x)
## Warning: imaginary parts discarded in coercion
## [1] 1 3 6
as.character(x)
## [1] "1+2i"   "3+1i"   "6-0.5i"

2.5 Coercion-Character vectors

x <- c("1","a","3","d")
as.logical(x)
## [1] NA NA NA NA
as.integer(x)
## Warning: NAs introduced by coercion
## [1]  1 NA  3 NA
as.numeric(x)
## Warning: NAs introduced by coercion
## [1]  1 NA  3 NA
as.complex(x)
## Warning: NAs introduced by coercion
## [1] 1+0i   NA 3+0i   NA

2.6 Coercion(cont’d)

c("a", 1)
## [1] "a" "1"
TRUE + 1
## [1] 2

2.6 Coercion(cont’d)

3. Lists

(x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9)))
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1]  TRUE FALSE  TRUE
## 
## [[4]]
## [1] 2.3 5.9
str(x)
## List of 4
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:3] TRUE FALSE TRUE
##  $ : num [1:2] 2.3 5.9
(l1 <- list(a=1,b=2,c=3)) # same as `as.list(c(a=1,b=2,c=3))`
## $a
## [1] 1
## 
## $b
## [1] 2
## 
## $c
## [1] 3

3. Lists(cont’d)

3. Lists(cont’d)

x <- list(a=1,b=2,c=3)
unlist(x) # Given a list structure x, unlist simplifies it to produce a vector which contains all the atomic components which occur in x.
## a b c 
## 1 2 3
l4 <- list(a=c(1,2),b=c(2,3,4),c="hello")
unlist(l4)
##      a1      a2      b1      b2      b3       c 
##     "1"     "2"     "2"     "3"     "4" "hello"

3. Lists(cont’d)

Lists are used to build up many of the more complicated data structures in R. For example, both data frames and linear models objects (as produced by lm()) are lists.

mod <- lm(mpg ~ wt, data = mtcars)
is.list(mod)
## [1] TRUE

4. Attributes

All objects can have arbitrary additional attributes, used to store metadata about the object. Attributes can be thought of as a named list (with unique names).

y <- 1:10; attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"
attributes(y)
## $my_attribute
## [1] "This is a vector"
str(attributes(y))
## List of 1
##  $ my_attribute: chr "This is a vector"

4. Attributes(cont’d)

The structure() function returns a new object with modified attributes:

structure(1:10, my_attribute = "This is a vector")
##  [1]  1  2  3  4  5  6  7  8  9 10
## attr(,"my_attribute")
## [1] "This is a vector"

By default, most attributes are lost when modifying a vector.

attributes(y[1])
## NULL
attributes(sum(y))
## NULL

4. Attributes(cont’d)

The only attributes not lost are the three most important:

Each of these attributes has a specific accessor function to get and set values. When working with these attributes, use names(x), class(x), and dim(x), not attr(x, "names"), attr(x, "class"), and attr(x, "dim").

4.1 Attributes-Names

You can name a vector in three ways:

Names don’t have to be unique. However, character subsetting is the most important reason to use names and it is most useful when the names are unique.

4.1 Attributes-Names(cont’d)

Not all elements of a vector need to have a name. If some names are missing, names() will return an empty string for those elements. If all names are missing, names() will return NULL.

y <- c(a = 1, 2, 3)
names(y)
## [1] "a" ""  ""
z <- c(1, 2, 3)
names(z)
## NULL

You can create a new vector without names using unname(x), or remove names in place with names(x) <- NULL.

4.2 Attributes-Dimensions

dim attribute provides a way to convert a vector to matrix or array.

x <- 1:10
dim(x) <- c(2,5,1)
dim(x)
## [1] 2 5 1

4.3 Attributes-Class

An object can have any class and more than one class. The class can be created by you.

R possesses a simple generic function mechanism which can be used for an object-oriented style of programming.Method dispatch takes place based on the class of the first argument to the generic function.

class(x) <- "A"
class(x)
## [1] "A"

5. Factors

x <- factor(c("a", "b", "b", "a"))
x
## [1] a b b a
## Levels: a b
class(x)
## [1] "factor"
levels(x)
## [1] "a" "b"

5. Factors(cont’d)

Sometimes when a data frame is read directly from a file, a column you’d thought would produce a numeric vector instead produces a factor. This is caused by a non-numeric value in the column, often a missing value encoded in a special way like . or -.

# Reading in "text" instead of from a file here:
cat("value\n12\n1\n.\n9")
## value
## 12
## 1
## .
## 9
z <- read.csv(text = "value\n12\n1\n.\n9")
str(z)
## 'data.frame':    4 obs. of  1 variable:
##  $ value: Factor w/ 4 levels ".","1","12","9": 3 2 1 4
as.double(z$value)
## [1] 3 2 1 4
# Oops, that's not right: 3 2 1 4 are the levels of a factor, not the values we read in!
class(z$value)
## [1] "factor"
# We can fix it now:
as.double(as.character(z$value))
## Warning: NAs introduced by coercion
## [1] 12  1 NA  9
# Or change how we read it in:
z <- read.csv(text = "value\n12\n1\n.\n9", na.strings=".")
class(z$value)
## [1] "integer"

5. Factors(cont’d)

5. Factors(cont’d)

6. Matrices and arrays

All columns in a matrix must have the same mode(numeric, character, etc.) and the same length.

# Two scalar arguments to specify rows and columns
a <- matrix(1:6, ncol = 3, nrow = 2)
# One vector argument to describe all dimensions
b <- array(1:12, c(2, 3, 2))

dim(a); dim(b)
## [1] 2 3
## [1] 2 3 2

6. Matrices and arrays(cont’d)

length() and names() have high-dimensional generalisations:

6. Matrices and arrays(cont’d)

length(a)
nrow(a)
ncol(a)
rownames(a) <- c("A", "B")
colnames(a) <- c("a", "b", "c")
a

length(b)
dim(b)
dimnames(b) <- list(c("one", "two"), c("a", "b", "c"), c("A", "B"))
b

matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,byrow=TRUE, dimnames=list(c("r1","r2","r3"),c("c1","c2","c3")))

array(c(0,1,2,3,4,5,6,7,8,9),dim=c(1,5,2), dimnames=list(c("r1"),c("c1","c2","c3","c4","c5"),c("k1","k2")))

6. Matrices and arrays(cont’d)

6. Matrices and arrays(cont’d)

str(1:3)                   # 1d vector
##  int [1:3] 1 2 3
str(matrix(1:3, ncol = 1)) # column vector
##  int [1:3, 1] 1 2 3
str(matrix(1:3, nrow = 1)) # row vector
##  int [1, 1:3] 1 2 3
str(array(1:3, 3))         # "array" vector
##  int [1:3(1d)] 1 2 3

7. Data frames

7.1 Data frames-Creation

You create a data frame using data.frame(), which takes named vectors as input:

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
df
##   x y
## 1 1 a
## 2 2 b
## 3 3 c

7.2 Data frames-Testing

Because a data.frame is an S3 class, its type reflects the underlying vector used to build it: the list. To check if an object is a data frame, use class() or test explicitly with is.data.frame():

typeof(df)
## [1] "list"
class(df)
## [1] "data.frame"
is.data.frame(df)
## [1] TRUE

7.3 Data frames-Combination

You can combine data frames using cbind() and rbind():

cbind(df, data.frame(z = 3:1))
##   x y z
## 1 1 a 3
## 2 2 b 2
## 3 3 c 1
rbind(df, data.frame(x = 10, y = "z"))
##    x y
## 1  1 a
## 2  2 b
## 3  3 c
## 4 10 z

When combining column-wise, the number of rows must match, but row names are ignored. When combining row-wise, both the number and names of columns must match. Use plyr::rbind.fill() to combine data frames that don’t have the same columns.

7.4 Data frames-Combination(cont’d)

It’s a common mistake to try and create a data frame by cbind()ing vectors together. This doesn’t work because cbind() will create a matrix unless one of the arguments is already a data frame. Instead use data.frame() directly:

bad <- data.frame(cbind(a = 1:2, b = c("a", "b")))
str(bad)
good <- data.frame(a = 1:2, b = c("a", "b"),
  stringsAsFactors = FALSE)
str(good)

7.5 Special columns

Since a data frame is a list of vectors, it is possible for a data frame to have a column that is a list:

df <- data.frame(x = 1:3)
df$y <- list(1:2, 1:3, 1:4)
str(df)
## 'data.frame':    3 obs. of  2 variables:
##  $ x: int  1 2 3
##  $ y:List of 3
##   ..$ : int  1 2
##   ..$ : int  1 2 3
##   ..$ : int  1 2 3 4

or

dfl <- data.frame(x = 1:3, y = I(list(1:2, 1:3, 1:4)))
str(dfl)
    dfl[2, "y"]

Use list and array columns with caution: many functions that work with data frames assume that all columns are atomic vectors.

7.6 Data frames- Coercion

You can coerce an object to a data frame with as.data.frame():

7.7 Coercion-matrix, data.frame, vector

m <- matrix(1:6, ncol = 3, nrow = 2)
d <- data.frame(x = 1:3,y = c("a", "b", "c"),stringsAsFactors = FALSE)

as.data.frame(m)
##   V1 V2 V3
## 1  1  3  5
## 2  2  4  6
as.matrix(d)
##      x   y  
## [1,] "1" "a"
## [2,] "2" "b"
## [3,] "3" "c"
as.character(m)
## [1] "1" "2" "3" "4" "5" "6"
as.numeric(m)
## [1] 1 2 3 4 5 6

7.7 Coercion-matrix, data.frame, vector(cont’d)

as.list(m)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5
## 
## [[6]]
## [1] 6
as.list(d)
## $x
## [1] 1 2 3
## 
## $y
## [1] "a" "b" "c"

7.7 Coercion-matrix, data.frame, vector(cont’d)

v <- c(1,9,5)
class(as.matrix(v))
## [1] "matrix"
class(as.array(v))
## [1] "array"
class(as.data.frame(v))
## [1] "data.frame"

7.8 Other useful functions

ls()       # list current objects
rm(object) # delete an object

x <- head(mtcars) # print first 6 rows of mydata
tail(mtcars)

newobject <- edit(object) # edit copy and save as newobject 
fix(object)               # edit in place

 

 

 

 

 

Thank you!