Xiaojun Sun
Thursday, October 09, 2014
Description | Keyboard(Windows) |
---|---|
Clear console | Ctrl+L |
Interrupt currently executing command | Esc |
Run current line/selection | Ctrl+Enter |
Show help for function at cursor | F1 |
Attempt completion | Tab or Ctrl+Space |
Navigate history | Up/Down arrow |
# assign
x <- 1
str(x)
# get help
?plot
# working directory
getwd()
setwd("E:/Project/WISE R Club/LearnR")
# options
?options
options(digits=3)
# packages
install.packages("ggplot2")
library(ggplot2)
There are five common types of atomic vectors that I’ll discuss in detail: logical, integer, double (often called numeric), complex and character.
typeof()
, or check if it’s a specific type with an “is” function: is.character()
, is.double()
, is.integer()
, is.logical()
, or, more generally, is.atomic()
.c()
, short for combine.# Use TRUE and FALSE (or T and F) to create logical vectors
TRUE;FALSE
## [1] TRUE
## [1] FALSE
T;F
## [1] TRUE
## [1] FALSE
logical(5)
## [1] FALSE FALSE FALSE FALSE FALSE
c(TRUE,TRUE,FALSE)
## [1] TRUE TRUE FALSE
as.logical("false")
## [1] FALSE
is.logical(c(FALSE,FALSE,TRUE))
## [1] TRUE
2L # With the L suffix, you get an integer rather than a double
## [1] 2
integer(5)
## [1] 0 0 0 0 0
c(1L,2L,3L)
## [1] 1 2 3
1L:5L
## [1] 1 2 3 4 5
is.integer(c(1,2,3))
## [1] FALSE
is.integer(c(1L,2L,3L))
## [1] TRUE
1.5
## [1] 1.5
numeric(5)
## [1] 0 0 0 0 0
c(1,2,3,4.5)
## [1] 1.0 2.0 3.0 4.5
c(1,2,c(2,3))
## [1] 1 2 2 3
1:5
## [1] 1 2 3 4 5
seq(1,10,2)
## [1] 1 3 5 7 9
1+1:5
## [1] 2 3 4 5 6
dbl <- c(1, 2.5, 4.5)
typeof(dbl)
## [1] "double"
class(dbl)
## [1] "numeric"
is.numeric(dbl)
## [1] TRUE
is.double(dbl)
## [1] TRUE
is.atomic(dbl)
## [1] TRUE
1+1i # 1+i will not work
complex(5)
c(1+1i,1+2i,1+3i)
is.complex(c(1+i,1+2i,1+3i))
"hello, world!"
## [1] "hello, world!"
character(3)
## [1] "" "" ""
c("Hello","World")
## [1] "Hello" "World"
c('This','is','something')
## [1] "This" "is" "something"
"This is a 'character' \"enclosed\" in double quotes"
## [1] "This is a 'character' \"enclosed\" in double quotes"
All elements of an atomic vector must be the same type, so when you attempt to combine different types they will be coerced to the most flexible type. Types from least to most flexible are: logical, integer, double, and character. You can always convert the type of object with as
function.
x <- c(FALSE, TRUE, FALSE, TRUE)
as.numeric(x)
## [1] 0 1 0 1
as.integer(x)
## [1] 0 1 0 1
as.character(x)
## [1] "FALSE" "TRUE" "FALSE" "TRUE"
as.complex(x)
## [1] 0+0i 1+0i 0+0i 1+0i
x <- 1L:5L
as.logical(x)
## [1] TRUE TRUE TRUE TRUE TRUE
as.numeric(x)
## [1] 1 2 3 4 5
as.character(x)
## [1] "1" "2" "3" "4" "5"
as.complex(x)
## [1] 1+0i 2+0i 3+0i 4+0i 5+0i
x <- c(1.5,2.3,7.9,0.1)
as.logical(x)
## [1] TRUE TRUE TRUE TRUE
as.integer(x)
## [1] 1 2 7 0
as.character(x)
## [1] "1.5" "2.3" "7.9" "0.1"
as.complex(x)
## [1] 1.5+0i 2.3+0i 7.9+0i 0.1+0i
x <- c(1+2i, 3+1i, 6-0.5i)
as.logical(x)
## [1] TRUE TRUE TRUE
as.integer(x)
## Warning: imaginary parts discarded in coercion
## [1] 1 3 6
as.numeric(x)
## Warning: imaginary parts discarded in coercion
## [1] 1 3 6
as.character(x)
## [1] "1+2i" "3+1i" "6-0.5i"
x <- c("1","a","3","d")
as.logical(x)
## [1] NA NA NA NA
as.integer(x)
## Warning: NAs introduced by coercion
## [1] 1 NA 3 NA
as.numeric(x)
## Warning: NAs introduced by coercion
## [1] 1 NA 3 NA
as.complex(x)
## Warning: NAs introduced by coercion
## [1] 1+0i NA 3+0i NA
c("a", 1)
## [1] "a" "1"
TRUE + 1
## [1] 2
+
, log
, abs
, etc.) will coerce to a double or integer, and most logical operations (&
, |
, any
, etc) will coerce to a logical. You will usually get a warning message if the coercion might lose information.Q1: c('h','i') == "hi"
, T or F ?
Q2: Why is 1 == "1"
true? Why is -1 < FALSE
true? Why is "one" < 2
false?
(x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9)))
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] "a"
##
## [[3]]
## [1] TRUE FALSE TRUE
##
## [[4]]
## [1] 2.3 5.9
str(x)
## List of 4
## $ : int [1:3] 1 2 3
## $ : chr "a"
## $ : logi [1:3] TRUE FALSE TRUE
## $ : num [1:2] 2.3 5.9
(l1 <- list(a=1,b=2,c=3)) # same as `as.list(c(a=1,b=2,c=3))`
## $a
## [1] 1
##
## $b
## [1] 2
##
## $c
## [1] 3
unlist()
. If the elements of a list have different types, unlist()
uses the same coercion rules as c()
.x <- list(a=1,b=2,c=3)
unlist(x) # Given a list structure x, unlist simplifies it to produce a vector which contains all the atomic components which occur in x.
## a b c
## 1 2 3
l4 <- list(a=c(1,2),b=c(2,3,4),c="hello")
unlist(l4)
## a1 a2 b1 b2 b3 c
## "1" "2" "2" "3" "4" "hello"
Lists are used to build up many of the more complicated data structures in R. For example, both data frames and linear models objects (as produced by lm()
) are lists.
mod <- lm(mpg ~ wt, data = mtcars)
is.list(mod)
## [1] TRUE
All objects can have arbitrary additional attributes, used to store metadata about the object. Attributes can be thought of as a named list (with unique names).
y <- 1:10; attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"
attributes(y)
## $my_attribute
## [1] "This is a vector"
str(attributes(y))
## List of 1
## $ my_attribute: chr "This is a vector"
The structure()
function returns a new object with modified attributes:
structure(1:10, my_attribute = "This is a vector")
## [1] 1 2 3 4 5 6 7 8 9 10
## attr(,"my_attribute")
## [1] "This is a vector"
By default, most attributes are lost when modifying a vector.
attributes(y[1])
## NULL
attributes(sum(y))
## NULL
The only attributes not lost are the three most important:
Each of these attributes has a specific accessor function to get and set values. When working with these attributes, use names(x)
, class(x)
, and dim(x)
, not attr(x, "names")
, attr(x, "class")
, and attr(x, "dim")
.
You can name a vector in three ways:
When creating it: x <- c(a = 1, b = 2, c = 3)
.
By modifying an existing vector in place: x <- 1:3; names(x) <- c("a", "b", "c")
.
By creating a modified copy of a vector: x <- setNames(1:3, c("a", "b", "c"))
. or attr(x, which = "names") <- c("a", "b", "c")
Names don’t have to be unique. However, character subsetting is the most important reason to use names and it is most useful when the names are unique.
Not all elements of a vector need to have a name. If some names are missing, names()
will return an empty string for those elements. If all names are missing, names()
will return NULL
.
y <- c(a = 1, 2, 3)
names(y)
## [1] "a" "" ""
z <- c(1, 2, 3)
names(z)
## NULL
You can create a new vector without names using unname(x)
, or remove names in place with names(x) <- NULL
.
dim
attribute provides a way to convert a vector to matrix or array.
x <- 1:10
dim(x) <- c(2,5,1)
dim(x)
## [1] 2 5 1
An object can have any class and more than one class. The class can be created by you.
R possesses a simple generic function mechanism which can be used for an object-oriented style of programming.Method dispatch takes place based on the class of the first argument to the generic function.
class(x) <- "A"
class(x)
## [1] "A"
class()
, “factor”, which makes them behave differently from regular integer vectors, and the levels()
, which defines the set of allowed values.x <- factor(c("a", "b", "b", "a"))
x
## [1] a b b a
## Levels: a b
class(x)
## [1] "factor"
levels(x)
## [1] "a" "b"
Sometimes when a data frame is read directly from a file, a column you’d thought would produce a numeric vector instead produces a factor. This is caused by a non-numeric value in the column, often a missing value encoded in a special way like .
or -
.
# Reading in "text" instead of from a file here:
cat("value\n12\n1\n.\n9")
## value
## 12
## 1
## .
## 9
z <- read.csv(text = "value\n12\n1\n.\n9")
str(z)
## 'data.frame': 4 obs. of 1 variable:
## $ value: Factor w/ 4 levels ".","1","12","9": 3 2 1 4
as.double(z$value)
## [1] 3 2 1 4
# Oops, that's not right: 3 2 1 4 are the levels of a factor, not the values we read in!
class(z$value)
## [1] "factor"
# We can fix it now:
as.double(as.character(z$value))
## Warning: NAs introduced by coercion
## [1] 12 1 NA 9
# Or change how we read it in:
z <- read.csv(text = "value\n12\n1\n.\n9", na.strings=".")
class(z$value)
## [1] "integer"
Unfortunately, most data loading functions in R automatically convert character vectors to factors. This is suboptimal, because there’s no way for those functions to know the set of all possible levels or their optimal order. Instead, use the argument stringsAsFactors = FALSE
to suppress this behaviour, and then manually convert character vectors to factors using your knowledge of the data.
While factors look (and often behave) like character vectors, they are actually integers. Be careful when treating them like strings. It’s usually best to explicitly convert factors to character vectors if you need string-like behaviour.
What happens to a factor when you modify its levels?
f1 <- factor(letters[1:5])
levels(f1) <- rev(levels(f1[1:5])) #rev(): reverse elements
What does this code do? How do f2
and f3
differ from f1
?
f2 <- rev(factor(letters[1:5]))
f3 <- factor(letters[1:5], levels = rev(letters[1:5]))
All columns in a matrix must have the same mode(numeric, character, etc.) and the same length.
# Two scalar arguments to specify rows and columns
a <- matrix(1:6, ncol = 3, nrow = 2)
# One vector argument to describe all dimensions
b <- array(1:12, c(2, 3, 2))
dim(a); dim(b)
## [1] 2 3
## [1] 2 3 2
length()
and names()
have high-dimensional generalisations:
length()
generalises to nrow()
and ncol()
for matrices, and dim()
for arrays.
names()
generalises to rownames()
and colnames()
for matrices, and dimnames()
, a list of character vectors, for arrays.
length(a)
nrow(a)
ncol(a)
rownames(a) <- c("A", "B")
colnames(a) <- c("a", "b", "c")
a
length(b)
dim(b)
dimnames(b) <- list(c("one", "two"), c("a", "b", "c"), c("A", "B"))
b
matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,byrow=TRUE, dimnames=list(c("r1","r2","r3"),c("c1","c2","c3")))
array(c(0,1,2,3,4,5,6,7,8,9),dim=c(1,5,2), dimnames=list(c("r1"),c("c1","c2","c3","c4","c5"),c("k1","k2")))
c()
generalises to cbind()
and rbind()
for matrices, and to abind()
(provided by the abind
package) for arrays. You can transpose a matrix with t()
; the generalised equivalent for arrays is aperm()
.
You can test if an object is a matrix or array using is.matrix()
and is.array()
, or by looking at the length of the dim()
. as.matrix()
and as.array()
make it easy to turn an existing vector into a matrix or array.
Vectors are not the only 1-dimensional data structure. You can have matrices with a single row or single column, or arrays with a single dimension.
str(1:3) # 1d vector
## int [1:3] 1 2 3
str(matrix(1:3, ncol = 1)) # column vector
## int [1:3, 1] 1 2 3
str(matrix(1:3, nrow = 1)) # row vector
## int [1, 1:3] 1 2 3
str(array(1:3, 3)) # "array" vector
## int [1:3(1d)] 1 2 3
A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier. Under the hood, a data frame is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list.
This means that a data frame has names()
, colnames()
, and rownames()
, although names()
and colnames()
are the same thing. The length()
of a data frame is the length of the underlying list and so is the same as ncol()
; nrow()
gives the number of rows.
You can subset a data frame like a 1d structure (where it behaves like a list), or a 2d structure (where it behaves like a matrix).
You create a data frame using data.frame()
, which takes named vectors as input:
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
df
## x y
## 1 1 a
## 2 2 b
## 3 3 c
data.frame()
’s default behaviour which turns strings into factors. Use stringAsFactors = FALSE
to suppress this behaviour.Because a data.frame
is an S3 class, its type reflects the underlying vector used to build it: the list. To check if an object is a data frame, use class()
or test explicitly with is.data.frame()
:
typeof(df)
## [1] "list"
class(df)
## [1] "data.frame"
is.data.frame(df)
## [1] TRUE
You can combine data frames using cbind()
and rbind()
:
cbind(df, data.frame(z = 3:1))
## x y z
## 1 1 a 3
## 2 2 b 2
## 3 3 c 1
rbind(df, data.frame(x = 10, y = "z"))
## x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 10 z
When combining column-wise, the number of rows must match, but row names are ignored. When combining row-wise, both the number and names of columns must match. Use plyr::rbind.fill()
to combine data frames that don’t have the same columns.
It’s a common mistake to try and create a data frame by cbind()
ing vectors together. This doesn’t work because cbind()
will create a matrix unless one of the arguments is already a data frame. Instead use data.frame()
directly:
bad <- data.frame(cbind(a = 1:2, b = c("a", "b")))
str(bad)
good <- data.frame(a = 1:2, b = c("a", "b"),
stringsAsFactors = FALSE)
str(good)
Since a data frame is a list of vectors, it is possible for a data frame to have a column that is a list:
df <- data.frame(x = 1:3)
df$y <- list(1:2, 1:3, 1:4)
str(df)
## 'data.frame': 3 obs. of 2 variables:
## $ x: int 1 2 3
## $ y:List of 3
## ..$ : int 1 2
## ..$ : int 1 2 3
## ..$ : int 1 2 3 4
or
dfl <- data.frame(x = 1:3, y = I(list(1:2, 1:3, 1:4)))
str(dfl)
dfl[2, "y"]
Use list and array columns with caution: many functions that work with data frames assume that all columns are atomic vectors.
You can coerce an object to a data frame with as.data.frame()
:
A vector will create a one-column data frame.
A list will create one column for each element; it’s an error if they’re not all the same length.
A matrix will create a data frame with the same number of columns and rows.
m <- matrix(1:6, ncol = 3, nrow = 2)
d <- data.frame(x = 1:3,y = c("a", "b", "c"),stringsAsFactors = FALSE)
as.data.frame(m)
## V1 V2 V3
## 1 1 3 5
## 2 2 4 6
as.matrix(d)
## x y
## [1,] "1" "a"
## [2,] "2" "b"
## [3,] "3" "c"
as.character(m)
## [1] "1" "2" "3" "4" "5" "6"
as.numeric(m)
## [1] 1 2 3 4 5 6
as.list(m)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4
##
## [[5]]
## [1] 5
##
## [[6]]
## [1] 6
as.list(d)
## $x
## [1] 1 2 3
##
## $y
## [1] "a" "b" "c"
v <- c(1,9,5)
class(as.matrix(v))
## [1] "matrix"
class(as.array(v))
## [1] "array"
class(as.data.frame(v))
## [1] "data.frame"
ls() # list current objects
rm(object) # delete an object
x <- head(mtcars) # print first 6 rows of mydata
tail(mtcars)
newobject <- edit(object) # edit copy and save as newobject
fix(object) # edit in place