Dear all WISE R Club members. :smile: Here comes the solution to assignment2. Because assignment 2 is more difficult than assignment 1, I advise you to read this solution manual more carefully to make sure you understand every single code.


Question 1

I define the factorial function in this way.

Fac <- function(n) if (n==1|n==0) 1 else n*Fac(n-1)

Now test it!

c(Fac(0),Fac(3),Fac(5))
## [1]   1   6 120

Question 2

To start with, I read the grades.txt dataset to create data frame.

grades <- read.table("grades.txt", header=TRUE,stringsAsFactors=FALSE)

Question 2.1

To plot the boxplot, we should reshape the data frame grades at first.

library(reshape2)
gradex <- melt(grades,id="name")

We can see what is in the new data frame gradex.

head(gradex)
##     name variable value
## 1 丁聪华     Math    63
## 2 孙蝶妃     Math    73
## 3 许娇翔     Math    78
## 4 崔子希     Math    74
## 5 樊瑶芳     Math    81
## 6 阮恭琴     Math    88
str(gradex)
## 'data.frame':    900 obs. of  3 variables:
##  $ name    : chr  "丁聪华" "孙蝶妃" "许娇翔" "崔子希" ...
##  $ variable: Factor w/ 3 levels "Math","Chinese",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ value   : int  63 73 78 74 81 88 84 96 78 96 ...

Then we plot the boxplot just like Figure 6.12.

boxplot(value ~ variable, data=gradex, col="lightblue",
        main="The boxplot of the test scores", 
        xlab="Subject", ylab="Test Scores")

the boxplot of test scores

Question 2.2

To calculate Sum and get the 80th, 60th, 40th and 20th percentile of it,

Sum <- rowSums(grades[,2:4])
x <- quantile(Sum, c(0.8,0.6,0.4,0.2))

To grade the students,

grades <- cbind(grades, Sum)
grades$grade[Sum >= x[1]] <-"A+"
grades$grade[Sum < x[1] & Sum >= x[2]] <-"A"
grades$grade[Sum < x[2] & Sum >= x[3]] <-"B"
grades$grade[Sum < x[3] & Sum >= x[4]] <-"C"
grades$grade[Sum < x[4]] <-"D" 
str(grades)
## 'data.frame':    300 obs. of  6 variables:
##  $ name   : chr  "丁聪华" "孙蝶妃" "许娇翔" "崔子希" ...
##  $ Math   : int  63 73 78 74 81 88 84 96 78 96 ...
##  $ Chinese: int  80 81 88 76 84 83 85 89 82 76 ...
##  $ English: int  77 74 82 86 86 92 92 98 81 91 ...
##  $ Sum    : num  220 228 248 236 251 263 261 283 241 263 ...
##  $ grade  : chr  "D" "D" "B" "D" ...

Question 2.3

Grade <- function(nm){
  names <- grades$name
  grades$grade[nm==names]
}
## see a few examples
c(Grade("王书棋"), Grade("姚仕能"), Grade("夏 良"))
## [1] "C" "B" "D"

Question 2.4

Now I draw the pie chart of the grade.

y <- table(grades$grade)
lb <- paste(names(y), " ", y, sep="")
pie(y, labels=lb, main= "Pie chart of the grade \n (with sample size)")
box()

the pie chart of grade

Question 2.5

Now calculate the mean and standard error for each subject in each group.

options(digits=2)
dstats <- function(x)(c(mean=mean(x), sd=sd(x)))
aggregate(grades[,2:4], by=list(grade=grades$grade), dstats)
##   grade Math.mean Math.sd Chinese.mean Chinese.sd English.mean English.sd
## 1     A      83.9     6.7         82.2        5.6         87.3        5.6
## 2    A+      90.1     6.4         83.3        4.3         92.4        5.3
## 3     B      80.6     5.7         80.9        5.7         84.8        4.7
## 4     C      76.5     5.9         82.1        4.7         81.5        5.1
## 5     D      70.7     6.0         81.6        5.4         74.7        6.5

Question 3

Question 3.1

Regress Math on Chinese and English.

fit<-lm(Math~Chinese+English,data=grades)
summary(fit)
## 
## Call:
## lm(formula = Math ~ Chinese + English, data = grades)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20.59  -5.53  -0.32   5.50  20.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  80.0629     9.6124    8.33  3.0e-15 ***
## Chinese      -0.3751     0.0919   -4.08  5.7e-05 ***
## English       0.3719     0.0596    6.24  1.5e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.2 on 297 degrees of freedom
## Multiple R-squared:  0.176,  Adjusted R-squared:  0.171 
## F-statistic: 31.8 on 2 and 297 DF,  p-value: 3.14e-13
par(mfrow=c(2,2))
plot(fit)

plot of chunk regression

Question 3.2

Now I plot a histogram of the studentized residuals below and superimposes a normal curve, kernel density curve and rug plot.

z <- rstudent(fit)
hist(z, breaks=13, freq=FALSE,ylim=c(0,0.43), xlab="Studentized Residual",main="Distribution of Errors")
rug(jitter(z), col="brown")
curve(dnorm(x, mean=mean(z), sd=sd(z)), add=TRUE, col="blue", lwd=2)
lines(density(z)$x, density(z)$y, col="red", lwd=2, lty=2)
legend("topright",legend = c( "Normal Curve", "Kernel Density Curve"),lty=1:2, col=c("blue","red"), cex=1)
box()

plot of chunk plotresidual

 

Return to the homepage.


WISE R Club project is proudly maintained by XiaojunSun.