This post is a demonstration of an exported Rmarkdown document that I made for one of my undergraduate assignments, converted into a blog-post form via the R package hugodown.

Question 8a

import the College.csv dataset using read.csv

getwd()
#> [1] "/home/richard/Insync/hochuan97@gmail.com/Google Drive/1School/2122Sem1/ST3248/Homework questions/Tutorial1"
setwd("/home/richard/Insync/hochuan97@gmail.com/Google Drive/1School/2122Sem1/ST3248/Homework questions/Tutorial1")
college <- read.csv("College.csv")
head(college)
#>                              X Private Apps Accept Enroll Top10perc Top25perc
#> 1 Abilene Christian University     Yes 1660   1232    721        23        52
#> 2           Adelphi University     Yes 2186   1924    512        16        29
#> 3               Adrian College     Yes 1428   1097    336        22        50
#> 4          Agnes Scott College     Yes  417    349    137        60        89
#> 5    Alaska Pacific University     Yes  193    146     55        16        44
#> 6            Albertson College     Yes  587    479    158        38        62
#>   F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal
#> 1        2885         537     7440       3300   450     2200  70       78
#> 2        2683        1227    12280       6450   750     1500  29       30
#> 3        1036          99    11250       3750   400     1165  53       66
#> 4         510          63    12960       5450   450      875  92       97
#> 5         249         869     7560       4120   800     1500  76       72
#> 6         678          41    13500       3335   500      675  67       73
#>   S.F.Ratio perc.alumni Expend Grad.Rate
#> 1      18.1          12   7041        60
#> 2      12.2          16  10527        56
#> 3      12.9          30   8735        54
#> 4       7.7          37  19016        59
#> 5      11.9           2  10922        15
#> 6       9.4          11   9727        55

Question 8b

renaming the rows based on the first column of the dataset

rownames(college) <- college[, 1] # select all rows of first col
college <- college[, -1]
head(college)
#>                              Private Apps Accept Enroll Top10perc Top25perc
#> Abilene Christian University     Yes 1660   1232    721        23        52
#> Adelphi University               Yes 2186   1924    512        16        29
#> Adrian College                   Yes 1428   1097    336        22        50
#> Agnes Scott College              Yes  417    349    137        60        89
#> Alaska Pacific University        Yes  193    146     55        16        44
#> Albertson College                Yes  587    479    158        38        62
#>                              F.Undergrad P.Undergrad Outstate Room.Board Books
#> Abilene Christian University        2885         537     7440       3300   450
#> Adelphi University                  2683        1227    12280       6450   750
#> Adrian College                      1036          99    11250       3750   400
#> Agnes Scott College                  510          63    12960       5450   450
#> Alaska Pacific University            249         869     7560       4120   800
#> Albertson College                    678          41    13500       3335   500
#>                              Personal PhD Terminal S.F.Ratio perc.alumni Expend
#> Abilene Christian University     2200  70       78      18.1          12   7041
#> Adelphi University               1500  29       30      12.2          16  10527
#> Adrian College                   1165  53       66      12.9          30   8735
#> Agnes Scott College               875  92       97       7.7          37  19016
#> Alaska Pacific University        1500  76       72      11.9           2  10922
#> Albertson College                 675  67       73       9.4          11   9727
#>                              Grad.Rate
#> Abilene Christian University        60
#> Adelphi University                  56
#> Adrian College                      54
#> Agnes Scott College                 59
#> Alaska Pacific University           15
#> Albertson College                   55

comments: the first column is now Private, and each row is now named with the university

Question 8c i

Produce numerical summary of variables in the data set

summary(college)
#>    Private               Apps           Accept          Enroll    
#>  Length:777         Min.   :   81   Min.   :   72   Min.   :  35  
#>  Class :character   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242  
#>  Mode  :character   Median : 1558   Median : 1110   Median : 434  
#>                     Mean   : 3002   Mean   : 2019   Mean   : 780  
#>                     3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902  
#>                     Max.   :48094   Max.   :26330   Max.   :6392  
#>    Top10perc       Top25perc      F.Undergrad     P.Undergrad     
#>  Min.   : 1.00   Min.   :  9.0   Min.   :  139   Min.   :    1.0  
#>  1st Qu.:15.00   1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0  
#>  Median :23.00   Median : 54.0   Median : 1707   Median :  353.0  
#>  Mean   :27.56   Mean   : 55.8   Mean   : 3700   Mean   :  855.3  
#>  3rd Qu.:35.00   3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0  
#>  Max.   :96.00   Max.   :100.0   Max.   :31643   Max.   :21836.0  
#>     Outstate       Room.Board       Books           Personal   
#>  Min.   : 2340   Min.   :1780   Min.   :  96.0   Min.   : 250  
#>  1st Qu.: 7320   1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850  
#>  Median : 9990   Median :4200   Median : 500.0   Median :1200  
#>  Mean   :10441   Mean   :4358   Mean   : 549.4   Mean   :1341  
#>  3rd Qu.:12925   3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700  
#>  Max.   :21700   Max.   :8124   Max.   :2340.0   Max.   :6800  
#>       PhD            Terminal       S.F.Ratio      perc.alumni   
#>  Min.   :  8.00   Min.   : 24.0   Min.   : 2.50   Min.   : 0.00  
#>  1st Qu.: 62.00   1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00  
#>  Median : 75.00   Median : 82.0   Median :13.60   Median :21.00  
#>  Mean   : 72.66   Mean   : 79.7   Mean   :14.09   Mean   :22.74  
#>  3rd Qu.: 85.00   3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00  
#>  Max.   :103.00   Max.   :100.0   Max.   :39.80   Max.   :64.00  
#>      Expend        Grad.Rate     
#>  Min.   : 3186   Min.   : 10.00  
#>  1st Qu.: 6751   1st Qu.: 53.00  
#>  Median : 8377   Median : 65.00  
#>  Mean   : 9660   Mean   : 65.46  
#>  3rd Qu.:10830   3rd Qu.: 78.00  
#>  Max.   :56233   Max.   :118.00

Question 8c ii

Produce scatterplot matrix of first 10 columns

college$Private <- as.factor(college$Private) # turning Private to factor
pairs(college[,1:10]) 

Question 8c iii

boxplots of outstate Outstate verses Private

plot(college$Private, college$Outstate,
     xlab = 'Private',
     ylab = 'Outstate')

Question 8c iv

creating new qualitative variable Elite

Elite <- rep("No", nrow(college)) # create repeat vector of no's, for number of rows in college
Elite[college$Top10perc > 50] <- "Yes" # replace any No with Yes conditionally for the row
Elite <- as.factor(Elite) # return as factor instead of numeric
college <- data.frame(college , Elite) # append to dataframe 

check how many elite universities

summary(college$Elite) # 78 elite universities
#>  No Yes 
#> 699  78

comments: there are 78 elite universities

boxplot of Outstate vs Elite

plot(college$Elite, college$Outstate,
     xlab = 'Elite',
     ylab = 'Outstate')

Question 8c v

Create histograms for a few quantitative variables with differing number of bins

par(mfrow = c(2,2)) # set plot into 4 quadrants
# Apps
hist(college$Apps, breaks=10,main = "Application histogram, bin 10")  
hist(college$Apps, breaks=50,main = "Application histogram, bin 50")  
hist(college$Apps, breaks=100, main = "Application histogram, bin 100")
hist(college$Apps, breaks=500, main = "Application histogram, bin 500")

# Top25perc
hist(college$Top25perc, breaks=10, main = "Top 25 histogram, bin 10") 
hist(college$Top25perc, breaks=50, main = "Top 25 histogram, bin 50") 
hist(college$Top25perc, breaks=100,main = "Top 25 histogram, bin 100")
hist(college$Top25perc, breaks=500,main = "Top 25 histogram, bin 500")

# Enroll
hist(college$PhD, breaks=10,  main = "PhD histogram, bin 10") 
hist(college$PhD, breaks=50,  main = "PhD histogram, bin 50")  
hist(college$PhD, breaks=100, main = "PhD histogram, bin 100") 
hist(college$PhD, breaks=500, main = "PhD histogram, bin 500")

Question 8c vi

Continued exploration.

By observing the scatterplot, we can see several associated variables. Apps, Accept, Enroll, F.Undergrad are associated with each other. Top10perc and Top25perc are associated with each other.

Also, from the boxplots, we can also see that out-of-state (Outstate) tuition is higher for Private universities, and also for elite universities. We can see that out-of-state tuition can be explained partly by whether the university is private, or whether it is an elite university.