This post is a demonstration of an exported Rmarkdown document that I made for one of my undergraduate assignments, converted into a blog-post form via the R package hugodown.

Question 8a

import the College.csv dataset using read.csv

college <- read.csv("College.csv")
Question 8b

renaming the rows based on the first column of the dataset

rownames(college) <- college[, 1] # select all rows of first col
college <- college[, -1]
comments: the first column is now Private, and each row is now named with the university

Question 8c i

Produce numerical summary of variables in the data set

Question 8c ii

Produce scatterplot matrix of first 10 columns

college$Private <- as.factor(college$Private) # turning Private to factor

Question 8c iii

boxplots of outstate Outstate verses Private

plot(college$Private, college$Outstate,
     xlab = 'Private',
     ylab = 'Outstate')

Question 8c iv

creating new qualitative variable Elite

Elite <- rep("No", nrow(college)) # create repeat vector of no's, for number of rows in college
Elite[college$Top10perc > 50] <- "Yes" # replace any No with Yes conditionally for the row
Elite <- as.factor(Elite) # return as factor instead of numeric
college <- data.frame(college , Elite) # append to dataframe 

check how many elite universities

summary(college$Elite) # 78 elite universities
comments: there are 78 elite universities

boxplot of Outstate vs Elite

plot(college$Elite, college$Outstate,
     xlab = 'Elite',
     ylab = 'Outstate')

Question 8c v

Create histograms for a few quantitative variables with differing number of bins

par(mfrow = c(2,2)) # set plot into 4 quadrants
# Apps
hist(college$Apps, breaks=10,main = "Application histogram, bin 10")  
hist(college$Apps, breaks=50,main = "Application histogram, bin 50")  
hist(college$Apps, breaks=100, main = "Application histogram, bin 100")
hist(college$Apps, breaks=500, main = "Application histogram, bin 500")

# Top25perc
hist(college$Top25perc, breaks=10, main = "Top 25 histogram, bin 10") 
hist(college$Top25perc, breaks=50, main = "Top 25 histogram, bin 50") 
hist(college$Top25perc, breaks=100,main = "Top 25 histogram, bin 100")
hist(college$Top25perc, breaks=500,main = "Top 25 histogram, bin 500")

# Enroll
hist(college$PhD, breaks=10,  main = "PhD histogram, bin 10") 
hist(college$PhD, breaks=50,  main = "PhD histogram, bin 50")  
hist(college$PhD, breaks=100, main = "PhD histogram, bin 100") 
hist(college$PhD, breaks=500, main = "PhD histogram, bin 500")

Question 8c vi

Continued exploration.

By observing the scatterplot, we can see several associated variables. Apps, Accept, Enroll, F.Undergrad are associated with each other. Top10perc and Top25perc are associated with each other.

Also, from the boxplots, we can also see that out-of-state (Outstate) tuition is higher for Private universities, and also for elite universities. We can see that out-of-state tuition can be explained partly by whether the university is private, or whether it is an elite university.