Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ---
- title: "Biplots of Data"
- output: html_notebook
- ---
- Group Members: \
- Rachel G:rachamin12@gmail.com,\
- James Hick: redsoxfan765@gmail.com, \
- Quientin Morrison: morriq@rpi.edu
- ### Import Libraries:
- ```{r}
- library(ggbiplot)
- library(readr)
- library(devtools)
- library(ggplot2)
- ```
- ### Read the Data:
- This reads in published urls from google drive cooresponding to chemsdata <- chemsrus.csv and targetdata <- chemstest.csv\
- The function suppressMessages() hides the parsing output; chemsdata and targetdata still exist in the coding enviornment
- ```{r}
- chemsdata <- suppressMessages(read_csv(url("https://docs.google.com/spreadsheets/d/1arHUuWJrVjpZboLOJa97iIbPzCszX8stE-fbYhw2OCA/pub?gid=1533528387&single=true&output=csv")))
- targetdata <- suppressMessages(read_csv(url("https://docs.google.com/spreadsheets/d/1NtoMaw06IlCDJ3k9Rxtcv01u2wUwdAh8D5rajytiNpQ/pub?gid=899540023&single=true&output=csv")))
- ```
- ### Biplots:
- Create a set of data frames based on chemsrus.csv and chemstest.csv then perform pca analysis and kmeans analysis for colored biplots.\
- The purpose of this is just to get us some extra points for visualization in case we miss out on small details. It's quick and easy
- ```{r}
- b_chem <- as.data.frame(chemsdata[ , 2:(ncol(chemsdata)-1)]) ## Strips the firstand last columns and sets as a data frame
- rownames(b_chem) <- t(chemsdata[ ,1]) #restores the rownames
- b_target <- as.data.frame(targetdata[ ,2:(ncol(targetdata)-1)])
- rownames(b_target) <- t(targetdata[ ,1])
- k_chem <- kmeans(b_chem,2,nstart=5) #I choose 2 clusters given biodegradibility
- k_target <- kmeans(b_target,2,nstart=5)
- p_chem <- prcomp(b_chem,retx=TRUE,center=TRUE,scale.=TRUE) # run principal component analysis on the modified chemsrus.csv data
- p_target <- prcomp(b_target,retx=TRUE,center=TRUE,scale.=TRUE)
- plot_1 <- ggbiplot(p_chem,choices= 1:2,
- labels=rownames(b_chem),
- labels.size=1.5,
- obs.scale=1,
- group=k_chem$cluster,
- aplha=0,
- var.axes=TRUE,
- var.scale=1,
- varname.size=2.5,
- varname.adjust=6.0)
- plot_1 + coord_cartesian(xlim=-11:13,ylim=-8:16) + ggtitle("Colored Biplot of chemsrus.csv") + scale_color_gradientn(colors=rainbow(2))
- plot_2 <- ggbiplot(p_target,choices= 1:2,
- labels=rownames(b_target),
- labels.size=1.5,
- obs.scale=1,
- group=k_target$cluster,
- aplha=0,
- var.axes=TRUE,
- var.scale=1,
- varname.size=2.0,
- varname.adjust=4.0)
- plot_2 + coord_cartesian(xlim=-8:10,ylim=-6:5) + ggtitle("Colored Biplot of chemstest.csv") + scale_color_gradientn(colors=rainbow(2))
- plot_2 + coord_cartesian(xlim=-1:1, ylim= -14:-12) + ggtitle("Outliers of chemstest.csv")+ scale_color_gradientn(colors=rainbow(2))
- ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement