Biplots of Data.Rmd

---
title: "Biplots of Data"
output: html_notebook
---

Group Members: \
Rachel G:rachamin12@gmail.com,\
James Hick: redsoxfan765@gmail.com, \
Quientin Morrison: morriq@rpi.edu

### Import Libraries:
```{r}
library(ggbiplot)
library(readr)
library(devtools)
library(ggplot2)
```
### Read the Data:
This reads in published urls from google drive cooresponding to chemsdata <- chemsrus.csv and targetdata <- chemstest.csv\
The function suppressMessages() hides the parsing output; chemsdata and targetdata still exist in the coding enviornment

```{r}
chemsdata <- suppressMessages(read_csv(url("https://docs.google.com/spreadsheets/d/1arHUuWJrVjpZboLOJa97iIbPzCszX8stE-fbYhw2OCA/pub?gid=1533528387&single=true&output=csv")))

targetdata <- suppressMessages(read_csv(url("https://docs.google.com/spreadsheets/d/1NtoMaw06IlCDJ3k9Rxtcv01u2wUwdAh8D5rajytiNpQ/pub?gid=899540023&single=true&output=csv")))

```
### Biplots:
Create a set of data frames based on chemsrus.csv and chemstest.csv then perform pca analysis and kmeans analysis for colored biplots.\
The purpose of this is just to get us some extra points for visualization in case we miss out on small details. It's quick and easy

```{r}
b_chem <- as.data.frame(chemsdata[ , 2:(ncol(chemsdata)-1)])  ## Strips the firstand last columns and sets as a data frame
rownames(b_chem) <- t(chemsdata[ ,1]) #restores the rownames

b_target <- as.data.frame(targetdata[ ,2:(ncol(targetdata)-1)])
rownames(b_target) <- t(targetdata[ ,1])

k_chem <- kmeans(b_chem,2,nstart=5)     #I choose 2 clusters given biodegradibility
k_target <- kmeans(b_target,2,nstart=5)

p_chem <- prcomp(b_chem,retx=TRUE,center=TRUE,scale.=TRUE) # run principal component analysis on the modified chemsrus.csv data
p_target <- prcomp(b_target,retx=TRUE,center=TRUE,scale.=TRUE)

plot_1 <- ggbiplot(p_chem,choices= 1:2,
                   labels=rownames(b_chem),
                   labels.size=1.5,
                   obs.scale=1,
                   group=k_chem$cluster,
                   aplha=0,
                   var.axes=TRUE,
                   var.scale=1,
                   varname.size=2.5,
                   varname.adjust=6.0)

plot_1 + coord_cartesian(xlim=-11:13,ylim=-8:16) + ggtitle("Colored Biplot of chemsrus.csv") + scale_color_gradientn(colors=rainbow(2))

plot_2 <- ggbiplot(p_target,choices= 1:2,
                   labels=rownames(b_target),
                   labels.size=1.5,
                   obs.scale=1,
                   group=k_target$cluster,
                   aplha=0,
                   var.axes=TRUE,
                   var.scale=1,
                   varname.size=2.0,
                   varname.adjust=4.0)

plot_2 + coord_cartesian(xlim=-8:10,ylim=-6:5) + ggtitle("Colored Biplot of chemstest.csv") + scale_color_gradientn(colors=rainbow(2))

plot_2 + coord_cartesian(xlim=-1:1, ylim= -14:-12) + ggtitle("Outliers of chemstest.csv")+ scale_color_gradientn(colors=rainbow(2))


```