Advertisement
flutedaddyfunk

Biplots of Data.Rmd

Apr 23rd, 2017
547
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.95 KB | None | 0 0
  1. ---
  2. title: "Biplots of Data"
  3. output: html_notebook
  4. ---
  5.  
  6. Group Members: \
  7. Rachel G:rachamin12@gmail.com,\
  8. James Hick: redsoxfan765@gmail.com, \
  9. Quientin Morrison: morriq@rpi.edu
  10.  
  11. ### Import Libraries:
  12. ```{r}
  13. library(ggbiplot)
  14. library(readr)
  15. library(devtools)
  16. library(ggplot2)
  17. ```
  18. ### Read the Data:
  19. This reads in published urls from google drive cooresponding to chemsdata <- chemsrus.csv and targetdata <- chemstest.csv\
  20. The function suppressMessages() hides the parsing output; chemsdata and targetdata still exist in the coding enviornment
  21.  
  22. ```{r}
  23. chemsdata <- suppressMessages(read_csv(url("https://docs.google.com/spreadsheets/d/1arHUuWJrVjpZboLOJa97iIbPzCszX8stE-fbYhw2OCA/pub?gid=1533528387&single=true&output=csv")))
  24.  
  25. targetdata <- suppressMessages(read_csv(url("https://docs.google.com/spreadsheets/d/1NtoMaw06IlCDJ3k9Rxtcv01u2wUwdAh8D5rajytiNpQ/pub?gid=899540023&single=true&output=csv")))
  26.  
  27. ```
  28. ### Biplots:
  29. Create a set of data frames based on chemsrus.csv and chemstest.csv then perform pca analysis and kmeans analysis for colored biplots.\
  30. The purpose of this is just to get us some extra points for visualization in case we miss out on small details. It's quick and easy
  31.  
  32. ```{r}
  33. b_chem <- as.data.frame(chemsdata[ , 2:(ncol(chemsdata)-1)]) ## Strips the firstand last columns and sets as a data frame
  34. rownames(b_chem) <- t(chemsdata[ ,1]) #restores the rownames
  35.  
  36. b_target <- as.data.frame(targetdata[ ,2:(ncol(targetdata)-1)])
  37. rownames(b_target) <- t(targetdata[ ,1])
  38.  
  39. k_chem <- kmeans(b_chem,2,nstart=5) #I choose 2 clusters given biodegradibility
  40. k_target <- kmeans(b_target,2,nstart=5)
  41.  
  42. p_chem <- prcomp(b_chem,retx=TRUE,center=TRUE,scale.=TRUE) # run principal component analysis on the modified chemsrus.csv data
  43. p_target <- prcomp(b_target,retx=TRUE,center=TRUE,scale.=TRUE)
  44.  
  45. plot_1 <- ggbiplot(p_chem,choices= 1:2,
  46. labels=rownames(b_chem),
  47. labels.size=1.5,
  48. obs.scale=1,
  49. group=k_chem$cluster,
  50. aplha=0,
  51. var.axes=TRUE,
  52. var.scale=1,
  53. varname.size=2.5,
  54. varname.adjust=6.0)
  55.  
  56. plot_1 + coord_cartesian(xlim=-11:13,ylim=-8:16) + ggtitle("Colored Biplot of chemsrus.csv") + scale_color_gradientn(colors=rainbow(2))
  57.  
  58. plot_2 <- ggbiplot(p_target,choices= 1:2,
  59. labels=rownames(b_target),
  60. labels.size=1.5,
  61. obs.scale=1,
  62. group=k_target$cluster,
  63. aplha=0,
  64. var.axes=TRUE,
  65. var.scale=1,
  66. varname.size=2.0,
  67. varname.adjust=4.0)
  68.  
  69. plot_2 + coord_cartesian(xlim=-8:10,ylim=-6:5) + ggtitle("Colored Biplot of chemstest.csv") + scale_color_gradientn(colors=rainbow(2))
  70.  
  71. plot_2 + coord_cartesian(xlim=-1:1, ylim= -14:-12) + ggtitle("Outliers of chemstest.csv")+ scale_color_gradientn(colors=rainbow(2))
  72.  
  73.  
  74. ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement