Guest User

Untitled

a guest
Feb 19th, 2018
83
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.75 KB | None | 0 0
  1. ---
  2. title: "Analysis test"
  3. output: html_document
  4. ---
  5.  
  6. ```{r, echo=FALSE, warning=FALSE}
  7. # This code chunk simply makes sure that all the libraries used here are installed, it will not be shown in the report (notice echo = FALSE).
  8. packages <- c("readxl", "knitr", "tidyr", "dplyr", "ggplot2", "plotly")
  9. if ( length(missing_pkgs <- setdiff(packages, rownames(installed.packages()))) > 0) {
  10. message("Installing missing package(s): ", paste(missing_pkgs, collapse = ", "))
  11. install.packages(missing_pkgs)
  12. }
  13. ```
  14.  
  15.  
  16. This is a simple example analysis of data including import from Excel, data structuring and plotting. The data in this case happens to be optical density data over time (replicate growth curves for a microorganism) but the nature of the data matters little to the basics introduced.
  17.  
  18. ## Import OD data
  19.  
  20. ```{r}
  21. library(readxl) # fast excel reader
  22. #library(googlesheets) # fast google spreadsheet reader (not used here but could be useful)
  23. data.raw <- read_excel("example.xlsx", skip = 1)
  24. ```
  25.  
  26. #### Show the raw data
  27.  
  28. ```{r}
  29. library(knitr) # the package that renders R markdown and has some good additional functionality
  30. kable(data.raw)
  31. ```
  32.  
  33. ### Restructuring the data
  34.  
  35. Turning the wide format excel data into *long* format. Note: here we make use of the pipe operator `%>%`, which just simplifies chaining operations.
  36.  
  37. ```{r}
  38. library(tidyr) # for restructuring data very easily
  39. data.long <- data.raw %>% gather(sample, OD600, -Time)
  40. # melt <- gather(raw, sample, OD600, -Time) # this would be identical without using %>%
  41. ```
  42.  
  43. Introducing time in hours.
  44.  
  45. ```{r}
  46. library(dplyr, warn.conflicts = FALSE) # powerful for doing calculations on data (by group, etc.)
  47. data.long <- data.long %>% mutate(time.hrs = as.numeric(Time - Time[1], units = "hours"))
  48. ```
  49.  
  50. First plot of all the data
  51.  
  52. ```{r}
  53. library(ggplot2) # powerful plotting package for aesthetics driven plotting
  54.  
  55. p1 <-
  56. ggplot(data.long) + # initiate plot
  57. aes(x = time.hrs, y = OD600, color = sample) + # setup aesthetic mappings
  58. geom_point(size = 5) # add points to plot
  59. print(p1) # output plot
  60. ```
  61.  
  62.  
  63. ### Combining data by adding sample meta information from the spreadsheet's second tab
  64.  
  65. ```{r}
  66. data.info <- read_excel("example.xlsx", sheet = "info")
  67. ```
  68.  
  69. Show all information (these are the experimental conditions for each sample)
  70.  
  71. ```{r}
  72. kable(data.info)
  73. ```
  74.  
  75. Combine OD data with sample information.
  76.  
  77. ```{r}
  78. data.all <- merge(data.long, data.info, by = "sample")
  79. ```
  80.  
  81. ### Show us the datas
  82.  
  83. Reuse same plot using `%+%` to substitute the original data set with a new one and changing the color to be determined based on the new information we added (but keep everything else about the plot the same).
  84.  
  85. ```{r}
  86. p1 %+% data.all %+% aes(color = substrate)
  87. ```
  88.  
  89. ### Summarize data
  90.  
  91. To make the figure a little bit easier to navigate, we're going to summarize the data for each condition (combine the replicates) and replot it with an error band showing the whole range of data points for each condition. We could reuse the plot `p1` again, but for clarity are constructing the plot from scratch instead.
  92.  
  93. ```{r}
  94. data.sum <- data.all %>%
  95. group_by(time.hrs, substrate) %>%
  96. summarize(
  97. OD600.avg = mean(OD600),
  98. OD600.min = min(OD600),
  99. OD600.max = max(OD600))
  100. data.sum %>% head() %>% kable() # show the first couple of lines
  101.  
  102. p2 <- ggplot(data.sum) + # initiate plot
  103. aes(x = time.hrs, y = OD600.avg, ymin = OD600.min, ymax = OD600.max,
  104. fill = substrate) + # setup global aesthetic mappings
  105. geom_ribbon(alpha = 0.3) + # value range (uses ymin and ymax, and fill for color)
  106. geom_line() + # connect averages (uses y)
  107. geom_point(shape = 21, size = 5) + # add points for averages (uses y and fill for color)
  108. theme_bw() + # style plot
  109. labs(title = "My plot", x = "Time [h]", y = "OD600", color = "Condition") # add labels
  110.  
  111. print(p2)
  112. ```
  113.  
  114. *Note that we could also have had ggplot do the whole statistical summarising for us using `stat_summary` but it's often helpful to have these values separately for other calcluations and purposes.*
  115.  
  116. Now could e.g. focus on a subset of data but reuse same plot using `%+%` to substitute the original data set with a new one (but keep everythign else about the plot the same).
  117.  
  118. ```{r}
  119. p2 %+% filter(data.sum, !grepl("background", substrate), time.hrs < 25)
  120. ```
  121.  
  122. Save this plot automatically as pdf by setting specific plot options in the r code chunk
  123.  
  124. ```{r this-is-my-plot, dev="pdf", fig.width=7, fig.height=5, fig.path="./"}
  125. print(p2)
  126. ```
  127.  
  128. #### Interactive plot
  129.  
  130. Last, you can make simple interactive (javascript) plots out of your original ggplots (plotly does not yet work great for all ggplot features but it's a start for easy visualization). You can of course construct plotly plots without ggplot for more customization too but that's for another time.
  131.  
  132. ```{r}
  133. library(plotly, warn.conflicts = FALSE)
  134. ggplotly(p1)
  135. ```
Add Comment
Please, Sign In to add comment