Guest User

Untitled

a guest
Dec 11th, 2017
72
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.87 KB | None | 0 0
  1. ---
  2. title: "Impute missing fish lengths"
  3. output: html_document
  4. ---
  5.  
  6. ```{r}
  7. library(tidyr)
  8. library(fitdistrplus)
  9. library(dplyr) # Package load order matters here
  10. ```
  11.  
  12. # Make fake data
  13.  
  14. ```{r}
  15. fish_data <- data.frame(site = 1,
  16. species = sample(c("A", "B", "C"), 50, TRUE),
  17. stringsAsFactors = FALSE)
  18. species_means <- data.frame(species = c("A", "B", "C"),
  19. mean_length = c(10, 20, 30),
  20. stringsAsFactors = FALSE)
  21. ```
  22.  
  23. To simulate the NA values you have, I use a `mutate` call with `ifelse()` to randomly set some lengths to NA.
  24.  
  25. ```{r}
  26. fish_data_s <- fish_data %>%
  27. left_join(species_means, "species") %>%
  28. group_by(species) %>%
  29. mutate(length = round(rnorm(n(), mean_length, 2), 2)) %>%
  30. select(-mean_length) %>%
  31. mutate(length = ifelse(runif(n()) > 0.90, NA, length))
  32. ```
  33.  
  34. # Fit a normal to each sites & species
  35.  
  36. I use the `fitdist` function to fit a univariate normal for each site and species, do some black magic to expand the result of the model fit into columns for later use, and then I tidy up the result.
  37.  
  38. ```{r, warning=FALSE}
  39. models <- fish_data_s %>%
  40. drop_na() %>%
  41. group_by(site, species) %>%
  42. do(data.frame(t(unlist(fitdist(.$length, "norm"))))) %>%
  43. select(site, species, estimate.mean, estimate.sd)
  44. ```
  45.  
  46. # Impute values for the missing values
  47.  
  48. Now that I have fitted normal distributions for each sites and species, I use another `mutate` with `ifelse` to draw a random sample from the estimated parameters from the normal distribution for each site and species.
  49.  
  50. ```{r}
  51. fish_data_s %>%
  52. left_join(models, c("site", "species")) %>%
  53. group_by(site, species) %>%
  54. mutate(imputed_length = ifelse(is.na(length),
  55. rnorm(n(), as.numeric(estimate.mean), as.numeric(estimate.sd)),
  56. NA)) %>%
  57. select(-estimate.mean, -estimate.sd)
  58. ```
Add Comment
Please, Sign In to add comment