Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ---
- title: "Impute missing fish lengths"
- output: html_document
- ---
- ```{r}
- library(tidyr)
- library(fitdistrplus)
- library(dplyr) # Package load order matters here
- ```
- # Make fake data
- ```{r}
- fish_data <- data.frame(site = 1,
- species = sample(c("A", "B", "C"), 50, TRUE),
- stringsAsFactors = FALSE)
- species_means <- data.frame(species = c("A", "B", "C"),
- mean_length = c(10, 20, 30),
- stringsAsFactors = FALSE)
- ```
- To simulate the NA values you have, I use a `mutate` call with `ifelse()` to randomly set some lengths to NA.
- ```{r}
- fish_data_s <- fish_data %>%
- left_join(species_means, "species") %>%
- group_by(species) %>%
- mutate(length = round(rnorm(n(), mean_length, 2), 2)) %>%
- select(-mean_length) %>%
- mutate(length = ifelse(runif(n()) > 0.90, NA, length))
- ```
- # Fit a normal to each sites & species
- I use the `fitdist` function to fit a univariate normal for each site and species, do some black magic to expand the result of the model fit into columns for later use, and then I tidy up the result.
- ```{r, warning=FALSE}
- models <- fish_data_s %>%
- drop_na() %>%
- group_by(site, species) %>%
- do(data.frame(t(unlist(fitdist(.$length, "norm"))))) %>%
- select(site, species, estimate.mean, estimate.sd)
- ```
- # Impute values for the missing values
- Now that I have fitted normal distributions for each sites and species, I use another `mutate` with `ifelse` to draw a random sample from the estimated parameters from the normal distribution for each site and species.
- ```{r}
- fish_data_s %>%
- left_join(models, c("site", "species")) %>%
- group_by(site, species) %>%
- mutate(imputed_length = ifelse(is.na(length),
- rnorm(n(), as.numeric(estimate.mean), as.numeric(estimate.sd)),
- NA)) %>%
- select(-estimate.mean, -estimate.sd)
- ```
Add Comment
Please, Sign In to add comment