Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ---
- title: "A new analysis workflow"
- output: github_document
- ---
- # Organize your data processing program with MECE pieces
- *MECE = Mutually exclusive, collectively exhaustive. From McKinsey*
- ## Summary
- When processing an input dataset, instead of creating many copies of it with
- names like data1, data2, data3, which has its problems, instead create mutually
- exclusive pieces, and then just merge them together at the end.
- # Details
- You often input a dataset and then need to manipulate it
- This is often how you are taught this in school.
- This is sort of what that looks like:
- ```{r}
- asl <- read.csv("asl.csv")
- asl1 <- asl %>% mutate(newvar=oldvar/12)
- asl2 <- asl1 %>% mutate(usubjid = pt)
- asl_final <- asl2
- ```
- Problems with this approach:
- * it is hard to keep track of all of these pieces
- * if things change, you have to rename all the numbers
- # A better approach
- For a better approach, at each step, create a mutually exclusive data frame that contains
- only what you need, and at the end, merge all the pieces together. Use
- informative names for these pieces.
- advantages
- * no need to constantly reorder and rename pieces that end in numbers
- Here is a real example:
- ```{r}
- asl <- get_csv("data/clinical/asl.csv")
- ## STEP1 : process the input dataset using MECE pieces
- ## one piece:
- asl_study_flags <- asl %>%
- select(usubjid, studyid) %>%
- mutate(...)
- select(-studyid)
- ## another piece:
- asl_new_censor_vars <- asl %>%
- select(usubjid, oscnsr, pfscnsr) %>%
- mutate(...)
- ## another piece:
- asl_biomarker_flags <- asl0 %>%
- select(usubjid) %>%
- left_join(...)
- mutate(...)
- ## STEP 2: at the end, join the mutually exclusive pieces
- asl_edited <- asl %>%
- left_join(asl_study_flags) %>%
- left_join(asl_biomarker_flags) %>%
- left_join(asl_new_censor_vars)
- ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement