Advertisement
Guest User

Untitled

a guest
Jul 28th, 2017
70
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.78 KB | None | 0 0
  1. ---
  2. title: "A new analysis workflow"
  3. output: github_document
  4. ---
  5.  
  6. # Organize your data processing program with MECE pieces
  7.  
  8. *MECE = Mutually exclusive, collectively exhaustive. From McKinsey*
  9.  
  10. ## Summary
  11.  
  12. When processing an input dataset, instead of creating many copies of it with
  13. names like data1, data2, data3, which has its problems, instead create mutually
  14. exclusive pieces, and then just merge them together at the end.
  15.  
  16.  
  17. # Details
  18.  
  19. You often input a dataset and then need to manipulate it
  20.  
  21. This is often how you are taught this in school.
  22.  
  23. This is sort of what that looks like:
  24.  
  25. ```{r}
  26. asl <- read.csv("asl.csv")
  27.  
  28. asl1 <- asl %>% mutate(newvar=oldvar/12)
  29.  
  30. asl2 <- asl1 %>% mutate(usubjid = pt)
  31.  
  32. asl_final <- asl2
  33. ```
  34.  
  35. Problems with this approach:
  36.  
  37. * it is hard to keep track of all of these pieces
  38. * if things change, you have to rename all the numbers
  39.  
  40. # A better approach
  41.  
  42. For a better approach, at each step, create a mutually exclusive data frame that contains
  43. only what you need, and at the end, merge all the pieces together. Use
  44. informative names for these pieces.
  45.  
  46. advantages
  47.  
  48. * no need to constantly reorder and rename pieces that end in numbers
  49.  
  50.  
  51. Here is a real example:
  52.  
  53.  
  54. ```{r}
  55. asl <- get_csv("data/clinical/asl.csv")
  56.  
  57. ## STEP1 : process the input dataset using MECE pieces
  58.  
  59. ## one piece:
  60. asl_study_flags <- asl %>%
  61. select(usubjid, studyid) %>%
  62. mutate(...)
  63. select(-studyid)
  64.  
  65. ## another piece:
  66. asl_new_censor_vars <- asl %>%
  67. select(usubjid, oscnsr, pfscnsr) %>%
  68. mutate(...)
  69.  
  70. ## another piece:
  71. asl_biomarker_flags <- asl0 %>%
  72. select(usubjid) %>%
  73. left_join(...)
  74. mutate(...)
  75.  
  76. ## STEP 2: at the end, join the mutually exclusive pieces
  77. asl_edited <- asl %>%
  78. left_join(asl_study_flags) %>%
  79. left_join(asl_biomarker_flags) %>%
  80. left_join(asl_new_censor_vars)
  81. ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement