Guest User

Untitled

a guest
Dec 11th, 2018
65
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.35 KB | None | 0 0
  1. userID browser slot n_clicks n_queries n_nonclk_queries
  2. 0 1 Browser #2 exp 23 32 19
  3. 1 3 Browser #4 exp 3 4 2
  4. 2 5 Browser #4 exp 29 35 16
  5. 3 6 Browser #4 control 12 6 0
  6. 4 7 Browser #4 exp 54 68 30
  7.  
  8. exp_clicks = data[data["slot"] == 'exp'].n_clicks
  9. cntrl_clicks = data[data["slot"] == 'control'].n_clicks
  10.  
  11. stats.mannwhitneyu(exp_clicks, cntrl_clicks)
  12.  
  13. In [221]: df
  14. Out[221]:
  15. userID browser slot n_clicks n_queries n_nonclk_queries
  16. 0 1 Browser #2 exp 23 32 19
  17. 1 1 Browser #33 exp 100 100 100
  18. 2 3 Browser #4 exp 3 4 2
  19. 3 5 Browser #4 exp 29 35 16
  20. 4 6 Browser #4 control 12 6 0
  21.  
  22. In [222]: df.groupby('userID', as_index=False).sum()
  23. Out[222]:
  24. userID n_clicks n_queries n_nonclk_queries
  25. 0 1 123 132 119
  26. 1 3 3 4 2
  27. 2 5 29 35 16
  28. 3 6 12 6 0
  29.  
  30. In [2]: df
  31. Out[2]:
  32. userID browser slot n_clicks n_queries n_nonclk_queries
  33. 0 1 Browser #2 exp 23 32 19
  34. 1 1 Browser #22 exp 100 100 100
  35. 2 1 Browser #33 control 200 200 200
  36. 3 3 Browser #4 exp 3 4 2
  37. 4 5 Browser #4 exp 29 35 16
  38. 5 6 Browser #4 control 12 6 0
  39.  
  40. In [5]: df.groupby(['userID','slot'], as_index=False).sum()
  41. Out[5]:
  42. userID slot n_clicks n_queries n_nonclk_queries
  43. 0 1 control 200 200 200
  44. 1 1 exp 123 132 119
  45. 2 3 exp 3 4 2
  46. 3 5 exp 29 35 16
  47. 4 6 control 12 6 0
  48.  
  49. In [24]: funcs = {c:'sum' for c in df.select_dtypes(include='number').drop('userID',1).columns}
  50.  
  51. In [25]: funcs
  52. Out[25]: {'n_clicks': 'sum', 'n_queries': 'sum', 'n_nonclk_queries': 'sum'}
  53.  
  54. In [26]: funcs['slot'] = lambda x: x.values.tolist()
  55.  
  56. In [27]: df.groupby('userID', as_index=False).agg(funcs)
  57. Out[27]:
  58. userID n_clicks n_queries n_nonclk_queries slot
  59. 0 1 323 332 319 [exp, exp, control]
  60. 1 3 3 4 2 [exp]
  61. 2 5 29 35 16 [exp]
  62. 3 6 12 6 0 [control]
  63.  
  64. In [28]: funcs['slot'] = 'first'
  65.  
  66. In [29]: funcs
  67. Out[29]:
  68. {'n_clicks': 'sum',
  69. 'n_queries': 'sum',
  70. 'n_nonclk_queries': 'sum',
  71. 'slot': 'first'}
  72.  
  73. In [30]: df.groupby('userID', as_index=False).agg(funcs)
  74. Out[30]:
  75. userID n_clicks n_queries n_nonclk_queries slot
  76. 0 1 323 332 319 exp
  77. 1 3 3 4 2 exp
  78. 2 5 29 35 16 exp
  79. 3 6 12 6 0 control
Add Comment
Please, Sign In to add comment