Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- userID browser slot n_clicks n_queries n_nonclk_queries
- 0 1 Browser #2 exp 23 32 19
- 1 3 Browser #4 exp 3 4 2
- 2 5 Browser #4 exp 29 35 16
- 3 6 Browser #4 control 12 6 0
- 4 7 Browser #4 exp 54 68 30
- exp_clicks = data[data["slot"] == 'exp'].n_clicks
- cntrl_clicks = data[data["slot"] == 'control'].n_clicks
- stats.mannwhitneyu(exp_clicks, cntrl_clicks)
- In [221]: df
- Out[221]:
- userID browser slot n_clicks n_queries n_nonclk_queries
- 0 1 Browser #2 exp 23 32 19
- 1 1 Browser #33 exp 100 100 100
- 2 3 Browser #4 exp 3 4 2
- 3 5 Browser #4 exp 29 35 16
- 4 6 Browser #4 control 12 6 0
- In [222]: df.groupby('userID', as_index=False).sum()
- Out[222]:
- userID n_clicks n_queries n_nonclk_queries
- 0 1 123 132 119
- 1 3 3 4 2
- 2 5 29 35 16
- 3 6 12 6 0
- In [2]: df
- Out[2]:
- userID browser slot n_clicks n_queries n_nonclk_queries
- 0 1 Browser #2 exp 23 32 19
- 1 1 Browser #22 exp 100 100 100
- 2 1 Browser #33 control 200 200 200
- 3 3 Browser #4 exp 3 4 2
- 4 5 Browser #4 exp 29 35 16
- 5 6 Browser #4 control 12 6 0
- In [5]: df.groupby(['userID','slot'], as_index=False).sum()
- Out[5]:
- userID slot n_clicks n_queries n_nonclk_queries
- 0 1 control 200 200 200
- 1 1 exp 123 132 119
- 2 3 exp 3 4 2
- 3 5 exp 29 35 16
- 4 6 control 12 6 0
- In [24]: funcs = {c:'sum' for c in df.select_dtypes(include='number').drop('userID',1).columns}
- In [25]: funcs
- Out[25]: {'n_clicks': 'sum', 'n_queries': 'sum', 'n_nonclk_queries': 'sum'}
- In [26]: funcs['slot'] = lambda x: x.values.tolist()
- In [27]: df.groupby('userID', as_index=False).agg(funcs)
- Out[27]:
- userID n_clicks n_queries n_nonclk_queries slot
- 0 1 323 332 319 [exp, exp, control]
- 1 3 3 4 2 [exp]
- 2 5 29 35 16 [exp]
- 3 6 12 6 0 [control]
- In [28]: funcs['slot'] = 'first'
- In [29]: funcs
- Out[29]:
- {'n_clicks': 'sum',
- 'n_queries': 'sum',
- 'n_nonclk_queries': 'sum',
- 'slot': 'first'}
- In [30]: df.groupby('userID', as_index=False).agg(funcs)
- Out[30]:
- userID n_clicks n_queries n_nonclk_queries slot
- 0 1 323 332 319 exp
- 1 3 3 4 2 exp
- 2 5 29 35 16 exp
- 3 6 12 6 0 control
Add Comment
Please, Sign In to add comment