Advertisement
jack06215

[pandas] find missing data in each row

Sep 26th, 2020
151
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.68 KB | None | 0 0
  1. import pandas as pd
  2. import numpy as np
  3.  
  4. def find_nan_columns(df):
  5.   ix_row, ix_col = np.where(np.asanyarray(np.isnan(df)))
  6.   prods = pd.DataFrame({'row_idx':ix_row,
  7.                         'col_idx':ix_col})
  8.   b = [-np.inf] + np.arange(len(df.columns)).tolist() + [np.inf]
  9.   l = ['Error'] + df.columns.tolist()
  10.   prods['label'] = pd.cut(prods['col_idx'], bins=b, labels=l, right=False)
  11.   prods = prods.drop(['col_idx'], axis=1)
  12.  
  13.   keys, values = prods.sort_values('row_idx').values.T
  14.   ukeys, index = np.unique(keys, True)
  15.   arrays = np.split(values, index[1:])
  16.   df2 = pd.DataFrame({'Row': ukeys,
  17.                       'Missing data': [list(a) for a in arrays]})
  18.   return df2
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement