Advertisement
jack06215

[pandas] generate dataframe with nan

Sep 26th, 2020
121
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.47 KB | None | 0 0
  1. import pandas as pd
  2. import numpy as np
  3. from sklearn.datasets import make_classification
  4.  
  5. perc_nan = 0.3
  6. n_samples = 300000
  7. n_features = 5
  8.  
  9. df = make_classification(n_samples, n_features)
  10. df = pd.DataFrame(df[0], columns = ['DataPoint_{0}'.format(i) for i in range(n_features)])
  11. c = int(round(df.size * perc_nan))
  12. A = df.to_numpy()
  13. A.ravel()[np.random.choice(A.size, c, replace=False)] = np.nan
  14. colname = df.columns
  15. del df
  16. df = pd.DataFrame(A, columns=colname)
  17. del A
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement