Advertisement
CAROJASQ

5 predictors gpu log

Oct 23rd, 2018
439
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 15.51 KB | None | 0 0
  1. ➜ src git:(master) ✗ time python massive_multilinear_regresions.py -i ../TestData/pronos_ordered_cleaned.csv -mp 5 -np 1 -d gpu
  2. Running calculations on GPU
  3. Doing regressions for 1 predictors (54) regressions
  4. Number of possible combinations are 54, batch size is 2447000
  5. Generating 2447000 combs for this batch
  6. Processing from 0 to 54 regressions in this batch
  7. For this batch 0 models are invalid
  8. Doing regressions for 2 predictors (1431) regressions
  9. Number of possible combinations are 1431, batch size is 1819000
  10. Generating 1819000 combs for this batch
  11. Processing from 0 to 1431 regressions in this batch
  12. For this batch 0 models are invalid
  13. Doing regressions for 3 predictors (24804) regressions
  14. Number of possible combinations are 24804, batch size is 1441000
  15. Generating 1441000 combs for this batch
  16. Processing from 0 to 24804 regressions in this batch
  17. For this batch 0 models are invalid
  18. Doing regressions for 4 predictors (316251) regressions
  19. Number of possible combinations are 316251, batch size is 1190000
  20. Generating 1190000 combs for this batch
  21. Processing from 0 to 316251 regressions in this batch
  22. For this batch 0 models are invalid
  23. Doing regressions for 5 predictors (3162510) regressions
  24. Number of possible combinations are 3162510, batch size is 1010000
  25. Generating 1010000 combs for this batch
  26. Processing from 0 to 1010000 regressions in this batch
  27. For this batch 0 models are invalid
  28. Processing from 1010000 to 2020000 regressions in this batch
  29. For this batch 0 models are invalid
  30. Processing from 2020000 to 3030000 regressions in this batch
  31. For this batch 0 models are invalid
  32. Processing from 3030000 to 3162510 regressions in this batch
  33. For this batch 0 models are invalid
  34. 3505050 Regressions has been done, tt 44.3007283211, te: 38.7705149651
  35. Using GPU to do regressions took 115.843286037
  36. }
  37.  
  38.  
  39.  
  40.  
  41.  
  42. Timer unit: 1e-06 s
  43.  
  44. Total time: 112.334 s
  45. File: /home/nvera/Cris/HMMMR/src/batched_regression.py
  46. Function: find_best_models_gpu at line 182
  47.  
  48. Line # Hits Time Per Hit % Time Line Contents
  49. ==============================================================
  50. 182 def find_best_models_gpu(file_name='../TestData/Y=2X1+3X2+4X3+5_with_shitty.csv', min_predictors=1, max_predictors=4, metric=None, window=None, handle=None, max_batch_size=None, **kwargs):
  51. 183 """
  52. 184
  53. 185 :param file_name: File name containing data, the format is the following
  54. 186 Columns: Contains data for N-2 predictors, 1 column full of 1s and 1 column with outcome data
  55. 187 Columns 1 to N-2 contains predictors data
  56. 188 The N-1 column is always full of 1s (due the constant on the model)
  57. 189 The N column contains Y data
  58. 190 Rows: The first row contains the name of the predictor
  59. 191 The next rows contains the observations (They need to be real values, no empty/nans are allowed
  60. 192 :param max_predictors: Max numbers of predictors to test in the regression. Should b N-2 at max
  61. 193 :return: Ordered array (by RMSE) of tuples containing (predictors_combination, RMSE)
  62. 194 """
  63. 195 1 2.0 2.0 0.0 tt = te = 0 # total time
  64. 196 1 142510.0 142510.0 0.1 handle = handle if handle else cublas.cublasCreate()
  65. 197 1 12348.0 12348.0 0.0 XY = np.loadtxt(open(file_name, "rb"), delimiter=",", skiprows=1, dtype=np.float32)
  66. 198 1 107.0 107.0 0.0 X = np.delete(XY, XY.shape[1] - 1, 1)
  67. 199 1 3.0 3.0 0.0 Y = XY[:, -1]
  68. 200 1 1.0 1.0 0.0 combs_rmse = None
  69. 201 1 1.0 1.0 0.0 done_regressions = 0
  70. 202 1 14.0 14.0 0.0 with open(file_name, 'rb') as f:
  71. 203 1 50.0 50.0 0.0 col_names = np.array(f.readline().strip().split(','))
  72. 204 6 29.0 4.8 0.0 for n_predictors in range(min_predictors, max_predictors+1):
  73. 205 5 1555.0 311.0 0.0 _print_memory_usage("Initial State: ")
  74. 206 5 1424.0 284.8 0.0 max_batch_size = _get_max_batch_size(n_predictors+1, Y.size)
  75. 207 5 126.0 25.2 0.0 iterator = get_combinatorial_iterator(X, n_predictors)
  76. 208 5 15.0 3.0 0.0 index_combinations = get_column_index_combinations(iterator, X, max_batch_size=max_batch_size) # n predictors - 1 constant
  77. 209 5 125.0 25.0 0.0 s_i = ncr(X.shape[1]-1, n_predictors) # Number of possible combinations
  78. 210 5 202.0 40.4 0.0 print "Doing regressions for {} predictors ({}) regressions".format(n_predictors, s_i)
  79. 211 5 66.0 13.2 0.0 print "Number of possible combinations are {}, batch size is {}".format(s_i, max_batch_size)
  80. 212 5 8.0 1.6 0.0 i = 0
  81. 213 13 5903998.0 454153.7 5.3 for current_combinations in index_combinations:
  82. 214 8 378.0 47.2 0.0 print "Processing from {} to {} regressions in this batch".format(i, i + len(current_combinations))
  83. 215 8 30.0 3.8 0.0 ss = time()
  84. 216 8 21754923.0 2719365.4 19.4 Xs = get_X_matrices_from_combinations(X, current_combinations)
  85. 217 8 16147017.0 2018377.1 14.4 XTs = get_Xt_matrices_from_combinations(X.T, current_combinations)
  86. 218 8 868421.0 108552.6 0.8 YsObs = get_Ys_matrices(Y, len(current_combinations))
  87. 219 8 72.0 9.0 0.0 te += time() - ss
  88. 220 8 8.0 1.0 0.0 ss = time()
  89. 221 8 44300627.0 5537578.4 39.4 regression_results = massive_multilineal_regresion(Xs, XTs, YsObs, handle=handle)
  90. 222 8 73.0 9.1 0.0 tt += time() - ss
  91. 223 8 1638067.0 204758.4 1.5 regression_results['predictors_combinations'] = np.array(current_combinations, dtype=np.int32)
  92. 224 # If the matrix had not inverse then the model is invalid
  93. 225 8 7818.0 977.2 0.0 invalid_models = np.where(regression_results['inv_results'].get() != 0)[0]
  94. 226 8 643.0 80.4 0.0 print "For this batch {} models are invalid".format(len(invalid_models))
  95. 227 # Cleaning invalid model results
  96. 228 8 95627.0 11953.4 0.1 regression_results['predictors_combinations'] = np.delete(regression_results['predictors_combinations'], invalid_models, 0)
  97. 229 8 126946.0 15868.2 0.1 regression_results['beta_coefficients'] = np.delete(regression_results['beta_coefficients'], invalid_models, 0)
  98. 230 8 7296.0 912.0 0.0 regression_results['rmse'] = np.delete(regression_results['rmse'], invalid_models, 0)
  99. 231 8 1256186.0 157023.2 1.1 regression_results['ys_sim'] = np.delete(regression_results['ys_sim'], invalid_models, 0)
  100. 232 8 902768.0 112846.0 0.8 regression_results['ys_obs'] = np.delete(regression_results['ys_obs'], invalid_models, 0)
  101. 233 3505058 8580682.0 2.4 7.6 combinations_cols_names = np.array([col_names[x] for x in regression_results['predictors_combinations']])
  102. 234 8 33.0 4.1 0.0 if combs_rmse is None:
  103. 235 1 167.0 167.0 0.0 combs_rmse = np.array(list(zip(combinations_cols_names, regression_results['rmse'])))
  104. 236 else:
  105. 237 7 4060554.0 580079.1 3.6 combs_rmse = np.vstack((combs_rmse, np.array(list(zip(combinations_cols_names, regression_results['rmse'])))))
  106. 238 8 43.0 5.4 0.0 i += len(current_combinations)
  107. 239 8 12.0 1.5 0.0 done_regressions += len(current_combinations)
  108. 240 1 55.0 55.0 0.0 print "{} Regressions has been done, tt {}, te: {}".format(done_regressions, tt, te)
  109. 241 1 6523239.0 6523239.0 5.8 ordered_combs = combs_rmse[combs_rmse[:, 1].argsort()]
  110. 242 1 3.0 3.0 0.0 return ordered_combs
  111.  
  112. Total time: 0 s
  113. File: /home/nvera/Cris/HMMMR/src/numpy_multiple_regression.py
  114. Function: find_best_models_cpu at line 78
  115.  
  116. Line # Hits Time Per Hit % Time Line Contents
  117. ==============================================================
  118. 78 def find_best_models_cpu(file_name='../TestData/Y=2X1+3X2+4X3+5_with_shitty.csv', min_predictors=1, max_predictors=4, handle=None, **kwargs):
  119. 79 """
  120. 80
  121. 81 :param file_name: File name containing data, the format is the following
  122. 82 Columns: Contains data for N-2 predictors, 1 column full of 1s and 1 column with outcome data
  123. 83 Columns 1 to N-2 contains predictors data
  124. 84 The N-1 column is always full of 1s (due the constant on the model)
  125. 85 The N column contains Y data
  126. 86 Rows: The first row contains the name of the predictor
  127. 87 The next rows contains the observations (They need to be real values, no empty/nans are allowed
  128. 88 :param max_predictors: Max numbers of predictors to test in the regression. Should b N-2 at max
  129. 89 :return: Ordered array (by RMSE) of tuples containing (predictors_combination, RMSE)
  130. 90 """
  131. 91 XY = np.loadtxt(open(file_name, "rb"), delimiter=",", skiprows=1, dtype=np.float32)
  132. 92 X = np.delete(XY, XY.shape[1] - 1, 1)
  133. 93 Y = XY[:, -1]
  134. 94 combs_rmse = None
  135. 95 done_regressions = 0
  136. 96 invalid_regressions = 0
  137. 97 with open(file_name, 'rb') as f:
  138. 98 col_names = np.array(f.readline().strip().split(','))
  139. 99 for n_predictors in range(min_predictors, max_predictors+1):
  140. 100 index_combinations = get_column_index_combinations(X, n_predictors) # n predictors - 1 constant
  141. 101 s_i = ncr(X.shape[1]-1, n_predictors) # Number of possible combinations
  142. 102 print "Doing regressions for {} predictors ({}) regressions".format(n_predictors, s_i)
  143. 103 for comb in index_combinations:
  144. 104 try:
  145. 105 X1, X1t = get_X_Xt_matrix(X, comb)
  146. 106 regression = numpy_regression(X1, X1t, Y)
  147. 107 combinations_cols_names = np.array([col_names[x] for x in comb])
  148. 108 result = np.array([[combinations_cols_names, regression['metric']]])
  149. 109
  150. 110 if combs_rmse is None:
  151. 111 combs_rmse = result
  152. 112 else:
  153. 113 combs_rmse = np.vstack([combs_rmse, result])
  154. 114 except:
  155. 115 invalid_regressions += 1
  156. 116 done_regressions += s_i
  157. 117 print "{} Regressions has been done, {} invalid".format(done_regressions, invalid_regressions)
  158. 118 ordered_combs = combs_rmse[combs_rmse[:, 1].argsort()]
  159. 119 return ordered_combs
  160.  
  161. Total time: 460.077 s
  162. File: massive_multilinear_regresions.py
  163. Function: perform_regressions at line 53
  164.  
  165. Line # Hits Time Per Hit % Time Line Contents
  166. ==============================================================
  167. 53 @do_profile(follow=[find_best_models_gpu, find_best_models_cpu])
  168. 54 def perform_regressions():
  169. 55 1 2.0 2.0 0.0 start_time = time()
  170. 56 1 3219.0 3219.0 0.0 input_file, window, max_predictors, min_predictors, metric, output_file, device, max_batch_size = parse_arguments()
  171. 57 1 2.0 2.0 0.0 if device == "gpu":
  172. 58 1 42.0 42.0 0.0 print "Running calculations on GPU"
  173. 59 1 115839985.0 115839985.0 25.2 ordered_combs = find_best_models_gpu(file_name=input_file, min_predictors=min_predictors, max_predictors=max_predictors, metric=metric, window=window, max_batch_size=max_batch_size)
  174. 60 1 72.0 72.0 0.0 print "Using GPU to do regressions took {}".format(time() - start_time)
  175. 61 elif device == "cpu":
  176. 62 ordered_combs = find_best_models_cpu(file_name=input_file, min_predictors=min_predictors, max_predictors=max_predictors, metric=metric, window=window, max_batch_size=max_batch_size)
  177. 63 1 361859.0 361859.0 0.1 df = pd.DataFrame(ordered_combs)
  178. 64 1 343871695.0 343871695.0 74.7 df.to_csv("/tmp/{}".format(output_file))
  179.  
  180. python massive_multilinear_regresions.py -i -mp 5 -np 1 -d gpu 429,16s user 41,83s system 99% cpu 7:53,19 total
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement