Advertisement
Guest User

Julia mixed-type matrix 2 df conversion benchmark

a guest
Sep 30th, 2017
210
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Julia 1.80 KB | None | 0 0
  1. using DataFrames, DataStructures
  2.  
  3. m = [
  4.     "parName"   "region"    "forType"       "value";
  5.     "vol"       "AL"        "broadL_highF"  3.3055628012;
  6.     "vol"       "AL"        "con_highF"     2.1360975151;
  7.     "vol"       "AQ"        "broadL_highF"  5.81984502;
  8.     "vol"       "AQ"        "con_highF"     8.1462998309;
  9. ]
  10.  
  11. function toDf1(m)
  12.     h    = [Symbol(c) for c in m[1,:]]
  13.     vals = m[2:end, :]
  14.     df   = convert(DataFrame,OrderedDict(zip(h,[vals[:,i] for i in 1:size(vals,2)])))
  15.     for c in names(df)
  16.         # Try to convert df from Any to In64, Float64 or String (in that order)
  17.         try
  18.           df[c] = convert(DataArrays.DataArray{Int64,1},df[c])
  19.         catch
  20.             try
  21.               df[c] = convert(DataArrays.DataArray{Float64,1},df[c])
  22.             catch
  23.                 try
  24.                   df[c] = convert(DataArrays.DataArray{String,1},df[c])
  25.                 catch
  26.                 end
  27.             end
  28.         end
  29.     end
  30.     return df
  31. end
  32.  
  33. function toDf2(m)
  34.     s = join([join([m[i,j] for j in indices(m, 2)], '\t') for i in indices(m, 1)], '\n')
  35.     df = DataFrames.inlinetable(s; separator='\t', header=true)
  36.     return df
  37. end
  38.  
  39. function toDf3(m)
  40.     df = DataFrame()
  41.     for (ind,s) in enumerate(Symbol.(m[1,:])) # convert first row to symbols and iterate through them.
  42.         # check all types the same else assign to Any
  43.         T = typeof(m[2,ind])
  44.         T = all(typeof.(m[2:end,ind]).==T) ? T : Any
  45.         # convert to type of second element then add to data frame
  46.         df[s] = T.(m[2:end,ind])
  47.     end
  48.     return df
  49. end
  50.  
  51. # second time for compilation.. further times ~ results
  52. @time toDf1(m) # 0.000946 seconds (336 allocations: 19.811 KiB)
  53. @time toDf2(m) # 0.000194 seconds (306 allocations: 17.406 KiB)
  54. @time toDf3(m) # 0.001820 seconds (445 allocations: 35.297 KiB)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement