SHARE
TWEET

Untitled

a guest Jan 11th, 2019 57 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. mdf = dd.read_parquet(self.local_location + self.megafile, engine='pyarrow')
  2.             inx = df.index.unique()
  3.             start1 = '2016-01-01'
  4.             end1 = pd.to_datetime(inx.values.min()).strftime('%Y-%m-%d')
  5.             start2 = pd.to_datetime(inx.values.max()).strftime('%Y-%m-%d')
  6.             end2 = '2029-01-01'
  7.             mdf1 = mdf[start1:end1]
  8.             mdf2 = mdf[start2:end2]
  9.             if len(mdf1) > 0:
  10.                 df_usage1 = 1 + mdf1.memory_usage(deep=True).sum().compute() // 100000001
  11.  
  12.                 if len(mdf2) > 0:
  13.                     df_usage2 = 1 + mdf1.memory_usage(deep=True).sum().compute() // 100000001
  14.                     mdf1 = mdf1.append(mdf2, npartitions=df_usage2)
  15.             else:
  16.                 if len(mdf2) > 0:
  17.                     df_usage2 = 1 + mdf2.memory_usage(deep=True).sum().compute() // 100000001
  18.                     mdf1 = dd.from_pandas(df).append(mdf2, npartitions=df_usage2)
  19.    
  20. mdf1 = mdf1.append(df, npartitions=df_usage1)
  21.  
  22. {ValueError}Exactly one of npartitions and chunksize must be specified.
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top