Guest User

Untitled

a guest
Jan 11th, 2019
116
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.05 KB | None | 0 0
  1. mdf = dd.read_parquet(self.local_location + self.megafile, engine='pyarrow')
  2. inx = df.index.unique()
  3. start1 = '2016-01-01'
  4. end1 = pd.to_datetime(inx.values.min()).strftime('%Y-%m-%d')
  5. start2 = pd.to_datetime(inx.values.max()).strftime('%Y-%m-%d')
  6. end2 = '2029-01-01'
  7. mdf1 = mdf[start1:end1]
  8. mdf2 = mdf[start2:end2]
  9. if len(mdf1) > 0:
  10. df_usage1 = 1 + mdf1.memory_usage(deep=True).sum().compute() // 100000001
  11.  
  12. if len(mdf2) > 0:
  13. df_usage2 = 1 + mdf1.memory_usage(deep=True).sum().compute() // 100000001
  14. mdf1 = mdf1.append(mdf2, npartitions=df_usage2)
  15. else:
  16. if len(mdf2) > 0:
  17. df_usage2 = 1 + mdf2.memory_usage(deep=True).sum().compute() // 100000001
  18. mdf1 = dd.from_pandas(df).append(mdf2, npartitions=df_usage2)
  19.  
  20. mdf1 = mdf1.append(df, npartitions=df_usage1)
  21.  
  22. {ValueError}Exactly one of npartitions and chunksize must be specified.
Add Comment
Please, Sign In to add comment