Advertisement
Guest User

Untitled

a guest
Dec 18th, 2014
177
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.34 KB | None | 0 0
  1. S = sc.parallelize([
  2. {'start': datetime(2000,10,11), 'end': datetime(2001,01,01)},
  3. {'start': datetime(2001,01,01), 'end': datetime(2002,01,01)},
  4. {'start': datetime(2002,01,01), 'end': datetime(2003,01,01)},
  5. {'start': datetime(2003,01,01), 'end': datetime(2004,01,01)},
  6. {'start': datetime(2004,01,01), 'end': datetime(2005,01,01)},
  7. ])
  8.  
  9. T = sc.parallelize([
  10. {'timestamp': datetime(2000,06,05)},
  11. {'timestamp': datetime(2002,06,05)},
  12. {'timestamp': datetime(2003,06,05)},
  13. {'timestamp': datetime(2002,07,05)},
  14. {'timestamp': datetime(2010,07,05)},
  15. ])
  16.  
  17. S = S.map(lambda r: ((r['start'], r['end']),r))
  18. T = T.map(lambda r: (r['timestamp'], r))
  19. join_condition = lambda s,t: t[0] < s < t[1]
  20. results = theta_join(S,T, join_condition).collect()
  21. for r in results:
  22. print r[1] #the original rows
  23.  
  24. """
  25.  
  26. ({u'timestamp': datetime.datetime(2002, 6, 5, 0, 0)}, {u'start': datetime.datetime(2002, 1, 1, 0, 0), u'end': datetime.datetime(2003, 1, 1, 0, 0)})
  27. ({u'timestamp': datetime.datetime(2003, 6, 5, 0, 0)}, {u'start': datetime.datetime(2003, 1, 1, 0, 0), u'end': datetime.datetime(2004, 1, 1, 0, 0)})
  28. ({u'timestamp': datetime.datetime(2002, 7, 5, 0, 0)}, {u'start': datetime.datetime(2002, 1, 1, 0, 0), u'end': datetime.datetime(2003, 1, 1, 0, 0)})
  29. """
  30.  
  31.  
  32.  
  33.  
  34.  
  35.  
  36.  
  37.  
  38.  
  39. """
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement