Guest User

Untitled

a guest
Oct 18th, 2017
57
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.80 KB | None | 0 0
  1. --------------------------------------------
  2.  
  3. - All Data Monthly -
  4.  
  5. --------------------------------------------
  6.  
  7. Local date-time is [2017-10-03 04:05:18.531]
  8.  
  9. --------------------------------------------
  10.  
  11. C NUMBER |SR LOCATION |COUNTY
  12.  
  13. 1234 |SFO |IND
  14. 4567 |CA |US
  15. and the data continues..
  16.  
  17. def data_cleaning(dt):
  18. s = str()
  19. for i in dt.replace('r','').strip().split('|')[56:]:
  20. s = s + i.replace('r','').replace('---','').strip() + '|'
  21.  
  22. return s
  23.  
  24. def download():
  25. rdd_data = sc.textFile(hdfs_path,3)
  26. print rdd_data.getNumPartitions()
  27. print rdd_data.map(lambda x: data_cleaning(x)).collect()
  28.  
  29. download()
Add Comment
Please, Sign In to add comment