Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- --------------------------------------------
- - All Data Monthly -
- --------------------------------------------
- Local date-time is [2017-10-03 04:05:18.531]
- --------------------------------------------
- C NUMBER |SR LOCATION |COUNTY
- 1234 |SFO |IND
- 4567 |CA |US
- and the data continues..
- def data_cleaning(dt):
- s = str()
- for i in dt.replace('r','').strip().split('|')[56:]:
- s = s + i.replace('r','').replace('---','').strip() + '|'
- return s
- def download():
- rdd_data = sc.textFile(hdfs_path,3)
- print rdd_data.getNumPartitions()
- print rdd_data.map(lambda x: data_cleaning(x)).collect()
- download()
Add Comment
Please, Sign In to add comment