Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- dataset_json = sc.textFile("data/my_data.json")
- dataset = dataset_json.map(lambda x: json.loads(x))
- dataset.persist()
- dataset.take(2)
- [{'movie': 'movie_name1',
- 'release_date': '2011-01-11T10:26:12Z',
- 'actor': 'actor_name1'},
- {'movie': 'movie_name2',
- 'release_date': '2010-04-08T04:14:23Z',
- 'actor': 'actor_name2'}]
- dataset2 = dataset.filter(lambda line: line.lookup('release_date'))
- dataset2.first()
- attributes = dataset.filter (lambda x: x.keys())
- attributes.take(2)
- [{'movie': 'movie_name1',
- 'release_date': '2011-01-11T10:26:12Z',
- 'actor': 'actor_name1'},
- {'movie': 'movie_name2',
- 'release_date': '2010-04-08T04:14:23Z',
- 'actor': 'actor_name2'}]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement