Guest User

Python Twitter JSON parsing

a guest
Apr 4th, 2011
5,656
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 1.35 KB | None | 0 0
  1. """
  2. Simple Python example showing how to parse JSON-formatted Twitter messages+metadata
  3. (i.e. data produced by the Twitter status tracking API)
  4.  
  5. This script simply creates Python lists containing the messages, locations and timezones
  6. of all tweets in a single JSON file.
  7.  
  8. Author: Geert Barentsen - 4 April (#dotastro)
  9. """
  10.  
  11. import sys
  12. import simplejson
  13. import difflib
  14.  
  15. # Input argument is the filename of the JSON ascii file from the Twitter API
  16. filename = sys.argv[1]
  17.  
  18. tweets_text = [] # We will store the text of every tweet in this list
  19. tweets_location = [] # Location of every tweet (free text field - not always accurate or given)
  20. tweets_timezone = [] # Timezone name of every tweet
  21.  
  22. # Loop over all lines
  23. f = file(filename, "r")
  24. lines = f.readlines()
  25. for line in lines:
  26.     try:
  27.         tweet = simplejson.loads(line)
  28.        
  29.         # Ignore retweets!
  30.         if tweet.has_key("retweeted_status") or not tweet.has_key("text"):
  31.             continue
  32.        
  33.         # Fetch text from tweet
  34.         text = tweet["text"].lower()
  35.        
  36.         # Ignore 'manual' retweets, i.e. messages starting with RT     
  37.         if text.find("rt ") > -1:
  38.             continue
  39.        
  40.         tweets_text.append( text )
  41.         tweets_location.append( tweet['user']['location'] )
  42.         tweets_timezone.append( tweet['user']['time_zone'] )
  43.  
  44.     except ValueError:
  45.         pass
  46.    
  47.  
  48. # Show result
  49. print tweets_text
  50. print tweets_location
  51. print tweets_timezone
Advertisement
Add Comment
Please, Sign In to add comment