Advertisement
TankorSmash

idiot's homework

Nov 8th, 2012
359
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 9.26 KB | None | 0 0
  1. This is a one time thing, I've had the busiest week of my life and this is due in 3 hours...I have the first of 3 functions done
  2. '''
  3.   There should be several functions in this module.  Two are
  4.   already provided, in case they are useful:  
  5.      datextract
  6.      eightdigs
  7.      fieldict
  8.   are functions from a previous homework which might be handy.
  9.  
  10.   Essential tricks:
  11.  
  12.      CSV FILES
  13.  
  14.      One of the data files is a Comma Separated File
  15.      (see http://en.wikipedia.org/wiki/Comma-separated_values if needed)
  16.  
  17.      Python has a module, the csv module, for reading and writing csv files.
  18.      Some information is found in these two links:
  19.          http://docs.python.org/2/library/csv.html
  20.          http://www.doughellmann.com/PyMOTW/csv/
  21.  
  22.      In case you don't read these, the brief example is this:
  23.  
  24.        import csv
  25.        F = open("somedata.csv")  # open a CSV file
  26.     csvF = csv.reader(F)      # makes a "csv reader" object
  27.     for row in csvF:
  28.        print row              # row is a tuple of the CSV fields (per line)
  29.  
  30.     The beauty of this csv module is that it can handle ugly CSF records like:
  31.  
  32.        Washer Assembly, 2504, "on order", "2,405,318"
  33.  
  34.     Notice that this has four fields, separated by commas.  But we cannot use
  35.     an expression like  line.split(',') to get the four fields!  The reason is
  36.     that Python will try to also split the last field, which contains commas.
  37.     The csv reader is smarter.  It will respect the quoted fields.
  38.  
  39.     Each row that a csv reader produces is a tuple of strings.  
  40.  
  41.     So how can you convert a string like '2,405,318' to a number?  
  42.     There are two simple ideas:
  43.        
  44.     1.  x = field[2].split(',')
  45.         x = ''.join(x)  # comma is gone!
  46.         x = int(x)
  47.     2.  x = field[2].replace(',','')  # replace comma by empty
  48.         x = int(x)
  49.  
  50.  
  51.  
  52.     SORTING BY FIELD
  53.  
  54.     Suppose you have a list of tuples, like M = [("X",50,3),("Y",3,6),("J",35,0)]
  55.     What is needed, however is to make a sorted version of M, sorted by the second
  56.     item of the tuples.  That is, we want N = [("Y",3,6),("J",35,0),("X",50,3)].
  57.  
  58.     The problem is that if we just write N = sorted(M), we will get the tuples
  59.     sorted by the first item, so N would be  [("J",35,0),("X",50,3),("Y",3,6)]
  60.  
  61.     Is there some way to tell Python's sort which of the items to use for sorting?
  62.      YES!  There's even a page on the subject:
  63.        http://wiki.python.org/moin/HowTo/Sorting/
  64.     But a brief example is helpful here.  The idea is to use keyword arguments
  65.     and another Python module, the operator module.  
  66.  
  67.     Here's the example:
  68.  
  69.         from operator import itemgetter  # used to customize sorting
  70.     N = sorted(M,key=itemgetter(1))  # says to use item 1 (0 is first item)
  71.  
  72.      This will give us the needed result in variable N.  What if, instead, we
  73.      wanted the result to be in decreasing order, rather than increasing order?
  74.      Another keyword argument does that:
  75.  
  76.     N = sorted(M,key=itemgetter(1),reverse=True)  
  77.  
  78.  
  79.  
  80.      DICTIONARY ACCUMULATION
  81.  
  82.      What if we need to build a dictionary where the key comes from some part
  83.      of a record in a file, and the value is the number of records that have
  84.      the same thing for that part.  Maybe, if we are counting states (with
  85.      two-letter abbreviations), the dictionary might be something like this:
  86.  
  87.          {'CA':620978, 'NY':583719, 'IA':2149}
  88.  
  89.      This dictionary could be the result of reading through a data file that
  90.      had 620,978 records for California and 583,719 records for New York (plus
  91.      some for Iowa).  As an example of creating this dictionary, consider a
  92.      data file with the state abbreviation as the first field of each record.
  93.  
  94.         D = { }  # empty dictionary for accumulation
  95.     for line in sys.stdin:  # data file is standard input
  96.        st = line.split()[0] # get state abbreviation
  97.        if st not in D.keys():
  98.           D[st] = 1   # first time for this state, count is 1
  99.        else:
  100.           D[st] += 1
  101.  
  102.      There is another way to do the same thing, using a more advanced idea:
  103.      the get() method of the dictionary type, which has a default value argument.
  104.  
  105.         D = { }  # empty dictionary for accumulation
  106.     for line in sys.stdin:  # data file is standard input
  107.        st = line.split()[0] # get state abbreviation
  108.            D[st] = D.get(st,0) + 1
  109.  
  110.      What you see above is D.get(st,0), which attempts to get the value D[st],
  111.      but will return 0 if st is not in the dictionary.  The trick here is that
  112.      0+1 is 1, which is the right value to store into D[st] for the first time
  113.      a state abbreviation is found while reading the dictionary.  It is a tricky
  114.      idea, which some Python programmers like.
  115.  
  116.  
  117.      DATETIME.DATE BREAKDOWN
  118.  
  119.      Suppose G is a datetime.date object, for instance
  120.          import datetime
  121.          G = datetime.date(2012,12,1)  # This is 1st December, 2012
  122.      In a program, can you get the year, month and day as integers
  123.      out of the datetime.date object G?  Yes, it's easy:
  124.  
  125.         1 + G.year  # G.year is an integer, equal to the year
  126.      # expression above is "next year"
  127.  
  128.     Similarly, G.month is the month as an integer, and G.day is the day.
  129.  
  130.  
  131. The task is to write three functions, citypop, eventfreq, and manuftop10.
  132.  
  133. See the docstrings below for an explanation of what is expected.  Test
  134. cases follow:
  135.  
  136.  >>> citypopdict = citypop()
  137.  >>> len(citypopdict)
  138.  4991
  139.  >>> citypopdict[ ('DES MOINES','IA') ]
  140.  197052
  141.  >>> citypopdict[ ('CORALVILLE','IA') ]
  142.  18478
  143.  >>> citypopdict[ ('STOCKTON','CA') ]
  144.  287037
  145.  >>> evlist = eventfreq(1995,1)
  146.  >>> len(evlist)
  147.  17
  148.  >>> evlist[0]
  149.  (datetime.date(1995, 1, 1), 5)
  150.  >>> evlist[14]
  151.  (datetime.date(1995, 1, 15), 1)
  152.  >>> len(eventfreq(1994,12))
  153.  22
  154.  >>> len(eventfreq(2012,2))
  155.  0
  156.  >>> manlist = manuftop10()
  157.  >>> len(manlist)
  158.  10
  159.  >>> manlist[3]
  160.  ('HONDA (AMERICAN HONDA MOTOR CO.)', 67)
  161.  >>> manlist[8]
  162.  ('MITSUBISHI MOTORS NORTH AMERICA, INC.', 16)
  163.  
  164. '''
  165.  
  166. def datextract(S):
  167.   return (int(S[:4]),int(S[4:6]),int(S[6:]))
  168. def eightdigs(S):
  169.   return type(S)==str and len(S)==8 and all([c in "0123456789" for c in S])
  170.  
  171. def citylist(filename):
  172.   with open(filename) as FileObject:
  173.      X = []
  174.      for line in FileObject:
  175.             T = line.strip().split('\t')
  176.             city = T[12].strip()
  177.             X.append(city)
  178.   return X
  179.  
  180. def statecount(filename):
  181.   with open(filename) as FileObject:
  182.      D = { }
  183.      for line in FileObject:
  184.             T = line.strip().split('\t')
  185.             state = T[13]
  186.             D[state] = 1 + D.get(state,0)
  187.   return D
  188.  
  189. def fieldict(filename):
  190.   '''
  191.  Returns a dictionary with record ID (integer) as
  192.  key, and a tuple as value.  The tuple has this form:
  193.         (manufacturer, date, crash, city, state)
  194.  where date is a datetime.date object, crash is a boolean,
  195.  and other tuple items are strings.
  196.  '''
  197.   import datetime
  198.   D = { }
  199.   with open(filename) as FileObject:
  200.      for line in FileObject:
  201.             R = { }
  202.             T = line.strip().split('\t')
  203.             manuf, date, crash, city, state = T[2], T[7], T[6], T[12], T[13]
  204.             manuf, date, city, state = manuf.strip(), date.strip(), city.strip(), state.strip()
  205.             if eightdigs(date):
  206.                y, m, d = datextract(date)
  207.                date = datetime.date(y,m,d)
  208.             else:
  209.                date = datetime.date(1,1,1)
  210.             crash = (crash == "Y")
  211.             D[int(T[0])] = (manuf,date,crash,city,state)
  212.      return D
  213.  
  214. def citypop():
  215.   '''
  216.  Read Top5000Population.txt and return a dictionary
  217.  of (city,state) as key, and population as value.
  218.  For compatibility with DOT data, convert city to
  219.  uppercase and truncate to at most 12 characters.
  220.  
  221.  BE CAREFUL that the city field might need to
  222.  have trailing spaces removed (otherwise the test
  223.  cases could fail)
  224.  '''
  225.   from csv import reader
  226.   D = {}
  227.   with open("Top5000Population.txt") as F:
  228.     for city, state, population in reader(F):
  229.       city = city.upper()[:12]
  230.       D[(city, state)] = int(population.replace(',',''))
  231.   return D
  232.    
  233. def eventfreq(year,month):
  234.   '''
  235.  Read DOT1000.txt and return a list of (d,ct)
  236.  pairs, where d is a date object of the form
  237.     datetime.date(A,B,C)
  238.  having A equal to the year argument and
  239.  B equal to the month argument to eventfreq(year,month).
  240.  The ct part of each pair is the number of records
  241.  that had a date equal to datetime.date(A,B,C).
  242.  
  243.  One more requirement:  sort the returned list
  244.  in increasing order by date (the sorted function will
  245.  do this for you)
  246.  
  247.  Use fieldict("DOT1000.txt") to get the dictionary
  248.  of tuples used for building the list of pairs
  249.  that eventfreq(year,month) will return.
  250.  '''
  251.   pass
  252.  
  253. def manuftop10():
  254.   '''
  255.  This function returns a list of ten pairs.  Each
  256.  pair has the form (man,ct) where man is a string
  257.  equal to a manufacturer name and ct is the number
  258.  of records in DOT1000.txt with that manufacturer.
  259.  In addition, the ten pairs returned are the "top 10"
  260.  (in decreasing order by count) of all the manufacturers
  261.  in the file.  Use fielddict("DOT1000.txt") to get
  262.  the dictionary of tuples used for building the list
  263.  of pairs.
  264.  '''
  265.   pass
  266.  
  267. if __name__ == "__main__":
  268.     import doctest
  269.     doctest.testmod()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement