Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- This is a one time thing, I've had the busiest week of my life and this is due in 3 hours...I have the first of 3 functions done
- '''
- There should be several functions in this module. Two are
- already provided, in case they are useful:
- datextract
- eightdigs
- fieldict
- are functions from a previous homework which might be handy.
- Essential tricks:
- CSV FILES
- One of the data files is a Comma Separated File
- (see http://en.wikipedia.org/wiki/Comma-separated_values if needed)
- Python has a module, the csv module, for reading and writing csv files.
- Some information is found in these two links:
- http://docs.python.org/2/library/csv.html
- http://www.doughellmann.com/PyMOTW/csv/
- In case you don't read these, the brief example is this:
- import csv
- F = open("somedata.csv") # open a CSV file
- csvF = csv.reader(F) # makes a "csv reader" object
- for row in csvF:
- print row # row is a tuple of the CSV fields (per line)
- The beauty of this csv module is that it can handle ugly CSF records like:
- Washer Assembly, 2504, "on order", "2,405,318"
- Notice that this has four fields, separated by commas. But we cannot use
- an expression like line.split(',') to get the four fields! The reason is
- that Python will try to also split the last field, which contains commas.
- The csv reader is smarter. It will respect the quoted fields.
- Each row that a csv reader produces is a tuple of strings.
- So how can you convert a string like '2,405,318' to a number?
- There are two simple ideas:
- 1. x = field[2].split(',')
- x = ''.join(x) # comma is gone!
- x = int(x)
- 2. x = field[2].replace(',','') # replace comma by empty
- x = int(x)
- SORTING BY FIELD
- Suppose you have a list of tuples, like M = [("X",50,3),("Y",3,6),("J",35,0)]
- What is needed, however is to make a sorted version of M, sorted by the second
- item of the tuples. That is, we want N = [("Y",3,6),("J",35,0),("X",50,3)].
- The problem is that if we just write N = sorted(M), we will get the tuples
- sorted by the first item, so N would be [("J",35,0),("X",50,3),("Y",3,6)]
- Is there some way to tell Python's sort which of the items to use for sorting?
- YES! There's even a page on the subject:
- http://wiki.python.org/moin/HowTo/Sorting/
- But a brief example is helpful here. The idea is to use keyword arguments
- and another Python module, the operator module.
- Here's the example:
- from operator import itemgetter # used to customize sorting
- N = sorted(M,key=itemgetter(1)) # says to use item 1 (0 is first item)
- This will give us the needed result in variable N. What if, instead, we
- wanted the result to be in decreasing order, rather than increasing order?
- Another keyword argument does that:
- N = sorted(M,key=itemgetter(1),reverse=True)
- DICTIONARY ACCUMULATION
- What if we need to build a dictionary where the key comes from some part
- of a record in a file, and the value is the number of records that have
- the same thing for that part. Maybe, if we are counting states (with
- two-letter abbreviations), the dictionary might be something like this:
- {'CA':620978, 'NY':583719, 'IA':2149}
- This dictionary could be the result of reading through a data file that
- had 620,978 records for California and 583,719 records for New York (plus
- some for Iowa). As an example of creating this dictionary, consider a
- data file with the state abbreviation as the first field of each record.
- D = { } # empty dictionary for accumulation
- for line in sys.stdin: # data file is standard input
- st = line.split()[0] # get state abbreviation
- if st not in D.keys():
- D[st] = 1 # first time for this state, count is 1
- else:
- D[st] += 1
- There is another way to do the same thing, using a more advanced idea:
- the get() method of the dictionary type, which has a default value argument.
- D = { } # empty dictionary for accumulation
- for line in sys.stdin: # data file is standard input
- st = line.split()[0] # get state abbreviation
- D[st] = D.get(st,0) + 1
- What you see above is D.get(st,0), which attempts to get the value D[st],
- but will return 0 if st is not in the dictionary. The trick here is that
- 0+1 is 1, which is the right value to store into D[st] for the first time
- a state abbreviation is found while reading the dictionary. It is a tricky
- idea, which some Python programmers like.
- DATETIME.DATE BREAKDOWN
- Suppose G is a datetime.date object, for instance
- import datetime
- G = datetime.date(2012,12,1) # This is 1st December, 2012
- In a program, can you get the year, month and day as integers
- out of the datetime.date object G? Yes, it's easy:
- 1 + G.year # G.year is an integer, equal to the year
- # expression above is "next year"
- Similarly, G.month is the month as an integer, and G.day is the day.
- The task is to write three functions, citypop, eventfreq, and manuftop10.
- See the docstrings below for an explanation of what is expected. Test
- cases follow:
- >>> citypopdict = citypop()
- >>> len(citypopdict)
- 4991
- >>> citypopdict[ ('DES MOINES','IA') ]
- 197052
- >>> citypopdict[ ('CORALVILLE','IA') ]
- 18478
- >>> citypopdict[ ('STOCKTON','CA') ]
- 287037
- >>> evlist = eventfreq(1995,1)
- >>> len(evlist)
- 17
- >>> evlist[0]
- (datetime.date(1995, 1, 1), 5)
- >>> evlist[14]
- (datetime.date(1995, 1, 15), 1)
- >>> len(eventfreq(1994,12))
- 22
- >>> len(eventfreq(2012,2))
- 0
- >>> manlist = manuftop10()
- >>> len(manlist)
- 10
- >>> manlist[3]
- ('HONDA (AMERICAN HONDA MOTOR CO.)', 67)
- >>> manlist[8]
- ('MITSUBISHI MOTORS NORTH AMERICA, INC.', 16)
- '''
- def datextract(S):
- return (int(S[:4]),int(S[4:6]),int(S[6:]))
- def eightdigs(S):
- return type(S)==str and len(S)==8 and all([c in "0123456789" for c in S])
- def citylist(filename):
- with open(filename) as FileObject:
- X = []
- for line in FileObject:
- T = line.strip().split('\t')
- city = T[12].strip()
- X.append(city)
- return X
- def statecount(filename):
- with open(filename) as FileObject:
- D = { }
- for line in FileObject:
- T = line.strip().split('\t')
- state = T[13]
- D[state] = 1 + D.get(state,0)
- return D
- def fieldict(filename):
- '''
- Returns a dictionary with record ID (integer) as
- key, and a tuple as value. The tuple has this form:
- (manufacturer, date, crash, city, state)
- where date is a datetime.date object, crash is a boolean,
- and other tuple items are strings.
- '''
- import datetime
- D = { }
- with open(filename) as FileObject:
- for line in FileObject:
- R = { }
- T = line.strip().split('\t')
- manuf, date, crash, city, state = T[2], T[7], T[6], T[12], T[13]
- manuf, date, city, state = manuf.strip(), date.strip(), city.strip(), state.strip()
- if eightdigs(date):
- y, m, d = datextract(date)
- date = datetime.date(y,m,d)
- else:
- date = datetime.date(1,1,1)
- crash = (crash == "Y")
- D[int(T[0])] = (manuf,date,crash,city,state)
- return D
- def citypop():
- '''
- Read Top5000Population.txt and return a dictionary
- of (city,state) as key, and population as value.
- For compatibility with DOT data, convert city to
- uppercase and truncate to at most 12 characters.
- BE CAREFUL that the city field might need to
- have trailing spaces removed (otherwise the test
- cases could fail)
- '''
- from csv import reader
- D = {}
- with open("Top5000Population.txt") as F:
- for city, state, population in reader(F):
- city = city.upper()[:12]
- D[(city, state)] = int(population.replace(',',''))
- return D
- def eventfreq(year,month):
- '''
- Read DOT1000.txt and return a list of (d,ct)
- pairs, where d is a date object of the form
- datetime.date(A,B,C)
- having A equal to the year argument and
- B equal to the month argument to eventfreq(year,month).
- The ct part of each pair is the number of records
- that had a date equal to datetime.date(A,B,C).
- One more requirement: sort the returned list
- in increasing order by date (the sorted function will
- do this for you)
- Use fieldict("DOT1000.txt") to get the dictionary
- of tuples used for building the list of pairs
- that eventfreq(year,month) will return.
- '''
- pass
- def manuftop10():
- '''
- This function returns a list of ten pairs. Each
- pair has the form (man,ct) where man is a string
- equal to a manufacturer name and ct is the number
- of records in DOT1000.txt with that manufacturer.
- In addition, the ten pairs returned are the "top 10"
- (in decreasing order by count) of all the manufacturers
- in the file. Use fielddict("DOT1000.txt") to get
- the dictionary of tuples used for building the list
- of pairs.
- '''
- pass
- if __name__ == "__main__":
- import doctest
- doctest.testmod()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement