Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Drew Davis <r.drew.davis@gmail.com>
- 1/19/12
- to Timothy
- Well this has been fun! Attached is my improved version of problem22.
- The first big change was to add a NameScorer class. When you instantiate
- it, you tell it the alphabet of symbols it'll be asked to process.
- The first symbol is counted as value 1, the 2nd symbol is given a value
- of 2, etc. The __init__ routine of NameScorer builds the valuedict,
- which when you come down to it, is just an implementation detail of
- how we compute the score for a name. Assuming ASCII and using the
- ord(c)-ord('A') approach is another way to do it. The new approach
- removes the valuedict from main(). That looked out of place in main()
- and I either had to make it global (which is considered bad form) or
- do what my previous version did: pass the valuedict in to the nameScore
- routine each time it was called!. Happy to be rid of that.
- What I learned as I wrote my first class is that a self. qualifier is
- very important when trying to refer to a symbol that is part of the
- instance of the class. multiple times the tests aborted with a message
- that there's no such global symbol as valdict. I'd have to correct
- the code to say self.valdict and hit my forehead while exclaiming Doh!
- I'm going to have to look for a short cut like import has so you don't
- have to repeatedly qualify a symbol reference with the module name.
- The next big change didn't actually survive to the final version. I had
- "for" loops that iterated over the alphabet or the name list, but I also
- had a counter that had to tick along with each iteration. That meant an
- initialization statement before the "for" and an incrementing statement in
- the body of the loop. There isn't wonderful documentation of the syntax
- in the Python books I have here, but you can say for (i, name) in <list
- of tuples> and thus have both the names and the counter automatically
- handled by the "for". That left almost nothing inside the "for" and it
- was then that I realized, the initialization of the value dictionary,
- could be handled by a "list comprehension" generating the tuples that
- then just become an argument to dict(). And the totaling of the weighted name
- scores in main could be done by invoking sum(i*ns.score(name) for (i, name) in
- zip(range(1,len(nameList)+1), namelist))
- The books I have here say that current Python implementations implement
- list comprehensions like that by actually generating the list of tuples
- and iterating over that. Apparently the intent is that some day the
- tuples will be generated as needed. But heck. Memory is plentiful.
- zip. by the way is a function that given 2 lists, generates tuples by
- pairing together the i-th element in one list with the i-th element in
- the other list. It only generates as many tuples as the shorter of the
- 2 lists, but both my lists were the same length.
- Somewhat invisible in the code is that while making these changes,
- I learned a few more basic tricks with the git version control system.
- I checked in yesterday's problem22.py to git. That version was implicitly
- the master branch.
- Then I created an experimental branch (yeah, this is overkill for a one
- person operation, but remember I'm doing this to learn). Each time I
- had a working change to problem22.py, I'd check it into the experimental
- branch. My last incremental change was in recognition of the ease of
- getting old versions thanks to git and frequent well commented logs of
- what changed. So I decided that my old habit of leaving test prints,
- etc. embedded into the code, albeit commented out, was just making it harder
- to read the program. So I weeded that commented-out scaffolding out of
- the final version.
- One disapointment was I was hoping the list comprehensions would get
- interpreted faster than the for loops, albeit at the price of using
- more memory. But it runs no faster, maybe even 10ms slower, but that's
- in the noise range, so more tests would be needed to pin down what is
- actually happening. I did accidently give myself one clue. I added
- print statements to show the type of the grammar variable and of the
- grammar.parsefile(...) result that I had to convert to a vanilla list
- before I could sort the list. I was lazy, so rather than tear apart
- the return(list(grammar.parsefile(...))) so I'd have an intermediate
- result that I could pass to type() to see what it was, I left the return
- statement alone and preceded it with a print statement that changed
- list(...) to type(...). That meant I parsed the file twice. The total
- CPU time nearly doubled, so without resorting to grand profiling tools
- (which I confess I don't yet know anyhow - and profiling a 320ms run is not
- such a great thing to do!), but we have some evidence that the vast majority
- of the run time is going into pyparsing as it reads and parses the 5000
- or so names in the input file. So hoping that tweeking the other loops and
- argument passing would improve CPU time was naive on my part. Oops.
- Oh, and the type of the grammar is "pyparsing.And", which agrees withe the
- top level of delimited list generating an expression using the overloaded
- + operator. I'm quite certain that the type of "grammar" is a subclass
- of parser element, so a grander grammer could be built up with this as
- just one part of it, Mercifully, I didn't need anything grander.
- The type of grammar.parseFile(...) is 'pyparsing.ParseResults". I didn't
- dig deeper to see why the thing looks like a list, but won't sort.
- It occurs to me that sort depends heavily on the comparison operators
- and presumably copy operators to swap values around. Perhaps pyparsing
- overloads those operators for its ParseResults type. If and when future
- problems cause me to revisit pyparsing, I'll take a closer look.
- Attached is a PDF file of the listing of the "final" program as produced
- by the Eric IDE (in color!). The entire listing now handily fits on a
- single page.
- I ran the file through sloccount and it says 15 non-comment, non-blank
- lines of source code. It also estimated that creating 15 lines of Python
- code like this would take about 0.65 months, which I figure is 2-3 weeks,
- which is scarily close to right. But if I wasn't taking the time to
- learn a new language and its modules and git, all at the same time,
- I'd like to believe I could knock out 15 lines of code in much less time.
- (Of course, working on my own, my time is not getting sapped into
- Affirmative Action meetings, design reviews, questions from the peanut
- gallery and all those other things that motivated me to work at home when
- I was doing real design or real coding. So maybe in a real corporate
- environment, that 2-3 week estimate still would apply, even for a seasoned
- Pythonista, which I don't claim to be yet!)
- My one regret still is that I'm still not doing test-driven-development.
- My programs don't have built-in self-testing capability to watch out for
- any regressions as changes are made. I'll have to force myself to try
- that stuff in some future problems. At this point, I'm not even sure
- where the test code would go. A separate file in place of "__main__".
- And, just to keep life interesting, Python has multiple unit-test modules
- available, so picking one intelligently will take some work. One of the
- unit-test packages is called "nose". If I pick that one will you go "Ewww!"
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement