Advertisement
Guy_Lalonde

entries histo

Feb 28th, 2015
228
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.44 KB | None | 0 0
  1. import numpy as np
  2. import pandas
  3. import matplotlib.pyplot as plt
  4.  
  5. def entries_histogram(turnstile_weather):
  6. '''
  7. Before we perform any analysis, it might be useful to take a
  8. look at the data we're hoping to analyze. More specifically, let's
  9. examine the hourly entries in our NYC subway data and determine what
  10. distribution the data follows. This data is stored in a dataframe
  11. called turnstile_weather under the ['ENTRIESn_hourly'] column.
  12.  
  13. Let's plot two histograms on the same axes to show hourly
  14. entries when raining vs. when not raining. Here's an example on how
  15. to plot histograms with pandas and matplotlib:
  16. turnstile_weather['column_to_graph'].hist()
  17.  
  18. Your histograph may look similar to bar graph in the instructor notes below.
  19.  
  20. You can read a bit about using matplotlib and pandas to plot histograms here:
  21. http://pandas.pydata.org/pandas-docs/stable/visualization.html#histograms
  22.  
  23. You can see the information contained within the turnstile weather data here:
  24. https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
  25. '''
  26.  
  27.  
  28. plt.figure()
  29.  
  30. turnstile_rain = turnstile_weather[turnstile_weather['rain']==1]
  31. entries_rain = turnstile_rain['ENTRIESn_hourly']
  32. turnstile_norain = turnstile_weather[turnstile_weather['rain']==0]
  33. entries_norain = turnstile_norain['ENTRIESn_hourly']
  34.  
  35. df = pandas.DataFrame({'no_rain':entries_norain, 'rain':entries_rain}, columns = ['no_rain', 'rain'])
  36.  
  37. df.plot(kind='hist', bins= 50, range = (0,4000), alpha=0.5)
  38. plt.ylabel('Frequency')
  39. plt.xlabel('Number of Entries/ Hour')
  40. plt.title('Histogram of Turnstile Entries per Hour: Rainy Days vs. Non-Rainy Days')
  41. plt.text(2160, 2000, r'note x-axis is truncated')
  42.  
  43. plt.show()
  44.  
  45. turnstile_weather = pandas.read_csv('turnstile_data_master_with_weather.csv')
  46. entries_histogram(turnstile_weather)
  47.  
  48.  
  49. guyrjacdesimac8:wrangling Poodlewood$ python rain_histoB.py
  50. Traceback (most recent call last):
  51. File "rain_histoB.py", line 46, in <module>
  52. entries_histogram(turnstile_weather)
  53. File "rain_histoB.py", line 37, in entries_histogram
  54. df.plot(kind='hist', bins= 50, range = (0,4000), alpha=0.5)
  55. File "/Users/Poodlewood/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 2090, in plot_frame
  56. raise ValueError('Invalid chart type given %s' % kind)
  57. ValueError: Invalid chart type given hist
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement