Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- import numpy as np
- import pandas
- import matplotlib.pyplot as plt
- def entries_histogram(turnstile_weather):
- '''
- Before we perform any analysis, it might be useful to take a
- look at the data we're hoping to analyze. More specifically, let's
- examine the hourly entries in our NYC subway data and determine what
- distribution the data follows. This data is stored in a dataframe
- called turnstile_weather under the ['ENTRIESn_hourly'] column.
- Let's plot two histograms on the same axes to show hourly
- entries when raining vs. when not raining. Here's an example on how
- to plot histograms with pandas and matplotlib:
- turnstile_weather['column_to_graph'].hist()
- Your histograph may look similar to bar graph in the instructor notes below.
- You can read a bit about using matplotlib and pandas to plot histograms here:
- http://pandas.pydata.org/pandas-docs/stable/visualization.html#histograms
- You can see the information contained within the turnstile weather data here:
- https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
- '''
- plt.figure()
- turnstile_rain = turnstile_weather[turnstile_weather['rain']==1]
- entries_rain = turnstile_rain['ENTRIESn_hourly']
- turnstile_norain = turnstile_weather[turnstile_weather['rain']==0]
- entries_norain = turnstile_norain['ENTRIESn_hourly']
- df = pandas.DataFrame({'no_rain':entries_norain, 'rain':entries_rain}, columns = ['no_rain', 'rain'])
- df.plot(kind='hist', bins= 50, range = (0,4000), alpha=0.5)
- plt.ylabel('Frequency')
- plt.xlabel('Number of Entries/ Hour')
- plt.title('Histogram of Turnstile Entries per Hour: Rainy Days vs. Non-Rainy Days')
- plt.text(2160, 2000, r'note x-axis is truncated')
- plt.show()
- turnstile_weather = pandas.read_csv('turnstile_data_master_with_weather.csv')
- entries_histogram(turnstile_weather)
- guyrjacdesimac8:wrangling Poodlewood$ python rain_histoB.py
- Traceback (most recent call last):
- File "rain_histoB.py", line 46, in <module>
- entries_histogram(turnstile_weather)
- File "rain_histoB.py", line 37, in entries_histogram
- df.plot(kind='hist', bins= 50, range = (0,4000), alpha=0.5)
- File "/Users/Poodlewood/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 2090, in plot_frame
- raise ValueError('Invalid chart type given %s' % kind)
- ValueError: Invalid chart type given hist
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement