Untitled

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Class 7- Starter code\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn import linear_model, metrics\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create sample data and fit a model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "df = pd.DataFrame({'x': range(100), 'y': range(100)})\n",
    "biased_df  = df.copy()\n",
    "biased_df.loc[:20, 'x'] = 1\n",
    "biased_df.loc[:20, 'y'] = 1\n",
    "\n",
    "def append_jitter(series):\n",
    "    jitter = np.random.random_sample(size=100)\n",
    "    return series + jitter\n",
    "\n",
    "df['x'] = append_jitter(df.x)\n",
    "df['y'] = append_jitter(df.y)\n",
    "\n",
    "biased_df['x'] = append_jitter(biased_df.x)\n",
    "biased_df['y'] = append_jitter(biased_df.y)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.168113519476\n"
     ]
    }
   ],
   "source": [
    "## fit\n",
    "lm = linear_model.LinearRegression().fit(df[['x']], df['y'])\n",
    "print metrics.mean_squared_error(df['y'], lm.predict(df[['x']]))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.17157553965\n"
     ]
    }
   ],
   "source": [
    "## biased fit\n",
    "lm = linear_model.LinearRegression().fit(biased_df[['x']], biased_df['y'])\n",
    "print metrics.mean_squared_error(df['y'], lm.predict(df[['x']]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cross validation\n",
    "#### Intro to cross validation with bike share data from last time. We will be modeling casual ridership. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from sklearn import cross_validation\n",
    "wd = '../../assets/dataset/'\n",
    "bikeshare = pd.read_csv(wd + 'bikeshare.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create dummy variables and set outcome (dependent) variable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "weather = pd.get_dummies(bikeshare.weathersit, prefix='weather')\n",
    "modeldata = bikeshare[['temp', 'hum']].join(weather[['weather_1', 'weather_2', 'weather_3']])\n",
    "y = bikeshare.casual "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create a cross valiation with 5 folds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "kf = cross_validation.KFold(len(modeldata), n_folds=5, shuffle=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "~~~~ CROSS VALIDATION each fold ~~~~\n",
      "Model 1\n",
      "MSE: 1808.27525551\n",
      "R2: 0.311880442375\n",
      "Model 2\n",
      "MSE: 1675.17498999\n",
      "R2: 0.311926484007\n",
      "Model 3\n",
      "MSE: 1641.19957929\n",
      "R2: 0.311884067469\n",
      "Model 4\n",
      "MSE: 1698.31203534\n",
      "R2: 0.311899972438\n",
      "Model 5\n",
      "MSE: 1544.00046427\n",
      "R2: 0.311895007988\n",
      "~~~~ SUMMARY OF CROSS VALIDATION ~~~~\n",
      "Mean of MSE for all folds: 1673.39246488\n",
      "Mean of R2 for all folds: 0.311897194855\n"
     ]
    }
   ],
   "source": [
    "mse_values = []\n",
    "scores = []\n",
    "n= 0\n",
    "print \"~~~~ CROSS VALIDATION each fold ~~~~\"\n",
    "for train_index, test_index in kf:\n",
    "    lm = linear_model.LinearRegression().fit(modeldata.iloc[train_index], y.iloc[train_index])\n",
    "    mse_values.append(metrics.mean_squared_error(y.iloc[test_index], lm.predict(modeldata.iloc[test_index])))\n",
    "    scores.append(lm.score(modeldata, y))\n",
    "    n+=1\n",
    "    print 'Model', n\n",
    "    print 'MSE:', mse_values[n-1]\n",
    "    print 'R2:', scores[n-1]\n",
    "\n",
    "\n",
    "print \"~~~~ SUMMARY OF CROSS VALIDATION ~~~~\"\n",
    "print 'Mean of MSE for all folds:', np.mean(mse_values)\n",
    "print 'Mean of R2 for all folds:', np.mean(scores)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "~~~~ Single Model ~~~~\n",
      "MSE of single model: 1672.58110765\n",
      "R2:  0.311934605989\n"
     ]
    }
   ],
   "source": [
    "lm = linear_model.LinearRegression().fit(modeldata, y)\n",
    "print \"~~~~ Single Model ~~~~\"\n",
    "print 'MSE of single model:', metrics.mean_squared_error(y, lm.predict(modeldata))\n",
    "print 'R2: ', lm.score(modeldata, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check\n",
    "While the cross validated approach here generated more overall error, which of the two approaches would predict new data more accurately: the single model or the cross validated, averaged one? Why?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cross-validation would predict new data more accurately, as it swaps bias error for generalized error (variance). We would rather describe future trends well enough than past trends (or training data?) perfectly. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### There are ways to improve our model with regularization. \n",
    "Let's check out the effects on MSE and R2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "~~~ OLS ~~~\n",
      "OLS MSE:  1672.58110765\n",
      "OLS R2: 0.311934605989\n",
      "~~~ Lasso ~~~\n",
      "Lasso MSE:  1725.41581608\n",
      "Lasso R2: 0.290199495922\n",
      "~~~ Ridge ~~~\n",
      "Ridge MSE:  1672.60490113\n",
      "Ridge R2: 0.311924817843\n"
     ]
    }
   ],
   "source": [
    "lm = linear_model.LinearRegression().fit(modeldata, y)\n",
    "print \"~~~ OLS ~~~\"\n",
    "print 'OLS MSE: ', metrics.mean_squared_error(y, lm.predict(modeldata))\n",
    "print 'OLS R2:', lm.score(modeldata, y)\n",
    "\n",
    "lm = linear_model.Lasso().fit(modeldata, y)\n",
    "print \"~~~ Lasso ~~~\"\n",
    "print 'Lasso MSE: ', metrics.mean_squared_error(y, lm.predict(modeldata))\n",
    "print 'Lasso R2:', lm.score(modeldata, y)\n",
    "\n",
    "lm = linear_model.Ridge().fit(modeldata, y)\n",
    "print \"~~~ Ridge ~~~\"\n",
    "print 'Ridge MSE: ', metrics.mean_squared_error(y, lm.predict(modeldata))\n",
    "print 'Ridge R2:', lm.score(modeldata, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Figuring out the alphas can be done by \"hand\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Alpha: 1e-10\n",
      "[ 112.68901765  -84.01121684  -24.68489063  -21.00314493  -21.71893628]\n",
      "1672.58110765\n",
      "Alpha: 1e-09\n",
      "[ 112.68901765  -84.01121684  -24.6848906   -21.00314491  -21.71893626]\n",
      "1672.58110765\n",
      "Alpha: 1e-08\n",
      "[ 112.68901765  -84.01121684  -24.6848904   -21.00314471  -21.71893606]\n",
      "1672.58110765\n",
      "Alpha: 1e-07\n",
      "[ 112.68901763  -84.01121682  -24.68488837  -21.00314268  -21.71893403]\n",
      "1672.58110765\n",
      "Alpha: 1e-06\n",
      "[ 112.68901745  -84.01121667  -24.68486804  -21.00312237  -21.71891373]\n",
      "1672.58110765\n",
      "Alpha: 1e-05\n",
      "[ 112.68901562  -84.01121509  -24.68466472  -21.00291929  -21.71871079]\n",
      "1672.58110765\n",
      "Alpha: 0.0001\n",
      "[ 112.68899732  -84.01119938  -24.68263174  -21.00088873  -21.71668161]\n",
      "1672.58110765\n",
      "Alpha: 0.001\n",
      "[ 112.68881437  -84.01104228  -24.66232204  -20.98060316  -21.69640993]\n",
      "1672.58110774\n",
      "Alpha: 0.01\n",
      "[ 112.68698753  -84.00947323  -24.46121539  -20.77973778  -21.49568404]\n",
      "1672.58111645\n",
      "Alpha: 0.1\n",
      "[ 112.66896732  -83.99396383  -22.63109556  -18.95202277  -19.66942371]\n",
      "1672.58185208\n",
      "Alpha: 1.0\n",
      "[ 112.50129738  -83.84805622  -13.38214934   -9.72671278  -10.46162477]\n",
      "1672.60490113\n",
      "Alpha: 10.0\n",
      "[ 110.96062533  -82.49604961   -3.94431741   -0.51765034   -1.45024412]\n",
      "1672.83347262\n",
      "Alpha: 100.0\n",
      "[ 97.69060562 -71.17602377  -0.31585194   1.18284675  -1.33281591]\n",
      "1686.31830362\n",
      "Alpha: 1000.0\n",
      "[ 44.59923075 -30.85843772   5.07876321   0.05369643  -5.107457  ]\n",
      "1937.81576044\n",
      "Alpha: 10000.0\n",
      "[ 7.03007064 -5.07733082  3.29039029 -1.2136063  -2.06842808]\n",
      "2314.83675678\n",
      "Alpha: 100000.0\n",
      "[ 0.75195708 -0.56490872  0.52067881 -0.25075496 -0.26895254]\n",
      "2415.77806566\n",
      "Alpha: 1000000.0\n",
      "[ 0.07576571 -0.05727511  0.05520142 -0.0273591  -0.02774349]\n",
      "2429.28026459\n",
      "Alpha: 10000000.0\n",
      "[ 0.00758239 -0.00573569  0.0055535  -0.00276043 -0.00278317]\n",
      "2430.68891798\n",
      "Alpha: 100000000.0\n",
      "[ 0.0007583  -0.00057365  0.00055569 -0.00027629 -0.00027841]\n",
      "2430.83041212\n",
      "Alpha: 1000000000.0\n",
      "[  7.58303020e-05  -5.73659720e-05   5.55719458e-05  -2.76314619e-05\n",
      "  -2.78414555e-05]\n",
      "2430.84456787\n",
      "Alpha: 10000000000.0\n",
      "[  7.58303603e-06  -5.73660542e-06   5.55722818e-06  -2.76317091e-06\n",
      "  -2.78415441e-06]\n",
      "2430.84598351\n"
     ]
    }
   ],
   "source": [
    "alphas = np.logspace(-10, 10, 21)\n",
    "for a in alphas:\n",
    "    print 'Alpha:', a\n",
    "    lm = linear_model.Ridge(alpha=a)\n",
    "    lm.fit(modeldata, y)\n",
    "    print lm.coef_\n",
    "    print metrics.mean_squared_error(y, lm.predict(modeldata))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Or we can use grid search to make this faster"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GridSearchCV(cv=None, error_score='raise',\n",
       "       estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,\n",
       "   normalize=False, random_state=None, solver='auto', tol=0.001),\n",
       "       fit_params={}, iid=True, n_jobs=1,\n",
       "       param_grid={'alpha': array([  1.00000e-10,   1.00000e-09,   1.00000e-08,   1.00000e-07,\n",
       "         1.00000e-06,   1.00000e-05,   1.00000e-04,   1.00000e-03,\n",
       "         1.00000e-02,   1.00000e-01,   1.00000e+00,   1.00000e+01,\n",
       "         1.00000e+02,   1.00000e+03,   1.00000e+04,   1.00000e+05,\n",
       "         1.00000e+06,   1.00000e+07,   1.00000e+08,   1.00000e+09,\n",
       "         1.00000e+10])},\n",
       "       pre_dispatch='2*n_jobs', refit=True, scoring='mean_squared_error',\n",
       "       verbose=0)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn import grid_search\n",
    "\n",
    "alphas = np.logspace(-10, 10, 21)\n",
    "gs = grid_search.GridSearchCV(\n",
    "    estimator=linear_model.Ridge(),\n",
    "    param_grid={'alpha': alphas},\n",
    "    scoring='mean_squared_error')\n",
    "\n",
    "gs.fit(modeldata, y)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Best score "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-1814.0936913337957"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs.best_score_ "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### mean squared error here comes in negative, so let's make it positive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1814.0936913337957"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "-gs.best_score_ "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### explains which grid_search setup worked best"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ridge(alpha=10.0, copy_X=True, fit_intercept=True, max_iter=None,\n",
       "   normalize=False, random_state=None, solver='auto', tol=0.001)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs.best_estimator_ "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### shows all the grid pairings and their performances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mean: -1817.58711, std: 542.14315, params: {'alpha': 1e-10}\n",
      "mean: -1817.58711, std: 542.14315, params: {'alpha': 1.0000000000000001e-09}\n",
      "mean: -1817.58711, std: 542.14315, params: {'alpha': 1e-08}\n",
      "mean: -1817.58711, std: 542.14315, params: {'alpha': 9.9999999999999995e-08}\n",
      "mean: -1817.58711, std: 542.14315, params: {'alpha': 9.9999999999999995e-07}\n",
      "mean: -1817.58711, std: 542.14317, params: {'alpha': 1.0000000000000001e-05}\n",
      "mean: -1817.58707, std: 542.14331, params: {'alpha': 0.0001}\n",
      "mean: -1817.58663, std: 542.14477, params: {'alpha': 0.001}\n",
      "mean: -1817.58230, std: 542.15933, params: {'alpha': 0.01}\n",
      "mean: -1817.54318, std: 542.30102, params: {'alpha': 0.10000000000000001}\n",
      "mean: -1817.20111, std: 543.63587, params: {'alpha': 1.0}\n",
      "mean: -1814.09369, std: 556.35563, params: {'alpha': 10.0}\n",
      "mean: -1818.51694, std: 653.68607, params: {'alpha': 100.0}\n",
      "mean: -2125.58777, std: 872.45270, params: {'alpha': 1000.0}\n",
      "mean: -2458.08836, std: 951.30428, params: {'alpha': 10000.0}\n",
      "mean: -2532.21151, std: 962.80083, params: {'alpha': 100000.0}\n",
      "mean: -2541.38479, std: 963.98339, params: {'alpha': 1000000.0}\n",
      "mean: -2542.32833, std: 964.10141, params: {'alpha': 10000000.0}\n",
      "mean: -2542.42296, std: 964.11321, params: {'alpha': 100000000.0}\n",
      "mean: -2542.43242, std: 964.11439, params: {'alpha': 1000000000.0}\n",
      "mean: -2542.43337, std: 964.11450, params: {'alpha': 10000000000.0}\n"
     ]
    }
   ],
   "source": [
    "for s in gs.grid_scores_:\n",
    "    print s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ACTIVITY"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradient Descent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.2 is better than 6.2\n",
      "found better solution! using 5.2\n",
      "4.2 is better than 5.2\n",
      "found better solution! using 4.2\n",
      "3.2 is better than 4.2\n",
      "found better solution! using 3.2\n",
      "2.2 is better than 3.2\n",
      "found better solution! using 2.2\n",
      "1.2 is better than 2.2\n",
      "found better solution! using 1.2\n",
      "0.2 is better than 1.2\n",
      "found better solution! using 0.2\n",
      "6.0 is closest to 6.2\n"
     ]
    }
   ],
   "source": [
    "num_to_approach, start, steps, optimized = 6.2, 0., [-1, 1], False\n",
    "\n",
    "while not optimized:\n",
    "    current_distance = num_to_approach - start\n",
    "    got_better = False\n",
    "    next_steps = [start + i for i in steps]\n",
    "    for n in next_steps:\n",
    "        distance = np.abs(num_to_approach - n)\n",
    "        if distance < current_distance:\n",
    "            got_better = True\n",
    "            print distance, 'is better than', current_distance\n",
    "            current_distance = distance\n",
    "            start = n\n",
    "    if got_better:\n",
    "        print 'found better solution! using', current_distance\n",
    "    else:\n",
    "        optimized = True\n",
    "        print start, 'is closest to', num_to_approach"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bonus: \n",
    "implement a stopping point, similar to what n_iter would do in gradient descent when we've reached \"good enough\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "15.2 is better than 16.2\n",
      "found better solution! using 15.2\n",
      "14.2 is better than 15.2\n",
      "found better solution! using 14.2\n",
      "13.2 is better than 14.2\n",
      "found better solution! using 13.2\n",
      "12.2 is better than 13.2\n",
      "found better solution! using 12.2\n",
      "11.2 is better than 12.2\n",
      "found better solution! using 11.2\n",
      "10.2 is better than 11.2\n",
      "found better solution! using 10.2\n",
      "9.2 is better than 10.2\n",
      "found better solution! using 9.2\n",
      "8.2 is better than 9.2\n",
      "found better solution! using 8.2\n",
      "7.2 is better than 8.2\n",
      "found better solution! using 7.2\n",
      "6.2 is better than 7.2\n",
      "found better solution! using 6.2\n",
      "5.2 is better than 6.2\n",
      "found better solution! using 5.2\n",
      "\n",
      "Exceeded maximum iterations!\n",
      "Stopping gradient descent...\n"
     ]
    }
   ],
   "source": [
    "num_to_approach, start, steps, optimized, max_iter = 6.2, -10., [-1, 1], False, 10\n",
    "n_iter = 0\n",
    "\n",
    "while not optimized:\n",
    "    current_distance = num_to_approach - start\n",
    "    got_better = False\n",
    "    next_steps = [start + i for i in steps]\n",
    "    for n in next_steps:\n",
    "        distance = np.abs(num_to_approach - n)\n",
    "        if distance < current_distance:\n",
    "            got_better = True\n",
    "            print distance, 'is better than', current_distance\n",
    "            current_distance = distance\n",
    "            start = n\n",
    "    if got_better:\n",
    "        print 'found better solution! using', current_distance\n",
    "    else:\n",
    "        optimized = True\n",
    "        print start, 'is closest to', num_to_approach\n",
    "    \n",
    "    n_iter += 1\n",
    "    \n",
    "    if n_iter > max_iter:\n",
    "        print \"\\nExceeded maximum iterations!\\nStopping gradient descent...\"\n",
    "        optimized = True\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Demo: Application of Gradient Descent "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-- Epoch 1\n",
      "Norm: 73.53, NNZs: 5, Bias: 14.490061, T: 17379, Avg. loss: 1005.162964\n",
      "Total training time: 0.01 seconds.\n",
      "-- Epoch 2\n",
      "Norm: 98.87, NNZs: 5, Bias: 17.287780, T: 34758, Avg. loss: 948.414396\n",
      "Total training time: 0.01 seconds.\n",
      "-- Epoch 3\n",
      "Norm: 112.96, NNZs: 5, Bias: 18.692687, T: 52137, Avg. loss: 918.970590\n",
      "Total training time: 0.01 seconds.\n",
      "-- Epoch 4\n",
      "Norm: 121.51, NNZs: 5, Bias: 19.512785, T: 69516, Avg. loss: 901.276514\n",
      "Total training time: 0.01 seconds.\n",
      "-- Epoch 5\n",
      "Norm: 127.03, NNZs: 5, Bias: 20.075823, T: 86895, Avg. loss: 889.599148\n",
      "Total training time: 0.02 seconds.\n",
      "\n",
      "Gradient Descent R2: 0.308305395581\n",
      "Gradient Descent MSE: 1681.40315977\n"
     ]
    }
   ],
   "source": [
    "lm = linear_model.SGDRegressor(verbose=1)\n",
    "lm.fit(modeldata, y)\n",
    "print \"\\nGradient Descent R2:\", lm.score(modeldata, y)\n",
    "print \"Gradient Descent MSE:\", metrics.mean_squared_error(y, lm.predict(modeldata))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check: Untuned, how well did gradient descent perform compared to OLS?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparing the two:\n",
    "\n",
    "    OLS MSE:  1672.58110765\n",
    "    OLS R2: 0.311934605989\n",
    "\n",
    "    Gradient Descent R2: 0.307785992311\n",
    "    Gradient Descent MSE: 1682.6657492\n",
    "\n",
    "Untuned gradient descent performed slightly worse in both MSE and R-squared."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Independent Practice: Bike data revisited\n",
    "\n",
    "There are tons of ways to approach a regression problem. The regularization techniques appended to ordinary least squares optimizes the size of coefficients to best account for error. Gradient Descent also introduces learning rate (how aggressively do we solve the problem), epsilon (at what point do we say the error margin is acceptable), and iterations (when should we stop no matter what?)\n",
    "\n",
    "For this deliverable, our goals are to:\n",
    "\n",
    "- implement the gradient descent approach to our bike-share modeling problem,\n",
    "- show how gradient descent solves and optimizes the solution,\n",
    "- demonstrate the grid_search module!\n",
    "\n",
    "While exploring the Gradient Descent regressor object, you'll build a grid search using the stochastic gradient descent estimator for the bike-share data set. Continue with either the model you evaluated last class or the simpler one from today. In particular, be sure to implement the \"param_grid\" in the grid search to get answers for the following questions:\n",
    "\n",
    "- With a set of alpha values between 10^-10 and 10^-1, how does the mean squared error change?\n",
    "- Based on the data, we know when to properly use l1 vs l2 regularization. By using a grid search with l1_ratios between 0 and 1 (increasing every 0.05), does that statement hold true? If not, did gradient descent have enough iterations?\n",
    "- How do these results change when you alter the learning rate (eta0)?\n",
    "\n",
    "**Bonus**: Can you see the advantages and disadvantages of using gradient descent after finishing this exercise?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Starter Code\n",
    "\n",
    "- With a set of alpha values between 10^-10 and 10^-1, how does the mean squared error change?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BEST ESTIMATOR\n",
      "1687.88462245\n",
      "SGDRegressor(alpha=1.0000000000000001e-05, average=False, epsilon=0.1,\n",
      "       eta0=0.01, fit_intercept=True, l1_ratio=0.15,\n",
      "       learning_rate='invscaling', loss='squared_loss', n_iter=5,\n",
      "       penalty='l2', power_t=0.25, random_state=None, shuffle=True,\n",
      "       verbose=0, warm_start=False)\n",
      "ALL ESTIMATORS\n",
      "[mean: -1690.00089, std: 83.71968, params: {'alpha': 1e-10}, mean: -1690.52355, std: 86.15990, params: {'alpha': 1.0000000000000001e-09}, mean: -1689.36280, std: 87.97990, params: {'alpha': 1e-08}, mean: -1689.67661, std: 86.58292, params: {'alpha': 9.9999999999999995e-08}, mean: -1688.35663, std: 86.38886, params: {'alpha': 9.9999999999999995e-07}, mean: -1687.88462, std: 85.34930, params: {'alpha': 1.0000000000000001e-05}, mean: -1690.12154, std: 85.66560, params: {'alpha': 0.0001}, mean: -1693.91600, std: 87.55442, params: {'alpha': 0.001}, mean: -1728.73018, std: 89.25997, params: {'alpha': 0.01}, mean: -2053.85245, std: 99.39294, params: {'alpha': 0.10000000000000001}]\n"
     ]
    }
   ],
   "source": [
    "alphas = np.logspace(-10, -1, 10)\n",
    "params = {'alpha': alphas}\n",
    "\n",
    "gs = grid_search.GridSearchCV(\n",
    "    estimator=linear_model.SGDRegressor(),\n",
    "    cv=cross_validation.KFold(len(modeldata), n_folds=5, shuffle=True),\n",
    "    param_grid=params,\n",
    "    scoring='mean_squared_error',\n",
    "    )\n",
    "\n",
    "gs.fit(modeldata, y)\n",
    "\n",
    "print 'BEST ESTIMATOR'\n",
    "print -gs.best_score_\n",
    "print gs.best_estimator_\n",
    "print 'ALL ESTIMATORS'\n",
    "print gs.grid_scores_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Changing the alphas between 10^-10 and 10^-1, we get a lowest MSE at 10^-8 of 1687.88. At 10^-10 MSE was 1690.00 and at 10^-1 the MSE was the highest value of 2053.85. Clearly picking the selecting the correct regularization parameter is important. Will set alpha to 10^-08 going forward.\n",
    "\n",
    "Next we will look at l1 vs l2:\n",
    "- Based on the data, we know when to properly use l1 vs l2 regularization. By using a grid search with l1_ratios between 0 and 1 (increasing every 0.05), does that statement hold true? If not, did gradient descent have enough iterations?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BEST ESTIMATOR\n",
      "1688.43060467\n",
      "SGDRegressor(alpha=1e-08, average=False, epsilon=0.1, eta0=0.01,\n",
      "       fit_intercept=True, l1_ratio=0.75, learning_rate='invscaling',\n",
      "       loss='squared_loss', n_iter=5, penalty='l2', power_t=0.25,\n",
      "       random_state=None, shuffle=True, verbose=0, warm_start=False)\n",
      "\n",
      "ALL ESTIMATORS\n",
      "[mean: -1689.50994, std: 56.84150, params: {'l1_ratio': 0.0}, mean: -1689.78229, std: 55.62580, params: {'l1_ratio': 0.050000000000000003}, mean: -1689.11004, std: 57.74534, params: {'l1_ratio': 0.10000000000000001}, mean: -1690.72398, std: 58.51519, params: {'l1_ratio': 0.15000000000000002}, mean: -1688.77832, std: 57.24667, params: {'l1_ratio': 0.20000000000000001}, mean: -1691.31072, std: 58.06187, params: {'l1_ratio': 0.25}, mean: -1690.51925, std: 59.51318, params: {'l1_ratio': 0.30000000000000004}, mean: -1690.12908, std: 56.28294, params: {'l1_ratio': 0.35000000000000003}, mean: -1689.26135, std: 58.15524, params: {'l1_ratio': 0.40000000000000002}, mean: -1688.75290, std: 57.99243, params: {'l1_ratio': 0.45000000000000001}, mean: -1689.45895, std: 57.81882, params: {'l1_ratio': 0.5}, mean: -1688.79888, std: 57.59656, params: {'l1_ratio': 0.55000000000000004}, mean: -1689.48729, std: 57.10687, params: {'l1_ratio': 0.60000000000000009}, mean: -1689.19351, std: 57.64491, params: {'l1_ratio': 0.65000000000000002}, mean: -1689.78649, std: 57.38352, params: {'l1_ratio': 0.70000000000000007}, mean: -1688.43060, std: 57.29220, params: {'l1_ratio': 0.75}, mean: -1689.41114, std: 57.81888, params: {'l1_ratio': 0.80000000000000004}, mean: -1689.36202, std: 57.49003, params: {'l1_ratio': 0.85000000000000009}, mean: -1689.15929, std: 56.94886, params: {'l1_ratio': 0.90000000000000002}, mean: -1689.29226, std: 57.48814, params: {'l1_ratio': 0.95000000000000007}, mean: -1689.21732, std: 57.47393, params: {'l1_ratio': 1.0}]\n"
     ]
    }
   ],
   "source": [
    "l1s = np.linspace(0, 1, num=21)\n",
    "params = {'l1_ratio': l1s}\n",
    "\n",
    "gs = grid_search.GridSearchCV(\n",
    "    estimator=linear_model.SGDRegressor(alpha=0.00000001),\n",
    "    cv=cross_validation.KFold(len(modeldata), n_folds=5, shuffle=True),\n",
    "    param_grid=params,\n",
    "    scoring='mean_squared_error',\n",
    "    )\n",
    "\n",
    "gs.fit(modeldata, y)\n",
    "\n",
    "print 'BEST ESTIMATOR'\n",
    "print -gs.best_score_\n",
    "print gs.best_estimator_\n",
    "print '\\nALL ESTIMATORS'\n",
    "print gs.grid_scores_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the above a number of times resulted in l1_ratios ranging from 0.15 to 0.85, at the default 5 iterations. Will now try to increase the number of iterations to attempt to find a stable l1_ratio:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BEST ESTIMATOR\n",
      "1673.20055258\n",
      "SGDRegressor(alpha=1e-08, average=False, epsilon=0.1, eta0=0.01,\n",
      "       fit_intercept=True, l1_ratio=0.65000000000000002,\n",
      "       learning_rate='invscaling', loss='squared_loss', n_iter=20,\n",
      "       penalty='l2', power_t=0.25, random_state=None, shuffle=True,\n",
      "       verbose=0, warm_start=False)\n"
     ]
    }
   ],
   "source": [
    "l1s = np.linspace(0, 1, num=21)\n",
    "n_iters = (10, 20, 50, 100)\n",
    "params = {'l1_ratio': l1s, 'n_iter': n_iters}\n",
    "\n",
    "gs = grid_search.GridSearchCV(\n",
    "    estimator=linear_model.SGDRegressor(alpha=0.00000001),\n",
    "    cv=cross_validation.KFold(len(modeldata), n_folds=5, shuffle=True),\n",
    "    param_grid=params,\n",
    "    scoring='mean_squared_error',\n",
    "    )\n",
    "\n",
    "gs.fit(modeldata, y)\n",
    "\n",
    "print 'BEST ESTIMATOR'\n",
    "print -gs.best_score_\n",
    "print gs.best_estimator_\n",
    "#print 'ALL ESTIMATORS'\n",
    "#print gs.grid_scores_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Running the above in an attempt to find a stable l1_ratio was not successful. All runs resulted in a MSE of about 1673, with l1_ratios varying anywhere from 0.1 to 0.95 and number of iterations from 20 to 100. Given that the l1_ratio appears to be dependent on the particular shuffling from a given run, it does not look like further effort should be expended in optimizing the l1_ratio, particularly with similiar MSE results. I'll set n_iters at 20 as that was the minimum value I saw for the Best Estimator and higher iterations did not markedly change results.\n",
    "\n",
    "Next will look at learning rates:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BEST ESTIMATOR\n",
      "1673.89830475\n",
      "SGDRegressor(alpha=1e-08, average=False, epsilon=0.1, eta0=0.01,\n",
      "       fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',\n",
      "       loss='squared_loss', n_iter=20, penalty='l2', power_t=0.25,\n",
      "       random_state=None, shuffle=True, verbose=0, warm_start=False)\n",
      "ALL ESTIMATORS\n",
      "[mean: -1934.35399, std: 400.13078, params: {'learning_rate': 'constant', 'eta0': 0.1}, mean: -26352639739801690830857568256.00000, std: 16865968182123518421972811776.00000, params: {'learning_rate': 'optimal', 'eta0': 0.1}, mean: -1681.45153, std: 77.83622, params: {'learning_rate': 'invscaling', 'eta0': 0.1}, mean: -1976.93762, std: 377.64384, params: {'learning_rate': 'constant', 'eta0': 0.05}, mean: -22804907109609210721452687360.00000, std: 11842527670805083618562539520.00000, params: {'learning_rate': 'optimal', 'eta0': 0.05}, mean: -1675.15116, std: 78.44786, params: {'learning_rate': 'invscaling', 'eta0': 0.05}, mean: -1694.36804, std: 83.75146, params: {'learning_rate': 'constant', 'eta0': 0.01}, mean: -12861899041372012389230182400.00000, std: 10163577213498355691307401216.00000, params: {'learning_rate': 'optimal', 'eta0': 0.01}, mean: -1673.89830, std: 77.48118, params: {'learning_rate': 'invscaling', 'eta0': 0.01}]\n"
     ]
    }
   ],
   "source": [
    "learning_rates = ('constant', 'optimal', 'invscaling')\n",
    "eta0s = (0.1, 0.05, 0.01)\n",
    "\n",
    "params = {'learning_rate': learning_rates, 'eta0': eta0s}\n",
    "\n",
    "gs = grid_search.GridSearchCV(\n",
    "    estimator=linear_model.SGDRegressor(alpha=0.00000001, n_iter=20),\n",
    "    cv=cross_validation.KFold(len(modeldata), n_folds=5, shuffle=True),\n",
    "    param_grid=params,\n",
    "    scoring='mean_squared_error',\n",
    "    )\n",
    "\n",
    "gs.fit(modeldata, y)\n",
    "\n",
    "print 'BEST ESTIMATOR'\n",
    "print -gs.best_score_\n",
    "print gs.best_estimator_\n",
    "print 'ALL ESTIMATORS'\n",
    "print gs.grid_scores_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "To answer the question: How do these results change when you alter the learning rate (eta0)?\n",
    "\n",
    "Varying learning rates between 0.1, 0.05, and .01 along with selecting between constant, optimal, and invscaling, resulted in MSE ranging from about from expected values in the range of 1673 to around 1700 for various eta0s and constant or invscaling. But absurdly high MSEs were found with optimal learning rate, in the range of 10^30! Obviously, something with the 'optimal' selection is not playing well with the selection of eta0. It may be that optimal has a default fixed eta0 and attempt to vary it leads to these non-useful results.\n",
    "\n",
    "In any case, repeated runs suggest that an invscaling selection along with eta0 of 0.01 resulted in the lowest MSE.\n",
    "\n",
    "Putting this all together for a final model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BEST ESTIMATOR\n",
      "1674.35011068\n",
      "SGDRegressor(alpha=1e-08, average=False, epsilon=0.1, eta0=0.01,\n",
      "       fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',\n",
      "       loss='squared_loss', n_iter=20, penalty='l2', power_t=0.25,\n",
      "       random_state=None, shuffle=True, verbose=0, warm_start=False)\n",
      "ALL ESTIMATORS\n",
      "[mean: -1785.90070, std: 97.78826, params: {'learning_rate': 'constant', 'eta0': 0.1}, mean: -11277800250012638309148262400.00000, std: 6777965403180558106704740352.00000, params: {'learning_rate': 'optimal', 'eta0': 0.1}, mean: -1687.49211, std: 109.20730, params: {'learning_rate': 'invscaling', 'eta0': 0.1}, mean: -1744.31843, std: 109.50611, params: {'learning_rate': 'constant', 'eta0': 0.05}, mean: -11050721264616227253038088192.00000, std: 8124596001136890987856003072.00000, params: {'learning_rate': 'optimal', 'eta0': 0.05}, mean: -1676.31982, std: 104.01003, params: {'learning_rate': 'invscaling', 'eta0': 0.05}, mean: -1704.08559, std: 127.64098, params: {'learning_rate': 'constant', 'eta0': 0.01}, mean: -19492282714118071886198865920.00000, std: 10900841562942524212643889152.00000, params: {'learning_rate': 'optimal', 'eta0': 0.01}, mean: -1674.35011, std: 103.74864, params: {'learning_rate': 'invscaling', 'eta0': 0.01}]\n"
     ]
    }
   ],
   "source": [
    "gs = grid_search.GridSearchCV(\n",
    "    estimator=linear_model.SGDRegressor(alpha=0.00000001, n_iter=20, learning_rate='invscaling', eta0=0.01),\n",
    "    cv=cross_validation.KFold(len(modeldata), n_folds=5, shuffle=True),\n",
    "    param_grid=params,\n",
    "    scoring='mean_squared_error',\n",
    "    )\n",
    "\n",
    "gs.fit(modeldata, y)\n",
    "\n",
    "print 'BEST ESTIMATOR'\n",
    "print -gs.best_score_\n",
    "print gs.best_estimator_\n",
    "print 'ALL ESTIMATORS'\n",
    "print gs.grid_scores_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Repeated runs resulted in an MSE ranging from 1673 to 1675.\n",
    "\n",
    "### Observations and Conclusions\n",
    "\n",
    "One notable result is that the optimizations converged on a MSE of about 1673 once we started varying the number of iterations and the l1 ratio. Futher optimization with learning rates did not improve the error, although the error could be made markedly worse with the 'optimal' learning rate selection.\n",
    "\n",
    "I suspect that the area of diminishing returns has been reached: additional effort on optimization is unlikely to result in further significant decreases in MSE. This is likely due to the fundamental nature of Gradient Descent: it seeks to minimize the error in a step-wise fashion, converging on a solution of a given set of data. If there is no perfect solution, than the result of Gradient Descent will have some inherent level of error. In this case, a linear regression is able to fit the data well but not perfectly, hence the results we saw with the Gradient Descent."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}