john_1726

Untitled

Jul 11th, 2022
75
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.79 KB | None | 0 0
  1. # Usually, hyperparameter tuning is combined with cross-validation. Sometimes, we want to run cross validation independently to see whether a candidate model is generalized enough for this dataset. To this end, let's use linear regressor as an example.
  2. #
  3. # Code Listing 9.01. Import all the necessary packages for the cross-validation example. We use the first 150 data points of the diabetes dataset and make a linear regressor with default parameters.
  4. from sklearn import datasets, linear_model
  5. from sklearn.model_selection import cross_validate
  6. diabetes = datasets.load_diabetes()
  7. X = diabetes.data[:150]
  8. y = diabetes.target[:150]
  9. lr = linear_model.LinearRegression()
  10. # Next, we will use the method cross_validate() to apply cross-validation on the linear regressor
  11. #
  12. # Code Listing 9.02. Use the cross_validate() method to apply 5-fold cross-validation on the linear regressor, and then print the test scores.
  13. scores = cross_validate(lr, X, y, cv=5, scoring=('r2', 'neg_mean_squared_error'),
  14. return_train_score=True)
  15. print("negative mean squared errors: ", scores["test_neg_mean_squared_error"])
  16. print("r2 scores: ", scores["test_r2"])
  17. negative mean squared errors: [-2547.29219945 -4523.25983124 -2301.49369105 -4378.07848216
  18. -2409.19372015]
  19. r2 scores: [0.36324841 0.28239194 0.4211776 0.30071196 0.61240533]
  20. # We use 5-fold cross-validation and use r2 and negative mean square error as the metrics. As we can see from the output, the linear regressor performs differently on each fold. That's why cross-validation can help us observe the performance vibration when data changes.
  21. #
  22. # Now we can try to incorporate hyperparameter tuning and see how it improves the performance over the model with default parameters. We know random forest models perform very well with default settings. Can we still make improvements with hyperparameter tuning and cross-validation?
  23. #
  24. # Firstly, we will fetch the California housing dataset for this practice. As usual, we will randomly get 80% of the data for training.
  25. #
  26. # Code Listing 9.03. Fetch the California housing dataset and split it into training/test sets.
  27. import numpy as np
  28. from sklearn.model_selection import RandomizedSearchCV
  29. from sklearn.datasets import fetch_california_housing
  30. from sklearn.model_selection import train_test_split
  31. from sklearn.metrics import mean_squared_error, r2_score
  32. california_housing_bunch = fetch_california_housing()
  33. california_housing_X, california_housing_y = california_housing_bunch.data, california_housing_bunch.target
  34. x_train, x_test, y_train, y_test = train_test_split(california_housing_X, california_housing_y, test_size=0.2)
  35. # For the second step, we need to create a basic estimator. If you want to fix some hyperparameters, you can set them at this stage.
  36. #
  37. # train a kNN regressor with k = 10
  38. n_neighbors=10
  39. from sklearn.neighbors import KNeighborsRegressor
  40. knn_10_regr = KNeighborsRegressor(n_neighbors=10)
  41. knn_10_regr.fit(x_train, y_train)
  42. # train a kNN regressor with k = 100
  43. knn_100_regr = KNeighborsRegressor(n_neighbors=100, weights="distance")
  44. knn_100_regr.fit(x_train, y_train)
  45. KNeighborsRegressor(n_neighbors=100, weights='distance')
  46. # Now it is time to create a hyperparameter grid for a random search.
  47. #
  48. # Code Listing 9.05. Create a hyperparameter grid for 3 parameters: n_estimators, max_depth, bootstrap for RandomForestRegressor.
  49. #
  50. # Number of trees in random forest
  51. n_estimators = [int(x) for x in np.linspace(start = 600, stop = 2000, num = 15)]
  52. # Maximum number of levels in tree
  53. max_depth = [int(x) for x in np.linspace(10, 80, num = 8)]
  54. max_depth.append(None)
  55. # Method of selecting samples for training each tree
  56. bootstrap = [True, False]
  57. random_grid = {'n_neighbors': n_neighbors}
  58. # numpy's linspace() method can create a list of numbers by pre-defined start/stop. We can start the random search + cross-validation. When we have a basic regressor, we can use it as the value for the parameter estimator for RandomizedSearchCV(), and randomly try different combinations of hyperparameters we want to test.
  59. #
  60. # Code Listing 9.06. Use training data to process the randomized search and cross-validation.
  61. #
  62. # Random search of parameters, using 3 fold cross validation,
  63. # search across 10 different combinations, and use all available cores
  64. rf_random = RandomizedSearchCV(estimator = knn_10_regr, param_distributions = random_grid, n_iter = 10, cv = 3, n_jobs = -1)
  65. # Fit the random search model
  66. rf_random.fit(x_train, y_train)
  67. ---------------------------------------------------------------------------
  68. TypeError Traceback (most recent call last)
  69. D:\Users\psalm\AppData\Local\Temp/ipykernel_33000/966981705.py in <module>
  70. 7 rf_random = RandomizedSearchCV(estimator = knn_10_regr, param_distributions = random_grid, n_iter = 10, cv = 3, n_jobs = -1)
  71. 8 # Fit the random search model
  72. ----> 9 rf_random.fit(x_train, y_train)
  73.  
  74. C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
  75. 61 extra_args = len(args) - len(all_args)
  76. 62 if extra_args <= 0:
  77. ---> 63 return f(*args, **kwargs)
  78. 64
  79. 65 # extra_args > 0
  80.  
  81. C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
  82. 839 return results
  83. 840
  84. --> 841 self._run_search(evaluate_candidates)
  85. 842
  86. 843 # multimetric is determined here because in the case of a callable
  87.  
  88. C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
  89. 1631 def _run_search(self, evaluate_candidates):
  90. 1632 """Search n_iter candidates from param_distributions"""
  91. -> 1633 evaluate_candidates(ParameterSampler(
  92. 1634 self.param_distributions, self.n_iter,
  93. 1635 random_state=self.random_state))
  94.  
  95. C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
  96. 61 extra_args = len(args) - len(all_args)
  97. 62 if extra_args <= 0:
  98. ---> 63 return f(*args, **kwargs)
  99. 64
  100. 65 # extra_args > 0
  101.  
  102. C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in __init__(self, param_distributions, n_iter, random_state)
  103. 258 if (not isinstance(dist[key], Iterable)
  104. 259 and not hasattr(dist[key], 'rvs')):
  105. --> 260 raise TypeError('Parameter value is not iterable '
  106. 261 'or distribution (key={!r}, value={!r})'
  107. 262 .format(key, dist[key]))
  108.  
  109. TypeError: Parameter value is not iterable or distribution (key='n_neighbors', value=10)
Advertisement
Add Comment
Please, Sign In to add comment