Advertisement
Guest User

Untitled

a guest
Nov 20th, 2019
126
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.73 KB | None | 0 0
  1. {
  2. "cell_type": "code",
  3. "execution_count": 36,
  4. "metadata": {},
  5. "outputs": [
  6. {
  7. "name": "stdout",
  8. "output_type": "stream",
  9. "text": ".. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:** \n\n :Number of Instances: 506 \n\n :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n :Attribute Information (in order):\n - CRIM per capita crime rate by town\n - ZN proportion of residential land zoned for lots over 25,000 sq.ft.\n - INDUS proportion of non-retail business acres per town\n - CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n - NOX nitric oxides concentration (parts per 10 million)\n - RM average number of rooms per dwelling\n - AGE proportion of owner-occupied units built prior to 1940\n - DIS weighted distances to five Boston employment centres\n - RAD index of accessibility to radial highways\n - TAX full-value property-tax rate per $10,000\n - PTRATIO pupil-teacher ratio by town\n - B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n - LSTAT % lower status of the population\n - MEDV Median value of owner-occupied homes in $1000's\n\n :Missing Attribute Values: None\n\n :Creator: Harrison, D. and Rubinfeld, D.L.\n\nThis is a copy of UCI ML housing dataset.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/housing/\n\n\nThis dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.\n\nThe Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic\nprices and the demand for clean air', J. Environ. Economics & Management,\nvol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics\n...', Wiley, 1980. N.B. Various transformations are used in the table on\npages 244-261 of the latter.\n\nThe Boston house-price data has been used in many machine learning papers that address regression\nproblems. \n \n.. topic:: References\n\n - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.\n - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.\n\n"
  10. }
  11. ],
  12. "source": [
  13. "from sklearn.datasets import load_boston\n",
  14. "boston_market_data = load_boston()\n",
  15. "print(boston_market_data['DESCR'])"
  16. ]
  17. },
  18. {
  19. "cell_type": "code",
  20. "execution_count": 39,
  21. "metadata": {},
  22. "outputs": [],
  23. "source": [
  24. "from sklearn.model_selection import train_test_split\n",
  25. "boston_market_train_data, boston_market_test_data, \\\n",
  26. "boston_market_train_target, boston_market_test_target = \\\n",
  27. "train_test_split(boston_market_data['data'],boston_market_data['target'], test_size=0.1, random_state=10)"
  28. ]
  29. },
  30. {
  31. "cell_type": "code",
  32. "execution_count": 40,
  33. "metadata": {},
  34. "outputs": [
  35. {
  36. "name": "stdout",
  37. "output_type": "stream",
  38. "text": "Training dataset:\nboston_market_train_data: (455, 13)\nboston_market_train_target: (455,)\n"
  39. }
  40. ],
  41. "source": [
  42. "print(\"Training dataset:\")\n",
  43. "print(\"boston_market_train_data:\", boston_market_train_data.shape)\n",
  44. "print(\"boston_market_train_target:\", boston_market_train_target.shape)"
  45. ]
  46. },
  47. {
  48. "cell_type": "code",
  49. "execution_count": 41,
  50. "metadata": {},
  51. "outputs": [
  52. {
  53. "name": "stdout",
  54. "output_type": "stream",
  55. "text": "Training dataset:\nboston_market_test_data: (51, 13)\nboston_market_test_target: (51,)\n"
  56. }
  57. ],
  58. "source": [
  59. "print(\"Training dataset:\")\n",
  60. "print(\"boston_market_test_data:\", boston_market_test_data.shape)\n",
  61. "print(\"boston_market_test_target:\", boston_market_test_target.shape)"
  62. ]
  63. },
  64. {
  65. "cell_type": "code",
  66. "execution_count": 42,
  67. "metadata": {},
  68. "outputs": [
  69. {
  70. "data": {
  71. "text/plain": "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
  72. },
  73. "execution_count": 42,
  74. "metadata": {},
  75. "output_type": "execute_result"
  76. }
  77. ],
  78. "source": [
  79. "from sklearn.linear_model import LinearRegression\n",
  80. "\n",
  81. "linear_regression = LinearRegression()\n",
  82. "linear_regression.fit(boston_market_train_data, boston_market_train_target)"
  83. ]
  84. },
  85. {
  86. "cell_type": "code",
  87. "execution_count": 44,
  88. "metadata": {},
  89. "outputs": [
  90. {
  91. "name": "stdout",
  92. "output_type": "stream",
  93. "text": "Mean squared error of a learned model: 27.96\n"
  94. }
  95. ],
  96. "source": [
  97. "from sklearn.metrics import mean_squared_error\n",
  98. "print(\"Mean squared error of a learned model: %.2f\" % \n",
  99. " mean_squared_error(boston_market_test_target, linear_regression.predict(boston_market_test_data)))"
  100. ]
  101. },
  102. {
  103. "cell_type": "code",
  104. "execution_count": 46,
  105. "metadata": {},
  106. "outputs": [
  107. {
  108. "name": "stdout",
  109. "output_type": "stream",
  110. "text": "Variance score: 0.72\n"
  111. }
  112. ],
  113. "source": [
  114. "from sklearn.metrics import r2_score\n",
  115. "print('Variance score: %.2f' % r2_score(boston_market_test_target, linear_regression.predict(boston_market_test_data)))"
  116. ]
  117. },
  118. {
  119. "cell_type": "code",
  120. "execution_count": null,
  121. "metadata": {},
  122. "outputs": [],
  123. "source": []
  124. },
  125. {
  126. "cell_type": "code",
  127. "execution_count": 47,
  128. "metadata": {},
  129. "outputs": [
  130. {
  131. "name": "stdout",
  132. "output_type": "stream",
  133. "text": "[ 0.60217169 0.60398145 0.35873597 -1.10867706]\n"
  134. }
  135. ],
  136. "source": [
  137. "from sklearn.model_selection import cross_val_score\n",
  138. "scores = cross_val_score(LinearRegression(), boston_market_data['data'], boston_market_data['target'], cv=4)\n",
  139. "print(scores)"
  140. ]
  141. },
  142. {
  143. "cell_type": "code",
  144. "execution_count": 48,
  145. "metadata": {},
  146. "outputs": [
  147. {
  148. "name": "stdout",
  149. "output_type": "stream",
  150. "text": "Model predicted for data under index 5 value [16.73451257]\nReal value for data under index 5 is 14.3\n"
  151. }
  152. ],
  153. "source": [
  154. "id = 5\n",
  155. "linear_regression_prediction = linear_regression.predict(boston_market_test_data[id,:].reshape(1,-1))\n",
  156. "print(\"Model predicted for data under index {0} value {1}\".format(id, linear_regression_prediction))\n",
  157. "print(\"Real value for data under index {0} is {1}\".format(id, boston_market_test_target[id]))"
  158. ]
  159. },
  160. {
  161. "cell_type": "markdown",
  162. "metadata": {},
  163. "source": [
  164. "# References\n",
  165. "__ALL images (unless otherwise stated) are from book__: Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015, ISBN 1783555130\n",
  166. "\n",
  167. "If You are using Your own computer\n",
  168. "install required packages: \n",
  169. "```conda install numpy pandas scikit-learn matplotlib seaborn```"
  170. ]
  171. },
  172. {
  173. "cell_type": "code",
  174. "execution_count": null,
  175. "metadata": {},
  176. "outputs": [],
  177. "source": []
  178. }
  179. ],
  180. "metadata": {
  181. "kernelspec": {
  182. "display_name": "Python 3",
  183. "language": "python",
  184. "name": "python3"
  185. },
  186. "language_info": {
  187. "codemirror_mode": {
  188. "name": "ipython",
  189. "version": 3
  190. },
  191. "file_extension": ".py",
  192. "mimetype": "text/x-python",
  193. "name": "python",
  194. "nbconvert_exporter": "python",
  195. "pygments_lexer": "ipython3",
  196. "version": "3.7.5"
  197. }
  198. },
  199. "nbformat": 4,
  200. "nbformat_minor": 2
  201. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement