Advertisement
Guest User

Untitled

a guest
Aug 22nd, 2019
134
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.23 KB | None | 0 0
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Sklearn, XGBoost"
  8. ]
  9. },
  10. {
  11. "cell_type": "markdown",
  12. "metadata": {},
  13. "source": [
  14. "## sklearn.ensemble.RandomForestClassifier"
  15. ]
  16. },
  17. {
  18. "cell_type": "code",
  19. "execution_count": null,
  20. "metadata": {
  21. "collapsed": true
  22. },
  23. "outputs": [],
  24. "source": [
  25. "from sklearn import ensemble , cross_validation, learning_curve, metrics \n",
  26. "\n",
  27. "import numpy as np\n",
  28. "import pandas as pd\n",
  29. "import xgboost as xgb"
  30. ]
  31. },
  32. {
  33. "cell_type": "code",
  34. "execution_count": null,
  35. "metadata": {
  36. "collapsed": false
  37. },
  38. "outputs": [],
  39. "source": [
  40. "%pylab inline"
  41. ]
  42. },
  43. {
  44. "cell_type": "markdown",
  45. "metadata": {},
  46. "source": [
  47. "### Данные"
  48. ]
  49. },
  50. {
  51. "cell_type": "markdown",
  52. "metadata": {},
  53. "source": [
  54. "Задача на kaggle: https://www.kaggle.com/c/bioresponse\n",
  55. "\n",
  56. "Данные: https://www.kaggle.com/c/bioresponse/data\n",
  57. "\n",
  58. "По данным характеристикам молекулы требуется определить, будет ли дан биологический ответ (biological response).\n",
  59. "\n",
  60. "Признаки нормализаваны.\n",
  61. "\n",
  62. "Для демонстрации используется обучающая выборка из исходных данных train.csv, файл с данными прилагается."
  63. ]
  64. },
  65. {
  66. "cell_type": "code",
  67. "execution_count": null,
  68. "metadata": {
  69. "collapsed": true
  70. },
  71. "outputs": [],
  72. "source": [
  73. "bioresponce = pd.read_csv('bioresponse.csv', header=0, sep=',')"
  74. ]
  75. },
  76. {
  77. "cell_type": "code",
  78. "execution_count": null,
  79. "metadata": {
  80. "collapsed": false
  81. },
  82. "outputs": [],
  83. "source": [
  84. "bioresponce.head()"
  85. ]
  86. },
  87. {
  88. "cell_type": "code",
  89. "execution_count": null,
  90. "metadata": {
  91. "collapsed": true
  92. },
  93. "outputs": [],
  94. "source": [
  95. "bioresponce_target = bioresponce.Activity.values"
  96. ]
  97. },
  98. {
  99. "cell_type": "code",
  100. "execution_count": null,
  101. "metadata": {
  102. "collapsed": false
  103. },
  104. "outputs": [],
  105. "source": [
  106. "bioresponce_data = bioresponce.iloc[:, 1:]"
  107. ]
  108. },
  109. {
  110. "cell_type": "markdown",
  111. "metadata": {},
  112. "source": [
  113. "### Модель RandomForestClassifier"
  114. ]
  115. },
  116. {
  117. "cell_type": "markdown",
  118. "metadata": {},
  119. "source": [
  120. "#### Зависимость качества от количесвта деревьев"
  121. ]
  122. },
  123. {
  124. "cell_type": "code",
  125. "execution_count": null,
  126. "metadata": {
  127. "collapsed": false
  128. },
  129. "outputs": [],
  130. "source": [
  131. "n_trees = [1] + range(10, 55, 5) "
  132. ]
  133. },
  134. {
  135. "cell_type": "code",
  136. "execution_count": null,
  137. "metadata": {
  138. "collapsed": false
  139. },
  140. "outputs": [],
  141. "source": [
  142. "%%time\n",
  143. "scoring = []\n",
  144. "for n_tree in n_trees:\n",
  145. " estimator = ensemble.RandomForestClassifier(n_estimators = n_tree, min_samples_split=5, random_state=1)\n",
  146. " score = cross_validation.cross_val_score(estimator, bioresponce_data, bioresponce_target, \n",
  147. " scoring = 'accuracy', cv = 3) \n",
  148. " scoring.append(score)\n",
  149. "scoring = np.asmatrix(scoring)"
  150. ]
  151. },
  152. {
  153. "cell_type": "code",
  154. "execution_count": null,
  155. "metadata": {
  156. "collapsed": false
  157. },
  158. "outputs": [],
  159. "source": [
  160. "scoring"
  161. ]
  162. },
  163. {
  164. "cell_type": "code",
  165. "execution_count": null,
  166. "metadata": {
  167. "collapsed": false
  168. },
  169. "outputs": [],
  170. "source": [
  171. "pylab.plot(n_trees, scoring.mean(axis = 1), marker='.', label='RandomForest')\n",
  172. "pylab.grid(True)\n",
  173. "pylab.xlabel('n_trees')\n",
  174. "pylab.ylabel('score')\n",
  175. "pylab.title('Accuracy score')\n",
  176. "pylab.legend(loc='lower right')"
  177. ]
  178. },
  179. {
  180. "cell_type": "markdown",
  181. "metadata": {},
  182. "source": [
  183. "#### Кривые обучения для деревьев большей глубины"
  184. ]
  185. },
  186. {
  187. "cell_type": "code",
  188. "execution_count": null,
  189. "metadata": {
  190. "collapsed": false
  191. },
  192. "outputs": [],
  193. "source": [
  194. "%%time\n",
  195. "xgb_scoring = []\n",
  196. "for n_tree in n_trees:\n",
  197. " estimator = xgb.XGBClassifier(learning_rate=0.1, max_depth=5, n_estimators=n_tree, min_child_weight=3)\n",
  198. " score = cross_validation.cross_val_score(estimator, bioresponce_data, bioresponce_target, \n",
  199. " scoring = 'accuracy', cv = 3) \n",
  200. " xgb_scoring.append(score)\n",
  201. "xgb_scoring = np.asmatrix(xgb_scoring)"
  202. ]
  203. },
  204. {
  205. "cell_type": "code",
  206. "execution_count": null,
  207. "metadata": {
  208. "collapsed": false
  209. },
  210. "outputs": [],
  211. "source": [
  212. "xgb_scoring"
  213. ]
  214. },
  215. {
  216. "cell_type": "code",
  217. "execution_count": null,
  218. "metadata": {
  219. "collapsed": false
  220. },
  221. "outputs": [],
  222. "source": [
  223. "pylab.plot(n_trees, scoring.mean(axis = 1), marker='.', label='RandomForest')\n",
  224. "pylab.plot(n_trees, xgb_scoring.mean(axis = 1), marker='.', label='XGBoost')\n",
  225. "pylab.grid(True)\n",
  226. "pylab.xlabel('n_trees')\n",
  227. "pylab.ylabel('score')\n",
  228. "pylab.title('Accuracy score')\n",
  229. "pylab.legend(loc='lower right')"
  230. ]
  231. },
  232. {
  233. "cell_type": "markdown",
  234. "metadata": {},
  235. "source": [
  236. "#### **Если Вас заинтересовал xgboost:**\n",
  237. "python api: http://xgboost.readthedocs.org/en/latest/python/python_api.html\n",
  238. "\n",
  239. "установка: http://xgboost.readthedocs.io/en/latest/build.html"
  240. ]
  241. }
  242. ],
  243. "metadata": {
  244. "kernelspec": {
  245. "display_name": "Python 2",
  246. "language": "python",
  247. "name": "python2"
  248. },
  249. "language_info": {
  250. "codemirror_mode": {
  251. "name": "ipython",
  252. "version": 2
  253. },
  254. "file_extension": ".py",
  255. "mimetype": "text/x-python",
  256. "name": "python",
  257. "nbconvert_exporter": "python",
  258. "pygments_lexer": "ipython2",
  259. "version": "2.7.12"
  260. }
  261. },
  262. "nbformat": 4,
  263. "nbformat_minor": 0
  264. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement