Guest User

Untitled

a guest
Jul 17th, 2018
91
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 18.94 KB | None | 0 0
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# <font color='black'>Machine Learning for Game Balancing</font> "
  8. ]
  9. },
  10. {
  11. "cell_type": "markdown",
  12. "metadata": {},
  13. "source": [
  14. "## <font color='black'>Step 1: Importing Libraries and Reading Data</font> "
  15. ]
  16. },
  17. {
  18. "cell_type": "markdown",
  19. "metadata": {},
  20. "source": [
  21. "The first step is to import pandas so we can work with the excel file as a dataframe. Make sure you have pandas, xlrd, and sci-kit learn installed!"
  22. ]
  23. },
  24. {
  25. "cell_type": "code",
  26. "execution_count": 1,
  27. "metadata": {},
  28. "outputs": [],
  29. "source": [
  30. "import pandas as pd\n",
  31. "match_data = pd.read_excel('match_data.xlsx')"
  32. ]
  33. },
  34. {
  35. "cell_type": "code",
  36. "execution_count": 2,
  37. "metadata": {},
  38. "outputs": [
  39. {
  40. "data": {
  41. "text/html": [
  42. "<div>\n",
  43. "<style scoped>\n",
  44. " .dataframe tbody tr th:only-of-type {\n",
  45. " vertical-align: middle;\n",
  46. " }\n",
  47. "\n",
  48. " .dataframe tbody tr th {\n",
  49. " vertical-align: top;\n",
  50. " }\n",
  51. "\n",
  52. " .dataframe thead th {\n",
  53. " text-align: right;\n",
  54. " }\n",
  55. "</style>\n",
  56. "<table border=\"1\" class=\"dataframe\">\n",
  57. " <thead>\n",
  58. " <tr style=\"text-align: right;\">\n",
  59. " <th></th>\n",
  60. " <th>Player A Class</th>\n",
  61. " <th>Player A Style</th>\n",
  62. " <th>Player B Class</th>\n",
  63. " <th>Player B Style</th>\n",
  64. " <th>Player A Win</th>\n",
  65. " </tr>\n",
  66. " </thead>\n",
  67. " <tbody>\n",
  68. " <tr>\n",
  69. " <th>0</th>\n",
  70. " <td>Warrior</td>\n",
  71. " <td>Neutral</td>\n",
  72. " <td>Rouge</td>\n",
  73. " <td>Neutral</td>\n",
  74. " <td>0</td>\n",
  75. " </tr>\n",
  76. " <tr>\n",
  77. " <th>1</th>\n",
  78. " <td>Warrior</td>\n",
  79. " <td>Offensive</td>\n",
  80. " <td>Rouge</td>\n",
  81. " <td>Offensive</td>\n",
  82. " <td>1</td>\n",
  83. " </tr>\n",
  84. " <tr>\n",
  85. " <th>2</th>\n",
  86. " <td>Warrior</td>\n",
  87. " <td>Defensive</td>\n",
  88. " <td>Rouge</td>\n",
  89. " <td>Defensive</td>\n",
  90. " <td>1</td>\n",
  91. " </tr>\n",
  92. " <tr>\n",
  93. " <th>3</th>\n",
  94. " <td>Warrior</td>\n",
  95. " <td>Neutral</td>\n",
  96. " <td>Rouge</td>\n",
  97. " <td>Neutral</td>\n",
  98. " <td>0</td>\n",
  99. " </tr>\n",
  100. " <tr>\n",
  101. " <th>4</th>\n",
  102. " <td>Warrior</td>\n",
  103. " <td>Offensive</td>\n",
  104. " <td>Rouge</td>\n",
  105. " <td>Offensive</td>\n",
  106. " <td>1</td>\n",
  107. " </tr>\n",
  108. " </tbody>\n",
  109. "</table>\n",
  110. "</div>"
  111. ],
  112. "text/plain": [
  113. " Player A Class Player A Style Player B Class Player B Style Player A Win\n",
  114. "0 Warrior Neutral Rouge Neutral 0\n",
  115. "1 Warrior Offensive Rouge Offensive 1\n",
  116. "2 Warrior Defensive Rouge Defensive 1\n",
  117. "3 Warrior Neutral Rouge Neutral 0\n",
  118. "4 Warrior Offensive Rouge Offensive 1"
  119. ]
  120. },
  121. "execution_count": 2,
  122. "metadata": {},
  123. "output_type": "execute_result"
  124. }
  125. ],
  126. "source": [
  127. "match_data.head()"
  128. ]
  129. },
  130. {
  131. "cell_type": "markdown",
  132. "metadata": {},
  133. "source": [
  134. "## <font color='black'>Step 2: Creating Dummy Variables and Data Concatenation</font> "
  135. ]
  136. },
  137. {
  138. "cell_type": "markdown",
  139. "metadata": {},
  140. "source": [
  141. "The next step is to create dummy variables of the features we want to implement into our model. For this example, we will need dummy variables of all qualitative features in our data set. We will start by grouping the data by class, by player, and creating the dataframe class_data."
  142. ]
  143. },
  144. {
  145. "cell_type": "code",
  146. "execution_count": 3,
  147. "metadata": {},
  148. "outputs": [
  149. {
  150. "data": {
  151. "text/html": [
  152. "<div>\n",
  153. "<style scoped>\n",
  154. " .dataframe tbody tr th:only-of-type {\n",
  155. " vertical-align: middle;\n",
  156. " }\n",
  157. "\n",
  158. " .dataframe tbody tr th {\n",
  159. " vertical-align: top;\n",
  160. " }\n",
  161. "\n",
  162. " .dataframe thead th {\n",
  163. " text-align: right;\n",
  164. " }\n",
  165. "</style>\n",
  166. "<table border=\"1\" class=\"dataframe\">\n",
  167. " <thead>\n",
  168. " <tr style=\"text-align: right;\">\n",
  169. " <th></th>\n",
  170. " <th>Rouge</th>\n",
  171. " <th>Warrior</th>\n",
  172. " <th>Rouge</th>\n",
  173. " <th>Warrior</th>\n",
  174. " </tr>\n",
  175. " </thead>\n",
  176. " <tbody>\n",
  177. " <tr>\n",
  178. " <th>0</th>\n",
  179. " <td>0</td>\n",
  180. " <td>1</td>\n",
  181. " <td>1</td>\n",
  182. " <td>0</td>\n",
  183. " </tr>\n",
  184. " <tr>\n",
  185. " <th>1</th>\n",
  186. " <td>0</td>\n",
  187. " <td>1</td>\n",
  188. " <td>1</td>\n",
  189. " <td>0</td>\n",
  190. " </tr>\n",
  191. " <tr>\n",
  192. " <th>2</th>\n",
  193. " <td>0</td>\n",
  194. " <td>1</td>\n",
  195. " <td>1</td>\n",
  196. " <td>0</td>\n",
  197. " </tr>\n",
  198. " <tr>\n",
  199. " <th>3</th>\n",
  200. " <td>0</td>\n",
  201. " <td>1</td>\n",
  202. " <td>1</td>\n",
  203. " <td>0</td>\n",
  204. " </tr>\n",
  205. " <tr>\n",
  206. " <th>4</th>\n",
  207. " <td>0</td>\n",
  208. " <td>1</td>\n",
  209. " <td>1</td>\n",
  210. " <td>0</td>\n",
  211. " </tr>\n",
  212. " </tbody>\n",
  213. "</table>\n",
  214. "</div>"
  215. ],
  216. "text/plain": [
  217. " Rouge Warrior Rouge Warrior\n",
  218. "0 0 1 1 0\n",
  219. "1 0 1 1 0\n",
  220. "2 0 1 1 0\n",
  221. "3 0 1 1 0\n",
  222. "4 0 1 1 0"
  223. ]
  224. },
  225. "execution_count": 3,
  226. "metadata": {},
  227. "output_type": "execute_result"
  228. }
  229. ],
  230. "source": [
  231. "class_a_dummies = pd.get_dummies(match_data['Player A Class'], drop_first=True)\n",
  232. "class_b_dummies = pd.get_dummies(match_data['Player B Class'], drop_first=True)\n",
  233. "style_a_dummies = pd.get_dummies(match_data['Player A Style'], drop_first=True)\n",
  234. "style_b_dummies = pd.get_dummies(match_data['Player B Style'], drop_first=True)\n",
  235. "class_data = pd.concat([class_a_dummies, class_b_dummies], axis=1)\n",
  236. "class_data.head()"
  237. ]
  238. },
  239. {
  240. "cell_type": "markdown",
  241. "metadata": {},
  242. "source": [
  243. "## <font color='black'>Step 3: Train Test Splits</font> "
  244. ]
  245. },
  246. {
  247. "cell_type": "markdown",
  248. "metadata": {},
  249. "source": [
  250. "Next, we are going to split our data into training and testing sets. We will need to import this from sklearn. A popular split is to use 70% of the data to train the model, and 30% of the data to test the model. "
  251. ]
  252. },
  253. {
  254. "cell_type": "code",
  255. "execution_count": 4,
  256. "metadata": {},
  257. "outputs": [],
  258. "source": [
  259. "from sklearn.model_selection import train_test_split\n",
  260. "X = class_data # The matchup as an input for our model\n",
  261. "y = match_data['Player A Win'] # The outcome of each matchup\n",
  262. "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3) # Creation of actual train / testing data"
  263. ]
  264. },
  265. {
  266. "cell_type": "markdown",
  267. "metadata": {},
  268. "source": [
  269. "## <font color='black'>Step 4: Model Selection</font> "
  270. ]
  271. },
  272. {
  273. "cell_type": "markdown",
  274. "metadata": {},
  275. "source": [
  276. "As explained in the post, we will be using a classification support vector machine. We will need to import this from sklearn. We will create an object of the model, and then train it with our testing data. For more information on this, read the support vector machine section in step 4 of the post."
  277. ]
  278. },
  279. {
  280. "cell_type": "code",
  281. "execution_count": 5,
  282. "metadata": {},
  283. "outputs": [
  284. {
  285. "data": {
  286. "text/plain": [
  287. "SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n",
  288. " decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',\n",
  289. " max_iter=-1, probability=False, random_state=None, shrinking=True,\n",
  290. " tol=0.001, verbose=False)"
  291. ]
  292. },
  293. "execution_count": 5,
  294. "metadata": {},
  295. "output_type": "execute_result"
  296. }
  297. ],
  298. "source": [
  299. "from sklearn.svm import SVC\n",
  300. "svc = SVC() # create an object of the model\n",
  301. "svc.fit(X_train, y_train) # train the model on our training data"
  302. ]
  303. },
  304. {
  305. "cell_type": "markdown",
  306. "metadata": {},
  307. "source": [
  308. "## <font color='black'>Step 5: Predicting Match-up Outcomes</font> "
  309. ]
  310. },
  311. {
  312. "cell_type": "markdown",
  313. "metadata": {},
  314. "source": [
  315. "Using our trained model we will now create predictions for matchups, and view the accuracy of our model through a classification report. We will import the classification report from sklearn, and assign the predictions to the variable pred"
  316. ]
  317. },
  318. {
  319. "cell_type": "code",
  320. "execution_count": 6,
  321. "metadata": {},
  322. "outputs": [
  323. {
  324. "name": "stdout",
  325. "output_type": "stream",
  326. "text": [
  327. " precision recall f1-score support\n",
  328. "\n",
  329. " 0 0.66 0.66 0.66 121\n",
  330. " 1 0.65 0.65 0.65 117\n",
  331. "\n",
  332. "avg / total 0.66 0.66 0.66 238\n",
  333. "\n"
  334. ]
  335. }
  336. ],
  337. "source": [
  338. "from sklearn.metrics import classification_report\n",
  339. "pred = svc.predict(X_test) # test the model on our testing data\n",
  340. "print(classification_report(y_test, pred))"
  341. ]
  342. },
  343. {
  344. "cell_type": "markdown",
  345. "metadata": {},
  346. "source": [
  347. "We should approximately 65% model accuracy. Now, what can we infer based on this accuracy? The model can predict above 50%, so is our game unbalanced? Or does this mean our game is balanced? Well since the model can predict with a decent accuracy we are aware that there is a trend in our data. One of two things could be occurring. 1.) The classes are balanced to a degree of satisfaction, and certain match ups are favored over others. 2.) A certain class is under-powered, or over-powered, leading to higher predictability of outcomes. Let's dive a bit deeper to see if we can determine which one is true."
  348. ]
  349. },
  350. {
  351. "cell_type": "markdown",
  352. "metadata": {},
  353. "source": [
  354. "## <font color='black'>Step 6: Deeper Analysis on Match-ups</font> "
  355. ]
  356. },
  357. {
  358. "cell_type": "markdown",
  359. "metadata": {},
  360. "source": [
  361. "Now that we know there is some predictability to match ups, lets take a look and see whether or not two classes exhibit more predictability. Lets take a look at all of the warrior and mage match ups and create a model to predict the outcome of these matches specifically."
  362. ]
  363. },
  364. {
  365. "cell_type": "code",
  366. "execution_count": 7,
  367. "metadata": {},
  368. "outputs": [],
  369. "source": [
  370. "warrior_mage = match_data[(match_data['Player A Class'] \n",
  371. " == 'Warrior') | (match_data['Player A Class'] == 'Mage')]"
  372. ]
  373. },
  374. {
  375. "cell_type": "code",
  376. "execution_count": 8,
  377. "metadata": {},
  378. "outputs": [
  379. {
  380. "data": {
  381. "text/html": [
  382. "<div>\n",
  383. "<style scoped>\n",
  384. " .dataframe tbody tr th:only-of-type {\n",
  385. " vertical-align: middle;\n",
  386. " }\n",
  387. "\n",
  388. " .dataframe tbody tr th {\n",
  389. " vertical-align: top;\n",
  390. " }\n",
  391. "\n",
  392. " .dataframe thead th {\n",
  393. " text-align: right;\n",
  394. " }\n",
  395. "</style>\n",
  396. "<table border=\"1\" class=\"dataframe\">\n",
  397. " <thead>\n",
  398. " <tr style=\"text-align: right;\">\n",
  399. " <th></th>\n",
  400. " <th>Player A Class</th>\n",
  401. " <th>Player A Style</th>\n",
  402. " <th>Player B Class</th>\n",
  403. " <th>Player B Style</th>\n",
  404. " <th>Player A Win</th>\n",
  405. " </tr>\n",
  406. " </thead>\n",
  407. " <tbody>\n",
  408. " <tr>\n",
  409. " <th>0</th>\n",
  410. " <td>Warrior</td>\n",
  411. " <td>Neutral</td>\n",
  412. " <td>Rouge</td>\n",
  413. " <td>Neutral</td>\n",
  414. " <td>0</td>\n",
  415. " </tr>\n",
  416. " <tr>\n",
  417. " <th>1</th>\n",
  418. " <td>Warrior</td>\n",
  419. " <td>Offensive</td>\n",
  420. " <td>Rouge</td>\n",
  421. " <td>Offensive</td>\n",
  422. " <td>1</td>\n",
  423. " </tr>\n",
  424. " <tr>\n",
  425. " <th>2</th>\n",
  426. " <td>Warrior</td>\n",
  427. " <td>Defensive</td>\n",
  428. " <td>Rouge</td>\n",
  429. " <td>Defensive</td>\n",
  430. " <td>1</td>\n",
  431. " </tr>\n",
  432. " <tr>\n",
  433. " <th>3</th>\n",
  434. " <td>Warrior</td>\n",
  435. " <td>Neutral</td>\n",
  436. " <td>Rouge</td>\n",
  437. " <td>Neutral</td>\n",
  438. " <td>0</td>\n",
  439. " </tr>\n",
  440. " <tr>\n",
  441. " <th>4</th>\n",
  442. " <td>Warrior</td>\n",
  443. " <td>Offensive</td>\n",
  444. " <td>Rouge</td>\n",
  445. " <td>Offensive</td>\n",
  446. " <td>1</td>\n",
  447. " </tr>\n",
  448. " </tbody>\n",
  449. "</table>\n",
  450. "</div>"
  451. ],
  452. "text/plain": [
  453. " Player A Class Player A Style Player B Class Player B Style Player A Win\n",
  454. "0 Warrior Neutral Rouge Neutral 0\n",
  455. "1 Warrior Offensive Rouge Offensive 1\n",
  456. "2 Warrior Defensive Rouge Defensive 1\n",
  457. "3 Warrior Neutral Rouge Neutral 0\n",
  458. "4 Warrior Offensive Rouge Offensive 1"
  459. ]
  460. },
  461. "execution_count": 8,
  462. "metadata": {},
  463. "output_type": "execute_result"
  464. }
  465. ],
  466. "source": [
  467. "warrior_mage.head()"
  468. ]
  469. },
  470. {
  471. "cell_type": "code",
  472. "execution_count": 9,
  473. "metadata": {},
  474. "outputs": [
  475. {
  476. "name": "stdout",
  477. "output_type": "stream",
  478. "text": [
  479. " precision recall f1-score support\n",
  480. "\n",
  481. " 0 0.59 0.64 0.61 78\n",
  482. " 1 0.62 0.57 0.59 81\n",
  483. "\n",
  484. "avg / total 0.61 0.60 0.60 159\n",
  485. "\n"
  486. ]
  487. }
  488. ],
  489. "source": [
  490. "class_a_dummies = pd.get_dummies(warrior_mage['Player A Class'],drop_first=True)\n",
  491. "class_b_dummies = pd.get_dummies(warrior_mage['Player B Class'],drop_first=True)\n",
  492. "\n",
  493. "aggregate_data = pd.concat([class_a_dummies, class_b_dummies], axis=1)\n",
  494. "\n",
  495. "X = aggregate_data\n",
  496. "y = warrior_mage['Player A Win']\n",
  497. "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)\n",
  498. "\n",
  499. "svc = SVC()\n",
  500. "svc.fit(X_train, y_train)\n",
  501. "pred = svc.predict(X_test)\n",
  502. "print(classification_report(y_test, pred))"
  503. ]
  504. },
  505. {
  506. "cell_type": "markdown",
  507. "metadata": {},
  508. "source": [
  509. "By repeating the steps above on the Warrior and Mage matches we can see that we come up with roughly the same results, there is a degree of predictability within each match up, but we aren't certain as to the underlying cause of this predictability."
  510. ]
  511. },
  512. {
  513. "cell_type": "code",
  514. "execution_count": 10,
  515. "metadata": {},
  516. "outputs": [
  517. {
  518. "name": "stdout",
  519. "output_type": "stream",
  520. "text": [
  521. " precision recall f1-score support\n",
  522. "\n",
  523. " 0 1.00 0.82 0.90 76\n",
  524. " 1 0.86 1.00 0.92 83\n",
  525. "\n",
  526. "avg / total 0.92 0.91 0.91 159\n",
  527. "\n"
  528. ]
  529. }
  530. ],
  531. "source": [
  532. "style_a_dummies = pd.get_dummies(warrior_mage['Player A Style'], drop_first=True)\n",
  533. "style_b_dummies = pd.get_dummies(warrior_mage['Player B Style'], drop_first=True)\n",
  534. "aggregate_data = pd.concat([class_a_dummies, class_b_dummies, style_a_dummies, style_b_dummies], axis=1)\n",
  535. "\n",
  536. "X = aggregate_data\n",
  537. "y = warrior_mage['Player A Win']\n",
  538. "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)\n",
  539. "\n",
  540. "svc = SVC()\n",
  541. "svc.fit(X_train, y_train)\n",
  542. "pred = svc.predict(X_test)\n",
  543. "print(classification_report(y_test, pred))"
  544. ]
  545. },
  546. {
  547. "cell_type": "markdown",
  548. "metadata": {},
  549. "source": [
  550. "By repeating the steps above on the Warrior and Mage matches, but including the style of play as an input, we get a much higher degree of accuracy! Increasing our predictability within each match up. Now we can infer that the reason there is predictability in certain match ups, is due to the style of play of each class."
  551. ]
  552. },
  553. {
  554. "cell_type": "markdown",
  555. "metadata": {},
  556. "source": [
  557. "## <font color='black'>Conclusion</font> "
  558. ]
  559. },
  560. {
  561. "cell_type": "markdown",
  562. "metadata": {},
  563. "source": [
  564. "This is, in my mind at least, an interesting perspective to take on the accuracy of model predictions within machine learning. We don't necessarily want to feature engineer our data set to get higher accuracy within a model, but instead, we want to understand why the model is producing the outputs it is. By breaking down the data by class, and eventually by class and play style, we can observe certain styles have an advantage over others. This is where balancing comes into play, we have the ability to view the model's accuracy as a quantitative means of balance. Allow me to present this in the form of another example. Suppose we observe that Mages tend to always win against Warriors within most styles of play, but we observe that Warriors can sometimes beat out Mages, regardless of mage style, when Warriors play offensively. The developer then has the ability to buff the Warriors offensive capabilities and see the overall affect on the game by comparing the model's accuracy before the balance change and after the balance change. Furthermore, the developer can keep track of the overall state of the game based on model accuracy as parameter of game balance.\n"
  565. ]
  566. }
  567. ],
  568. "metadata": {
  569. "kernelspec": {
  570. "display_name": "Python 3",
  571. "language": "python",
  572. "name": "python3"
  573. },
  574. "language_info": {
  575. "codemirror_mode": {
  576. "name": "ipython",
  577. "version": 3
  578. },
  579. "file_extension": ".py",
  580. "mimetype": "text/x-python",
  581. "name": "python",
  582. "nbconvert_exporter": "python",
  583. "pygments_lexer": "ipython3",
  584. "version": "3.6.5"
  585. }
  586. },
  587. "nbformat": 4,
  588. "nbformat_minor": 2
  589. }
Add Comment
Please, Sign In to add comment