Advertisement
Guest User

Untitled

a guest
Dec 18th, 2014
138
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 28.45 KB | None | 0 0
  1. {
  2. "worksheets": [
  3. {
  4. "cells": [
  5. {
  6. "metadata": {},
  7. "cell_type": "heading",
  8. "source": "Intro:",
  9. "level": 1
  10. },
  11. {
  12. "metadata": {},
  13. "cell_type": "markdown",
  14. "source": "####Purpose: This notebook provides scaffolding to support the creation of a data model or ontology from the contents of IPython notebooks.\n\nThe best data models or ontologies of an organizing system are going to be those designed by the developers of said system. This IPython notebook is designed to interactively support the creation of a data model or ontology by scanning notebooks for concepts that that it may be instructive to include in a model. This notebook reads the example IPython notebooks in BigBang and processes their underlying JSON structure, computing some key statistics about each notebook and processing candidates for description. Candidates for description are then compared with those listed in a CSV file, if they are not listed on the CSV file (perhaps because they are additions to the notebooks), the user of this notebook will be prompted to flag them to add to a description file, or pass over them. Then, candidates flagged for addition are presented to the user for a description. The description is then saved to another CSV file. \n\nThis creates the groundwork for a maintenance notebook that can assist in ensuring notebooks are well-commented, adding to the afforded understandability and accessibility of the notebooks, and well-documented at the more abstract level of a model. \n\nThis may serve as a forcing function to push function definitions out of the IPython notebooks and into the library code. It may identify not only those key aspects of the BigBang ecosystem that deserve descriptions, but also identify parameters that may be easily adjusted by users experimenting with the capabilities of BigBang. \n\nThis notebook also very naively extracts tokens used in the markdown cells of the notebooks. This could be expanded to help maintain a controlled vocabulary throughout the notebooks, in addition to controls over function and variable names in the code. This can serve as a check not just for controlled vocabulary, but also to gauge whether interaction and experimentation is explicitly discussed in the markup of various notebooks. Extended appropriately, this could support editing language used throughout the notebooks or support the addition of comments in-line.\n\n###Interactions:\n- ####1: [Run audit_notebook to compute statistics on each notebook in BigBang examples](#1:) \n- ####2: [Extract key aspects of a specified notebook](#2:) \n- ####3: [Interactively approve or reject candidates for description](#3:) \n- ####4: [Interactively provide descriptions of candidates](#4:) "
  15. },
  16. {
  17. "metadata": {},
  18. "cell_type": "code",
  19. "input": "# To Run:\n# Load this notebook to the bigbang examples folder. Run from bigbang as home directory.\n\n# This notebook can be tested outside of the BigBang framework, by situating a notebook \n# to be audited in an examples folder or by changing the filepath in the last line of this cell\n\nimport csv\nfrom __future__ import division\nfrom IPython.nbformat import current as nbformat\nfrom IPython.nbconvert import PythonExporter\nfrom os import listdir\n\n# Loads the names of all IPython notebooks in the examples directory:\nnotebooks = [f for f in listdir('../examples/') if f.endswith('ipynb')]",
  20. "prompt_number": 1,
  21. "outputs": [],
  22. "language": "python",
  23. "trusted": true,
  24. "collapsed": false
  25. },
  26. {
  27. "metadata": {},
  28. "cell_type": "markdown",
  29. "source": "####The audit_notebook function extracts key statistics and lines with variables defined, numerical parameters, and functions when defined."
  30. },
  31. {
  32. "metadata": {},
  33. "cell_type": "code",
  34. "input": "def audit_notebook(notebook, full = False):\n # Create path for notebook:\n filepath = '../examples/'+notebook\n \n # Open notebook and read as JSON\n with open(filepath) as fh:\n nb = nbformat.reads_json(fh.read())\n \n \n # Extract information from code cells:\n code_cells = []\n for each in nb[u'worksheets'][0]['cells']:\n if each['cell_type'] == 'code':\n code_cells.append(each['input'].encode('UTF-8'))\n lines = []\n for cell in code_cells:\n lines.extend(cell.splitlines())\n token_list = []\n import_list = []\n function_list = []\n var_list = []\n numerical_parameters = []\n commented_line = 0\n stripped_lines = [line.strip() for line in lines]\n for line in stripped_lines:\n comment_flag = False\n\n tokens = line.split()\n if len(tokens) > 0:\n \n # Find all imports\n if tokens[0] == 'import' or tokens[0] == 'from':\n import_list.append(' '.join(tokens))\n \n # Find all functions\n if tokens[0] == 'def':\n function_list.append(' '.join(tokens))\n \n else:\n if len(tokens) > 1:\n for t in tokens:\n if t[0] == '#':\n commented_line += 1\n comment_flag = True\n break\n if not comment_flag: \n # Find defined variables\n if tokens[1] == '=':\n var_list.append(tokens[0])\n if not comment_flag:\n for t in tokens:\n\n # Find numerical parameters:\n for n, char in enumerate(t):\n if char.isdigit():\n if len(t) > 1 and not t[n-1].isalpha():\n numerical_parameters.append(line)\n else:\n token_list.append(t)\n\n # Extract markdown source and identify individual tokens \n markdown_cells = []\n for each in nb[u'worksheets'][0]['cells']:\n if each['cell_type'] == 'markdown':\n markdown_cells.append(each['source'].encode('UTF-8'))\n markdown_tokens = []\n for markdown in markdown_cells:\n markdown_tokens.extend(markdown.split())\n \n # Determine total lines of code:\n lines_of_code = len([l for l in lines if len(l) > 0])\n\n # Create printout\n print 'Notebook: %s' % notebook\n print '\\t%s imports' % len(import_list)\n print '\\t%s function(s)' % len(function_list)\n print '\\t%s lines of code' % lines_of_code\n print '\\t%s variables defined' % len(var_list)\n print '\\t%s numerical parameters determined' % len(numerical_parameters)\n print '\\t%s%% of lines are commented' % round(commented_line/lines_of_code*100,2)\n print '\\t%s words in markdown comments per line' % round(len(markdown_tokens)/lines_of_code,2)\n print\n \n if full:\n print 'Imports:'\n for import_ in import_list:\n print '\\t'+import_\n print\n if len(function_list) > 0:\n print 'Functions:'\n for function in function_list:\n print '\\t'+function[4:-1]\n print\n print 'Defined Variables:'\n for var in var_list:\n print '\\t'+var\n print\n print 'Numerical Parameters:'\n for numpar in numerical_parameters:\n print '\\t'+numpar\n return {'var':var_list, 'function':function_list, 'numpar':numerical_parameters}",
  35. "prompt_number": 2,
  36. "outputs": [],
  37. "language": "python",
  38. "trusted": true,
  39. "collapsed": false
  40. },
  41. {
  42. "metadata": {},
  43. "cell_type": "heading",
  44. "source": "1:",
  45. "level": 1
  46. },
  47. {
  48. "metadata": {},
  49. "cell_type": "markdown",
  50. "source": "##Run this to see key statistics describing each notebook in BigBang examples:\n[return to Intro](#Intro:)"
  51. },
  52. {
  53. "metadata": {},
  54. "cell_type": "code",
  55. "input": "for notebook in notebooks:\n audit_notebook(notebook)",
  56. "prompt_number": 3,
  57. "outputs": [
  58. {
  59. "output_type": "stream",
  60. "text": "Notebook: Analyze Senders.ipynb\n\t12 imports\n\t0 function(s)\n\t66 lines of code\n\t13 variables defined\n\t49 numerical parameters determined\n\t13.64% of lines are commented\n\t6.68 words in markdown comments per line\n\nNotebook: Auditing Fernando.ipynb\n\t4 imports\n\t0 function(s)\n\t23 lines of code\n\t10 variables defined\n\t15 numerical parameters determined\n\t0.0% of lines are commented\n\t9.26 words in markdown comments per line\n\nNotebook: Cohort Visualization.ipynb\n\t5 imports\n\t0 function(s)\n\t38 lines of code\n\t12 variables defined\n\t34 numerical parameters determined\n\t23.68% of lines are commented\n\t8.08 words in markdown comments per line\n\nNotebook: Corr between centrality and community 0.1.ipynb\n\t13 imports\n\t1 function(s)\n\t95 lines of code\n\t27 variables defined\n\t28 numerical parameters determined\n\t9.47% of lines are commented\n\t0.85 words in markdown comments per line\n\nNotebook: Git Collection.ipynb\n\t6 imports\n\t1 function(s)\n\t59 lines of code\n\t34 variables defined\n\t48 numerical parameters determined\n\t3.39% of lines are commented\n\t6.34 words in markdown comments per line\n\nNotebook: model_development.ipynb\n\t5 imports\n\t1 function(s)\n\t187 lines of code\n\t39 variables defined\n\t36 numerical parameters determined\n\t13.37% of lines are commented\n\t2.64 words in markdown comments per line\n\nNotebook: Plot Activity.ipynb\n\t14 imports\n\t0 function(s)\n\t81 lines of code\n\t15 variables defined\n\t53 numerical parameters determined\n\t20.99% of lines are commented\n\t5.02 words in markdown comments per line\n\nNotebook: Show Interaction Graph.ipynb\n\t10 imports\n\t2 function(s)\n\t59 lines of code\n\t15 variables defined\n\t32 numerical parameters determined\n\t8.47% of lines are commented\n\t2.8 words in markdown comments per line\n\nNotebook: Threads-research-in-progress.ipynb\n\t6 imports\n\t0 function(s)\n\t82 lines of code\n\t20 variables defined\n\t29 numerical parameters determined\n\t1.22% of lines are commented\n\t1.63 words in markdown comments per line\n\nNotebook: Threads.ipynb\n\t5 imports\n\t0 function(s)\n\t38 lines of code\n\t9 variables defined\n\t12 numerical parameters determined\n\t2.63% of lines are commented\n\t4.34 words in markdown comments per line\n\n",
  61. "stream": "stdout"
  62. }
  63. ],
  64. "language": "python",
  65. "trusted": true,
  66. "collapsed": false
  67. },
  68. {
  69. "metadata": {},
  70. "cell_type": "heading",
  71. "source": "2:",
  72. "level": 1
  73. },
  74. {
  75. "metadata": {},
  76. "cell_type": "markdown",
  77. "source": "##Run this to see extract key aspects of a specific notebook:\n[return to Intro](#Intro:)"
  78. },
  79. {
  80. "metadata": {},
  81. "cell_type": "code",
  82. "input": "# Specify any of the notebooks listed in the output of this cell as the notebook_name in the cell below:\nprint notebooks",
  83. "prompt_number": 5,
  84. "outputs": [
  85. {
  86. "output_type": "stream",
  87. "text": "['Analyze Senders.ipynb', 'Auditing Fernando.ipynb', 'Cohort Visualization.ipynb', 'Corr between centrality and community 0.1.ipynb', 'Git Collection.ipynb', 'model_development.ipynb', 'Plot Activity.ipynb', 'Show Interaction Graph.ipynb', 'Threads-research-in-progress.ipynb', 'Threads.ipynb']\n",
  88. "stream": "stdout"
  89. }
  90. ],
  91. "language": "python",
  92. "trusted": true,
  93. "collapsed": false
  94. },
  95. {
  96. "metadata": {},
  97. "cell_type": "code",
  98. "input": "# If you want to extract the imports, defined variables, or lines of numerical parameters\n# change the notebook name below:\nnotebook_name = \"Analyze Senders.ipynb\"\nmd_dict = audit_notebook(notebook_name, full = True)",
  99. "prompt_number": 6,
  100. "outputs": [
  101. {
  102. "output_type": "stream",
  103. "text": "Notebook: Analyze Senders.ipynb\n\t12 imports\n\t0 function(s)\n\t66 lines of code\n\t13 variables defined\n\t49 numerical parameters determined\n\t13.64% of lines are commented\n\t6.68 words in markdown comments per line\n\nImports:\n\timport bigbang.mailman as mailman\n\timport bigbang.graph as graph\n\timport bigbang.process as process\n\tfrom bigbang.parse import get_date\n\timport pandas as pd\n\timport datetime\n\timport matplotlib.pyplot as plt\n\timport numpy as np\n\timport math\n\timport pytz\n\timport pickle\n\timport os\n\n\nDefined Variables:\n\turls\n\tmlists\n\tactivities\n\tfig\n\tconsolidates\n\tc\n\tlev_c\n\tlevc_corner\n\tfig\n\ttc\n\tgrouped\n\tdomain_groups\n\tdomain_messages_sum\n\nNumerical Parameters:\n\tta[-10:].plot(kind='barh', width=1)\n\tta[-10:].plot(kind='barh', width=1)\n\tta[-10:].plot(kind='barh', width=1)\n\tfig = plt.figure(figsize=(15, 12))\n\tfig = plt.figure(figsize=(15, 12))\n\tfig = plt.figure(figsize=(15, 12))\n\tfig = plt.figure(figsize=(15, 12))\n\tplt.yticks(np.arange(0.5, len(levdf_corner.index), 1), levdf_corner.index)\n\tplt.yticks(np.arange(0.5, len(levdf_corner.index), 1), levdf_corner.index)\n\tplt.yticks(np.arange(0.5, len(levdf_corner.index), 1), levdf_corner.index)\n\tplt.xticks(np.arange(0.5, len(levdf_corner.columns), 1), levdf_corner.columns, rotation='vertical')\n\tplt.xticks(np.arange(0.5, len(levdf_corner.columns), 1), levdf_corner.columns, rotation='vertical')\n\tplt.xticks(np.arange(0.5, len(levdf_corner.columns), 1), levdf_corner.columns, rotation='vertical')\n\tfor index, value in levdf.loc[levdf[col] < 10, col].iteritems():\n\tfor index, value in levdf.loc[levdf[col] < 10, col].iteritems():\n\tlevc_corner = lev_c.iloc[:25,:25]\n\tlevc_corner = lev_c.iloc[:25,:25]\n\tlevc_corner = lev_c.iloc[:25,:25]\n\tlevc_corner = lev_c.iloc[:25,:25]\n\tfig = plt.figure(figsize=(15, 12))\n\tfig = plt.figure(figsize=(15, 12))\n\tfig = plt.figure(figsize=(15, 12))\n\tfig = plt.figure(figsize=(15, 12))\n\tplt.yticks(np.arange(0.5, len(levc_corner.index), 1), levc_corner.index)\n\tplt.yticks(np.arange(0.5, len(levc_corner.index), 1), levc_corner.index)\n\tplt.yticks(np.arange(0.5, len(levc_corner.index), 1), levc_corner.index)\n\tplt.xticks(np.arange(0.5, len(levc_corner.columns), 1), levc_corner.columns, rotation='vertical')\n\tplt.xticks(np.arange(0.5, len(levc_corner.columns), 1), levc_corner.columns, rotation='vertical')\n\tplt.xticks(np.arange(0.5, len(levc_corner.columns), 1), levc_corner.columns, rotation='vertical')\n\tfig, axes = plt.subplots(nrows=2, figsize=(15, 12))\n\tfig, axes = plt.subplots(nrows=2, figsize=(15, 12))\n\tfig, axes = plt.subplots(nrows=2, figsize=(15, 12))\n\tfig, axes = plt.subplots(nrows=2, figsize=(15, 12))\n\tfig, axes = plt.subplots(nrows=2, figsize=(15, 12))\n\tta[-20:].plot(kind='barh',ax=axes[0], width=1, title='Before consolidation')\n\tta[-20:].plot(kind='barh',ax=axes[0], width=1, title='Before consolidation')\n\tta[-20:].plot(kind='barh',ax=axes[0], width=1, title='Before consolidation')\n\tta[-20:].plot(kind='barh',ax=axes[0], width=1, title='Before consolidation')\n\ttc = c.sum(0)\n\ttc[-20:].plot(kind='barh',ax=axes[1], width=1, title='After consolidation')\n\ttc[-20:].plot(kind='barh',ax=axes[1], width=1, title='After consolidation')\n\ttc[-20:].plot(kind='barh',ax=axes[1], width=1, title='After consolidation')\n\ttc[-20:].plot(kind='barh',ax=axes[1], width=1, title='After consolidation')\n\tdomain_groups[-20:].plot(kind='barh', width=1, title=\"Number of participants at domain\")\n\tdomain_groups[-20:].plot(kind='barh', width=1, title=\"Number of participants at domain\")\n\tdomain_groups[-20:].plot(kind='barh', width=1, title=\"Number of participants at domain\")\n\tdomain_messages_sum[-20:].plot(kind='barh', width=1, title=\"Number of messages from domain\")\n\tdomain_messages_sum[-20:].plot(kind='barh', width=1, title=\"Number of messages from domain\")\n\tdomain_messages_sum[-20:].plot(kind='barh', width=1, title=\"Number of messages from domain\")\n",
  104. "stream": "stdout"
  105. }
  106. ],
  107. "language": "python",
  108. "trusted": true,
  109. "collapsed": false
  110. },
  111. {
  112. "metadata": {},
  113. "cell_type": "heading",
  114. "source": "3:",
  115. "level": 1
  116. },
  117. {
  118. "metadata": {},
  119. "cell_type": "markdown",
  120. "source": "##Interactively approve or reject candidates from notebook specified in part 2 for description\nAdd or pass in the raw_input field that appears below the cell:\n\n[return to Intro](#Intro:)"
  121. },
  122. {
  123. "metadata": {},
  124. "cell_type": "code",
  125. "input": "try:\n with open('model_development.csv', mode='r') as fin:\n reader = csv.reader(fin)\n in_md_dict = {rows[0]:rows[1] for rows in reader}\nexcept:\n print \"model_development.csv must be created...\"\n print\n in_md_dict = {}\n \n# To add item to model development queue enter 'a' to pass enter 'p':\n# The printout will provide candidate and then a tuple pair of the candidates status, which should be blank and the type\n# Example: \"Currently: length ('', var )\"\n# Types: var = defined variable, numpar = numerical parameter\n\nout_md_dict = in_md_dict\nfor key, value in md_dict.items():\n for val in value:\n if val in in_md_dict.keys():\n try: \n a, b = in_md_dict[val].split()\n status, typ = a[2], b[1:-2]\n except Exception as e: \n #print e\n #print val\n status, typ = in_md_dict[val]\n if status == \"''\":\n print 'Currently:', val,in_md_dict[val] \n status = raw_input(\"Add (a) or Pass (p)\")\n out_md_dict[val] = (status, key)\n print\n else:\n print 'Currently:', val, '(\\'\\',', key, ')' \n status = raw_input(\"Add (a) or Pass (p)\")\n out_md_dict[val] = (status, key)\n print\n \nwith open('model_development.csv', 'wb') as fout:\n writer = csv.writer(fout)\n for key, value in out_md_dict.items():\n writer.writerow([key, value])",
  126. "prompt_number": 7,
  127. "outputs": [
  128. {
  129. "output_type": "stream",
  130. "text": "Currently: mlists ('', var )\n",
  131. "stream": "stdout"
  132. },
  133. {
  134. "output_type": "stream",
  135. "name": "stdout",
  136. "text": "Add (a) or Pass (p)p\n",
  137. "stream": "stdout"
  138. },
  139. {
  140. "output_type": "stream",
  141. "text": "\nCurrently: activities ('', var )\n",
  142. "stream": "stdout"
  143. },
  144. {
  145. "output_type": "stream",
  146. "name": "stdout",
  147. "text": "Add (a) or Pass (p)p\n",
  148. "stream": "stdout"
  149. },
  150. {
  151. "output_type": "stream",
  152. "text": "\nCurrently: fig ('', var )\n",
  153. "stream": "stdout"
  154. },
  155. {
  156. "output_type": "stream",
  157. "name": "stdout",
  158. "text": "Add (a) or Pass (p)p\n",
  159. "stream": "stdout"
  160. },
  161. {
  162. "output_type": "stream",
  163. "text": "\nCurrently: consolidates ('', var )\n",
  164. "stream": "stdout"
  165. },
  166. {
  167. "output_type": "stream",
  168. "name": "stdout",
  169. "text": "Add (a) or Pass (p)a\n",
  170. "stream": "stdout"
  171. },
  172. {
  173. "output_type": "stream",
  174. "text": "\nCurrently: c ('', var )\n",
  175. "stream": "stdout"
  176. },
  177. {
  178. "output_type": "stream",
  179. "name": "stdout",
  180. "text": "Add (a) or Pass (p)a\n",
  181. "stream": "stdout"
  182. },
  183. {
  184. "output_type": "stream",
  185. "text": "\nCurrently: lev_c ('', var )\n",
  186. "stream": "stdout"
  187. },
  188. {
  189. "output_type": "stream",
  190. "name": "stdout",
  191. "text": "Add (a) or Pass (p)a\n",
  192. "stream": "stdout"
  193. },
  194. {
  195. "output_type": "stream",
  196. "text": "\nCurrently: levc_corner ('', var )\n",
  197. "stream": "stdout"
  198. },
  199. {
  200. "output_type": "stream",
  201. "name": "stdout",
  202. "text": "Add (a) or Pass (p)p\n",
  203. "stream": "stdout"
  204. },
  205. {
  206. "output_type": "stream",
  207. "text": "\nCurrently: tc ('', var )\n",
  208. "stream": "stdout"
  209. },
  210. {
  211. "output_type": "stream",
  212. "name": "stdout",
  213. "text": "Add (a) or Pass (p)p\n",
  214. "stream": "stdout"
  215. },
  216. {
  217. "output_type": "stream",
  218. "text": "\nCurrently: grouped ('', var )\n",
  219. "stream": "stdout"
  220. },
  221. {
  222. "output_type": "stream",
  223. "name": "stdout",
  224. "text": "Add (a) or Pass (p)p\n",
  225. "stream": "stdout"
  226. },
  227. {
  228. "output_type": "stream",
  229. "text": "\nCurrently: domain_groups ('', var )\n",
  230. "stream": "stdout"
  231. },
  232. {
  233. "output_type": "stream",
  234. "name": "stdout",
  235. "text": "Add (a) or Pass (p)a\n",
  236. "stream": "stdout"
  237. },
  238. {
  239. "output_type": "stream",
  240. "text": "\nCurrently: domain_messages_sum ('', var )\n",
  241. "stream": "stdout"
  242. },
  243. {
  244. "output_type": "stream",
  245. "name": "stdout",
  246. "text": "Add (a) or Pass (p)p\n",
  247. "stream": "stdout"
  248. },
  249. {
  250. "output_type": "stream",
  251. "text": "\nCurrently: ta[-10:].plot(kind='barh', width=1) ('', numpar )\n",
  252. "stream": "stdout"
  253. },
  254. {
  255. "output_type": "stream",
  256. "name": "stdout",
  257. "text": "Add (a) or Pass (p)p\n",
  258. "stream": "stdout"
  259. },
  260. {
  261. "output_type": "stream",
  262. "text": "\nCurrently: fig = plt.figure(figsize=(15, 12)) ('', numpar )\n",
  263. "stream": "stdout"
  264. },
  265. {
  266. "output_type": "stream",
  267. "name": "stdout",
  268. "text": "Add (a) or Pass (p)p\n",
  269. "stream": "stdout"
  270. },
  271. {
  272. "output_type": "stream",
  273. "text": "\nCurrently: plt.yticks(np.arange(0.5, len(levdf_corner.index), 1), levdf_corner.index) ('', numpar )\n",
  274. "stream": "stdout"
  275. },
  276. {
  277. "output_type": "stream",
  278. "name": "stdout",
  279. "text": "Add (a) or Pass (p)p\n",
  280. "stream": "stdout"
  281. },
  282. {
  283. "output_type": "stream",
  284. "text": "\nCurrently: plt.xticks(np.arange(0.5, len(levdf_corner.columns), 1), levdf_corner.columns, rotation='vertical') ('', numpar )\n",
  285. "stream": "stdout"
  286. },
  287. {
  288. "output_type": "stream",
  289. "name": "stdout",
  290. "text": "Add (a) or Pass (p)p\n",
  291. "stream": "stdout"
  292. },
  293. {
  294. "output_type": "stream",
  295. "text": "\nCurrently: for index, value in levdf.loc[levdf[col] < 10, col].iteritems(): ('', numpar )\n",
  296. "stream": "stdout"
  297. },
  298. {
  299. "output_type": "stream",
  300. "name": "stdout",
  301. "text": "Add (a) or Pass (p)p\n",
  302. "stream": "stdout"
  303. },
  304. {
  305. "output_type": "stream",
  306. "text": "\nCurrently: levc_corner = lev_c.iloc[:25,:25] ('', numpar )\n",
  307. "stream": "stdout"
  308. },
  309. {
  310. "output_type": "stream",
  311. "name": "stdout",
  312. "text": "Add (a) or Pass (p)p\n",
  313. "stream": "stdout"
  314. },
  315. {
  316. "output_type": "stream",
  317. "text": "\nCurrently: plt.yticks(np.arange(0.5, len(levc_corner.index), 1), levc_corner.index) ('', numpar )\n",
  318. "stream": "stdout"
  319. },
  320. {
  321. "output_type": "stream",
  322. "name": "stdout",
  323. "text": "Add (a) or Pass (p)p\n",
  324. "stream": "stdout"
  325. },
  326. {
  327. "output_type": "stream",
  328. "text": "\nCurrently: plt.xticks(np.arange(0.5, len(levc_corner.columns), 1), levc_corner.columns, rotation='vertical') ('', numpar )\n",
  329. "stream": "stdout"
  330. },
  331. {
  332. "output_type": "stream",
  333. "name": "stdout",
  334. "text": "Add (a) or Pass (p)p\n",
  335. "stream": "stdout"
  336. },
  337. {
  338. "output_type": "stream",
  339. "text": "\nCurrently: fig, axes = plt.subplots(nrows=2, figsize=(15, 12)) ('', numpar )\n",
  340. "stream": "stdout"
  341. },
  342. {
  343. "output_type": "stream",
  344. "name": "stdout",
  345. "text": "Add (a) or Pass (p)p\n",
  346. "stream": "stdout"
  347. },
  348. {
  349. "output_type": "stream",
  350. "text": "\nCurrently: ta[-20:].plot(kind='barh',ax=axes[0], width=1, title='Before consolidation') ('', numpar )\n",
  351. "stream": "stdout"
  352. },
  353. {
  354. "output_type": "stream",
  355. "name": "stdout",
  356. "text": "Add (a) or Pass (p)p\n",
  357. "stream": "stdout"
  358. },
  359. {
  360. "output_type": "stream",
  361. "text": "\nCurrently: tc = c.sum(0) ('', numpar )\n",
  362. "stream": "stdout"
  363. },
  364. {
  365. "output_type": "stream",
  366. "name": "stdout",
  367. "text": "Add (a) or Pass (p)p\n",
  368. "stream": "stdout"
  369. },
  370. {
  371. "output_type": "stream",
  372. "text": "\nCurrently: tc[-20:].plot(kind='barh',ax=axes[1], width=1, title='After consolidation') ('', numpar )\n",
  373. "stream": "stdout"
  374. },
  375. {
  376. "output_type": "stream",
  377. "name": "stdout",
  378. "text": "Add (a) or Pass (p)p\n",
  379. "stream": "stdout"
  380. },
  381. {
  382. "output_type": "stream",
  383. "text": "\nCurrently: domain_groups[-20:].plot(kind='barh', width=1, title=\"Number of participants at domain\") ('', numpar )\n",
  384. "stream": "stdout"
  385. },
  386. {
  387. "output_type": "stream",
  388. "name": "stdout",
  389. "text": "Add (a) or Pass (p)p\n",
  390. "stream": "stdout"
  391. },
  392. {
  393. "output_type": "stream",
  394. "text": "\nCurrently: domain_messages_sum[-20:].plot(kind='barh', width=1, title=\"Number of messages from domain\") ('', numpar )\n",
  395. "stream": "stdout"
  396. },
  397. {
  398. "output_type": "stream",
  399. "name": "stdout",
  400. "text": "Add (a) or Pass (p)p\n",
  401. "stream": "stdout"
  402. },
  403. {
  404. "output_type": "stream",
  405. "text": "\n",
  406. "stream": "stdout"
  407. }
  408. ],
  409. "language": "python",
  410. "trusted": true,
  411. "collapsed": false
  412. },
  413. {
  414. "metadata": {},
  415. "cell_type": "heading",
  416. "source": "4:",
  417. "level": 1
  418. },
  419. {
  420. "metadata": {},
  421. "cell_type": "markdown",
  422. "source": "##Interactively provide descriptions of candidates\nThis will process all candidates flagged for various notebooks in part 3.\n\nEnter description in the raw_input field that appears below the cell:\n\n[return to Intro](#Intro:)"
  423. },
  424. {
  425. "metadata": {},
  426. "cell_type": "code",
  427. "input": "try:\n with open('item_descriptions.csv', mode='r') as fin:\n reader = csv.reader(fin)\n in_dict = {rows[0]:rows[1] for rows in reader}\nexcept:\n print \"item_descriptions.csv must be created...\"\n print\n in_dict = {}\n \nwith open('model_development.csv', mode='r') as fin:\n reader = csv.reader(fin)\n in_md_dict = {rows[0]:rows[1] for rows in reader}\n \nout_dict = in_dict\nfor key, value in in_md_dict.items():\n a, b = value.split()\n status, typ = a[2], b[1:-2]\n \n if status == 'a':\n print 'Item:', key,value \n description = raw_input(\"Enter description: \")\n out_md_dict[key] = ('d', typ) # d for done\n out_dict[key] = (description, typ)\n \nwith open('model_development.csv', 'wb') as fout:\n writer = csv.writer(fout)\n for key, value in out_md_dict.items():\n writer.writerow([key, value])\n\nwith open('item_descriptions.csv', 'wb') as fout:\n writer = csv.writer(fout)\n for key, value in out_dict.items():\n writer.writerow([key, value])",
  428. "prompt_number": 8,
  429. "outputs": [
  430. {
  431. "output_type": "stream",
  432. "text": "Item: consolidates ('a', 'var')\n",
  433. "stream": "stdout"
  434. },
  435. {
  436. "output_type": "stream",
  437. "name": "stdout",
  438. "text": "Enter description: The variable for consolidated entity after entity resolution.\n",
  439. "stream": "stdout"
  440. },
  441. {
  442. "output_type": "stream",
  443. "text": "Item: lev_c ('a', 'var')\n",
  444. "stream": "stdout"
  445. },
  446. {
  447. "output_type": "stream",
  448. "name": "stdout",
  449. "text": "Enter description: The variable defined within the Process library code.\n",
  450. "stream": "stdout"
  451. },
  452. {
  453. "output_type": "stream",
  454. "text": "Item: c ('a', 'var')\n",
  455. "stream": "stdout"
  456. },
  457. {
  458. "output_type": "stream",
  459. "name": "stdout",
  460. "text": "Enter description: A list of consolidated senders after processing by the Process library code, this list contains (...).\n",
  461. "stream": "stdout"
  462. },
  463. {
  464. "output_type": "stream",
  465. "text": "Item: domain_groups ('a', 'var')\n",
  466. "stream": "stdout"
  467. },
  468. {
  469. "output_type": "stream",
  470. "name": "stdout",
  471. "text": "Enter description: Email addresses grouped by domain name.\n",
  472. "stream": "stdout"
  473. }
  474. ],
  475. "language": "python",
  476. "trusted": true,
  477. "collapsed": false
  478. }
  479. ],
  480. "metadata": {}
  481. }
  482. ],
  483. "metadata": {
  484. "name": "",
  485. "signature": "sha256:aa4449af8938c56713bde41136c7e4706d81416724b0d037a8b90c1f3b83c553"
  486. },
  487. "nbformat": 3
  488. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement