Advertisement
Guest User

Untitled

a guest
Jun 26th, 2017
76
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.76 KB | None | 0 0
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Demo for cohort's mini-gsutil library"
  8. ]
  9. },
  10. {
  11. "cell_type": "code",
  12. "execution_count": 1,
  13. "metadata": {},
  14. "outputs": [],
  15. "source": [
  16. "import os\n",
  17. "\n",
  18. "%reload_ext autoreload\n",
  19. "%autoreload 2\n",
  20. "\n",
  21. "import cohorts\n",
  22. "from cohorts.io import gcloud_storage"
  23. ]
  24. },
  25. {
  26. "cell_type": "markdown",
  27. "metadata": {},
  28. "source": [
  29. "Create a barebones files to be used for testing the new functionality. We will populate it manually for now and then will try to fetch it, modify it, and put it back onto the GS:"
  30. ]
  31. },
  32. {
  33. "cell_type": "code",
  34. "execution_count": 2,
  35. "metadata": {},
  36. "outputs": [
  37. {
  38. "name": "stdout",
  39. "output_type": "stream",
  40. "text": [
  41. "Copying file:///tmp/gsio.txt [Content-Type=text/plain]...\n",
  42. "/ [1 files][ 13.0 B/ 13.0 B] \n",
  43. "Operation completed over 1 objects/13.0 B. \n",
  44. "Cohorts test\n"
  45. ]
  46. }
  47. ],
  48. "source": [
  49. "# file URI on the GS\n",
  50. "gsuri = \"gs://arman-hammerlab/test/gsio.txt\"\n",
  51. "org_tmp = \"/tmp/gsio.txt\"\n",
  52. "\n",
  53. "!echo 'Cohorts test' > {org_tmp}\n",
  54. "!gsutil cp {org_tmp} {gsuri}\n",
  55. "!gsutil cat {gsuri}"
  56. ]
  57. },
  58. {
  59. "cell_type": "markdown",
  60. "metadata": {},
  61. "source": [
  62. "Now, let's see if we can pull it without breaking anything:"
  63. ]
  64. },
  65. {
  66. "cell_type": "code",
  67. "execution_count": 3,
  68. "metadata": {},
  69. "outputs": [],
  70. "source": [
  71. "# Initialize the class that will help us with those:\n",
  72. "gcio = gcloud_storage.GoogleStorageIO() # Look ma, no passwords or anything!\n",
  73. "# and it handles the auth part very smoothly, which is great.\n",
  74. "\n",
  75. "gcio.download_to_path(gsuri=gsuri, localpath=\"{}.dl\".format(org_tmp))"
  76. ]
  77. },
  78. {
  79. "cell_type": "code",
  80. "execution_count": 4,
  81. "metadata": {},
  82. "outputs": [
  83. {
  84. "name": "stdout",
  85. "output_type": "stream",
  86. "text": [
  87. "Cohorts test\r\n"
  88. ]
  89. }
  90. ],
  91. "source": [
  92. "!cat {org_tmp}.dl"
  93. ]
  94. },
  95. {
  96. "cell_type": "markdown",
  97. "metadata": {},
  98. "source": [
  99. "Weee! So we can pull files to our own local, that is useful; but even more useful would be to open them as if they are local files, change the content, and seemlessly putting it back. We will do this with the Python's `ContextManager`-style of doing things:"
  100. ]
  101. },
  102. {
  103. "cell_type": "code",
  104. "execution_count": 5,
  105. "metadata": {},
  106. "outputs": [
  107. {
  108. "name": "stdout",
  109. "output_type": "stream",
  110. "text": [
  111. "Cohorts test\n",
  112. "\n"
  113. ]
  114. }
  115. ],
  116. "source": [
  117. "# This is our context class that hands you a file handler\n",
  118. "# once you are in and takes of the rest of the legwork (download/upload)\n",
  119. "# for you:\n",
  120. "from cohorts.io.gcloud_storage import GoogleStorageFile\n",
  121. "\n",
  122. "# The following is similar to what we do with `with open(file, mode)`:\n",
  123. "with GoogleStorageFile(gcio=gcio, gsuri=gsuri, mode='r+') as fgcs:\n",
  124. " print(fgcs.read()) # See the contents first\n",
  125. " # And since the mode is `r+`, this addition should go to the end\n",
  126. " # of the file (append mode):\n",
  127. " fgcs.write(\"Additional text\") \n",
  128. "# And we are out of the context"
  129. ]
  130. },
  131. {
  132. "cell_type": "markdown",
  133. "metadata": {},
  134. "source": [
  135. "All right! No errors, but just the content we printed out is here; but now let's see things are OK on the GS side:"
  136. ]
  137. },
  138. {
  139. "cell_type": "code",
  140. "execution_count": 6,
  141. "metadata": {},
  142. "outputs": [
  143. {
  144. "name": "stdout",
  145. "output_type": "stream",
  146. "text": [
  147. "Cohorts test\r\n",
  148. "Additional text"
  149. ]
  150. }
  151. ],
  152. "source": [
  153. "!gsutil cat {gsuri}"
  154. ]
  155. },
  156. {
  157. "cell_type": "markdown",
  158. "metadata": {},
  159. "source": [
  160. "And we have a few other utils that might come handy, too:"
  161. ]
  162. },
  163. {
  164. "cell_type": "code",
  165. "execution_count": 7,
  166. "metadata": {},
  167. "outputs": [
  168. {
  169. "name": "stdout",
  170. "output_type": "stream",
  171. "text": [
  172. "<Blob: arman-hammerlab, CIBERSORT.R>\n",
  173. "<Blob: arman-hammerlab, gs://arman-hammerlab/test/gsio.txt>\n",
  174. "<Blob: arman-hammerlab, lowresp/SI9946.bam>\n",
  175. "<Blob: arman-hammerlab, lowresp/SI9946.bam.bai>\n",
  176. "<Blob: arman-hammerlab, rcc/reDEFAULT.bam>\n",
  177. "<Blob: arman-hammerlab, rcc/reDEFAULT.bam.bai>\n",
  178. "<Blob: arman-hammerlab, rcc/reL0-sorted.bam>\n",
  179. "<Blob: arman-hammerlab, rcc/reL0-sorted.bam.bai>\n",
  180. "<Blob: arman-hammerlab, rcc/reL3-sorted.bam>\n",
  181. "<Blob: arman-hammerlab, rcc/reL3-sorted.bam.bai>\n",
  182. "<Blob: arman-hammerlab, rcc/tumor.bam>\n",
  183. "<Blob: arman-hammerlab, rcc/tumor.bam.bai>\n",
  184. "<Blob: arman-hammerlab, test-htslib/sample-readwrite.bam>\n",
  185. "<Blob: arman-hammerlab, test-htslib/sample.bam>\n",
  186. "<Blob: arman-hammerlab, test/C509.TCGA-78-7536-10A-01D-2063-08.1_gdc_realn.bam>\n",
  187. "<Blob: arman-hammerlab, test/a9fa477c-8299-41e4-aa5d-e0ae6801a201_gdc_realn_rehead.bam>\n",
  188. "<Blob: arman-hammerlab, test/b37decoy.fasta>\n",
  189. "<Blob: arman-hammerlab, test/bla>\n",
  190. "<Blob: arman-hammerlab, test/cnv>\n",
  191. "<Blob: arman-hammerlab, test/d295cec33ca4e862fc06cb13aa9afa3d26ed6aa7cfc46893691aafbc404b250en_PGDX2811T_Excleaned-b2fq-PE_R1.fastq>\n",
  192. "<Blob: arman-hammerlab, test/gsio.txt>\n",
  193. "<Blob: arman-hammerlab, test/gsio.txt.upl>\n",
  194. "<Blob: arman-hammerlab, test/tumor>\n"
  195. ]
  196. }
  197. ],
  198. "source": [
  199. "# Listing files, maybe?\n",
  200. "for afile in gcio.list_files(\"gs://arman-hammerlab/test\"):\n",
  201. " print(afile)"
  202. ]
  203. },
  204. {
  205. "cell_type": "markdown",
  206. "metadata": {},
  207. "source": [
  208. "We have another one: not as powerful as the gsutil is, but still a small/frequent task that keep doing over and over again: upload file"
  209. ]
  210. },
  211. {
  212. "cell_type": "code",
  213. "execution_count": 8,
  214. "metadata": {},
  215. "outputs": [
  216. {
  217. "name": "stdout",
  218. "output_type": "stream",
  219. "text": [
  220. "Cohorts test\r\n",
  221. "Yet another addition\r\n"
  222. ]
  223. }
  224. ],
  225. "source": [
  226. "!echo \"Yet another addition\" >> {org_tmp}\n",
  227. "gcio.upload_file(localpath=org_tmp, gsuri=\"{}.upl\".format(gsuri))\n",
  228. "!gsutil cat {gsuri}.upl"
  229. ]
  230. }
  231. ],
  232. "metadata": {
  233. "kernelspec": {
  234. "display_name": "Python 3",
  235. "language": "python",
  236. "name": "python3"
  237. },
  238. "language_info": {
  239. "codemirror_mode": {
  240. "name": "ipython",
  241. "version": 3
  242. },
  243. "file_extension": ".py",
  244. "mimetype": "text/x-python",
  245. "name": "python",
  246. "nbconvert_exporter": "python",
  247. "pygments_lexer": "ipython3",
  248. "version": "3.6.1"
  249. }
  250. },
  251. "nbformat": 4,
  252. "nbformat_minor": 2
  253. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement