Guest User

Untitled

a guest
Apr 23rd, 2018
84
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 17.54 KB | None | 0 0
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Knet RNN example"
  8. ]
  9. },
  10. {
  11. "cell_type": "code",
  12. "execution_count": 1,
  13. "metadata": {},
  14. "outputs": [],
  15. "source": [
  16. "# After installing and starting Julia run the following to install the required packages:\n",
  17. "# Pkg.init(); Pkg.update()\n",
  18. "# for p in (\"CUDAdrv\",\"IJulia\",\"PyCall\",\"JLD2\",\"Knet\"); Pkg.add(p); end\n",
  19. "# Pkg.checkout(\"Knet\",\"ilkarman\") # make sure we have the right Knet version\n",
  20. "# Pkg.build(\"Knet\")"
  21. ]
  22. },
  23. {
  24. "cell_type": "code",
  25. "execution_count": 2,
  26. "metadata": {},
  27. "outputs": [],
  28. "source": [
  29. "using Knet\n",
  30. "True=true # so we can read the python params\n",
  31. "include(\"common/params_lstm.py\");"
  32. ]
  33. },
  34. {
  35. "cell_type": "code",
  36. "execution_count": 3,
  37. "metadata": {},
  38. "outputs": [
  39. {
  40. "name": "stdout",
  41. "output_type": "stream",
  42. "text": [
  43. "OS: Linux\n",
  44. "Julia: 0.6.2\n",
  45. "Knet: 0.9.0+\n",
  46. "GPU: Tesla K80\n",
  47. "\n"
  48. ]
  49. }
  50. ],
  51. "source": [
  52. "println(\"OS: \", Sys.KERNEL)\n",
  53. "println(\"Julia: \", VERSION)\n",
  54. "println(\"Knet: \", Pkg.installed(\"Knet\"))\n",
  55. "println(\"GPU: \", readstring(`nvidia-smi --query-gpu=name --format=csv,noheader`))"
  56. ]
  57. },
  58. {
  59. "cell_type": "code",
  60. "execution_count": 4,
  61. "metadata": {},
  62. "outputs": [
  63. {
  64. "data": {
  65. "text/markdown": [
  66. "```\n",
  67. "rnninit(inputSize, hiddenSize; opts...)\n",
  68. "```\n",
  69. "\n",
  70. "Return an `(r,w)` pair where `r` is a RNN struct and `w` is a single weight array that includes all matrices and biases for the RNN. Keyword arguments:\n",
  71. "\n",
  72. " * `rnnType=:lstm` Type of RNN: One of :relu, :tanh, :lstm, :gru.\n",
  73. " * `numLayers=1`: Number of RNN layers.\n",
  74. " * `bidirectional=false`: Create a bidirectional RNN if `true`.\n",
  75. " * `dropout=0.0`: Dropout probability. Ignored if `numLayers==1`.\n",
  76. " * `skipInput=false`: Do not multiply the input with a matrix if `true`.\n",
  77. " * `dataType=Float32`: Data type to use for weights.\n",
  78. " * `algo=0`: Algorithm to use, see CUDNN docs for details.\n",
  79. " * `seed=0`: Random number seed. Uses `time()` if 0.\n",
  80. " * `winit=xavier`: Weight initialization method for matrices.\n",
  81. " * `binit=zeros`: Weight initialization method for bias vectors.\n",
  82. " * `usegpu=(gpu()>=0): GPU used by default if one exists.\n",
  83. "\n",
  84. "RNNs compute the output h[t] for a given iteration from the recurrent input h[t-1] and the previous layer input x[t] given matrices W, R and biases bW, bR from the following equations:\n",
  85. "\n",
  86. "`:relu` and `:tanh`: Single gate RNN with activation function f:\n",
  87. "\n",
  88. "```\n",
  89. "h[t] = f(W * x[t] .+ R * h[t-1] .+ bW .+ bR)\n",
  90. "```\n",
  91. "\n",
  92. "`:gru`: Gated recurrent unit:\n",
  93. "\n",
  94. "```\n",
  95. "i[t] = sigm(Wi * x[t] .+ Ri * h[t-1] .+ bWi .+ bRi) # input gate\n",
  96. "r[t] = sigm(Wr * x[t] .+ Rr * h[t-1] .+ bWr .+ bRr) # reset gate\n",
  97. "n[t] = tanh(Wn * x[t] .+ r[t] .* (Rn * h[t-1] .+ bRn) .+ bWn) # new gate\n",
  98. "h[t] = (1 - i[t]) .* n[t] .+ i[t] .* h[t-1]\n",
  99. "```\n",
  100. "\n",
  101. "`:lstm`: Long short term memory unit with no peephole connections:\n",
  102. "\n",
  103. "```\n",
  104. "i[t] = sigm(Wi * x[t] .+ Ri * h[t-1] .+ bWi .+ bRi) # input gate\n",
  105. "f[t] = sigm(Wf * x[t] .+ Rf * h[t-1] .+ bWf .+ bRf) # forget gate\n",
  106. "o[t] = sigm(Wo * x[t] .+ Ro * h[t-1] .+ bWo .+ bRo) # output gate\n",
  107. "n[t] = tanh(Wn * x[t] .+ Rn * h[t-1] .+ bWn .+ bRn) # new gate\n",
  108. "c[t] = f[t] .* c[t-1] .+ i[t] .* n[t] # cell output\n",
  109. "h[t] = o[t] .* tanh(c[t])\n",
  110. "```\n"
  111. ],
  112. "text/plain": [
  113. "```\n",
  114. "rnninit(inputSize, hiddenSize; opts...)\n",
  115. "```\n",
  116. "\n",
  117. "Return an `(r,w)` pair where `r` is a RNN struct and `w` is a single weight array that includes all matrices and biases for the RNN. Keyword arguments:\n",
  118. "\n",
  119. " * `rnnType=:lstm` Type of RNN: One of :relu, :tanh, :lstm, :gru.\n",
  120. " * `numLayers=1`: Number of RNN layers.\n",
  121. " * `bidirectional=false`: Create a bidirectional RNN if `true`.\n",
  122. " * `dropout=0.0`: Dropout probability. Ignored if `numLayers==1`.\n",
  123. " * `skipInput=false`: Do not multiply the input with a matrix if `true`.\n",
  124. " * `dataType=Float32`: Data type to use for weights.\n",
  125. " * `algo=0`: Algorithm to use, see CUDNN docs for details.\n",
  126. " * `seed=0`: Random number seed. Uses `time()` if 0.\n",
  127. " * `winit=xavier`: Weight initialization method for matrices.\n",
  128. " * `binit=zeros`: Weight initialization method for bias vectors.\n",
  129. " * `usegpu=(gpu()>=0): GPU used by default if one exists.\n",
  130. "\n",
  131. "RNNs compute the output h[t] for a given iteration from the recurrent input h[t-1] and the previous layer input x[t] given matrices W, R and biases bW, bR from the following equations:\n",
  132. "\n",
  133. "`:relu` and `:tanh`: Single gate RNN with activation function f:\n",
  134. "\n",
  135. "```\n",
  136. "h[t] = f(W * x[t] .+ R * h[t-1] .+ bW .+ bR)\n",
  137. "```\n",
  138. "\n",
  139. "`:gru`: Gated recurrent unit:\n",
  140. "\n",
  141. "```\n",
  142. "i[t] = sigm(Wi * x[t] .+ Ri * h[t-1] .+ bWi .+ bRi) # input gate\n",
  143. "r[t] = sigm(Wr * x[t] .+ Rr * h[t-1] .+ bWr .+ bRr) # reset gate\n",
  144. "n[t] = tanh(Wn * x[t] .+ r[t] .* (Rn * h[t-1] .+ bRn) .+ bWn) # new gate\n",
  145. "h[t] = (1 - i[t]) .* n[t] .+ i[t] .* h[t-1]\n",
  146. "```\n",
  147. "\n",
  148. "`:lstm`: Long short term memory unit with no peephole connections:\n",
  149. "\n",
  150. "```\n",
  151. "i[t] = sigm(Wi * x[t] .+ Ri * h[t-1] .+ bWi .+ bRi) # input gate\n",
  152. "f[t] = sigm(Wf * x[t] .+ Rf * h[t-1] .+ bWf .+ bRf) # forget gate\n",
  153. "o[t] = sigm(Wo * x[t] .+ Ro * h[t-1] .+ bWo .+ bRo) # output gate\n",
  154. "n[t] = tanh(Wn * x[t] .+ Rn * h[t-1] .+ bWn .+ bRn) # new gate\n",
  155. "c[t] = f[t] .* c[t-1] .+ i[t] .* n[t] # cell output\n",
  156. "h[t] = o[t] .* tanh(c[t])\n",
  157. "```\n"
  158. ]
  159. },
  160. "execution_count": 4,
  161. "metadata": {},
  162. "output_type": "execute_result"
  163. }
  164. ],
  165. "source": [
  166. "@doc rnninit"
  167. ]
  168. },
  169. {
  170. "cell_type": "code",
  171. "execution_count": 6,
  172. "metadata": {},
  173. "outputs": [],
  174. "source": [
  175. "# define model\n",
  176. "function initmodel()\n",
  177. " rnnSpec,rnnWeights = rnninit(EMBEDSIZE,NUMHIDDEN; rnnType=:gru)\n",
  178. " inputMatrix = KnetArray(xavier(Float32,EMBEDSIZE,MAXFEATURES))\n",
  179. " outputMatrix = KnetArray(xavier(Float32,2,NUMHIDDEN))\n",
  180. " return rnnSpec,(rnnWeights,inputMatrix,outputMatrix)\n",
  181. "end;"
  182. ]
  183. },
  184. {
  185. "cell_type": "code",
  186. "execution_count": 9,
  187. "metadata": {},
  188. "outputs": [
  189. {
  190. "data": {
  191. "text/markdown": [
  192. "```\n",
  193. "rnnforw(r, w, x[, hx, cx]; batchSizes, hy, cy)\n",
  194. "```\n",
  195. "\n",
  196. "Returns a tuple (y,hyout,cyout,rs) given rnn `r`, weights `w`, input `x` and optionally the initial hidden and cell states `hx` and `cx` (`cx` is only used in LSTMs). `r` and `w` should come from a previous call to `rnninit`. Both `hx` and `cx` are optional, they are treated as zero arrays if not provided. The output `y` contains the hidden states of the final layer for each time step, `hyout` and `cyout` give the final hidden and cell states for all layers, `rs` is a buffer the RNN needs for its gradient calculation.\n",
  197. "\n",
  198. "The boolean keyword arguments `hy` and `cy` control whether `hyout` and `cyout` will be output. By default `hy = (hx!=nothing)` and `cy = (cx!=nothing && r.mode==2)`, i.e. a hidden state will be output if one is provided as input and for cell state we also require an LSTM. If `hy`/`cy` is `false`, `hyout`/`cyout` will be `nothing`. `batchSizes` can be an integer array that specifies non-uniform batch sizes as explained below. By default `batchSizes=nothing` and the same batch size, `size(x,2)`, is used for all time steps.\n",
  199. "\n",
  200. "The input and output dimensions are:\n",
  201. "\n",
  202. " * `x`: (X,[B,T])\n",
  203. " * `y`: (H/2H,[B,T])\n",
  204. " * `hx`,`cx`,`hyout`,`cyout`: (H,B,L/2L)\n",
  205. " * `batchSizes`: `nothing` or `Vector{Int}(T)`\n",
  206. "\n",
  207. "where X is inputSize, H is hiddenSize, B is batchSize, T is seqLength, L is numLayers. `x` can be 1, 2, or 3 dimensional. If `batchSizes==nothing`, a 1-D `x` represents a single instance, a 2-D `x` represents a single minibatch, and a 3-D `x` represents a sequence of identically sized minibatches. If `batchSizes` is an array of (non-increasing) integers, it gives us the batch size for each time step in the sequence, in which case `sum(batchSizes)` should equal `div(length(x),size(x,1))`. `y` has the same dimensionality as `x`, differing only in its first dimension, which is H if the RNN is unidirectional, 2H if bidirectional. Hidden vectors `hx`, `cx`, `hyout`, `cyout` all have size (H,B1,L) for unidirectional RNNs, and (H,B1,2L) for bidirectional RNNs where B1 is the size of the first minibatch.\n"
  208. ],
  209. "text/plain": [
  210. "```\n",
  211. "rnnforw(r, w, x[, hx, cx]; batchSizes, hy, cy)\n",
  212. "```\n",
  213. "\n",
  214. "Returns a tuple (y,hyout,cyout,rs) given rnn `r`, weights `w`, input `x` and optionally the initial hidden and cell states `hx` and `cx` (`cx` is only used in LSTMs). `r` and `w` should come from a previous call to `rnninit`. Both `hx` and `cx` are optional, they are treated as zero arrays if not provided. The output `y` contains the hidden states of the final layer for each time step, `hyout` and `cyout` give the final hidden and cell states for all layers, `rs` is a buffer the RNN needs for its gradient calculation.\n",
  215. "\n",
  216. "The boolean keyword arguments `hy` and `cy` control whether `hyout` and `cyout` will be output. By default `hy = (hx!=nothing)` and `cy = (cx!=nothing && r.mode==2)`, i.e. a hidden state will be output if one is provided as input and for cell state we also require an LSTM. If `hy`/`cy` is `false`, `hyout`/`cyout` will be `nothing`. `batchSizes` can be an integer array that specifies non-uniform batch sizes as explained below. By default `batchSizes=nothing` and the same batch size, `size(x,2)`, is used for all time steps.\n",
  217. "\n",
  218. "The input and output dimensions are:\n",
  219. "\n",
  220. " * `x`: (X,[B,T])\n",
  221. " * `y`: (H/2H,[B,T])\n",
  222. " * `hx`,`cx`,`hyout`,`cyout`: (H,B,L/2L)\n",
  223. " * `batchSizes`: `nothing` or `Vector{Int}(T)`\n",
  224. "\n",
  225. "where X is inputSize, H is hiddenSize, B is batchSize, T is seqLength, L is numLayers. `x` can be 1, 2, or 3 dimensional. If `batchSizes==nothing`, a 1-D `x` represents a single instance, a 2-D `x` represents a single minibatch, and a 3-D `x` represents a sequence of identically sized minibatches. If `batchSizes` is an array of (non-increasing) integers, it gives us the batch size for each time step in the sequence, in which case `sum(batchSizes)` should equal `div(length(x),size(x,1))`. `y` has the same dimensionality as `x`, differing only in its first dimension, which is H if the RNN is unidirectional, 2H if bidirectional. Hidden vectors `hx`, `cx`, `hyout`, `cyout` all have size (H,B1,L) for unidirectional RNNs, and (H,B1,2L) for bidirectional RNNs where B1 is the size of the first minibatch.\n"
  226. ]
  227. },
  228. "execution_count": 9,
  229. "metadata": {},
  230. "output_type": "execute_result"
  231. }
  232. ],
  233. "source": [
  234. "@doc rnnforw"
  235. ]
  236. },
  237. {
  238. "cell_type": "code",
  239. "execution_count": 10,
  240. "metadata": {},
  241. "outputs": [],
  242. "source": [
  243. "# define loss and its gradient\n",
  244. "function predict(weights, inputs, rnnSpec)\n",
  245. " rnnWeights, inputMatrix, outputMatrix = weights # (1,1,W), (X,V), (2,H)\n",
  246. " indices = hcat(inputs...)' # (B,T)\n",
  247. " rnnInput = inputMatrix[:,indices] # (X,B,T)\n",
  248. " rnnOutput = rnnforw(rnnSpec, rnnWeights, rnnInput)[1] # (H,B,T)\n",
  249. " return outputMatrix * rnnOutput[:,:,end] # (2,H) * (H,B) = (2,B)\n",
  250. "end\n",
  251. "\n",
  252. "loss(w,x,y,r)=nll(predict(w,x,r),y)\n",
  253. "lossgradient = grad(loss);"
  254. ]
  255. },
  256. {
  257. "cell_type": "code",
  258. "execution_count": 11,
  259. "metadata": {},
  260. "outputs": [
  261. {
  262. "name": "stderr",
  263. "output_type": "stream",
  264. "text": [
  265. "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mLoading IMDB...\n",
  266. "\u001b[39m"
  267. ]
  268. },
  269. {
  270. "name": "stdout",
  271. "output_type": "stream",
  272. "text": [
  273. " 10.383756 seconds (15.84 M allocations: 830.528 MiB, 4.02% gc time)\n",
  274. "25000-element Array{Array{Int32,1},1}\n",
  275. "25000-element Array{Int8,1}\n",
  276. "25000-element Array{Array{Int32,1},1}\n",
  277. "25000-element Array{Int8,1}\n"
  278. ]
  279. }
  280. ],
  281. "source": [
  282. "# load data\n",
  283. "include(Knet.dir(\"data\",\"imdb.jl\"))\n",
  284. "@time (xtrn,ytrn,xtst,ytst,imdbdict)=imdb(maxlen=MAXLEN,maxval=MAXFEATURES)\n",
  285. "for d in (xtrn,ytrn,xtst,ytst); println(summary(d)); end"
  286. ]
  287. },
  288. {
  289. "cell_type": "code",
  290. "execution_count": 17,
  291. "metadata": {},
  292. "outputs": [
  293. {
  294. "data": {
  295. "text/plain": [
  296. "150-element Array{String,1}:\n",
  297. " \"sharp\" \n",
  298. " \"engrossing\" \n",
  299. " \"and\" \n",
  300. " \"perceptive\" \n",
  301. " \"examination\"\n",
  302. " \"of\" \n",
  303. " \"suburban\" \n",
  304. " \"angst\" \n",
  305. " \"and\" \n",
  306. " \"the\" \n",
  307. " \"limitations\"\n",
  308. " \"of\" \n",
  309. " \"the\" \n",
  310. " ⋮ \n",
  311. " \"both\" \n",
  312. " \"on\" \n",
  313. " \"the\" \n",
  314. " \"money\" \n",
  315. " \"solid\" \n",
  316. " \"and\" \n",
  317. " \"effective\" \n",
  318. " \"recommended\"\n",
  319. " \"viewing\" \n",
  320. " \"for\" \n",
  321. " \"sarno\" \n",
  322. " \"fans\" "
  323. ]
  324. },
  325. "execution_count": 17,
  326. "metadata": {},
  327. "output_type": "execute_result"
  328. }
  329. ],
  330. "source": [
  331. "imdbarray = Array{String}(88584)\n",
  332. "for (k,v) in imdbdict; imdbarray[v]=k; end\n",
  333. "imdbarray[xtrn[1]]"
  334. ]
  335. },
  336. {
  337. "cell_type": "code",
  338. "execution_count": 18,
  339. "metadata": {},
  340. "outputs": [],
  341. "source": [
  342. "# prepare for training\n",
  343. "weights = nothing; knetgc(); # Reclaim memory from previous run\n",
  344. "rnnSpec,weights = initmodel()\n",
  345. "optim = optimizers(weights, Adam; lr=LR, beta1=BETA_1, beta2=BETA_2, eps=EPS);"
  346. ]
  347. },
  348. {
  349. "cell_type": "code",
  350. "execution_count": 19,
  351. "metadata": {},
  352. "outputs": [
  353. {
  354. "name": "stdout",
  355. "output_type": "stream",
  356. "text": [
  357. " 16.016910 seconds (2.05 M allocations: 137.400 MiB, 3.30% gc time)\n"
  358. ]
  359. }
  360. ],
  361. "source": [
  362. "# cold start\n",
  363. "@time for (x,y) in minibatch(xtrn,ytrn,BATCHSIZE;shuffle=true)\n",
  364. " grads = lossgradient(weights,x,y,rnnSpec)\n",
  365. " update!(weights, grads, optim)\n",
  366. "end"
  367. ]
  368. },
  369. {
  370. "cell_type": "code",
  371. "execution_count": 20,
  372. "metadata": {},
  373. "outputs": [],
  374. "source": [
  375. "# prepare for training\n",
  376. "weights = nothing; knetgc(); # Reclaim memory from previous run\n",
  377. "rnnSpec,weights = initmodel()\n",
  378. "optim = optimizers(weights, Adam; lr=LR, beta1=BETA_1, beta2=BETA_2, eps=EPS);"
  379. ]
  380. },
  381. {
  382. "cell_type": "code",
  383. "execution_count": 21,
  384. "metadata": {},
  385. "outputs": [
  386. {
  387. "name": "stderr",
  388. "output_type": "stream",
  389. "text": [
  390. "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mTraining...\n",
  391. "\u001b[39m"
  392. ]
  393. },
  394. {
  395. "name": "stdout",
  396. "output_type": "stream",
  397. "text": [
  398. " 10.263796 seconds (358.68 k allocations: 45.038 MiB, 4.63% gc time)\n",
  399. " 9.550875 seconds (354.17 k allocations: 44.687 MiB, 6.23% gc time)\n",
  400. " 9.575668 seconds (354.89 k allocations: 44.699 MiB, 6.32% gc time)\n",
  401. " 29.397045 seconds (1.07 M allocations: 134.575 MiB, 5.70% gc time)\n"
  402. ]
  403. }
  404. ],
  405. "source": [
  406. "# 29s\n",
  407. "info(\"Training...\")\n",
  408. "@time for epoch in 1:EPOCHS\n",
  409. " @time for (x,y) in minibatch(xtrn,ytrn,BATCHSIZE;shuffle=true)\n",
  410. " grads = lossgradient(weights,x,y,rnnSpec)\n",
  411. " update!(weights, grads, optim)\n",
  412. " end\n",
  413. "end"
  414. ]
  415. },
  416. {
  417. "cell_type": "code",
  418. "execution_count": 22,
  419. "metadata": {},
  420. "outputs": [
  421. {
  422. "name": "stderr",
  423. "output_type": "stream",
  424. "text": [
  425. "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mTesting...\n",
  426. "\u001b[39m"
  427. ]
  428. },
  429. {
  430. "name": "stdout",
  431. "output_type": "stream",
  432. "text": [
  433. " 3.780345 seconds (737.73 k allocations: 70.577 MiB, 3.82% gc time)\n"
  434. ]
  435. },
  436. {
  437. "data": {
  438. "text/plain": [
  439. "0.8530448717948718"
  440. ]
  441. },
  442. "execution_count": 22,
  443. "metadata": {},
  444. "output_type": "execute_result"
  445. }
  446. ],
  447. "source": [
  448. "info(\"Testing...\")\n",
  449. "@time accuracy(weights, minibatch(xtst,ytst,BATCHSIZE), (w,x)->predict(w,x,rnnSpec))"
  450. ]
  451. },
  452. {
  453. "cell_type": "code",
  454. "execution_count": null,
  455. "metadata": {},
  456. "outputs": [],
  457. "source": []
  458. }
  459. ],
  460. "metadata": {
  461. "kernelspec": {
  462. "display_name": "Julia 0.6.2",
  463. "language": "julia",
  464. "name": "julia-0.6"
  465. },
  466. "language_info": {
  467. "file_extension": ".jl",
  468. "mimetype": "application/julia",
  469. "name": "julia",
  470. "version": "0.6.2"
  471. }
  472. },
  473. "nbformat": 4,
  474. "nbformat_minor": 2
  475. }
Add Comment
Please, Sign In to add comment