Advertisement
Guest User

Untitled

a guest
Jun 26th, 2019
72
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 10.80 KB | None | 0 0
  1. import torch
  2. import torch.nn as nn
  3.  
  4. from utils import idx2onehot
  5.  
  6.  
  7. class AE(nn.Module):
  8.  
  9. def __init__(self,
  10. encoder_layer_sizes,
  11. latent_size,
  12. decoder_layer_sizes,
  13. conditional=False,
  14. num_labels=0):
  15.  
  16. super().__init__()
  17.  
  18. if conditional:
  19. assert num_labels > 0
  20.  
  21. assert type(encoder_layer_sizes) == list
  22. assert type(latent_size) == int
  23. assert type(decoder_layer_sizes) == list
  24.  
  25. self.latent_size = latent_size
  26.  
  27. self.encoder = Encoder(
  28. encoder_layer_sizes,
  29. latent_size,
  30. conditional,
  31. num_labels)
  32. self.decoder = Decoder(
  33. decoder_layer_sizes,
  34. latent_size,
  35. conditional,
  36. num_labels)
  37.  
  38. def forward(self,
  39. x,
  40. c=None):
  41.  
  42. if x.dim() > 2:
  43. x = x.view(-1, 28*28)
  44.  
  45. z = self.encoder(x, c)
  46.  
  47. recon_x = self.decoder(z, c)
  48.  
  49. return recon_x, z
  50.  
  51. def inference(self, device, n=1, c=None):
  52.  
  53. batch_size = n
  54. z = torch.randn([batch_size,
  55. self.latent_size]).to(device)
  56.  
  57. recon_x = self.decoder(z, c)
  58.  
  59. return recon_x
  60.  
  61.  
  62. class Encoder(nn.Module):
  63.  
  64. def __init__(self,
  65. layer_sizes,
  66. latent_size,
  67. conditional,
  68. num_labels):
  69.  
  70. super().__init__()
  71.  
  72. self.conditional = conditional
  73. if self.conditional:
  74. layer_sizes[0] += num_labels
  75.  
  76. self.MLP = nn.Sequential()
  77.  
  78. for i, (in_size, out_size) in enumerate(zip(layer_sizes[:-1],
  79. layer_sizes[1:])):
  80. print(i, ": ", in_size, out_size)
  81. self.MLP.add_module(name="L{:d}".format(i),
  82. module=nn.Linear(in_size, out_size))
  83. if i != len(layer_sizes):
  84. print("ReLU added @ Encoder")
  85. self.MLP.add_module(name="A{:d}".format(i),
  86. module=nn.ReLU())
  87. # self.MLP.add_module(name="BN{:d}".format(i),
  88. # module=nn.BatchNorm1d(out_size))
  89.  
  90. self.linear = nn.Linear(layer_sizes[-1], latent_size)
  91.  
  92. def forward(self, x, c=None):
  93.  
  94. if self.conditional:
  95. c = idx2onehot(c, n=10)
  96. x = torch.cat((x, c), dim=-1)
  97.  
  98. x = self.MLP(x)
  99.  
  100. z = self.linear(x)
  101.  
  102. return z
  103.  
  104.  
  105. class Decoder(nn.Module):
  106.  
  107. def __init__(self,
  108. layer_sizes,
  109. latent_size,
  110. conditional,
  111. num_labels):
  112.  
  113. super().__init__()
  114.  
  115. self.MLP = nn.Sequential()
  116.  
  117. self.conditional = conditional
  118. if self.conditional:
  119. input_size = latent_size + num_labels
  120. else:
  121. input_size = latent_size
  122.  
  123. for i, (in_size, out_size) in enumerate(
  124. zip([input_size]+layer_sizes[:-1], layer_sizes)):
  125. print(i, ": ", in_size, out_size)
  126. self.MLP.add_module(
  127. name="L{:d}".format(i), module=nn.Linear(in_size, out_size))
  128. if i+1 < len(layer_sizes):
  129. if i != 0:
  130. print("ReLU added @ Decoder")
  131. self.MLP.add_module(name="A{:d}".format(i), module=nn.ReLU())
  132. # self.MLP.add_module(name="BN{:d}".format(i),
  133. # module=nn.BatchNorm1d(out_size))
  134.  
  135. else:
  136. print("Sig step")
  137. self.MLP.add_module(name="sigmoid", module=nn.Sigmoid())
  138.  
  139. def forward(self, z, c):
  140.  
  141. if self.conditional:
  142. c = idx2onehot(c, n=10)
  143. z = torch.cat((z, c), dim=-1)
  144.  
  145. x = self.MLP(z)
  146.  
  147. return x
  148.  
  149. ## Goal of the Project
  150. The project goal is about the way to determine the `optimal number of latent
  151. dimension`.
  152.  
  153. First, the project introduces the linearity and non-linearity and postulates
  154. the assumption that linearity corresponds to `one` dimension. Then, this
  155. linearity could be split into `two` non-overlapping dimension by one ReLU based non-linearity.
  156.  
  157. Therefore, this project shows that the determination of optimal number of latent dimension
  158. preliminarily `not depend on the data distribution itself`, but depends on `the network structure`,
  159. more specifically, depends on the `total number of dimension that the model
  160. about to express`. The paper will call this total number of dimension that the
  161. model about to express as **model dimension**.
  162.  
  163. After the model dimension being set, one can train the network and check whether
  164. it's possible to over-fit the network with the data given. If the data points
  165. over-fit in some point of train epochs, this network can be thought as "enough to
  166. express the data distribution". However, if not over-fit, one can consider to
  167. enlarge the **model dimension** and re-try the over-fit process.
  168.  
  169. ## To-do
  170. Define the over-fit.
  171. The classification threshold of over-fit depends on the experiment.
  172. - In which epoch of training process one should determine over-fit?
  173.  
  174. ## Caution
  175. It's better to use whole data when to determine the "model dimension" since
  176. it's about how much non-linearity is required for the collected or targeted
  177. data domain.
  178.  
  179. ## Convergence Determination Metric
  180. When the EpochAVGLoss doe not change more than 1 % for 5 epochs from the first epoch, we determine the training loss being converged
  181.  
  182. ## Experiment Workflow
  183.  
  184. ##### Exp_1 : 1 ReLU applied to 256 dimension. (Then Linear Transformation to LatentDim)
  185.  
  186. By the assumption, the **model dimension** is 512(256*2). Thus, we verify the assumption by
  187.  
  188. 1) check the sequential decrease of Loss at certain train epoch while sequentially increase the LatentDim
  189.  
  190. with `1 * (MLP + ReLU) + LatentDim 1`
  191.  
  192. Epoch 09/10 Batch 0937/937, Loss 165.5437
  193.  
  194. with `1 * (MLP + ReLU) + LatentDim 2`
  195.  
  196. Epoch 09/10 Batch 0937/937, Loss 150.2990
  197.  
  198. with `1 * (MLP + ReLU) + LatentDim 3`
  199.  
  200. Epoch 09/10 Batch 0937/937, Loss 133.2206
  201.  
  202. with `1 * (MLP + ReLU) + LatentDim 4`
  203.  
  204. Epoch 09/10 Batch 0937/937, Loss 138.1151
  205.  
  206. with `1 * (MLP + ReLU) + LatentDim 8`
  207.  
  208. Epoch 09/10 Batch 0937/937, Loss 110.9839
  209.  
  210. with `1 * (MLP + ReLU) + LatentDim 16`
  211.  
  212. Epoch 09/10 Batch 0937/937, Loss 89.6707
  213.  
  214. with `1 * (MLP + ReLU) + LatentDim 32`
  215.  
  216. Epoch 09/10 Batch 0937/937, Loss 72.5663
  217.  
  218. with `1 * (MLP + ReLU) + LatentDim 64`
  219.  
  220. Epoch 09/10 Batch 0937/937, Loss 54.2545
  221.  
  222. > ... since the model converges at LatentDim 64 with Loss 52, we shrink down the ReLU_InputDim to 32 (go to Exp3)
  223.  
  224. with `1 * (MLP + ReLU) + LatentDim 128`
  225.  
  226. Epoch 09/10 Batch 0937/937, Loss 54.3565
  227.  
  228. with `1 * (MLP + ReLU) + LatentDim 256`
  229.  
  230. Epoch 09/10 Batch 0937/937, Loss 52.3050
  231.  
  232. > ... must keep decreasing. write the code to automatically does this job
  233.  
  234. with `1 * (MLP + ReLU) + LatentDim 512`
  235.  
  236. Epoch 09/10 Batch 0937/937, Loss 53.2412
  237.  
  238. > ... Check whether at any LatentDim > 512, no decrease of Loss at fixed train epoch.
  239.  
  240.  
  241. with `1 * (MLP + ReLU) + LatentDim 1024`
  242.  
  243. Epoch 09/10 Batch 0937/937, Loss 54.3255
  244.  
  245. > As you see, with the expansion of LatentDim `doubled`, still the LossAtFixedStep is not decreased,
  246. which means model dimension already being saturated.
  247. #### Exp_2: Now Introduce the Twice more model dimension by ReLU
  248.  
  249. with `2 * (MLP + ReLU) + LatentDim 1024`
  250.  
  251. > Epoch 09/10 Batch 0937/937, Loss 57.9039
  252.  
  253.  
  254. (without Bias.. the sequential ReLU doesn't work)
  255.  
  256.  
  257. ### Exp_3 : Shrink down ReLU InputDim to 32 maintaining latentDim 64
  258.  
  259. ### Summary of Algorithm
  260.  
  261. If convgeLoss != 0:
  262. if modelDim > latentDim:
  263. enlarge latentDim
  264. if modelDim =< latentDim:
  265. increase #ReLU
  266.  
  267. * modelDim = 2* num_ReLUs
  268.  
  269. To verify this,
  270.  
  271. @ exp latentDim 64, convergeLoss 80, layerSize [784, 32],
  272. if one increase the latentDim, convergeLoss should not be below 80
  273.  
  274. Let's Check!
  275. @ exp latentDim 128, convergeLoss 80, layerSize [784, 32], convergeLoss 80
  276.  
  277. now, let's add stack the double ReLU layers, [784, 32, 32], which is assumably represents 128 dimension
  278. @ exp latentDim 128, convergeLoss 80, layerSize [784, 32, 32], convergeLoss 80 (still same)
  279.  
  280. As you see, without enlarge of foremost dimension, the deeper ReLU does not work. This is reference from Raghu(2017)
  281.  
  282. Now make it wide, such as [784, 64],
  283. @ exp_1555829642 latentDim 128, convergeLoss 80, layerSize [784, 64], the convergeLoss 65 < 80
  284.  
  285. moreover, make it more wide, such as [784, 128],
  286. @ exp_1555829642 latentDim 128, convergeLoss 55, layerSize [784, 128], the convergeLoss 55 < 80
  287.  
  288. moreover, make it more wide, such as [784, 256],
  289. @ exp_1555832143 latentDim 128, convergeLoss 55, layerSize [784, 256], the convergeLoss 55 = 55
  290.  
  291. The problem is, latentDim. Make sure the latentDim is sufficient
  292. @ exp_1555832638 latentDim 256, convergeLoss 55, layerSize [784, 256], the convergeLoss 55 = 55
  293.  
  294. ===> Question! How to determine latentDim with less effort not getting through this cumbersome experimental step?
  295.  
  296. The problem is, latentDim. Make sure the latentDim is sufficient
  297. @ exp_1555832638 latentDim 128, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 65 > 55
  298.  
  299. The problem is, latentDim. Make sure the latentDim is sufficient
  300. @ exp_1555832638 latentDim 256, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 68 > 55
  301.  
  302. The problem is, latentDim. Make sure the latentDim is sufficient
  303. @ exp_1555832638 latentDim 64, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 68 > 55
  304.  
  305. The problem is, latentDim. Make sure the latentDim is sufficient
  306. @ exp_1555832638 latentDim 128, convergeLoss 60, layerSize [784, 256, 128], the convergeLoss 60 > 55
  307.  
  308. The problem is, latentDim. Make sure the latentDim is sufficient
  309. @ exp_1555834546 latentDim 64, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 55 = 55
  310.  
  311. =====> decrease the latentDim makes the model to learn better (Q1)
  312.  
  313. The problem is, latentDim. Make sure the latentDim is sufficient
  314. @ exp_1555834546 latentDim 32, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 60 > 55
  315.  
  316.  
  317. If one check the currently get 55,
  318.  
  319. listed as:
  320. [784, 128], ld 128
  321. [784, 128], ld 256
  322. [784, 256, 256], ld 64
  323.  
  324. @ 1555843696, ld64 [784, 128, 128] convergeLoss 60>55
  325. @ 1555844254, ld128 [784, 128, 128] convergeLoss 64>55
  326. @ 1555844254, ld32 [784, 128, 128] convergeLoss 66>55
  327.  
  328.  
  329. Dont know why, but if the network is deeper, too many latent space decrease the learning efficiency (Q1)
  330.  
  331.  
  332.  
  333.  
  334. The problem is, latentDim. Make sure the latentDim is sufficient
  335. @ exp_1555832638 latentDim 32, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 55 = 55
  336.  
  337.  
  338. Maybe, if the modelDim is too big and latentDim is too small, as seen in exp [784, 32, 32],
  339. training might be not working. Thus, we have leverage up the latentDim at the same setting from 128 to 256
  340. @ exp_1555830495 convergeLoss 80 (still same)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement