Advertisement
Guest User

Untitled

a guest
Sep 23rd, 2019
107
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Latex 3.69 KB | None | 0 0
  1. \subsection{Discussion and Empirical Difference between RaE and \algopt}\label{subsec:discussion}
  2. % \iseb{let's focus on the other sections first...}
  3. % We have some hand-waving arguments:
  4. % \begin{itemize}
  5. %  \item if the mask converges (we observe almost convergence), then our pruned iterate ends up at a local minima; but for RaE only the non-pruned iterate converges to a local minima; it is highly unlikely that the pruned would be good as well \seb{we could also keep the mask fix at the end, to get a better performance, do we do this?}\dan{yes, we tried finetuning the fixed mask, there is a quite significant improvement (Table~\ref{tab:sota_dnns_cifar10_unstructured_pruning_baseline_performance_extra_training}), but incorporating it in the method description can be overcomplicated for the general story}\seb{somehow this is covered by the retraining phase, so also the other guys get a local minima; I agree we should mention this, but probably is not the only key point...}
  6. %  \item avoids sharp minima, no good theory for that
  7. %  \item $\rightarrow$ or maybe we just focus on the advantages (faster, better), and don't try to cook up too much illustrative math...
  8.  
  9. % \end{itemize}
  10.  
  11. Here we aim to discuss the qualitative difference that leads to the superior performance of \algopt over RaE in practice. First, from the Figure~\ref{fig:2D_example} one can see that \algopt tend to oscilate between several local minimas, whereas RaE even with finetuning leads to one solution, which is not necessarily close to the optimal. We believe that the behavior of \algopt helps to explore more masks and, thus, to find a better local minima (which can be even further improved, as showed in the Table~\ref{tab:sota_dnns_cifar10_unstructured_pruning_baseline_performance_extra_training}). However, since the space of all masks is exponentially large, the model will examine only a small ratio of potential masks during the training. We analyzed the empirical result of how drastically the masks change between reparametrization and how likely it is for some masked weight to become active in the later stages of training. On Figure~\ref{fig:wideresnet28_2_cifar10_unstructured_masks_last_change} for each weight we calculated the \textbf{last} epoch when it was changed and for each epoch we plot how many weights changed starting from this epoch. As an example, we see that for sparsity ratio 95\%, after epoch 157 (i.e. for 43 epochs left), only 5\% of the mask elements were changing. This suggests that, up to some extent, the masks have converged early in the training, and the model is only searching for the optimal solutions between a small number of masks.
  12.  
  13. \begin{figure*}[!h]
  14.    \centering
  15.    \subfigure[Mask convergence over the whole training]{
  16.    \includegraphics[width=0.46\textwidth,]{figures/last_change_from0.pdf}
  17.        \label{fig:wideresnet28_2_cifar10_unstructured_masks_last_change_from0}
  18.    }
  19.    \hfill
  20.    \subfigure[Mask convergence over last 80 epochs]{
  21.        \includegraphics[width=0.46\textwidth,]{figures/last_change_from120.pdf}
  22.        \label{fig:wideresnet28_2_cifar10_unstructured_masks_last_change_from120}
  23.    }
  24.    \vspace{-1em}
  25.    \caption{\small{
  26.         These plots show the dynamics of mask convergence for training the model with different sparsity levels.
  27.         The $y$ axis of the plot represent how many mask elements are still changing \textbf{after} the certain iteration (determined by $x$-axis).
  28.         The plot of the left covers the whole training, while the identical right plot covers the last 80 epochs
  29.         (after gradual pruning phase and the sparsity ratio meets to the expected final constant sparsity).
  30.    }}
  31.    \label{fig:wideresnet28_2_cifar10_unstructured_masks_last_change}
  32. \end{figure*}
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement