Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- documentclass[a4paper,11pt]{article}
- title{HW-1 REPORT}
- author{Eral, Mustafa}
- date{March 2015}
- usepackage{amsthm, mathtools}
- usepackage{graphicx}
- usepackage{caption}
- usepackage{placeins}
- newtheorem{theorem}{Theorem}
- newtheorem{lemma}[theorem]{Lemma}
- newtheorem{proposition}[theorem]{Proposition}
- newtheorem{corollary}[theorem]{Corollary}
- setlengthparindent{24pt}
- begin{document}
- maketitle
- section{Theoratical Estimates}
- subsection{All To One Broadcats Algorithm On a Ring}
- Each node first sends to one of its neighbors
- the data it needs to broadcast. In subsequent steps, it forwards the data received from one of
- its neighbors to its other neighbor. Each node receives all $(p - 1)$ pieces of information from all other nodes in (p - 1) steps. Pseudo code is as follows:
- textit{•\
- \ procedure ALL-TO-ALL-BC-RING(myid, my-message, p, result)
- \ indent begin
- \ indent left := (my-id - 1) mod p;
- \ indent right := (my-id + 1) mod p;
- \ indent result := my-msg;
- \ indent msg := result;
- \ indent indent for i := 1 to p - 1 do
- \ indent indent send msg to right;
- \ indent indent receive msg from left;
- \ indent indent result := result $cup$ msg
- \ indent end for;
- \end ALL-TO-ALL-BC-RING }
- subsubsection{Wall Time Estimation}
- If we assume each for loop in the algorithm given above in pseudo form takes a time of $t_w $(time for receiving message from its left and sending previously received message to its right) for $p$ processor on the
- ring, this should take $(p-1)t_w$ amount of time.
- subsection{All To One Broadcats Algorithm On a Hybercube}
- It takes $log(p)$ steps. In each iteration, nodes communicate
- in pairs so that the labels of the nodes communicating with each other in the $i$ th iteration differ
- in the $i$ th least significant bit of their binary representations. At each step, message size to be transmitted is doubled. Pseudo code is as follows:
- textit{•\
- \ procedure ALL-TO-ALL-BC-HCUBE(myid, my-msg, d, result)
- \ indent begin
- \ indent result := my-msg;
- \ indent for i := 0 to d - 1 do
- \ indent indent partner := my-id XOR 2 i ;
- \ indent indent send result to partner;
- \ indent indent receive msg from partner;
- \ indent indent result := result $cup$ msg
- \ indent end for;
- \end ALL-TO-ALL-BC-HCUBE }
- subsubsection{Wall Time Estimation}
- Algorithm needs $log(p)$ steps to complete. Assuming message size of each individual message is $m$ byte (it is sizeof(int) in the code), time to send message at $i$th step is $t_s+2^it_wm$ where $t_s$ is the start up time spent at each send, and $t_w$ is per byte transfer time. Total time for all all to all broadcats is then
- $sum_{i=1}^{log(p)} 2^{i-1}t_wm = t_s log(p) + t_wm(p-1)$ .Since message size $m$ is too small in this experiment, we can ignore the term $t_wm(p-1)$. Thus we roughly expect the execution time $t_s log (p)$, which is proportional to $log(p)$.
- section{Execution Results}
- Note that when obtaining wall times, each combination was run 100 times and average of these wall times was taken.
- Execution times are given in tables and resulting plots are also give in figures. This results show that when the node size is doubled, execution time increases much more rapidly. For that reason, execution time when node
- size is two is not shown in the below figures to display others more clear.
- %subsection{Using MPI-AllGather}
- begin{table}[h]
- begin{center}
- begin{tabular}{|c|c|}
- hline
- $num of processors$ &$Execution Time(ms)$ \
- hline
- 1 node 2 proc &0. 0.002\
- 1 node 4 proc &0.0.004\
- 1 node 8 proc &0.008\
- 2 node 16 proc&0.012\
- hline
- end{tabular}
- end{center}
- caption{Execution times when built-in MPI-AllGather function is used}
- end{table}
- begin{figure}[h]
- centering
- includegraphics[scale= 0.5]{buildin.eps}
- caption{Resulting wall-time vs p plot when buildin function MPI-AllReduce is used}
- label{}
- end{figure}
- FloatBarrier
- subsection{Using Ring Topology}
- begin{table}[h]
- begin{center}
- begin{tabular}{|c|c|}
- hline
- $num of processors$ &$Execution Time(ms)$ \
- hline
- 1 node 2 proc &0.001\
- 1 node 4 proc &0.005\
- 1 node 8 proc &0.007\
- 2 node 16 proc&270,570\
- hline
- end{tabular}
- end{center}
- caption{Execution times when Ring architecture is used}
- end{table}
- begin{figure}[h]
- centering
- includegraphics[scale= 0.5]{ring.eps}
- caption{Resulting wall-time vs p plot when Ring architecture is used}
- label{}
- end{figure}
- FloatBarrier
- subsection{Using Hybercube Topology}
- begin{table}[h]
- begin{center}
- begin{tabular}{|c|c|}
- hline
- $num of processors$ &$Execution Time(ms)$ \
- hline
- 1 node 2 proc &0.001\
- 1 node 4 proc &0.003\
- 1 node 8 proc &0.005\
- 2 node 16 proc&155.763\
- hline
- end{tabular}
- end{center}
- caption{Execution times when Hybercube architecture is used}
- end{table}
- begin{figure}
- centering
- includegraphics[scale= 0.5]{cube.eps}
- caption{Resulting wall-time vs p plot when Cube architecture is used}
- label{}
- end{figure}
- FloatBarrier
- section{Conclusion}
- Although ring and cubic implemantations are successfull in the sense that they give the correct
- results, built-in MPI function is more successful especially when the processes are at the different nodes.
- This is because communication is more costly between different nodes since it requires external link between
- nodes, where in the same node, processes can use shared memory to transfer data which is much more faster.
- The reason that built-in functions is successful even is node size is doubled, probably it does not use
- MPI-Send and MPI-Receive functions in their implemantation, but they use lower level functions which make
- use of underlying hardware better.
- Note also that in ring implementation, wall time is more rapidly increases compared to Hypercube
- implementation as $p$ increases because it is linearly dependent to p(except at the time node size is
- increases), and hypercube is dependent to $logp$
- end{document}
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement