diff --git a/report.tex b/report.tex index d23ecd1..ed99d29 100644 --- a/report.tex +++ b/report.tex @@ -1,11 +1,10 @@ -\documentclass{article} -\usepackage[margin=1in]{geometry} +\documentclass[conference,twocolumn]{IEEEtran} \usepackage[skip=0.2\baselineskip]{caption} \usepackage{longtable} \usepackage{booktabs} \usepackage{graphicx} -\title{High Performance Computer Architecture Final Project} -\author{Danila Fedorin} +\title{ECE 570 Final Project Report} +\author{Danila Fedorin\\fedorind@oregonstate.edu} \begin{document} \maketitle @@ -19,24 +18,23 @@ Results are grouped by benchmark to make it easier to compare various branch prediction algorithms. \begin{figure}[h] -\begin{longtable}[]{@{}llllll@{}} +\begin{tabular}[]{@{}llllll@{}} \toprule Benchkmark & Taken & Not Taken & Bimod & 2 level & -Combined\tabularnewline +Comb \\ \midrule -\endhead -Anagram & .3126 & .3126 & .9613 & .8717 & .9742\tabularnewline -GCC & .4049 & .4049 & .8661 & .7668 & .8793\tabularnewline -Go & .3782 & .3782 & .7822 & .6768 & .7906\tabularnewline +Anagram & .3126 & .3126 & .9613 & .8717 & .9742 \\ +Go & .3782 & .3782 & .7822 & .6768 & .7906 \\ +GCC & .4049 & .4049 & .8661 & .7668 & .8793 \\ \bottomrule -\end{longtable} +\end{tabular} \caption{Address prediction rates of various predictors} \label{fig:ap1} \end{figure} \begin{figure}[h] \begin{center} - \includegraphics[width=0.65\linewidth]{ap1.png} + \includegraphics[width=\linewidth]{ap1.png} \end{center} \caption{Address prediction rates by benchmark} \label{fig:ap1graph} @@ -53,6 +51,59 @@ a combination of the other two stateful predictors, performs better than its constituents, since it's able to switch to a better-performing predictor as needed. +I was confused why the \emph{Taken} and \emph{Not Taken} +predictors had identical address prediction rates. I would +have expected the \emph{Taken} predictor to correctly predict +more addresses, since structures like loops will typically +have more ``taken'' branches than ``not taken`` ones. At first, +I thought that this is explained by both stateless predictors +having no BTB - functions like \texttt{bpred\_update} +do not initialize these tables, and they are not used for +prediction. However, this shouldn't entirely account for the identical +numbers of address hits - after all, the \emph{Taken} predictor +should always return the expected target address, while the +\emph{Not Taken} predictor should, in the case of conditional +jumps, return \texttt{PC+1}. This seems consistent with the code. + +However, I think I see what is happening. I looked at the following +fragment from \texttt{sim-outorder.c} (which was \textbf{not} added by me): + +\begin{verbatim} +bpred_lookup(pred, + /* branch address */fetch_regs_PC, + /* target address */ + /* FIXME: not computed */0, + ... +\end{verbatim} + +It seems as though the target address is always predicted to be zero, +because it is not computed at the time of this function call. The +text ``FIXME'' indicates that this may be a bug or temporary issue. +This prediction, in turn, seems to mean that the \emph{Taken} branch predictor +will return \texttt{0} in all cases. I confirmed that this is the case by adding a call +to \texttt{printf} to the \texttt{BPredTaken} case of \texttt{bpred\_lookup}. + +To me, this seems like an issue, because code for other predictors uses +\texttt{0} to represent ``not taken''. Consider, for instance, the following +snippet from later on in the same function: + +\begin{verbatim} +return + ((*(dir_update_ptr->pdir1) >= pred_cutoff) + ? /* taken */ pbtb->target + : /* not taken */ 0); +\end{verbatim} + +Zero here is clearly used to denote ``not taken''. So, it seems as though +all in all, \emph{Taken} always returns ``not taken''. Amusingly, +the same will be the case with \emph{Not Taken}. It returns \texttt{PC+1} +in the case of conditional jumps (which is equivalent to returning zero, +since the code in \texttt{sim-outorder.c} converts zero to \texttt{PC+1}), +or, in the case of unconditional jumps, it returns the expected target +address (zero), which is \textit{also} \texttt{PC+1}! The fact that +the two predictors have the same address prediction rate seems +to be due to the ``FIXME'' in the simulator code. + \section*{Part 2: IPC Benchmarks} In this section, we present the IPC results from the previously listed predictors. Figure \ref{fig:ipc} contains the collected @@ -60,24 +111,23 @@ data, and Figure \ref{fig:ipcgraph} is a bar chart of that data. \begin{figure}[h] -\begin{longtable}[]{@{}llllll@{}} +\begin{tabular}[]{@{}llllll@{}} \toprule Benchkmark & Taken & Not Taken & Bimod & 2 level & -Combined\tabularnewline +Comb \\ \midrule -\endhead -Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487\tabularnewline -GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598\tabularnewline -Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393\tabularnewline +Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487 \\ +Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393 \\ +GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598 \\ \bottomrule -\end{longtable} +\end{tabular} \caption{IPC by benchmark} \label{fig:ipc} \end{figure} \begin{figure}[h] \begin{center} - \includegraphics[width=0.65\linewidth]{ipc.png} + \includegraphics[width=\linewidth]{ipc.png} \end{center} \caption{IPC by benchmark} \label{fig:ipcgraph} @@ -90,47 +140,57 @@ because most of the given programs have loops, in which the conditional branch is taken many times while the loop is iterating, and then once when the loop terminates. Predicting ``not taken'' in this case would lead to many mispredictions. +However, as described above, it seems like \emph{Taken} +and \emph{Not Taken} return the same addresses, so I'm not +completely sure where the IPC difference is coming from. Once again, the \emph{Bimodal} predictor performs better than -the \emph{2-Level} predictor, and both are outperform by +the \emph{2-Level} predictor, and both are outperformed by \emph{Combined}, which leverages the two at the same time. \section*{Part 3 - Bimodal Exploration} In this section, the \emph{Bimodal} branch predictor is further analyzed by varying the size of the BTB. BTB sizes range from 256 to 4096. The data collected from this analysis is shown -in figure \ref{fig:ap2}. As usual, the data is shown as -a bar graph in figure \ref{fig:ap2graph}. +in Figure \ref{fig:ap2}. As usual, the data is shown as +a bar graph in Figure \ref{fig:ap2graph}. \begin{figure}[h] -\begin{longtable}[]{@{}llllll@{}} +\begin{tabular}[]{@{}llllll@{}} \toprule -Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline +Benchkmark & 256 & 512 & 1024 & 2048 & 4096 \\ \midrule -\endhead -Anagram & .9606 & .9609 & .9612 & .9613 & .9613\tabularnewline -GCC & .8158 & .8371 & .8554 & .8661 & .8726\tabularnewline -Go & .7430 & .7610 & .7731 & .7822 & .7885\tabularnewline +Anagram & .9606 & .9609 & .9612 & .9613 & .9613 \\ +Go & .7430 & .7610 & .7731 & .7822 & .7885 \\ +GCC & .8158 & .8371 & .8554 & .8661 & .8726 \\ \bottomrule -\end{longtable} +\end{tabular} \caption{Bimodal address prediction rates by benchmark} \label{fig:ap2} \end{figure} -\pagebreak - \begin{figure}[h] \begin{center} - \includegraphics[width=0.65\linewidth]{ap2.png} + \includegraphics[width=\linewidth]{ap2.png} \end{center} - \caption{IPC by benchmark} + \caption{Bimodal address prediction by benchmark} \label{fig:ap2graph} \end{figure} As expected, increasing the BTB size for the Bimodal -predictor seems to improve its performance. The exception -appears to be anagram, where the changes to performance -are small enough to be unnoticable in the visualization. +predictor seems to improve its performance. Since instructions +are assigned slots in the BTB according to their hashes (which can collide), +having a larger BTB means that there is a smaller chance of collisions, +and, therefore, that branch targets are more accurately predicted. + +The exception appears to be the Anagram benchmark, where the changes to performance +are small enough to be unnoticable in the visualization. This +could be because the Anagram benchmark has only a few important +branches, which means that increasing the BTB size does not +prevent any further collisions. The benchmark also takes less real +time to run on my machine, which is an indicator that it is +less complex than the Go and GCC benchmarks (which further supports +the above theory). \section*{Part 4 - Combined Branch Predictor Explanation} It appears as though the combined branch predictor works @@ -140,13 +200,16 @@ to, the combined predictor uses a third predictor, named \texttt{meta} in the code. The \texttt{meta} predictor appears to be another bimodal predictor, but instead of deciding whether a branch is taken or not taken, it decides whether to use the two-level or the bimodal predictor -to determine the branch outcome. If \texttt{meta} chooses a predictor -that ends up being wrong, while the other predictor ends up right, -\texttt{meta}'s 2-bit counter is updated to favor the correct predictor. +to determine the branch outcome. If the two predictors managed by +\texttt{meta} disagree about the direction, then \texttt{meta}'s +2-bit counter is updated to favor the correct predictor. Otherwise, +if the two predictors both predict ``taken'' or ``not taken'', +\texttt{meta} is unaffected, since neither predictor did better +than the other. Because \texttt{meta} is implemented as a 2-bit predictor, it can tolerate at most one use of the wrong branch predictor before -switching to the other (if the current predictor is "strongly" +switching to the other (if the current predictor is ``strongly'' predicted). \section*{Part 5 - 3-Bit Branch Predictor} @@ -160,23 +223,22 @@ and Figure \ref{fig:ap3graph} contains the visualization of that data. \begin{figure}[h] -\begin{longtable}[]{@{}llllll@{}} +\begin{tabular}[]{@{}llllll@{}} \toprule -Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline +Benchkmark & 256 & 512 & 1024 & 2048 & 4096 \\ \midrule -\endhead -Anagram & .9610 & .9612 & .9615 & .9616 & .9616\tabularnewline -GCC & .8192 & .8385 & .8554 & .8656 & .8728\tabularnewline -Go & .7507 & .7680 & .7799 & .7897 & .7966\tabularnewline +Anagram & .9610 & .9612 & .9615 & .9616 & .9616 \\ +Go & .7507 & .7680 & .7799 & .7897 & .7966 \\ +GCC & .8192 & .8385 & .8554 & .8656 & .8728 \\ \bottomrule -\end{longtable} +\end{tabular} \caption{3-Bit address prediction rates} \label{fig:ap3} \end{figure} \begin{figure}[h] \begin{center} - \includegraphics[width=0.65\linewidth]{ap3.png} + \includegraphics[width=\linewidth]{ap3.png} \end{center} \caption{3-Bit address prediction rates} \label{fig:ap3graph} @@ -212,7 +274,7 @@ but perhaps it also follows the same pattern. \begin{figure}[h] \begin{center} - \includegraphics[width=0.65\linewidth]{2v3.png} + \includegraphics[width=\linewidth]{2v3.png} \end{center} \caption{Percent improvement of 3-bit predictor over the bimodal predictor.} \label{fig:2v3}