Add 'bug report' to report.

This commit is contained in:
Danila Fedorin 2020-12-06 17:40:01 -08:00
parent bb1564d522
commit 358121de55

View File

@ -1,11 +1,10 @@
\documentclass{article} \documentclass[conference,twocolumn]{IEEEtran}
\usepackage[margin=1in]{geometry}
\usepackage[skip=0.2\baselineskip]{caption} \usepackage[skip=0.2\baselineskip]{caption}
\usepackage{longtable} \usepackage{longtable}
\usepackage{booktabs} \usepackage{booktabs}
\usepackage{graphicx} \usepackage{graphicx}
\title{High Performance Computer Architecture Final Project} \title{ECE 570 Final Project Report}
\author{Danila Fedorin} \author{Danila Fedorin\\fedorind@oregonstate.edu}
\begin{document} \begin{document}
\maketitle \maketitle
@ -19,24 +18,23 @@ Results are grouped by benchmark to make it easier to compare
various branch prediction algorithms. various branch prediction algorithms.
\begin{figure}[h] \begin{figure}[h]
\begin{longtable}[]{@{}llllll@{}} \begin{tabular}[]{@{}llllll@{}}
\toprule \toprule
Benchkmark & Taken & Not Taken & Bimod & 2 level & Benchkmark & Taken & Not Taken & Bimod & 2 level &
Combined\tabularnewline Comb \\
\midrule \midrule
\endhead Anagram & .3126 & .3126 & .9613 & .8717 & .9742 \\
Anagram & .3126 & .3126 & .9613 & .8717 & .9742\tabularnewline Go & .3782 & .3782 & .7822 & .6768 & .7906 \\
GCC & .4049 & .4049 & .8661 & .7668 & .8793\tabularnewline GCC & .4049 & .4049 & .8661 & .7668 & .8793 \\
Go & .3782 & .3782 & .7822 & .6768 & .7906\tabularnewline
\bottomrule \bottomrule
\end{longtable} \end{tabular}
\caption{Address prediction rates of various predictors} \caption{Address prediction rates of various predictors}
\label{fig:ap1} \label{fig:ap1}
\end{figure} \end{figure}
\begin{figure}[h] \begin{figure}[h]
\begin{center} \begin{center}
\includegraphics[width=0.65\linewidth]{ap1.png} \includegraphics[width=\linewidth]{ap1.png}
\end{center} \end{center}
\caption{Address prediction rates by benchmark} \caption{Address prediction rates by benchmark}
\label{fig:ap1graph} \label{fig:ap1graph}
@ -53,6 +51,59 @@ a combination of the other two stateful predictors, performs
better than its constituents, since it's able to switch better than its constituents, since it's able to switch
to a better-performing predictor as needed. to a better-performing predictor as needed.
I was confused why the \emph{Taken} and \emph{Not Taken}
predictors had identical address prediction rates. I would
have expected the \emph{Taken} predictor to correctly predict
more addresses, since structures like loops will typically
have more ``taken'' branches than ``not taken`` ones. At first,
I thought that this is explained by both stateless predictors
having no BTB - functions like \texttt{bpred\_update}
do not initialize these tables, and they are not used for
prediction. However, this shouldn't entirely account for the identical
numbers of address hits - after all, the \emph{Taken} predictor
should always return the expected target address, while the
\emph{Not Taken} predictor should, in the case of conditional
jumps, return \texttt{PC+1}. This seems consistent with the code.
However, I think I see what is happening. I looked at the following
fragment from \texttt{sim-outorder.c} (which was \textbf{not} added by me):
\begin{verbatim}
bpred_lookup(pred,
/* branch address */fetch_regs_PC,
/* target address */
/* FIXME: not computed */0,
...
\end{verbatim}
It seems as though the target address is always predicted to be zero,
because it is not computed at the time of this function call. The
text ``FIXME'' indicates that this may be a bug or temporary issue.
This prediction, in turn, seems to mean that the \emph{Taken} branch predictor
will return \texttt{0} in all cases. I confirmed that this is the case by adding a call
to \texttt{printf} to the \texttt{BPredTaken} case of \texttt{bpred\_lookup}.
To me, this seems like an issue, because code for other predictors uses
\texttt{0} to represent ``not taken''. Consider, for instance, the following
snippet from later on in the same function:
\begin{verbatim}
return
((*(dir_update_ptr->pdir1) >= pred_cutoff)
? /* taken */ pbtb->target
: /* not taken */ 0);
\end{verbatim}
Zero here is clearly used to denote ``not taken''. So, it seems as though
all in all, \emph{Taken} always returns ``not taken''. Amusingly,
the same will be the case with \emph{Not Taken}. It returns \texttt{PC+1}
in the case of conditional jumps (which is equivalent to returning zero,
since the code in \texttt{sim-outorder.c} converts zero to \texttt{PC+1}),
or, in the case of unconditional jumps, it returns the expected target
address (zero), which is \textit{also} \texttt{PC+1}! The fact that
the two predictors have the same address prediction rate seems
to be due to the ``FIXME'' in the simulator code.
\section*{Part 2: IPC Benchmarks} \section*{Part 2: IPC Benchmarks}
In this section, we present the IPC results from the previously listed In this section, we present the IPC results from the previously listed
predictors. Figure \ref{fig:ipc} contains the collected predictors. Figure \ref{fig:ipc} contains the collected
@ -60,24 +111,23 @@ data, and Figure \ref{fig:ipcgraph} is a bar chart of
that data. that data.
\begin{figure}[h] \begin{figure}[h]
\begin{longtable}[]{@{}llllll@{}} \begin{tabular}[]{@{}llllll@{}}
\toprule \toprule
Benchkmark & Taken & Not Taken & Bimod & 2 level & Benchkmark & Taken & Not Taken & Bimod & 2 level &
Combined\tabularnewline Comb \\
\midrule \midrule
\endhead Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487 \\
Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487\tabularnewline Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393 \\
GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598\tabularnewline GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598 \\
Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393\tabularnewline
\bottomrule \bottomrule
\end{longtable} \end{tabular}
\caption{IPC by benchmark} \caption{IPC by benchmark}
\label{fig:ipc} \label{fig:ipc}
\end{figure} \end{figure}
\begin{figure}[h] \begin{figure}[h]
\begin{center} \begin{center}
\includegraphics[width=0.65\linewidth]{ipc.png} \includegraphics[width=\linewidth]{ipc.png}
\end{center} \end{center}
\caption{IPC by benchmark} \caption{IPC by benchmark}
\label{fig:ipcgraph} \label{fig:ipcgraph}
@ -90,47 +140,57 @@ because most of the given programs have loops, in which
the conditional branch is taken many times while the loop the conditional branch is taken many times while the loop
is iterating, and then once when the loop terminates. Predicting is iterating, and then once when the loop terminates. Predicting
``not taken'' in this case would lead to many mispredictions. ``not taken'' in this case would lead to many mispredictions.
However, as described above, it seems like \emph{Taken}
and \emph{Not Taken} return the same addresses, so I'm not
completely sure where the IPC difference is coming from.
Once again, the \emph{Bimodal} predictor performs better than Once again, the \emph{Bimodal} predictor performs better than
the \emph{2-Level} predictor, and both are outperform by the \emph{2-Level} predictor, and both are outperformed by
\emph{Combined}, which leverages the two at the same time. \emph{Combined}, which leverages the two at the same time.
\section*{Part 3 - Bimodal Exploration} \section*{Part 3 - Bimodal Exploration}
In this section, the \emph{Bimodal} branch predictor is further In this section, the \emph{Bimodal} branch predictor is further
analyzed by varying the size of the BTB. BTB sizes range from analyzed by varying the size of the BTB. BTB sizes range from
256 to 4096. The data collected from this analysis is shown 256 to 4096. The data collected from this analysis is shown
in figure \ref{fig:ap2}. As usual, the data is shown as in Figure \ref{fig:ap2}. As usual, the data is shown as
a bar graph in figure \ref{fig:ap2graph}. a bar graph in Figure \ref{fig:ap2graph}.
\begin{figure}[h] \begin{figure}[h]
\begin{longtable}[]{@{}llllll@{}} \begin{tabular}[]{@{}llllll@{}}
\toprule \toprule
Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline Benchkmark & 256 & 512 & 1024 & 2048 & 4096 \\
\midrule \midrule
\endhead Anagram & .9606 & .9609 & .9612 & .9613 & .9613 \\
Anagram & .9606 & .9609 & .9612 & .9613 & .9613\tabularnewline Go & .7430 & .7610 & .7731 & .7822 & .7885 \\
GCC & .8158 & .8371 & .8554 & .8661 & .8726\tabularnewline GCC & .8158 & .8371 & .8554 & .8661 & .8726 \\
Go & .7430 & .7610 & .7731 & .7822 & .7885\tabularnewline
\bottomrule \bottomrule
\end{longtable} \end{tabular}
\caption{Bimodal address prediction rates by benchmark} \caption{Bimodal address prediction rates by benchmark}
\label{fig:ap2} \label{fig:ap2}
\end{figure} \end{figure}
\pagebreak
\begin{figure}[h] \begin{figure}[h]
\begin{center} \begin{center}
\includegraphics[width=0.65\linewidth]{ap2.png} \includegraphics[width=\linewidth]{ap2.png}
\end{center} \end{center}
\caption{IPC by benchmark} \caption{Bimodal address prediction by benchmark}
\label{fig:ap2graph} \label{fig:ap2graph}
\end{figure} \end{figure}
As expected, increasing the BTB size for the Bimodal As expected, increasing the BTB size for the Bimodal
predictor seems to improve its performance. The exception predictor seems to improve its performance. Since instructions
appears to be anagram, where the changes to performance are assigned slots in the BTB according to their hashes (which can collide),
are small enough to be unnoticable in the visualization. having a larger BTB means that there is a smaller chance of collisions,
and, therefore, that branch targets are more accurately predicted.
The exception appears to be the Anagram benchmark, where the changes to performance
are small enough to be unnoticable in the visualization. This
could be because the Anagram benchmark has only a few important
branches, which means that increasing the BTB size does not
prevent any further collisions. The benchmark also takes less real
time to run on my machine, which is an indicator that it is
less complex than the Go and GCC benchmarks (which further supports
the above theory).
\section*{Part 4 - Combined Branch Predictor Explanation} \section*{Part 4 - Combined Branch Predictor Explanation}
It appears as though the combined branch predictor works It appears as though the combined branch predictor works
@ -140,13 +200,16 @@ to, the combined predictor uses a third predictor, named \texttt{meta}
in the code. The \texttt{meta} predictor appears to be another bimodal in the code. The \texttt{meta} predictor appears to be another bimodal
predictor, but instead of deciding whether a branch is taken or not predictor, but instead of deciding whether a branch is taken or not
taken, it decides whether to use the two-level or the bimodal predictor taken, it decides whether to use the two-level or the bimodal predictor
to determine the branch outcome. If \texttt{meta} chooses a predictor to determine the branch outcome. If the two predictors managed by
that ends up being wrong, while the other predictor ends up right, \texttt{meta} disagree about the direction, then \texttt{meta}'s
\texttt{meta}'s 2-bit counter is updated to favor the correct predictor. 2-bit counter is updated to favor the correct predictor. Otherwise,
if the two predictors both predict ``taken'' or ``not taken'',
\texttt{meta} is unaffected, since neither predictor did better
than the other.
Because \texttt{meta} is implemented as a 2-bit predictor, it can Because \texttt{meta} is implemented as a 2-bit predictor, it can
tolerate at most one use of the wrong branch predictor before tolerate at most one use of the wrong branch predictor before
switching to the other (if the current predictor is "strongly" switching to the other (if the current predictor is ``strongly''
predicted). predicted).
\section*{Part 5 - 3-Bit Branch Predictor} \section*{Part 5 - 3-Bit Branch Predictor}
@ -160,23 +223,22 @@ and Figure \ref{fig:ap3graph} contains the visualization
of that data. of that data.
\begin{figure}[h] \begin{figure}[h]
\begin{longtable}[]{@{}llllll@{}} \begin{tabular}[]{@{}llllll@{}}
\toprule \toprule
Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline Benchkmark & 256 & 512 & 1024 & 2048 & 4096 \\
\midrule \midrule
\endhead Anagram & .9610 & .9612 & .9615 & .9616 & .9616 \\
Anagram & .9610 & .9612 & .9615 & .9616 & .9616\tabularnewline Go & .7507 & .7680 & .7799 & .7897 & .7966 \\
GCC & .8192 & .8385 & .8554 & .8656 & .8728\tabularnewline GCC & .8192 & .8385 & .8554 & .8656 & .8728 \\
Go & .7507 & .7680 & .7799 & .7897 & .7966\tabularnewline
\bottomrule \bottomrule
\end{longtable} \end{tabular}
\caption{3-Bit address prediction rates} \caption{3-Bit address prediction rates}
\label{fig:ap3} \label{fig:ap3}
\end{figure} \end{figure}
\begin{figure}[h] \begin{figure}[h]
\begin{center} \begin{center}
\includegraphics[width=0.65\linewidth]{ap3.png} \includegraphics[width=\linewidth]{ap3.png}
\end{center} \end{center}
\caption{3-Bit address prediction rates} \caption{3-Bit address prediction rates}
\label{fig:ap3graph} \label{fig:ap3graph}
@ -212,7 +274,7 @@ but perhaps it also follows the same pattern.
\begin{figure}[h] \begin{figure}[h]
\begin{center} \begin{center}
\includegraphics[width=0.65\linewidth]{2v3.png} \includegraphics[width=\linewidth]{2v3.png}
\end{center} \end{center}
\caption{Percent improvement of 3-bit predictor over the bimodal predictor.} \caption{Percent improvement of 3-bit predictor over the bimodal predictor.}
\label{fig:2v3} \label{fig:2v3}