Add 'bug report' to report.
This commit is contained in:
parent
bb1564d522
commit
358121de55
162
report.tex
162
report.tex
|
@ -1,11 +1,10 @@
|
||||||
\documentclass{article}
|
\documentclass[conference,twocolumn]{IEEEtran}
|
||||||
\usepackage[margin=1in]{geometry}
|
|
||||||
\usepackage[skip=0.2\baselineskip]{caption}
|
\usepackage[skip=0.2\baselineskip]{caption}
|
||||||
\usepackage{longtable}
|
\usepackage{longtable}
|
||||||
\usepackage{booktabs}
|
\usepackage{booktabs}
|
||||||
\usepackage{graphicx}
|
\usepackage{graphicx}
|
||||||
\title{High Performance Computer Architecture Final Project}
|
\title{ECE 570 Final Project Report}
|
||||||
\author{Danila Fedorin}
|
\author{Danila Fedorin\\fedorind@oregonstate.edu}
|
||||||
|
|
||||||
\begin{document}
|
\begin{document}
|
||||||
\maketitle
|
\maketitle
|
||||||
|
@ -19,24 +18,23 @@ Results are grouped by benchmark to make it easier to compare
|
||||||
various branch prediction algorithms.
|
various branch prediction algorithms.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{longtable}[]{@{}llllll@{}}
|
\begin{tabular}[]{@{}llllll@{}}
|
||||||
\toprule
|
\toprule
|
||||||
Benchkmark & Taken & Not Taken & Bimod & 2 level &
|
Benchkmark & Taken & Not Taken & Bimod & 2 level &
|
||||||
Combined\tabularnewline
|
Comb \\
|
||||||
\midrule
|
\midrule
|
||||||
\endhead
|
Anagram & .3126 & .3126 & .9613 & .8717 & .9742 \\
|
||||||
Anagram & .3126 & .3126 & .9613 & .8717 & .9742\tabularnewline
|
Go & .3782 & .3782 & .7822 & .6768 & .7906 \\
|
||||||
GCC & .4049 & .4049 & .8661 & .7668 & .8793\tabularnewline
|
GCC & .4049 & .4049 & .8661 & .7668 & .8793 \\
|
||||||
Go & .3782 & .3782 & .7822 & .6768 & .7906\tabularnewline
|
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{longtable}
|
\end{tabular}
|
||||||
\caption{Address prediction rates of various predictors}
|
\caption{Address prediction rates of various predictors}
|
||||||
\label{fig:ap1}
|
\label{fig:ap1}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\includegraphics[width=0.65\linewidth]{ap1.png}
|
\includegraphics[width=\linewidth]{ap1.png}
|
||||||
\end{center}
|
\end{center}
|
||||||
\caption{Address prediction rates by benchmark}
|
\caption{Address prediction rates by benchmark}
|
||||||
\label{fig:ap1graph}
|
\label{fig:ap1graph}
|
||||||
|
@ -53,6 +51,59 @@ a combination of the other two stateful predictors, performs
|
||||||
better than its constituents, since it's able to switch
|
better than its constituents, since it's able to switch
|
||||||
to a better-performing predictor as needed.
|
to a better-performing predictor as needed.
|
||||||
|
|
||||||
|
I was confused why the \emph{Taken} and \emph{Not Taken}
|
||||||
|
predictors had identical address prediction rates. I would
|
||||||
|
have expected the \emph{Taken} predictor to correctly predict
|
||||||
|
more addresses, since structures like loops will typically
|
||||||
|
have more ``taken'' branches than ``not taken`` ones. At first,
|
||||||
|
I thought that this is explained by both stateless predictors
|
||||||
|
having no BTB - functions like \texttt{bpred\_update}
|
||||||
|
do not initialize these tables, and they are not used for
|
||||||
|
prediction. However, this shouldn't entirely account for the identical
|
||||||
|
numbers of address hits - after all, the \emph{Taken} predictor
|
||||||
|
should always return the expected target address, while the
|
||||||
|
\emph{Not Taken} predictor should, in the case of conditional
|
||||||
|
jumps, return \texttt{PC+1}. This seems consistent with the code.
|
||||||
|
|
||||||
|
However, I think I see what is happening. I looked at the following
|
||||||
|
fragment from \texttt{sim-outorder.c} (which was \textbf{not} added by me):
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
bpred_lookup(pred,
|
||||||
|
/* branch address */fetch_regs_PC,
|
||||||
|
/* target address */
|
||||||
|
/* FIXME: not computed */0,
|
||||||
|
...
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
It seems as though the target address is always predicted to be zero,
|
||||||
|
because it is not computed at the time of this function call. The
|
||||||
|
text ``FIXME'' indicates that this may be a bug or temporary issue.
|
||||||
|
This prediction, in turn, seems to mean that the \emph{Taken} branch predictor
|
||||||
|
will return \texttt{0} in all cases. I confirmed that this is the case by adding a call
|
||||||
|
to \texttt{printf} to the \texttt{BPredTaken} case of \texttt{bpred\_lookup}.
|
||||||
|
|
||||||
|
To me, this seems like an issue, because code for other predictors uses
|
||||||
|
\texttt{0} to represent ``not taken''. Consider, for instance, the following
|
||||||
|
snippet from later on in the same function:
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
return
|
||||||
|
((*(dir_update_ptr->pdir1) >= pred_cutoff)
|
||||||
|
? /* taken */ pbtb->target
|
||||||
|
: /* not taken */ 0);
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
Zero here is clearly used to denote ``not taken''. So, it seems as though
|
||||||
|
all in all, \emph{Taken} always returns ``not taken''. Amusingly,
|
||||||
|
the same will be the case with \emph{Not Taken}. It returns \texttt{PC+1}
|
||||||
|
in the case of conditional jumps (which is equivalent to returning zero,
|
||||||
|
since the code in \texttt{sim-outorder.c} converts zero to \texttt{PC+1}),
|
||||||
|
or, in the case of unconditional jumps, it returns the expected target
|
||||||
|
address (zero), which is \textit{also} \texttt{PC+1}! The fact that
|
||||||
|
the two predictors have the same address prediction rate seems
|
||||||
|
to be due to the ``FIXME'' in the simulator code.
|
||||||
|
|
||||||
\section*{Part 2: IPC Benchmarks}
|
\section*{Part 2: IPC Benchmarks}
|
||||||
In this section, we present the IPC results from the previously listed
|
In this section, we present the IPC results from the previously listed
|
||||||
predictors. Figure \ref{fig:ipc} contains the collected
|
predictors. Figure \ref{fig:ipc} contains the collected
|
||||||
|
@ -60,24 +111,23 @@ data, and Figure \ref{fig:ipcgraph} is a bar chart of
|
||||||
that data.
|
that data.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{longtable}[]{@{}llllll@{}}
|
\begin{tabular}[]{@{}llllll@{}}
|
||||||
\toprule
|
\toprule
|
||||||
Benchkmark & Taken & Not Taken & Bimod & 2 level &
|
Benchkmark & Taken & Not Taken & Bimod & 2 level &
|
||||||
Combined\tabularnewline
|
Comb \\
|
||||||
\midrule
|
\midrule
|
||||||
\endhead
|
Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487 \\
|
||||||
Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487\tabularnewline
|
Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393 \\
|
||||||
GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598\tabularnewline
|
GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598 \\
|
||||||
Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393\tabularnewline
|
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{longtable}
|
\end{tabular}
|
||||||
\caption{IPC by benchmark}
|
\caption{IPC by benchmark}
|
||||||
\label{fig:ipc}
|
\label{fig:ipc}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\includegraphics[width=0.65\linewidth]{ipc.png}
|
\includegraphics[width=\linewidth]{ipc.png}
|
||||||
\end{center}
|
\end{center}
|
||||||
\caption{IPC by benchmark}
|
\caption{IPC by benchmark}
|
||||||
\label{fig:ipcgraph}
|
\label{fig:ipcgraph}
|
||||||
|
@ -90,47 +140,57 @@ because most of the given programs have loops, in which
|
||||||
the conditional branch is taken many times while the loop
|
the conditional branch is taken many times while the loop
|
||||||
is iterating, and then once when the loop terminates. Predicting
|
is iterating, and then once when the loop terminates. Predicting
|
||||||
``not taken'' in this case would lead to many mispredictions.
|
``not taken'' in this case would lead to many mispredictions.
|
||||||
|
However, as described above, it seems like \emph{Taken}
|
||||||
|
and \emph{Not Taken} return the same addresses, so I'm not
|
||||||
|
completely sure where the IPC difference is coming from.
|
||||||
|
|
||||||
Once again, the \emph{Bimodal} predictor performs better than
|
Once again, the \emph{Bimodal} predictor performs better than
|
||||||
the \emph{2-Level} predictor, and both are outperform by
|
the \emph{2-Level} predictor, and both are outperformed by
|
||||||
\emph{Combined}, which leverages the two at the same time.
|
\emph{Combined}, which leverages the two at the same time.
|
||||||
|
|
||||||
\section*{Part 3 - Bimodal Exploration}
|
\section*{Part 3 - Bimodal Exploration}
|
||||||
In this section, the \emph{Bimodal} branch predictor is further
|
In this section, the \emph{Bimodal} branch predictor is further
|
||||||
analyzed by varying the size of the BTB. BTB sizes range from
|
analyzed by varying the size of the BTB. BTB sizes range from
|
||||||
256 to 4096. The data collected from this analysis is shown
|
256 to 4096. The data collected from this analysis is shown
|
||||||
in figure \ref{fig:ap2}. As usual, the data is shown as
|
in Figure \ref{fig:ap2}. As usual, the data is shown as
|
||||||
a bar graph in figure \ref{fig:ap2graph}.
|
a bar graph in Figure \ref{fig:ap2graph}.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{longtable}[]{@{}llllll@{}}
|
\begin{tabular}[]{@{}llllll@{}}
|
||||||
\toprule
|
\toprule
|
||||||
Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline
|
Benchkmark & 256 & 512 & 1024 & 2048 & 4096 \\
|
||||||
\midrule
|
\midrule
|
||||||
\endhead
|
Anagram & .9606 & .9609 & .9612 & .9613 & .9613 \\
|
||||||
Anagram & .9606 & .9609 & .9612 & .9613 & .9613\tabularnewline
|
Go & .7430 & .7610 & .7731 & .7822 & .7885 \\
|
||||||
GCC & .8158 & .8371 & .8554 & .8661 & .8726\tabularnewline
|
GCC & .8158 & .8371 & .8554 & .8661 & .8726 \\
|
||||||
Go & .7430 & .7610 & .7731 & .7822 & .7885\tabularnewline
|
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{longtable}
|
\end{tabular}
|
||||||
\caption{Bimodal address prediction rates by benchmark}
|
\caption{Bimodal address prediction rates by benchmark}
|
||||||
\label{fig:ap2}
|
\label{fig:ap2}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\pagebreak
|
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\includegraphics[width=0.65\linewidth]{ap2.png}
|
\includegraphics[width=\linewidth]{ap2.png}
|
||||||
\end{center}
|
\end{center}
|
||||||
\caption{IPC by benchmark}
|
\caption{Bimodal address prediction by benchmark}
|
||||||
\label{fig:ap2graph}
|
\label{fig:ap2graph}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
As expected, increasing the BTB size for the Bimodal
|
As expected, increasing the BTB size for the Bimodal
|
||||||
predictor seems to improve its performance. The exception
|
predictor seems to improve its performance. Since instructions
|
||||||
appears to be anagram, where the changes to performance
|
are assigned slots in the BTB according to their hashes (which can collide),
|
||||||
are small enough to be unnoticable in the visualization.
|
having a larger BTB means that there is a smaller chance of collisions,
|
||||||
|
and, therefore, that branch targets are more accurately predicted.
|
||||||
|
|
||||||
|
The exception appears to be the Anagram benchmark, where the changes to performance
|
||||||
|
are small enough to be unnoticable in the visualization. This
|
||||||
|
could be because the Anagram benchmark has only a few important
|
||||||
|
branches, which means that increasing the BTB size does not
|
||||||
|
prevent any further collisions. The benchmark also takes less real
|
||||||
|
time to run on my machine, which is an indicator that it is
|
||||||
|
less complex than the Go and GCC benchmarks (which further supports
|
||||||
|
the above theory).
|
||||||
|
|
||||||
\section*{Part 4 - Combined Branch Predictor Explanation}
|
\section*{Part 4 - Combined Branch Predictor Explanation}
|
||||||
It appears as though the combined branch predictor works
|
It appears as though the combined branch predictor works
|
||||||
|
@ -140,13 +200,16 @@ to, the combined predictor uses a third predictor, named \texttt{meta}
|
||||||
in the code. The \texttt{meta} predictor appears to be another bimodal
|
in the code. The \texttt{meta} predictor appears to be another bimodal
|
||||||
predictor, but instead of deciding whether a branch is taken or not
|
predictor, but instead of deciding whether a branch is taken or not
|
||||||
taken, it decides whether to use the two-level or the bimodal predictor
|
taken, it decides whether to use the two-level or the bimodal predictor
|
||||||
to determine the branch outcome. If \texttt{meta} chooses a predictor
|
to determine the branch outcome. If the two predictors managed by
|
||||||
that ends up being wrong, while the other predictor ends up right,
|
\texttt{meta} disagree about the direction, then \texttt{meta}'s
|
||||||
\texttt{meta}'s 2-bit counter is updated to favor the correct predictor.
|
2-bit counter is updated to favor the correct predictor. Otherwise,
|
||||||
|
if the two predictors both predict ``taken'' or ``not taken'',
|
||||||
|
\texttt{meta} is unaffected, since neither predictor did better
|
||||||
|
than the other.
|
||||||
|
|
||||||
Because \texttt{meta} is implemented as a 2-bit predictor, it can
|
Because \texttt{meta} is implemented as a 2-bit predictor, it can
|
||||||
tolerate at most one use of the wrong branch predictor before
|
tolerate at most one use of the wrong branch predictor before
|
||||||
switching to the other (if the current predictor is "strongly"
|
switching to the other (if the current predictor is ``strongly''
|
||||||
predicted).
|
predicted).
|
||||||
|
|
||||||
\section*{Part 5 - 3-Bit Branch Predictor}
|
\section*{Part 5 - 3-Bit Branch Predictor}
|
||||||
|
@ -160,23 +223,22 @@ and Figure \ref{fig:ap3graph} contains the visualization
|
||||||
of that data.
|
of that data.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{longtable}[]{@{}llllll@{}}
|
\begin{tabular}[]{@{}llllll@{}}
|
||||||
\toprule
|
\toprule
|
||||||
Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline
|
Benchkmark & 256 & 512 & 1024 & 2048 & 4096 \\
|
||||||
\midrule
|
\midrule
|
||||||
\endhead
|
Anagram & .9610 & .9612 & .9615 & .9616 & .9616 \\
|
||||||
Anagram & .9610 & .9612 & .9615 & .9616 & .9616\tabularnewline
|
Go & .7507 & .7680 & .7799 & .7897 & .7966 \\
|
||||||
GCC & .8192 & .8385 & .8554 & .8656 & .8728\tabularnewline
|
GCC & .8192 & .8385 & .8554 & .8656 & .8728 \\
|
||||||
Go & .7507 & .7680 & .7799 & .7897 & .7966\tabularnewline
|
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{longtable}
|
\end{tabular}
|
||||||
\caption{3-Bit address prediction rates}
|
\caption{3-Bit address prediction rates}
|
||||||
\label{fig:ap3}
|
\label{fig:ap3}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\includegraphics[width=0.65\linewidth]{ap3.png}
|
\includegraphics[width=\linewidth]{ap3.png}
|
||||||
\end{center}
|
\end{center}
|
||||||
\caption{3-Bit address prediction rates}
|
\caption{3-Bit address prediction rates}
|
||||||
\label{fig:ap3graph}
|
\label{fig:ap3graph}
|
||||||
|
@ -212,7 +274,7 @@ but perhaps it also follows the same pattern.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\includegraphics[width=0.65\linewidth]{2v3.png}
|
\includegraphics[width=\linewidth]{2v3.png}
|
||||||
\end{center}
|
\end{center}
|
||||||
\caption{Percent improvement of 3-bit predictor over the bimodal predictor.}
|
\caption{Percent improvement of 3-bit predictor over the bimodal predictor.}
|
||||||
\label{fig:2v3}
|
\label{fig:2v3}
|
||||||
|
|
Loading…
Reference in New Issue
Block a user