Add initial draft of report.
This commit is contained in:
parent
39ebd845aa
commit
bb1564d522
221
report.tex
Normal file
221
report.tex
Normal file
|
@ -0,0 +1,221 @@
|
||||||
|
\documentclass{article}
|
||||||
|
\usepackage[margin=1in]{geometry}
|
||||||
|
\usepackage[skip=0.2\baselineskip]{caption}
|
||||||
|
\usepackage{longtable}
|
||||||
|
\usepackage{booktabs}
|
||||||
|
\usepackage{graphicx}
|
||||||
|
\title{High Performance Computer Architecture Final Project}
|
||||||
|
\author{Danila Fedorin}
|
||||||
|
|
||||||
|
\begin{document}
|
||||||
|
\maketitle
|
||||||
|
\section*{Part 1: Address Prediction Benchmarks}
|
||||||
|
In this part, the \emph{Taken}, \emph{Not Taken},
|
||||||
|
\emph{Bimodal}, \emph{2-Level} and \emph{Combined} branch
|
||||||
|
predictors were run against three benchmarks. The results
|
||||||
|
are recorded in Figure \ref{fig:ap1}. Figure \ref{fig:ap1graph}
|
||||||
|
provides a bar chart of this data.
|
||||||
|
Results are grouped by benchmark to make it easier to compare
|
||||||
|
various branch prediction algorithms.
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{longtable}[]{@{}llllll@{}}
|
||||||
|
\toprule
|
||||||
|
Benchkmark & Taken & Not Taken & Bimod & 2 level &
|
||||||
|
Combined\tabularnewline
|
||||||
|
\midrule
|
||||||
|
\endhead
|
||||||
|
Anagram & .3126 & .3126 & .9613 & .8717 & .9742\tabularnewline
|
||||||
|
GCC & .4049 & .4049 & .8661 & .7668 & .8793\tabularnewline
|
||||||
|
Go & .3782 & .3782 & .7822 & .6768 & .7906\tabularnewline
|
||||||
|
\bottomrule
|
||||||
|
\end{longtable}
|
||||||
|
\caption{Address prediction rates of various predictors}
|
||||||
|
\label{fig:ap1}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.65\linewidth]{ap1.png}
|
||||||
|
\end{center}
|
||||||
|
\caption{Address prediction rates by benchmark}
|
||||||
|
\label{fig:ap1graph}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
As expected, the two stateless predictors, \emph{Taken}
|
||||||
|
and \emph{Not Taken}, perform significantly worse than the
|
||||||
|
others. These predictors do not keep track of the behavior
|
||||||
|
of various branches, and thus have limited ability
|
||||||
|
to predict the direction of a branch. Out of the stateful
|
||||||
|
predictors, the \emph{2-level} predictor seems to perform the worst.
|
||||||
|
Unsurprisingly, the \emph{Combined} predictor, which is
|
||||||
|
a combination of the other two stateful predictors, performs
|
||||||
|
better than its constituents, since it's able to switch
|
||||||
|
to a better-performing predictor as needed.
|
||||||
|
|
||||||
|
\section*{Part 2: IPC Benchmarks}
|
||||||
|
In this section, we present the IPC results from the previously listed
|
||||||
|
predictors. Figure \ref{fig:ipc} contains the collected
|
||||||
|
data, and Figure \ref{fig:ipcgraph} is a bar chart of
|
||||||
|
that data.
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{longtable}[]{@{}llllll@{}}
|
||||||
|
\toprule
|
||||||
|
Benchkmark & Taken & Not Taken & Bimod & 2 level &
|
||||||
|
Combined\tabularnewline
|
||||||
|
\midrule
|
||||||
|
\endhead
|
||||||
|
Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487\tabularnewline
|
||||||
|
GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598\tabularnewline
|
||||||
|
Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393\tabularnewline
|
||||||
|
\bottomrule
|
||||||
|
\end{longtable}
|
||||||
|
\caption{IPC by benchmark}
|
||||||
|
\label{fig:ipc}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.65\linewidth]{ipc.png}
|
||||||
|
\end{center}
|
||||||
|
\caption{IPC by benchmark}
|
||||||
|
\label{fig:ipcgraph}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
Once again, the stateless predictors perform significantly
|
||||||
|
worse than the stateful predictors. Also, \emph{Taken}
|
||||||
|
performs better than \emph{Not Taken}. This is likely
|
||||||
|
because most of the given programs have loops, in which
|
||||||
|
the conditional branch is taken many times while the loop
|
||||||
|
is iterating, and then once when the loop terminates. Predicting
|
||||||
|
``not taken'' in this case would lead to many mispredictions.
|
||||||
|
|
||||||
|
Once again, the \emph{Bimodal} predictor performs better than
|
||||||
|
the \emph{2-Level} predictor, and both are outperform by
|
||||||
|
\emph{Combined}, which leverages the two at the same time.
|
||||||
|
|
||||||
|
\section*{Part 3 - Bimodal Exploration}
|
||||||
|
In this section, the \emph{Bimodal} branch predictor is further
|
||||||
|
analyzed by varying the size of the BTB. BTB sizes range from
|
||||||
|
256 to 4096. The data collected from this analysis is shown
|
||||||
|
in figure \ref{fig:ap2}. As usual, the data is shown as
|
||||||
|
a bar graph in figure \ref{fig:ap2graph}.
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{longtable}[]{@{}llllll@{}}
|
||||||
|
\toprule
|
||||||
|
Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline
|
||||||
|
\midrule
|
||||||
|
\endhead
|
||||||
|
Anagram & .9606 & .9609 & .9612 & .9613 & .9613\tabularnewline
|
||||||
|
GCC & .8158 & .8371 & .8554 & .8661 & .8726\tabularnewline
|
||||||
|
Go & .7430 & .7610 & .7731 & .7822 & .7885\tabularnewline
|
||||||
|
\bottomrule
|
||||||
|
\end{longtable}
|
||||||
|
\caption{Bimodal address prediction rates by benchmark}
|
||||||
|
\label{fig:ap2}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\pagebreak
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.65\linewidth]{ap2.png}
|
||||||
|
\end{center}
|
||||||
|
\caption{IPC by benchmark}
|
||||||
|
\label{fig:ap2graph}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
As expected, increasing the BTB size for the Bimodal
|
||||||
|
predictor seems to improve its performance. The exception
|
||||||
|
appears to be anagram, where the changes to performance
|
||||||
|
are small enough to be unnoticable in the visualization.
|
||||||
|
|
||||||
|
\section*{Part 4 - Combined Branch Predictor Explanation}
|
||||||
|
It appears as though the combined branch predictor works
|
||||||
|
by considering the decisions of both a 2-level and a bimodal
|
||||||
|
branch predictor. To decide which predictor to listen
|
||||||
|
to, the combined predictor uses a third predictor, named \texttt{meta}
|
||||||
|
in the code. The \texttt{meta} predictor appears to be another bimodal
|
||||||
|
predictor, but instead of deciding whether a branch is taken or not
|
||||||
|
taken, it decides whether to use the two-level or the bimodal predictor
|
||||||
|
to determine the branch outcome. If \texttt{meta} chooses a predictor
|
||||||
|
that ends up being wrong, while the other predictor ends up right,
|
||||||
|
\texttt{meta}'s 2-bit counter is updated to favor the correct predictor.
|
||||||
|
|
||||||
|
Because \texttt{meta} is implemented as a 2-bit predictor, it can
|
||||||
|
tolerate at most one use of the wrong branch predictor before
|
||||||
|
switching to the other (if the current predictor is "strongly"
|
||||||
|
predicted).
|
||||||
|
|
||||||
|
\section*{Part 5 - 3-Bit Branch Predictor}
|
||||||
|
For this part, I modified the SimpleScalar codebase to add
|
||||||
|
a 3-bit branch predictor. The code will be included with this
|
||||||
|
report, but not in this document. After implementing
|
||||||
|
this predictor, I simulated it with the same BTB sizes
|
||||||
|
as the previous extended simulations of the Bimodal (2-bit)
|
||||||
|
predictor. Figure \ref{fig:ap3} contains this data,
|
||||||
|
and Figure \ref{fig:ap3graph} contains the visualization
|
||||||
|
of that data.
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{longtable}[]{@{}llllll@{}}
|
||||||
|
\toprule
|
||||||
|
Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline
|
||||||
|
\midrule
|
||||||
|
\endhead
|
||||||
|
Anagram & .9610 & .9612 & .9615 & .9616 & .9616\tabularnewline
|
||||||
|
GCC & .8192 & .8385 & .8554 & .8656 & .8728\tabularnewline
|
||||||
|
Go & .7507 & .7680 & .7799 & .7897 & .7966\tabularnewline
|
||||||
|
\bottomrule
|
||||||
|
\end{longtable}
|
||||||
|
\caption{3-Bit address prediction rates}
|
||||||
|
\label{fig:ap3}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.65\linewidth]{ap3.png}
|
||||||
|
\end{center}
|
||||||
|
\caption{3-Bit address prediction rates}
|
||||||
|
\label{fig:ap3graph}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
As with the bimodal branch predictor, the 3-bit predictor
|
||||||
|
benefits from larger BTB sizes in the Go and GCC benchmarks,
|
||||||
|
but seems to remain very consistent in the Anagram benchmark.
|
||||||
|
The differences between this predictor and the related bimodal
|
||||||
|
predictor are hard to see in this diagram.
|
||||||
|
|
||||||
|
To better compare
|
||||||
|
the two predictors, I computed the percent improvement to
|
||||||
|
address prediction rates of the 3-bit branch predictor
|
||||||
|
relative to the bimodal one. Figure \ref{fig:2v3} displays
|
||||||
|
this information. From this figure, it appears as though
|
||||||
|
the 3-bit predictor performs better than the bimodal one
|
||||||
|
in most cases. However, it does perform slightly worse
|
||||||
|
with a 2048-sized BTB in the GCC benchmark.
|
||||||
|
|
||||||
|
The Go benchmark sees the most improvement (around 1\%).
|
||||||
|
A 3-bit predictor performs better when branches generally
|
||||||
|
follow the same direction, except for occasional groups
|
||||||
|
in the other direction. If the Go benchmark implements
|
||||||
|
the Chinese game of the same name, it's possible that the
|
||||||
|
program behaves very much in this manner. For instance,
|
||||||
|
if the program is scanning the board to find groups
|
||||||
|
of ``dead'' pieces, starting at a recently placed piece,
|
||||||
|
it will likely find pieces nearby, but occasionally run
|
||||||
|
into empty spaces like ``eyes''. If the benchmark implements
|
||||||
|
a Go AI, I'm not sure how it would behave computationally,
|
||||||
|
but perhaps it also follows the same pattern.
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.65\linewidth]{2v3.png}
|
||||||
|
\end{center}
|
||||||
|
\caption{Percent improvement of 3-bit predictor over the bimodal predictor.}
|
||||||
|
\label{fig:2v3}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\end{document}
|
Loading…
Reference in New Issue
Block a user