Add initial draft of report.

2020-12-06 00:35:06 -08:00 · 2020-12-06 00:35:06 -08:00 · bb1564d522
commit bb1564d522
parent 39ebd845aa
6 changed files with 221 additions and 0 deletions
--- a/2v3.png
+++ b/2v3.png
--- a/ap1.png
+++ b/ap1.png
--- a/ap2.png
+++ b/ap2.png
--- a/ap3.png
+++ b/ap3.png
--- a/ipc.png
+++ b/ipc.png
--- a/report.tex
+++ b/report.tex
@ -0,0 +1,221 @@
 \documentclass{article}
 \usepackage[margin=1in]{geometry}
 \usepackage[skip=0.2\baselineskip]{caption}
 \usepackage{longtable}
 \usepackage{booktabs}
 \usepackage{graphicx}
 \title{High Performance Computer Architecture Final Project}
 \author{Danila Fedorin}
 \begin{document}
 \maketitle
 \section*{Part 1: Address Prediction Benchmarks}
 In this part, the \emph{Taken}, \emph{Not Taken},
 \emph{Bimodal}, \emph{2-Level} and \emph{Combined} branch
 predictors were run against three benchmarks. The results
 are recorded in Figure \ref{fig:ap1}. Figure \ref{fig:ap1graph}
 provides a bar chart of this data.
 Results are grouped by benchmark to make it easier to compare
 various branch prediction algorithms.
 \begin{figure}[h]
 \begin{longtable}[]{@{}llllll@{}}
 \toprule
 Benchkmark & Taken & Not Taken & Bimod & 2 level &
 Combined\tabularnewline
 \midrule
 \endhead
 Anagram & .3126 & .3126 & .9613 & .8717 & .9742\tabularnewline
 GCC & .4049 & .4049 & .8661 & .7668 & .8793\tabularnewline
 Go & .3782 & .3782 & .7822 & .6768 & .7906\tabularnewline
 \bottomrule
 \end{longtable}
 \caption{Address prediction rates of various predictors}
 \label{fig:ap1}
 \end{figure}
 \begin{figure}[h]
    \begin{center}
        \includegraphics[width=0.65\linewidth]{ap1.png}
    \end{center}
    \caption{Address prediction rates by benchmark}
    \label{fig:ap1graph}
 \end{figure}
 As expected, the two stateless predictors, \emph{Taken}
 and \emph{Not Taken}, perform significantly worse than the
 others. These predictors do not keep track of the behavior
 of various branches, and thus have limited ability
 to predict the direction of a branch. Out of the stateful
 predictors, the \emph{2-level} predictor seems to perform the worst.
 Unsurprisingly, the \emph{Combined} predictor, which is
 a combination of the other two stateful predictors, performs
 better than its constituents, since it's able to switch
 to a better-performing predictor as needed.
 \section*{Part 2: IPC Benchmarks}
 In this section, we present the IPC results from the previously listed
 predictors. Figure \ref{fig:ipc} contains the collected
 data, and Figure \ref{fig:ipcgraph} is a bar chart of
 that data.
 \begin{figure}[h]
 \begin{longtable}[]{@{}llllll@{}}
 \toprule
 Benchkmark & Taken & Not Taken & Bimod & 2 level &
 Combined\tabularnewline
 \midrule
 \endhead
 Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487\tabularnewline
 GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598\tabularnewline
 Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393\tabularnewline
 \bottomrule
 \end{longtable}
 \caption{IPC by benchmark}
 \label{fig:ipc}
 \end{figure}
 \begin{figure}[h]
    \begin{center}
        \includegraphics[width=0.65\linewidth]{ipc.png}
    \end{center}
    \caption{IPC by benchmark}
    \label{fig:ipcgraph}
 \end{figure}
 Once again, the stateless predictors perform significantly
 worse than the stateful predictors. Also, \emph{Taken}
 performs better than \emph{Not Taken}. This is likely
 because most of the given programs have loops, in which
 the conditional branch is taken many times while the loop
 is iterating, and then once when the loop terminates. Predicting
 ``not taken'' in this case would lead to many mispredictions.
 Once again, the \emph{Bimodal} predictor performs better than
 the \emph{2-Level} predictor, and both are outperform by
 \emph{Combined}, which leverages the two at the same time.
 \section*{Part 3 - Bimodal Exploration}
 In this section, the \emph{Bimodal} branch predictor is further
 analyzed by varying the size of the BTB. BTB sizes range from
 256 to 4096. The data collected from this analysis is shown
 in figure \ref{fig:ap2}. As usual, the data is shown as
 a bar graph in figure \ref{fig:ap2graph}.
 \begin{figure}[h]
 \begin{longtable}[]{@{}llllll@{}}
 \toprule
 Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline
 \midrule
 \endhead
 Anagram & .9606 & .9609 & .9612 & .9613 & .9613\tabularnewline
 GCC & .8158 & .8371 & .8554 & .8661 & .8726\tabularnewline
 Go & .7430 & .7610 & .7731 & .7822 & .7885\tabularnewline
 \bottomrule
 \end{longtable}
 \caption{Bimodal address prediction rates by benchmark}
 \label{fig:ap2}
 \end{figure}
 \pagebreak
 \begin{figure}[h]
    \begin{center}
        \includegraphics[width=0.65\linewidth]{ap2.png}
    \end{center}
    \caption{IPC by benchmark}
    \label{fig:ap2graph}
 \end{figure}
 As expected, increasing the BTB size for the Bimodal
 predictor seems to improve its performance. The exception
 appears to be anagram, where the changes to performance
 are small enough to be unnoticable in the visualization.
 \section*{Part 4 - Combined Branch Predictor Explanation}
 It appears as though the combined branch predictor works
 by considering the decisions of both a 2-level and a bimodal 
 branch predictor. To decide which predictor to listen
 to, the combined predictor uses a third predictor, named \texttt{meta}
 in the code. The \texttt{meta} predictor appears to be another bimodal
 predictor, but instead of deciding whether a branch is taken or not
 taken, it decides whether to use the two-level or the bimodal predictor
 to determine the branch outcome. If \texttt{meta} chooses a predictor
 that ends up being wrong, while the other predictor ends up right,
 \texttt{meta}'s 2-bit counter is updated to favor the correct predictor.
 Because \texttt{meta} is implemented as a 2-bit predictor, it can
 tolerate at most one use of the wrong branch predictor before
 switching to the other (if the current predictor is "strongly"
 predicted).
 \section*{Part 5 - 3-Bit Branch Predictor}
 For this part, I modified the SimpleScalar codebase to add
 a 3-bit branch predictor. The code will be included with this
 report, but not in this document. After implementing
 this predictor, I simulated it with the same BTB sizes
 as the previous extended simulations of the Bimodal (2-bit)
 predictor. Figure \ref{fig:ap3} contains this data,
 and Figure \ref{fig:ap3graph} contains the visualization
 of that data.
 \begin{figure}[h]
 \begin{longtable}[]{@{}llllll@{}}
 \toprule
 Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline
 \midrule
 \endhead
 Anagram & .9610 & .9612 & .9615 & .9616 & .9616\tabularnewline
 GCC & .8192 & .8385 & .8554 & .8656 & .8728\tabularnewline
 Go & .7507 & .7680 & .7799 & .7897 & .7966\tabularnewline
 \bottomrule
 \end{longtable}
 \caption{3-Bit address prediction rates}
 \label{fig:ap3}
 \end{figure}
 \begin{figure}[h]
    \begin{center}
        \includegraphics[width=0.65\linewidth]{ap3.png}
    \end{center}
    \caption{3-Bit address prediction rates}
    \label{fig:ap3graph}
 \end{figure}
 As with the bimodal branch predictor, the 3-bit predictor
 benefits from larger BTB sizes in the Go and GCC benchmarks,
 but seems to remain very consistent in the Anagram benchmark.
 The differences between this predictor and the related bimodal
 predictor are hard to see in this diagram.
 To better compare
 the two predictors, I computed the percent improvement to
 address prediction rates of the 3-bit branch predictor
 relative to the bimodal one. Figure \ref{fig:2v3} displays
 this information. From this figure, it appears as though
 the 3-bit predictor performs better than the bimodal one
 in most cases. However, it does perform slightly worse
 with a 2048-sized BTB in the GCC benchmark.
 The Go benchmark sees the most improvement (around 1\%).
 A 3-bit predictor performs better when branches generally
 follow the same direction, except for occasional groups
 in the other direction. If the Go benchmark implements
 the Chinese game of the same name, it's possible that the
 program behaves very much in this manner. For instance,
 if the program is scanning the board to find groups
 of ``dead'' pieces, starting at a recently placed piece,
 it will likely find pieces nearby, but occasionally run
 into empty spaces like ``eyes''. If the benchmark implements
 a Go AI, I'm not sure how it would behave computationally,
 but perhaps it also follows the same pattern.
 \begin{figure}[h]
    \begin{center}
        \includegraphics[width=0.65\linewidth]{2v3.png}
    \end{center}
    \caption{Percent improvement of 3-bit predictor over the bimodal predictor.}
    \label{fig:2v3}
 \end{figure}
 \end{document}