diff --git a/2v3.png b/2v3.png new file mode 100644 index 0000000..9bd84ab Binary files /dev/null and b/2v3.png differ diff --git a/ap1.png b/ap1.png new file mode 100644 index 0000000..309f59f Binary files /dev/null and b/ap1.png differ diff --git a/ap2.png b/ap2.png new file mode 100644 index 0000000..4cb4fea Binary files /dev/null and b/ap2.png differ diff --git a/ap3.png b/ap3.png new file mode 100644 index 0000000..0e4792d Binary files /dev/null and b/ap3.png differ diff --git a/ipc.png b/ipc.png new file mode 100644 index 0000000..0cc0839 Binary files /dev/null and b/ipc.png differ diff --git a/report.tex b/report.tex new file mode 100644 index 0000000..d23ecd1 --- /dev/null +++ b/report.tex @@ -0,0 +1,221 @@ +\documentclass{article} +\usepackage[margin=1in]{geometry} +\usepackage[skip=0.2\baselineskip]{caption} +\usepackage{longtable} +\usepackage{booktabs} +\usepackage{graphicx} +\title{High Performance Computer Architecture Final Project} +\author{Danila Fedorin} + +\begin{document} +\maketitle +\section*{Part 1: Address Prediction Benchmarks} +In this part, the \emph{Taken}, \emph{Not Taken}, +\emph{Bimodal}, \emph{2-Level} and \emph{Combined} branch +predictors were run against three benchmarks. The results +are recorded in Figure \ref{fig:ap1}. Figure \ref{fig:ap1graph} +provides a bar chart of this data. +Results are grouped by benchmark to make it easier to compare +various branch prediction algorithms. + +\begin{figure}[h] +\begin{longtable}[]{@{}llllll@{}} +\toprule +Benchkmark & Taken & Not Taken & Bimod & 2 level & +Combined\tabularnewline +\midrule +\endhead +Anagram & .3126 & .3126 & .9613 & .8717 & .9742\tabularnewline +GCC & .4049 & .4049 & .8661 & .7668 & .8793\tabularnewline +Go & .3782 & .3782 & .7822 & .6768 & .7906\tabularnewline +\bottomrule +\end{longtable} +\caption{Address prediction rates of various predictors} +\label{fig:ap1} +\end{figure} + +\begin{figure}[h] + \begin{center} + \includegraphics[width=0.65\linewidth]{ap1.png} + \end{center} + \caption{Address prediction rates by benchmark} + \label{fig:ap1graph} +\end{figure} + +As expected, the two stateless predictors, \emph{Taken} +and \emph{Not Taken}, perform significantly worse than the +others. These predictors do not keep track of the behavior +of various branches, and thus have limited ability +to predict the direction of a branch. Out of the stateful +predictors, the \emph{2-level} predictor seems to perform the worst. +Unsurprisingly, the \emph{Combined} predictor, which is +a combination of the other two stateful predictors, performs +better than its constituents, since it's able to switch +to a better-performing predictor as needed. + +\section*{Part 2: IPC Benchmarks} +In this section, we present the IPC results from the previously listed +predictors. Figure \ref{fig:ipc} contains the collected +data, and Figure \ref{fig:ipcgraph} is a bar chart of +that data. + +\begin{figure}[h] +\begin{longtable}[]{@{}llllll@{}} +\toprule +Benchkmark & Taken & Not Taken & Bimod & 2 level & +Combined\tabularnewline +\midrule +\endhead +Anagram & 1.0473 & 1.0396 & 2.1871 & 1.8826 & 2.2487\tabularnewline +GCC & 0.7878 & 0.7722 & 1.2343 & 1.1148 & 1.2598\tabularnewline +Go & 0.9512 & 0.9412 & 1.3212 & 1.2035 & 1.3393\tabularnewline +\bottomrule +\end{longtable} +\caption{IPC by benchmark} +\label{fig:ipc} +\end{figure} + +\begin{figure}[h] + \begin{center} + \includegraphics[width=0.65\linewidth]{ipc.png} + \end{center} + \caption{IPC by benchmark} + \label{fig:ipcgraph} +\end{figure} + +Once again, the stateless predictors perform significantly +worse than the stateful predictors. Also, \emph{Taken} +performs better than \emph{Not Taken}. This is likely +because most of the given programs have loops, in which +the conditional branch is taken many times while the loop +is iterating, and then once when the loop terminates. Predicting +``not taken'' in this case would lead to many mispredictions. + +Once again, the \emph{Bimodal} predictor performs better than +the \emph{2-Level} predictor, and both are outperform by +\emph{Combined}, which leverages the two at the same time. + +\section*{Part 3 - Bimodal Exploration} +In this section, the \emph{Bimodal} branch predictor is further +analyzed by varying the size of the BTB. BTB sizes range from +256 to 4096. The data collected from this analysis is shown +in figure \ref{fig:ap2}. As usual, the data is shown as +a bar graph in figure \ref{fig:ap2graph}. + +\begin{figure}[h] +\begin{longtable}[]{@{}llllll@{}} +\toprule +Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline +\midrule +\endhead +Anagram & .9606 & .9609 & .9612 & .9613 & .9613\tabularnewline +GCC & .8158 & .8371 & .8554 & .8661 & .8726\tabularnewline +Go & .7430 & .7610 & .7731 & .7822 & .7885\tabularnewline +\bottomrule +\end{longtable} +\caption{Bimodal address prediction rates by benchmark} +\label{fig:ap2} +\end{figure} + +\pagebreak + +\begin{figure}[h] + \begin{center} + \includegraphics[width=0.65\linewidth]{ap2.png} + \end{center} + \caption{IPC by benchmark} + \label{fig:ap2graph} +\end{figure} + +As expected, increasing the BTB size for the Bimodal +predictor seems to improve its performance. The exception +appears to be anagram, where the changes to performance +are small enough to be unnoticable in the visualization. + +\section*{Part 4 - Combined Branch Predictor Explanation} +It appears as though the combined branch predictor works +by considering the decisions of both a 2-level and a bimodal +branch predictor. To decide which predictor to listen +to, the combined predictor uses a third predictor, named \texttt{meta} +in the code. The \texttt{meta} predictor appears to be another bimodal +predictor, but instead of deciding whether a branch is taken or not +taken, it decides whether to use the two-level or the bimodal predictor +to determine the branch outcome. If \texttt{meta} chooses a predictor +that ends up being wrong, while the other predictor ends up right, +\texttt{meta}'s 2-bit counter is updated to favor the correct predictor. + +Because \texttt{meta} is implemented as a 2-bit predictor, it can +tolerate at most one use of the wrong branch predictor before +switching to the other (if the current predictor is "strongly" +predicted). + +\section*{Part 5 - 3-Bit Branch Predictor} +For this part, I modified the SimpleScalar codebase to add +a 3-bit branch predictor. The code will be included with this +report, but not in this document. After implementing +this predictor, I simulated it with the same BTB sizes +as the previous extended simulations of the Bimodal (2-bit) +predictor. Figure \ref{fig:ap3} contains this data, +and Figure \ref{fig:ap3graph} contains the visualization +of that data. + +\begin{figure}[h] +\begin{longtable}[]{@{}llllll@{}} +\toprule +Benchkmark & 256 & 512 & 1024 & 2048 & 4096\tabularnewline +\midrule +\endhead +Anagram & .9610 & .9612 & .9615 & .9616 & .9616\tabularnewline +GCC & .8192 & .8385 & .8554 & .8656 & .8728\tabularnewline +Go & .7507 & .7680 & .7799 & .7897 & .7966\tabularnewline +\bottomrule +\end{longtable} +\caption{3-Bit address prediction rates} +\label{fig:ap3} +\end{figure} + +\begin{figure}[h] + \begin{center} + \includegraphics[width=0.65\linewidth]{ap3.png} + \end{center} + \caption{3-Bit address prediction rates} + \label{fig:ap3graph} +\end{figure} + +As with the bimodal branch predictor, the 3-bit predictor +benefits from larger BTB sizes in the Go and GCC benchmarks, +but seems to remain very consistent in the Anagram benchmark. +The differences between this predictor and the related bimodal +predictor are hard to see in this diagram. + +To better compare +the two predictors, I computed the percent improvement to +address prediction rates of the 3-bit branch predictor +relative to the bimodal one. Figure \ref{fig:2v3} displays +this information. From this figure, it appears as though +the 3-bit predictor performs better than the bimodal one +in most cases. However, it does perform slightly worse +with a 2048-sized BTB in the GCC benchmark. + +The Go benchmark sees the most improvement (around 1\%). +A 3-bit predictor performs better when branches generally +follow the same direction, except for occasional groups +in the other direction. If the Go benchmark implements +the Chinese game of the same name, it's possible that the +program behaves very much in this manner. For instance, +if the program is scanning the board to find groups +of ``dead'' pieces, starting at a recently placed piece, +it will likely find pieces nearby, but occasionally run +into empty spaces like ``eyes''. If the benchmark implements +a Go AI, I'm not sure how it would behave computationally, +but perhaps it also follows the same pattern. + +\begin{figure}[h] + \begin{center} + \includegraphics[width=0.65\linewidth]{2v3.png} + \end{center} + \caption{Percent improvement of 3-bit predictor over the bimodal predictor.} + \label{fig:2v3} +\end{figure} + +\end{document}