Add first draft of homework solutions.
This commit is contained in:
commit
ee069ad7da
249
hw1.tex
Normal file
249
hw1.tex
Normal file
@ -0,0 +1,249 @@
|
||||
\documentclass{article}
|
||||
\usepackage{amsmath}
|
||||
\usepackage{geometry}[margin=0.5in]
|
||||
\begin{document}
|
||||
\section*{Problem 1}
|
||||
\subsection*{Part a}
|
||||
With one processor, a single processor must computed $N$ serial values, each of which
|
||||
takes $T_c$ time. However, no communication is necessary. Thus, we have:
|
||||
|
||||
\begin{equation*}
|
||||
T_\text{serial} = N T_c
|
||||
\end{equation*}
|
||||
|
||||
With multiple processes, the task is split into many parts. Each processor
|
||||
computes $N/P$ values, each of which takes $T_c$ time. Then, the processors
|
||||
take turns broadcasting values; all $N$ values must be broadcast, and each
|
||||
takes $T_b$ time. Thus:
|
||||
|
||||
\begin{equation*}
|
||||
T_\text{parallel} = \frac{N}{P}T_c + N T_b
|
||||
\end{equation*}
|
||||
|
||||
We can now compute the speedup:
|
||||
|
||||
\begin{equation*}
|
||||
\frac{T_\text{serial}}{T_\text{parallel}} = \frac{T_c}{\frac{T_c}{P} + T_b} = \frac{R}{\frac{R}{P} + 1}
|
||||
\end{equation*}
|
||||
|
||||
\subsection*{Part b}
|
||||
We have
|
||||
|
||||
\begin{equation*}
|
||||
K = \frac{N}{P} \Rightarrow P = \frac{N}{K}
|
||||
\end{equation*}
|
||||
|
||||
Thus, for each $K = 1, 2, ..., 1024$, we have $P = 1024, 512, ..., 1$.
|
||||
To maximize speedup, we want to minimize the denominator of our speedup
|
||||
equation from part a. This is done when P is largest, and thus, when K
|
||||
is smallest:
|
||||
|
||||
\begin{equation*}
|
||||
K = 1 \Leftrightarrow P = 1024 \Leftrightarrow S = \frac{R}{\frac{R}{1024}+1}
|
||||
\end{equation*}
|
||||
|
||||
\subsection*{Part c}
|
||||
\begin{equation*}
|
||||
\begin{aligned}
|
||||
\frac{R}{\frac{R}{P} + 1} &> 1 \\
|
||||
R - 1 &> \frac{R}{P} \\
|
||||
P &> \frac{R}{R-1}
|
||||
\end{aligned}
|
||||
\end{equation*}
|
||||
|
||||
Since the number of processors has to be an integer,
|
||||
|
||||
\begin{equation*}
|
||||
P > \lfloor \frac{R}{R-1} \rfloor
|
||||
\end{equation*}
|
||||
|
||||
\subsection*{Part d}
|
||||
We effectively just need to counteract the speedup;
|
||||
since all times in our equations are proportional to N,
|
||||
the following is sufficient:
|
||||
|
||||
\begin{equation*}
|
||||
N_\text{parallel} = \lceil N_\text{serial} \frac{R}{\frac{R}{P} + 1} \rceil
|
||||
\end{equation*}
|
||||
|
||||
\pagebreak
|
||||
|
||||
\section*{Problem 2}
|
||||
\subsection*{Part a}
|
||||
The average execution times are as follows:
|
||||
\begin{itemize}
|
||||
\item 3670ms (base machine)
|
||||
\item 2334ms (base + FP units)
|
||||
\item 2903 (base + cache)
|
||||
\end{itemize}
|
||||
The following table contains the speedups:
|
||||
\begin{figure}[h]
|
||||
\begin{tabular}{l|c|c|c}
|
||||
Machine & Program 1 Speedup & Program 2 Speedup & Program 3 Speedup \\ \hline
|
||||
Base + FP & 1 & 5 & 1.667 \\
|
||||
Base + Cache & 1.429 & 1.111 & 1.25
|
||||
\end{tabular}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[h]
|
||||
\begin{tabular}{l|c|c|c|c}
|
||||
Machine & $S_1$ & $S_2$ & $S_3$ & $S_4$ \\ \hline
|
||||
Base + FP & 1.572 & 2.556 & 1.667 & 2.027 \\
|
||||
Base + Cache & 1.264 & 1.263 & 1.25 & 1.257
|
||||
\end{tabular}
|
||||
\end{figure}
|
||||
|
||||
For each one of speedups $S_1$ through $S_4$, it seems like the Base + FP machine is
|
||||
the best improvement. In each measure, it had a greater speedup than Base + Cache.
|
||||
|
||||
\subsection*{Part b}
|
||||
The average execution times are as follows:
|
||||
\begin{itemize}
|
||||
\item 1 (base machine)
|
||||
\item .6 (base + FP units)
|
||||
\item .8 (base + cache)
|
||||
\end{itemize}
|
||||
The individual speedups remain the same:
|
||||
\begin{equation*}
|
||||
\frac{A}{B} = \frac{A/A}{B/A}
|
||||
\end{equation*}
|
||||
So, only $S_1$ changes.
|
||||
\begin{figure}[h]
|
||||
\begin{tabular}{l|c|c|c|c}
|
||||
Machine & $S_1$ & $S_2$ & $S_3$ & $S_4$ \\ \hline
|
||||
Base + FP & 1.667 & 2.556 & 1.667 & 2.027 \\
|
||||
Base + Cache & 1.25 & 1.263 & 1.25 & 1.257
|
||||
\end{tabular}
|
||||
\end{figure}
|
||||
|
||||
For each of the speedups $S_1$ through $S_4$, it seems like Base + FP is still
|
||||
the best improvement. Only speedup $S_1$ changed, and it didn't change by that much.
|
||||
In fact, it made Base + FP look even better.
|
||||
|
||||
\pagebreak
|
||||
|
||||
\section*{Problem 3}
|
||||
% if integer arithmetic/logic/store/load instruction in ID, check <pipeline registers>;
|
||||
% if FP load instruction in ID, check <pipeline registers>
|
||||
% if FP arithmetic instruction in ID, check <pipeline registers>
|
||||
% if FP store instruction in ID, check <pipeline registers >
|
||||
\subsection*{Part a}
|
||||
\begin{itemize}
|
||||
\item If integer arithmetic/logic/store/load instruction in ID, check ID/EX (in case of load).
|
||||
\item If FP load instruction in ID, check ID/EX (in case of integer load, assuming S.D doesn't go through FP)
|
||||
\item If FP arithmetic instruction in ID, check ID/EX (float load), ID/FP, FP1/FP2, FP2/FP3, FP3/FP4.
|
||||
No need to check FP4/FP5 and after because forwarding available.
|
||||
\item If FP store instruction in ID, ID/EX (address), ID/FP, FP1/FP2, FP2/FP3, FP3/FP4
|
||||
\end{itemize}
|
||||
|
||||
\subsection*{Part b}
|
||||
\begin{itemize}
|
||||
\item If integer arithmetic/logic/store/load instruction in ID, check nothing?
|
||||
\item If FP load instruction in ID, check ID/FP, FP1/FP2, FP2/FP3
|
||||
\item If FP arithmetic instruction in ID, check nothing?
|
||||
\item If FP store instruction in ID, check nothing?
|
||||
\end{itemize}
|
||||
|
||||
\pagebreak
|
||||
\section*{Problem 4}
|
||||
\subsection*{Part a}
|
||||
\begin{itemize}
|
||||
\item EX/ME $\rightarrow$ EX (for consecutive integer operations)
|
||||
\item ME/FP3 $\rightarrow$ EX (for integer operations separated by one other)
|
||||
\item FP3/FP4 $\rightarrow$ EX (for integer operations separated by two others)
|
||||
\item FP4/FP5 $\rightarrow$ EX (for integer operations separated by three others)
|
||||
\item FP5/WB $\rightarrow$ EX (for integer operations separated by four others)
|
||||
\item ME/FP3 $\rightarrow$ FP1 (for float operations following floating load)
|
||||
\item FP3/FP4 $\rightarrow$ FP1 (for float operations following floating load)
|
||||
\item FP4/FP5 $\rightarrow$ FP1 (same reason)
|
||||
\item FP5/WB $\rightarrow$ FP1 (same reason)
|
||||
\end{itemize}
|
||||
|
||||
\subsection*{Part b}
|
||||
\begin{itemize}
|
||||
\item If integer arithmetic/logic/store/load instruction in ID, check ID/EX (in case of integer load).
|
||||
\item If FP arithmetic instruction in ID, check ID/EX (in case of float load), FP1/FP2, FP2/FP3 FP3/FP4.
|
||||
\item If FP store instruction in ID, check ID/EX (in case of int load), FP1/FP2, ME/FP3, FP2/FP3, FP3/FP4.
|
||||
\end{itemize}
|
||||
|
||||
\pagebreak
|
||||
\section*{Problem 5}
|
||||
\begin{figure}[h]
|
||||
\begin{tabular}{l|c|c|c|c|c|c|l}
|
||||
& D & I & E.S. & E.C. & C & B & Comments \\ \hline
|
||||
L.D F0, 0(R1) & 1 & 2 & 3 & 3 & 4 & 5 & \\ \hline
|
||||
L.D F2, 0(R2) & 2 & 3 & 4 & 4 & 5 & 6 & \\ \hline
|
||||
L.D F4, 0(R3) & 3 & 4 & 5 & 5 & 6 & 7 & \\ \hline
|
||||
MUL.D F6, F2, F0 & 4 & 7 & 8 & 12 & - & 13 & Wait for F2 to hit CDB. \\ \hline
|
||||
ADD.D F8, F6, F4 & 5 & 14 & 15 & 19 & - & 20 & Wait for F6 to hit CDB. \\ \hline
|
||||
ADDI R1, R1, \#8 & 6 & 7 & 8 & 8 & - & 9 & \\ \hline
|
||||
ADDI R2, R2, \#8 & 7 & 8 & 9 & 9 & - & 10 & \\ \hline
|
||||
ADDI R3, R3, \#8 & 8 & 9 & 10 & 10 & - & 11 & \\ \hline
|
||||
S.D-A F8, -8(R2) & 9 & 11 & 12 & 12 & - & - & Wait for R2 to hit CDB. \\ \hline
|
||||
S.D-D F8, -8(R2) & 10 & 21 & 22 & 22 & 23 & - & Wait for F8 to hit CDB. \\ \hline
|
||||
BNE R4, R2, LOOP & 11 & 12 & 13 & 13 & - & 14 & \\ \hline
|
||||
L.D F0, 0(R1) & 15 & 16 & 17 & 17 & 18 & 19 & Wait for BNE to complete. \\ \hline
|
||||
L.D F2, 0(R2) & 16 & 17 & 18 & 18 & 19 & 20 & \\ \hline
|
||||
\end{tabular}
|
||||
\caption{Part a - no speculation.}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[h]
|
||||
\begin{tabular}{l|c|c|c|c|c|c|c|l}
|
||||
& D & I & E.S. & E.C. & C & B & R & Comments \\ \hline
|
||||
L.D F0, 0(R1) & 1 & 2 & 3 & 3 & 4 & 5 & 6 & \\ \hline
|
||||
L.D F2, 0(R2) & 2 & 3 & 4 & 4 & 5 & 6 & 7 & \\ \hline
|
||||
L.D F4, 0(R3) & 3 & 4 & 5 & 5 & 6 & 7 & 8 & \\ \hline
|
||||
MUL.D F6, F2, F0 & 4 & 7 & 8 & 12 & - & 13 & 14 & Wait for F2 to hit CDB. \\ \hline
|
||||
ADD.D F8, F6, F4 & 5 & 14 & 15 & 19 & - & 20 & 21 & Wait for F6 to hit CDB. \\ \hline
|
||||
ADDI R1, R1, \#8 & 6 & 7 & 8 & 8 & - & 9 & 22 & \\ \hline
|
||||
ADDI R2, R2, \#8 & 7 & 8 & 9 & 9 & - & 10 & 23 & \\ \hline
|
||||
ADDI R3, R3, \#8 & 8 & 9 & 10 & 10 & - & 11 & 24 & \\ \hline
|
||||
S.D-A F8, -8(R2) & 9 & 11 & 12 & 12 & - & - & - & Wait for R2 to hit CDB. \\ \hline
|
||||
S.D-D F8, -8(R2) & 10 & 22 & 23 & 23 & - & 24 & 25 & Wait for F8 and time retire (?) \\ \hline
|
||||
BNE R4, R2, LOOP & 11 & 12 & 13 & 13 & - & 14 & 26 & \\ \hline
|
||||
L.D F0, 0(R1) & 12 & 13 & 14 & 14 & 15 & 16 & 27 & Predict taken. \\ \hline
|
||||
L.D F2, 0(R2) & 13 & 14 & 15 & 15 & 16 & 17 & 28 & \\ \hline
|
||||
\end{tabular}
|
||||
\caption{Part b - with speculation.}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[h]
|
||||
\begin{tabular}{l|c|c|c|c|c|c|c|c|l}
|
||||
& D & I & F & E.S. & E.C. & C & B & R & Comments \\ \hline
|
||||
L.D F0, 0(R1) & 1 & 2 & 3 & 4 & 4 & 5 & 6 & 7 & \\ \hline
|
||||
L.D F2, 0(R2) & 2 & 3 & 4 & 5 & 5 & 6 & 7 & 8 & \\ \hline
|
||||
L.D F4, 0(R3) & 3 & 4 & 5 & 6 & 6 & 7 & 8 & 9 & \\ \hline
|
||||
MUL.D F6, F2, F0 & 4 & 8 & 9 & 10 & 14 & - & 15 & 16 & Wait for F2 to hit CDB. \\ \hline
|
||||
ADD.D F8, F6, F4 & 5 & 16 & 17 & 18 & 22 & - & 23 & 24 & Wait for F6 to hit CDB. \\ \hline
|
||||
ADDI R1, R1, \#8 & 6 & 7 & 8 & 9 & 9 & - & 10 & 25 & \\ \hline
|
||||
ADDI R1, R1, \#8 & 7 & 8 & 9 & 10 & 10 & - & 11 & 26 & \\ \hline
|
||||
ADDI R1, R1, \#8 & 8 & 9 & 10 & 11 & 11 & - & 12 & 27 & \\ \hline
|
||||
S.D-A F8, -8(R2) & 9 & 12 & 13 & 14 & 14 & - & - & - & Wait for R2 to hit CDB. \\ \hline
|
||||
S.D-D F8, -8(R2) & 10 & 24 & 25 & 26 & 26 & 27 & - & 28 & Wait for F8 to hid CDB. \\ \hline
|
||||
BNE R4, R2, LOOP & 11 & 13 & 14 & 15 & 15 & - & 16 & 29 & CDB conflict with I4. \\ \hline
|
||||
L.D F0, 0(R1) & 12 & 13 & 14 & 15 & 15 & 16 & 17 & 30 & Predict taken. \\ \hline
|
||||
L.D F2, 0(R2) & 13 & 14 & 15 & 16 & 16 & 17 & 18 & 31 & \\ \hline
|
||||
\end{tabular}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[h]
|
||||
\begin{tabular}{l|c|c|c|c|c|c|c|c|l}
|
||||
& D & I & F & E.S. & E.C. & C & B & R & Comments \\ \hline
|
||||
L.D F0, 0(R1) & 1 & 2 & 3 & 4 & 4 & 5 & 6 & 7 & \\ \hline
|
||||
L.D F2, 0(R2) & 2 & 3 & 4 & 5 & 5 & 6 & 7 & 8 & \\ \hline
|
||||
L.D F4, 0(R3) & 3 & 4 & 5 & 6 & 6 & 7 & 8 & 9 & \\ \hline
|
||||
MUL.D F6, F2, F0 & 4 & 5 & 6 & 7 & 11 & - & 12 & 13 & Speculatively schedule, expecting F2. \\ \hline
|
||||
ADD.D F8, F6, F4 & 5 & 10 & 11 & 12 & 16 & - & 17 & 18 & Speculatively schedule, expecting F6. \\ \hline
|
||||
ADDI R1, R1, \#8 & 6 & 7 & 8 & 9 & 9 & - & 10 & 19 & \\ \hline
|
||||
ADDI R1, R1, \#8 & 7 & 8 & 9 & 10 & 10 & - & 11 & 20 & \\ \hline
|
||||
ADDI R1, R1, \#8 & 8 & 10 & 11 & 12 & 12 & - & 13 & 21 & CDB conflict with I4. \\ \hline
|
||||
S.D-A F8, -8(R2) & 9 & 10 & 11 & 12 & 12 & - & - & - & Speculatively schedule, expecting R2. \\ \hline
|
||||
S.D-D F8, -8(R2) & 10 & 15 & 16 & 17 & 17 & 18 & - & 22 & Do we need to delay so retire is after cache? \\ \hline
|
||||
BNE R4, R2, LOOP & 11 & 12 & 13 & 14 & 14 & - & 15 & 23 & \\ \hline
|
||||
L.D F0, 0(R1) & 12 & 13 & 14 & 15 & 15 & 16 & 17 & 24 & \\ \hline
|
||||
L.D F2, 0(R2) & 13 & 14 & 15 & 16 & 16 & 17 & 18 & 25 & \\ \hline
|
||||
\end{tabular}
|
||||
\end{figure}
|
||||
|
||||
\end{document}
|
Loading…
Reference in New Issue
Block a user