\documentclass{article}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx}
\usepackage{amsmath}
\title{Final Project Report}
\author{Danila Fedorin}
\begin{document}
\maketitle
\section*{General Design and Considerations}
The goal of this assignment was to create a 256-byte SRAM memory unit. In order
to minimize wire delays, I chose to split each bit into \textbf{4 columns of 64 SRAM cells
each}. This was motivated by the following factors:
\begin{itemize}
    \item \emph{Larger} columns were eliminated due to the high cost of interconnect.
        Even large write blocks were not able to charge the ``far ends'' of the wire
        at shorter clock cycles. Increasing wire width did not help; although resistance
        decreased, the capacitance increased, leading to small net gains. Thus, I made
        the decision to shrink the columns as much as possible. However...
    \item \emph{Smaller} columns became a routing challenge. Even with a 4-column split,
        to properly connect each cell of the SRAM column, the SRAM cells themselves need
        to accomodate an additional three \textsc{Wl} lines. Due to the pitch requirements
        on metals three and four, this is the upper limit (for reasonably sized cells).
        Alternatives included splitting the decoder into pieces, but for large numbers
        of columns, this meant that the decoder signal traveled through significant amounts
        of wire, and was thus slower.
\end{itemize}

For each of the 4 64-bit columns, I attached separate read and write blocks. However,
my placement of the write block was unorthodox. I observed that, although the write block
is perfectly capable of quickly manipulating the bitlines close to it, the changes
to the wires take too long to propagate through to the end. I addressed this with two separate
changes:

\begin{itemize}
    \item I added \textbf{additional precharge transistors} along the column, a total of 4.
        Each was sized at $5\lambda$, much like the SRAM transistors themselves. When the clock
        was low, these PMOS transistors became transparent, and helped precharge the bitlines faster.
        Doing so helped avid hysteresis. However, this did not help with writing during high clock,
        so...
    \item I also \textbf{placed the write block in the middle of the column}. This increased the distance
        between my furthest SRAM cell and the read block (since the write block now contributed to wire
        length). However, this made it significantly easier to drive the entire length of the wire,
        which was my main bottleneck. This was because the maximum distance from the write
        block to any cell in the column was halved. Since my read circuit continued to work in this
        configuration, I did not place it in the middle of the column, as that would needlessly
        increase the length of the wires. 
\end{itemize}

This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I placed
a memory cell at the very top of my column, which is the furthest spot from both the read and write
circuit. I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $5\lambda$ precharge
transistors, as well as 16 always-off $5\lambda$ transistors, which simulated the remaining memory cells.
I also placed \textsc{Din}, \textsc{Ad0}, and \textsc{Rwt} behind the default-sized flip-flops
attached to the clock to simulate something like a pipeline stage. My overall design is shown
in figure \ref{fig:top-design-sim}.

\pagebreak
\section*{Performance Results}
I made three measurements of my performance.

\begin{itemize}
    \item Without flip-flopping my inputs and outputs, I was able to clock my design around
        950\textit{ps}.
    \item With flip-flops on my inputs (but not on my output), I was able to clock my design
        around 1.24\textit{ns}. However, at this delay, the output of the gate came in very close to
        the falling edge of the clock.
    \item With flip-flops on my inputs and my outputs, I was able to clock my design at 2.6\textit{ns}.
        This significant delay was to allow enough setup time for the flip flop.
\end{itemize}
%
Two factors lead to these upper limits. 
%
\begin{itemize}
    \item \textit{Write capacitance} makes it increasingly difficult to overwrite the value
        in the cell. Clocking my design any faster than 950\textit{ps} or 1.24\textit{ns}
        (depending on the case) leads my cell to \textit{almost} flip, but not resolve correctly.
        I have found no way to work around these limits once my wire was properly sized, and my
        write block was placed in the middle of the column.
    \item \textit{Flop, decoder, and read delays} are the major limitation when both the inputs
        and the outputs of the circuit are connected to flip flops. Even though the output
        of the read block is correct, it doesn't arrive fast enough to be captured by the next cycle.
        Furthermore, in some cases, the signal to open a memory cell arrives later than the
        \textsc{Trig} signal for the senseamp, making it read too early and thus output the incorrect value.
\end{itemize}

\end{document}