Update TODOs.

2021-03-17 12:16:02 -07:00
parent 9afa839bff
commit b99403a4ff
1 changed files with 128 additions and 21 deletions
--- a/final/report.tex
+++ b/final/report.tex
@@ -4,6 +4,8 @@
 \usepackage{amsmath}
 \usepackage{hyperref}
 \usepackage{xcolor}
 \usepackage{caption}
 \usepackage{subcaption}
 \definecolor{link}{HTML}{006275}
 \hypersetup{
    colorlinks,
@@ -45,7 +47,7 @@ changes:
 \begin{itemize}
    \item I added \textbf{additional precharge transistors} along the column, a total of 4.
-        Each was sized at $5\lambda$, much like the SRAM transistors themselves. When the clock
+        Each was sized at $10\lambda$, much like the SRAM transistors themselves. When the clock
        was low, these PMOS transistors became transparent, and helped precharge the bitlines faster.
        Doing so helped avid hysteresis. However, this did not help with writing during high clock,
        so...
@@ -57,11 +59,21 @@ changes:
        configuration, I did not place it in the middle of the column, as that would needlessly
        increase the length of the wires. 
 \end{itemize}
-
+%
-This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I placed
+This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I \textbf{tested three configurations}:
-a memory cell at the very top of my column, which is the furthest spot from both the read and write
+\begin{enumerate}
-circuit. I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
+    \item A memory cell at the very top of my column, which is the furthest spot from both the read and write.
-capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $5\lambda$ precharge
+        This is the simulation in the figure.
    \item A memory cell in the middle of my column, in the same place as the write block. Since the write block
        has brief ``false starts'', this test was to ensure that the read block can still pick up data
        despite the write block's misfires.
    \item A memory cell at the very bottom of my column. This area has additional capacitance from the read block;
        it thus takes longer to charge up, and tends to be the first spot where writes fail.
 circuit. 
 %
 \end{enumerate}
 I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
 capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $10\lambda$ precharge
 transistors, as well as 16 always-off $5\lambda$ transistors, which simulated the remaining memory cells.
 I also placed \textsc{Din}, \textsc{Ad0}, and \textsc{Rwt} behind the default-sized flip-flops
 attached to the clock to simulate something like a pipeline stage. My overall design is shown
@@ -89,12 +101,7 @@ of length to this number, to a total of roughly $2200\lambda$.
 \pagebreak
 \section{Performance Results}
-I was able to clock my design at 1.38ns. There is a caveat to this clock speed: my \textsc{Bt} and
+I was able to clock my design at $1.9\textit{ns}$.
 \textsc{Bf} lines are not pulled all the way to \textsc{Gnd} when they are written low. This
 doesn't seem to be a problem - it's sufficient to flip the furthest cell in the design in
 every situation I've tested. However, from what I hear, this was discouraged during one of
 the office hours (which I was unable to attend). With the constraint of pulling the wires
 all the way down, my design can operate at around 2.1ns.
 %
 Two factors lead to these upper limits. 
 %
@@ -116,7 +123,7 @@ Two factors lead to these upper limits.
 \section{Components}
 \subsection{Decoder}
 \subsubsection{In My Own Words}
-The decoder in this design is exact same one as we were given in lecture.
+The decoder in this design is \textit{almost} the exact same one as we were given in lecture.
 It computes all combinations of two consecutive bits using a \textsc{Nand} gate; for
 each combination, there are 4 adjacent two-bit combinations,
 leading to a 4 \textsc{Nor} gates connected to each \textsc{Nand}. There are now
@@ -127,6 +134,20 @@ results in 256 unique \textsc{Wl} wires. Finally, these need to be attached
 to the clock, so that cells aren't open randomly. This is done using an \textsc{And}
 gate (a \textsc{Nand} followed by an inverter).
 I adjusted this design to account for the address signals that need to be fed
 into the write blocks. Which of the read/write columns is triggered
 depends on the upper two bits of the address (since we have 4 columns). I modeled
 this by increasing the fanout on the first \textsc{Nand} gate from 1 to 4.
 This is pessimistic; each 2-bit combination would only feed into one write block,
 whose trigger gate is normally sized.
 \begin{figure}[h]
    \centering
    \includegraphics[width=\linewidth]{decoder.png}
    \caption{Decoder model used in project.}
    \label{fig:decoder}
 \end{figure}
 % TODO: Domino logic
 % TODO: More inverters?
@@ -151,7 +172,7 @@ on the two \textsc{Nand3} gates was easy to understand and build, but was less
 sensitive, and tended to behave strangely under pressure. This led to difficulties
 with debugging (the output would, for instance, flip completely at certain
 wire widths), and was seemingly random. Instead, I used
-an \textbf{improved latch-based sense amplifier design} from . % TODO: cite
+an \textbf{improved latch-based sense amplifier design} from \cite{210039}. % TODO: cite
 The design I used is shown in Figure \ref{fig:latch-amp}.
 I left it sized at $40\lambda$, since larger amplifiers seem to take longer
 to trigger and exit metastability.
@@ -163,9 +184,32 @@ the initial clock. Thus, if a write occurred during a previous cycle, the write
 activate for a short period of time before the read block does. The memory cell
 will overpower this initial misfire\footnote{According to my additional simulations, this is true even when the memory cell is close to the write block.}, but in this case, both \textsc{Bt} and \textsc{Bf}
 will be below \textsc{Vdd}. The ``improved sense amplifier'' seems to handle this
-case better than the one based on two \textsc{Nand} gates. I think that both Reed and
+case better than the one based on two \textsc{Nand} gates. 
-Graham experienced this occurrence -- they seemed to post very similar waveforms
+
-to the community Discord group chat.
+The latch-induced delay in \textsc{Rwt} also causes a strange \textsc{Trigger} signal during write operations
 directly following read operations. The trigger signal initialy activates, putting the sense
 amplifier into metastability; however, the correct \textsc{Rwt} value arrives before the
 sense amp's outputs are compromised. If this became a problem, I would add an additional,
 delayed clock signal \emph{after} the sense amplifier, and use an \textsc{And} gate
 to delay the read block's output.
 \begin{figure}[h]
 \centering
 \begin{subfigure}{.5\textwidth}
  \centering
  \includegraphics[width=.7\linewidth]{amp.png}
  \caption{The latch-based sense amplifier from \cite{210039}.}
  \label{fig:latch-amp}
 \end{subfigure}%
 \begin{subfigure}{.5\textwidth}
  \centering
  \includegraphics[width=.8\linewidth]{read_select.png}
  \caption{The block gathering signals from the four columns.}
  \label{fig:read-collect}
 \end{subfigure}
 \caption{Read block schematics}
 \label{fig:read}
 \end{figure}
 \pagebreak
 \subsection{Write Block}
@@ -173,8 +217,8 @@ to the community Discord group chat.
 The write block converts a ``data in'', or \textsc{Din}, signal
 into a one-hot representation. It does so by pulling one of the bitlines high, and the other
 low. Once the memory cell connects to the bitlines, it takes on the charge provided by the
-write block, and is therefore overwritten. In my design, two PMOS transistor for each bitline
+write block, and is therefore overwritten. In my design, two PMOS transistors for each bitline
-are used to pull down; one of the transistors is triggered by \textsc{Din} signal (which wire
+are used to pull down; one of the transistors is triggered by the \textsc{Din} signal (which wire
 we pull down depends on the signal itself!), and the other by a combination of the clock
 and \textsc{Rwt} (we don't want to touch the wires when reading!).
@@ -194,7 +238,9 @@ time is spent reading the wires, the memory cell in question is able to graduall
 of charge on one of these wires. Since the original, \textsc{Nand}-based sense amplifier required
 all inputs to be high to properly function, this led to it eventually ``flipping'' and producing
 the wrong output. This was only an issue above $5\textit{ns}$, and only with the original sense amplifier
-design, though.
+design, though. I think that both Reed and
 Graham experienced this occurrence -- they seemed to post very similar waveforms
 to the community Discord group chat.
 One thing to note about the write block is that its \textbf{clock input is deliberately delayed} compared
 to the ``actual'' clock. This is because of an issue with \textsc{Din}. Since this
@@ -202,12 +248,19 @@ input is behind a latch, it takes around $300\textit{ps}$ to arrive after the ri
 edge. If the previous value of \textsc{Din} was different than its current one, the write
 block will start writing the wrong value. This will typically mean that the block cannot properly
 perform the write. The delay on the clock input serves to mitigate this issue, by giving more
-time for \textbf{Din} to settle before starting to write. To compensate for this delay, I sized
+time for \textsc{Din} to settle before starting to write. To compensate for this delay, I sized
 the write block's pull down transistors quite large ($100\lambda$), so that they can pull
 the wire down, even starting $300\textit{ps}$ into the cycle. This is why the ``clock'' input
 in my diagrams is colored black, unlike every other clocked component. The delay is achieved
 by 6 sequenced inverters, two of which are sized 10x larger than the rest.
 \begin{figure}[h]
    \centering
    \includegraphics[width=0.65\linewidth]{write.png}
    \caption{Write block used in this project.}
    \label{fig:write}
 \end{figure}
 \pagebreak
 \subsection{Memory Cell}
 \subsubsection{In My Own Words}
@@ -250,4 +303,58 @@ above - it becomes nigh impossible to wire further \textsc{Wl} lines through eac
 unless the decoder is split into bits, in which case the width of the entire assembly drastically increases,
 slowing down all signals.
 \begin{figure}[h]
    \centering
    \includegraphics[width=0.5\linewidth]{layout_single.png}
    \caption{Electric layout for a single cell.}
    \label{fig:layout-cell}
 \end{figure}
 \pagebreak
 My basic cell is shown in Figure \ref{fig:layout-cell}. The arrayed version (in Figure \ref{fig:layout-arrayed})
 merits additional explanation. In my earlier description of the overall design, I mentioned
 that I have precharge PMOS transistors. I have integrated these into my layout to accurately model
 my design. I also made them $10\lambda$ wide, since this is, at the time of writing,
 the size of my 4 precharge transistors. In the bird's eye view (Figure \ref{fig:layout-arrayed-far}),
 three things can be observed:
 \begin{itemize}
    \item \textit{Additional vertical line:} This line represents the clock signal,
        which must be fed to the precharge transistors. In the full design, there would
        be 5 clock lines (3 shared, and 2 on either side).
    \item \textit{``Empty'' space between nodes:} I left this space because I was not sure
        how wide I would end up making my \textsc{Bt} and \textsc{Bf} wires. I have measured
        the distance to ensure that the design will remain DRC clean with up to \textbf{$8\lambda$-wide bitlines}.
        This appears to be a sweet spot for my design, anyway.
    \item \textit{Moved well contacts:} I have moved my well contacts to the region between
        two columns. By extending the N- and P-wells to this area, I was able to
        share a single contact between two cells, leaving room for prechare transistors
        on both sides of the cell. This was partially inspired by Reed's compact cell design,
        which shared a single contact between two cells\footnote{I am operating based on your
        comment that well contacts for every cell are significantly overkill.}.
 \end{itemize}
 Figure \ref{fig:layout-arrayed-close} shows a closer view of the design. Due to the additional
 space incurred, an entire column is approximately $100\lambda$ wide.
 \begin{figure}[h]
 \centering
 \begin{subfigure}{.5\textwidth}
  \centering
  \includegraphics[width=.7\linewidth]{layout_arrayed.png}
  \caption{Bird's eye view of the arrayed SRAM cells.}
  \label{fig:layout-arrayed-far}
 \end{subfigure}%
 \begin{subfigure}{.5\textwidth}
  \centering
  \includegraphics[width=.8\linewidth]{layout_arrayed_closeup.png}
  \caption{Close up from arrayed SRAM cells.}
  \label{fig:layout-arrayed-close}
 \end{subfigure}
 \caption{Read block schematics}
 \label{fig:layout-arrayed}
 \end{figure}
 \pagebreak
 \bibliographystyle{unsrt}
 \bibliography{bibliography}
 \end{document}