Update TODOs.

This commit is contained in:
Danila Fedorin 2021-03-17 12:16:02 -07:00
parent 9afa839bff
commit b99403a4ff

View File

@ -4,6 +4,8 @@
\usepackage{amsmath}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{caption}
\usepackage{subcaption}
\definecolor{link}{HTML}{006275}
\hypersetup{
colorlinks,
@ -45,7 +47,7 @@ changes:
\begin{itemize}
\item I added \textbf{additional precharge transistors} along the column, a total of 4.
Each was sized at $5\lambda$, much like the SRAM transistors themselves. When the clock
Each was sized at $10\lambda$, much like the SRAM transistors themselves. When the clock
was low, these PMOS transistors became transparent, and helped precharge the bitlines faster.
Doing so helped avid hysteresis. However, this did not help with writing during high clock,
so...
@ -57,11 +59,21 @@ changes:
configuration, I did not place it in the middle of the column, as that would needlessly
increase the length of the wires.
\end{itemize}
This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I placed
a memory cell at the very top of my column, which is the furthest spot from both the read and write
circuit. I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $5\lambda$ precharge
%
This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I \textbf{tested three configurations}:
\begin{enumerate}
\item A memory cell at the very top of my column, which is the furthest spot from both the read and write.
This is the simulation in the figure.
\item A memory cell in the middle of my column, in the same place as the write block. Since the write block
has brief ``false starts'', this test was to ensure that the read block can still pick up data
despite the write block's misfires.
\item A memory cell at the very bottom of my column. This area has additional capacitance from the read block;
it thus takes longer to charge up, and tends to be the first spot where writes fail.
circuit.
%
\end{enumerate}
I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $10\lambda$ precharge
transistors, as well as 16 always-off $5\lambda$ transistors, which simulated the remaining memory cells.
I also placed \textsc{Din}, \textsc{Ad0}, and \textsc{Rwt} behind the default-sized flip-flops
attached to the clock to simulate something like a pipeline stage. My overall design is shown
@ -89,12 +101,7 @@ of length to this number, to a total of roughly $2200\lambda$.
\pagebreak
\section{Performance Results}
I was able to clock my design at 1.38ns. There is a caveat to this clock speed: my \textsc{Bt} and
\textsc{Bf} lines are not pulled all the way to \textsc{Gnd} when they are written low. This
doesn't seem to be a problem - it's sufficient to flip the furthest cell in the design in
every situation I've tested. However, from what I hear, this was discouraged during one of
the office hours (which I was unable to attend). With the constraint of pulling the wires
all the way down, my design can operate at around 2.1ns.
I was able to clock my design at $1.9\textit{ns}$.
%
Two factors lead to these upper limits.
%
@ -116,7 +123,7 @@ Two factors lead to these upper limits.
\section{Components}
\subsection{Decoder}
\subsubsection{In My Own Words}
The decoder in this design is exact same one as we were given in lecture.
The decoder in this design is \textit{almost} the exact same one as we were given in lecture.
It computes all combinations of two consecutive bits using a \textsc{Nand} gate; for
each combination, there are 4 adjacent two-bit combinations,
leading to a 4 \textsc{Nor} gates connected to each \textsc{Nand}. There are now
@ -127,6 +134,20 @@ results in 256 unique \textsc{Wl} wires. Finally, these need to be attached
to the clock, so that cells aren't open randomly. This is done using an \textsc{And}
gate (a \textsc{Nand} followed by an inverter).
I adjusted this design to account for the address signals that need to be fed
into the write blocks. Which of the read/write columns is triggered
depends on the upper two bits of the address (since we have 4 columns). I modeled
this by increasing the fanout on the first \textsc{Nand} gate from 1 to 4.
This is pessimistic; each 2-bit combination would only feed into one write block,
whose trigger gate is normally sized.
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{decoder.png}
\caption{Decoder model used in project.}
\label{fig:decoder}
\end{figure}
% TODO: Domino logic
% TODO: More inverters?
@ -151,7 +172,7 @@ on the two \textsc{Nand3} gates was easy to understand and build, but was less
sensitive, and tended to behave strangely under pressure. This led to difficulties
with debugging (the output would, for instance, flip completely at certain
wire widths), and was seemingly random. Instead, I used
an \textbf{improved latch-based sense amplifier design} from . % TODO: cite
an \textbf{improved latch-based sense amplifier design} from \cite{210039}. % TODO: cite
The design I used is shown in Figure \ref{fig:latch-amp}.
I left it sized at $40\lambda$, since larger amplifiers seem to take longer
to trigger and exit metastability.
@ -163,9 +184,32 @@ the initial clock. Thus, if a write occurred during a previous cycle, the write
activate for a short period of time before the read block does. The memory cell
will overpower this initial misfire\footnote{According to my additional simulations, this is true even when the memory cell is close to the write block.}, but in this case, both \textsc{Bt} and \textsc{Bf}
will be below \textsc{Vdd}. The ``improved sense amplifier'' seems to handle this
case better than the one based on two \textsc{Nand} gates. I think that both Reed and
Graham experienced this occurrence -- they seemed to post very similar waveforms
to the community Discord group chat.
case better than the one based on two \textsc{Nand} gates.
The latch-induced delay in \textsc{Rwt} also causes a strange \textsc{Trigger} signal during write operations
directly following read operations. The trigger signal initialy activates, putting the sense
amplifier into metastability; however, the correct \textsc{Rwt} value arrives before the
sense amp's outputs are compromised. If this became a problem, I would add an additional,
delayed clock signal \emph{after} the sense amplifier, and use an \textsc{And} gate
to delay the read block's output.
\begin{figure}[h]
\centering
\begin{subfigure}{.5\textwidth}
\centering
\includegraphics[width=.7\linewidth]{amp.png}
\caption{The latch-based sense amplifier from \cite{210039}.}
\label{fig:latch-amp}
\end{subfigure}%
\begin{subfigure}{.5\textwidth}
\centering
\includegraphics[width=.8\linewidth]{read_select.png}
\caption{The block gathering signals from the four columns.}
\label{fig:read-collect}
\end{subfigure}
\caption{Read block schematics}
\label{fig:read}
\end{figure}
\pagebreak
\subsection{Write Block}
@ -173,8 +217,8 @@ to the community Discord group chat.
The write block converts a ``data in'', or \textsc{Din}, signal
into a one-hot representation. It does so by pulling one of the bitlines high, and the other
low. Once the memory cell connects to the bitlines, it takes on the charge provided by the
write block, and is therefore overwritten. In my design, two PMOS transistor for each bitline
are used to pull down; one of the transistors is triggered by \textsc{Din} signal (which wire
write block, and is therefore overwritten. In my design, two PMOS transistors for each bitline
are used to pull down; one of the transistors is triggered by the \textsc{Din} signal (which wire
we pull down depends on the signal itself!), and the other by a combination of the clock
and \textsc{Rwt} (we don't want to touch the wires when reading!).
@ -194,7 +238,9 @@ time is spent reading the wires, the memory cell in question is able to graduall
of charge on one of these wires. Since the original, \textsc{Nand}-based sense amplifier required
all inputs to be high to properly function, this led to it eventually ``flipping'' and producing
the wrong output. This was only an issue above $5\textit{ns}$, and only with the original sense amplifier
design, though.
design, though. I think that both Reed and
Graham experienced this occurrence -- they seemed to post very similar waveforms
to the community Discord group chat.
One thing to note about the write block is that its \textbf{clock input is deliberately delayed} compared
to the ``actual'' clock. This is because of an issue with \textsc{Din}. Since this
@ -202,12 +248,19 @@ input is behind a latch, it takes around $300\textit{ps}$ to arrive after the ri
edge. If the previous value of \textsc{Din} was different than its current one, the write
block will start writing the wrong value. This will typically mean that the block cannot properly
perform the write. The delay on the clock input serves to mitigate this issue, by giving more
time for \textbf{Din} to settle before starting to write. To compensate for this delay, I sized
time for \textsc{Din} to settle before starting to write. To compensate for this delay, I sized
the write block's pull down transistors quite large ($100\lambda$), so that they can pull
the wire down, even starting $300\textit{ps}$ into the cycle. This is why the ``clock'' input
in my diagrams is colored black, unlike every other clocked component. The delay is achieved
by 6 sequenced inverters, two of which are sized 10x larger than the rest.
\begin{figure}[h]
\centering
\includegraphics[width=0.65\linewidth]{write.png}
\caption{Write block used in this project.}
\label{fig:write}
\end{figure}
\pagebreak
\subsection{Memory Cell}
\subsubsection{In My Own Words}
@ -250,4 +303,58 @@ above - it becomes nigh impossible to wire further \textsc{Wl} lines through eac
unless the decoder is split into bits, in which case the width of the entire assembly drastically increases,
slowing down all signals.
\begin{figure}[h]
\centering
\includegraphics[width=0.5\linewidth]{layout_single.png}
\caption{Electric layout for a single cell.}
\label{fig:layout-cell}
\end{figure}
\pagebreak
My basic cell is shown in Figure \ref{fig:layout-cell}. The arrayed version (in Figure \ref{fig:layout-arrayed})
merits additional explanation. In my earlier description of the overall design, I mentioned
that I have precharge PMOS transistors. I have integrated these into my layout to accurately model
my design. I also made them $10\lambda$ wide, since this is, at the time of writing,
the size of my 4 precharge transistors. In the bird's eye view (Figure \ref{fig:layout-arrayed-far}),
three things can be observed:
\begin{itemize}
\item \textit{Additional vertical line:} This line represents the clock signal,
which must be fed to the precharge transistors. In the full design, there would
be 5 clock lines (3 shared, and 2 on either side).
\item \textit{``Empty'' space between nodes:} I left this space because I was not sure
how wide I would end up making my \textsc{Bt} and \textsc{Bf} wires. I have measured
the distance to ensure that the design will remain DRC clean with up to \textbf{$8\lambda$-wide bitlines}.
This appears to be a sweet spot for my design, anyway.
\item \textit{Moved well contacts:} I have moved my well contacts to the region between
two columns. By extending the N- and P-wells to this area, I was able to
share a single contact between two cells, leaving room for prechare transistors
on both sides of the cell. This was partially inspired by Reed's compact cell design,
which shared a single contact between two cells\footnote{I am operating based on your
comment that well contacts for every cell are significantly overkill.}.
\end{itemize}
Figure \ref{fig:layout-arrayed-close} shows a closer view of the design. Due to the additional
space incurred, an entire column is approximately $100\lambda$ wide.
\begin{figure}[h]
\centering
\begin{subfigure}{.5\textwidth}
\centering
\includegraphics[width=.7\linewidth]{layout_arrayed.png}
\caption{Bird's eye view of the arrayed SRAM cells.}
\label{fig:layout-arrayed-far}
\end{subfigure}%
\begin{subfigure}{.5\textwidth}
\centering
\includegraphics[width=.8\linewidth]{layout_arrayed_closeup.png}
\caption{Close up from arrayed SRAM cells.}
\label{fig:layout-arrayed-close}
\end{subfigure}
\caption{Read block schematics}
\label{fig:layout-arrayed}
\end{figure}
\pagebreak
\bibliographystyle{unsrt}
\bibliography{bibliography}
\end{document}