Update report.

1.9ns everywhere.
Add final SRAM design.
2021-03-17 12:21:35 -07:00 · 2021-03-17 12:19:33 -07:00 · 2021-03-17 12:19:17 -07:00 · 2021-03-17 12:19:02 -07:00 · 2021-03-17 12:16:02 -07:00 · 2021-03-17 11:51:50 -07:00
8 changed files with 1354 additions and 565 deletions
--- a/final/SRAM.jelib
+++ b/final/SRAM.jelib
--- a/final/SRAM_bits.cir
+++ b/final/SRAM_bits.cir
@@ -268,7 +268,7 @@ Xh2 nn1 dot inv size='60'
 .subckt decModel choose din clk size='20'
 Xi1 nn1 din inv size='size'
 * Here: stopped using i1 and just used din
-Xnal ww1 gnd din nnd2 size='size'
+Xnal ww1 gnd din nnd2 size='size*4'
 Xnar nn2 vdd din nnd2 size='size'
 Xnrl ww2 nn2 vdd nor2 size='size*3'
 Xnrr nn3 nn2 gnd nor2 size='size'
--- a/final/layout_arrayed.png
+++ b/final/layout_arrayed.png
--- a/final/layout_arrayed_closeup.png
+++ b/final/layout_arrayed_closeup.png
--- a/final/layout_single.png
+++ b/final/layout_single.png
--- a/final/report.tex
+++ b/final/report.tex
@@ -4,6 +4,8 @@
 \usepackage{amsmath}
 \usepackage{hyperref}
 \usepackage{xcolor}
+\usepackage{caption}
+\usepackage{subcaption}
 \definecolor{link}{HTML}{006275}
 \hypersetup{
    colorlinks,
@@ -45,7 +47,7 @@ changes:

 \begin{itemize}
    \item I added \textbf{additional precharge transistors} along the column, a total of 4.
-        Each was sized at $5\lambda$, much like the SRAM transistors themselves. When the clock
+        Each was sized at $10\lambda$, much like the SRAM transistors themselves. When the clock
        was low, these PMOS transistors became transparent, and helped precharge the bitlines faster.
        Doing so helped avid hysteresis. However, this did not help with writing during high clock,
        so...
@@ -57,11 +59,21 @@ changes:
        configuration, I did not place it in the middle of the column, as that would needlessly
        increase the length of the wires. 
 \end{itemize}
-
-This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I placed
-a memory cell at the very top of my column, which is the furthest spot from both the read and write
-circuit. I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
-capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $5\lambda$ precharge
+%
+This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I \textbf{tested three configurations}:
+\begin{enumerate}
+    \item A memory cell at the very top of my column, which is the furthest spot from both the read and write.
+        This is the simulation in the figure.
+    \item A memory cell in the middle of my column, in the same place as the write block. Since the write block
+        has brief ``false starts'', this test was to ensure that the read block can still pick up data
+        despite the write block's misfires.
+    \item A memory cell at the very bottom of my column. This area has additional capacitance from the read block;
+        it thus takes longer to charge up, and tends to be the first spot where writes fail.
+circuit. 
+%
+\end{enumerate}
+I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
+capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $10\lambda$ precharge
 transistors, as well as 16 always-off $5\lambda$ transistors, which simulated the remaining memory cells.
 I also placed \textsc{Din}, \textsc{Ad0}, and \textsc{Rwt} behind the default-sized flip-flops
 attached to the clock to simulate something like a pipeline stage. My overall design is shown
@@ -89,12 +101,7 @@ of length to this number, to a total of roughly $2200\lambda$.

 \pagebreak
 \section{Performance Results}
-I was able to clock my design at 1.38ns. There is a caveat to this clock speed: my \textsc{Bt} and
-\textsc{Bf} lines are not pulled all the way to \textsc{Gnd} when they are written low. This
-doesn't seem to be a problem - it's sufficient to flip the furthest cell in the design in
-every situation I've tested. However, from what I hear, this was discouraged during one of
-the office hours (which I was unable to attend). With the constraint of pulling the wires
-all the way down, my design can operate at around 2.1ns.
+I was able to clock my design at $1.9\textit{ns}$.
 %
 Two factors lead to these upper limits. 
 %
@@ -116,7 +123,7 @@ Two factors lead to these upper limits.
 \section{Components}
 \subsection{Decoder}
 \subsubsection{In My Own Words}
-The decoder in this design is exact same one as we were given in lecture.
+The decoder in this design is \textit{almost} the exact same one as we were given in lecture.
 It computes all combinations of two consecutive bits using a \textsc{Nand} gate; for
 each combination, there are 4 adjacent two-bit combinations,
 leading to a 4 \textsc{Nor} gates connected to each \textsc{Nand}. There are now
@@ -127,6 +134,20 @@ results in 256 unique \textsc{Wl} wires. Finally, these need to be attached
 to the clock, so that cells aren't open randomly. This is done using an \textsc{And}
 gate (a \textsc{Nand} followed by an inverter).

+I adjusted this design to account for the address signals that need to be fed
+into the write blocks. Which of the read/write columns is triggered
+depends on the upper two bits of the address (since we have 4 columns). I modeled
+this by increasing the fanout on the first \textsc{Nand} gate from 1 to 4.
+This is pessimistic; each 2-bit combination would only feed into one write block,
+whose trigger gate is normally sized.
+
+\begin{figure}[h]
+    \centering
+    \includegraphics[width=\linewidth]{decoder.png}
+    \caption{Decoder model used in project.}
+    \label{fig:decoder}
+\end{figure}
+
 % TODO: Domino logic
 % TODO: More inverters?

@@ -151,7 +172,7 @@ on the two \textsc{Nand3} gates was easy to understand and build, but was less
 sensitive, and tended to behave strangely under pressure. This led to difficulties
 with debugging (the output would, for instance, flip completely at certain
 wire widths), and was seemingly random. Instead, I used
-an \textbf{improved latch-based sense amplifier design} from . % TODO: cite
+an \textbf{improved latch-based sense amplifier design} from \cite{210039}. % TODO: cite
 The design I used is shown in Figure \ref{fig:latch-amp}.
 I left it sized at $40\lambda$, since larger amplifiers seem to take longer
 to trigger and exit metastability.
@@ -163,9 +184,32 @@ the initial clock. Thus, if a write occurred during a previous cycle, the write
 activate for a short period of time before the read block does. The memory cell
 will overpower this initial misfire\footnote{According to my additional simulations, this is true even when the memory cell is close to the write block.}, but in this case, both \textsc{Bt} and \textsc{Bf}
 will be below \textsc{Vdd}. The ``improved sense amplifier'' seems to handle this
-case better than the one based on two \textsc{Nand} gates. I think that both Reed and
-Graham experienced this occurrence -- they seemed to post very similar waveforms
-to the community Discord group chat.
+case better than the one based on two \textsc{Nand} gates. 
+
+The latch-induced delay in \textsc{Rwt} also causes a strange \textsc{Trigger} signal during write operations
+directly following read operations. The trigger signal initialy activates, putting the sense
+amplifier into metastability; however, the correct \textsc{Rwt} value arrives before the
+sense amp's outputs are compromised. If this became a problem, I would add an additional,
+delayed clock signal \emph{after} the sense amplifier, and use an \textsc{And} gate
+to delay the read block's output.
+
+\begin{figure}[h]
+\centering
+\begin{subfigure}{.5\textwidth}
+  \centering
+  \includegraphics[width=.7\linewidth]{amp.png}
+  \caption{The latch-based sense amplifier from \cite{210039}.}
+  \label{fig:latch-amp}
+\end{subfigure}%
+\begin{subfigure}{.5\textwidth}
+  \centering
+  \includegraphics[width=.8\linewidth]{read_select.png}
+  \caption{The block gathering signals from the four columns.}
+  \label{fig:read-collect}
+\end{subfigure}
+\caption{Read block schematics}
+\label{fig:read}
+\end{figure}

 \pagebreak
 \subsection{Write Block}
@@ -173,8 +217,8 @@ to the community Discord group chat.
 The write block converts a ``data in'', or \textsc{Din}, signal
 into a one-hot representation. It does so by pulling one of the bitlines high, and the other
 low. Once the memory cell connects to the bitlines, it takes on the charge provided by the
-write block, and is therefore overwritten. In my design, two PMOS transistor for each bitline
-are used to pull down; one of the transistors is triggered by \textsc{Din} signal (which wire
+write block, and is therefore overwritten. In my design, two PMOS transistors for each bitline
+are used to pull down; one of the transistors is triggered by the \textsc{Din} signal (which wire
 we pull down depends on the signal itself!), and the other by a combination of the clock
 and \textsc{Rwt} (we don't want to touch the wires when reading!).

@@ -194,7 +238,9 @@ time is spent reading the wires, the memory cell in question is able to graduall
 of charge on one of these wires. Since the original, \textsc{Nand}-based sense amplifier required
 all inputs to be high to properly function, this led to it eventually ``flipping'' and producing
 the wrong output. This was only an issue above $5\textit{ns}$, and only with the original sense amplifier
-design, though.
+design, though. I think that both Reed and
+Graham experienced this occurrence -- they seemed to post very similar waveforms
+to the community Discord group chat.

 One thing to note about the write block is that its \textbf{clock input is deliberately delayed} compared
 to the ``actual'' clock. This is because of an issue with \textsc{Din}. Since this
@@ -202,12 +248,19 @@ input is behind a latch, it takes around $300\textit{ps}$ to arrive after the ri
 edge. If the previous value of \textsc{Din} was different than its current one, the write
 block will start writing the wrong value. This will typically mean that the block cannot properly
 perform the write. The delay on the clock input serves to mitigate this issue, by giving more
-time for \textbf{Din} to settle before starting to write. To compensate for this delay, I sized
+time for \textsc{Din} to settle before starting to write. To compensate for this delay, I sized
 the write block's pull down transistors quite large ($100\lambda$), so that they can pull
 the wire down, even starting $300\textit{ps}$ into the cycle. This is why the ``clock'' input
 in my diagrams is colored black, unlike every other clocked component. The delay is achieved
 by 6 sequenced inverters, two of which are sized 10x larger than the rest.

+\begin{figure}[h]
+    \centering
+    \includegraphics[width=0.65\linewidth]{write.png}
+    \caption{Write block used in this project.}
+    \label{fig:write}
+\end{figure}
+
 \pagebreak
 \subsection{Memory Cell}
 \subsubsection{In My Own Words}
@@ -229,7 +282,7 @@ the vertical wires, \textsc{Bt} and \textsc{Bf}. This allowed me to use metal fo
 \textsc{Wl} (access) signal. Since this was the only use of metal four, I had enough free
 room to route thee additional \textsc{Wl} signals to the remaining three columns. 

-My general principle for designing the layout was that, in an 8-bit, 4-column design, \textbf{a single
+My general principle for designing the layout was that, in an 12-bit, 4-column design, \textbf{a single
 unit of height costs as much as 64 units of width}. Thus, I was fairly liberal with my layout's
 width, but made sure to minimize the height of the design. The most significant bottleneck
 was the gate oxide ``poking out'' of the ends of the design. In total, I was able to achieve
@@ -250,4 +303,58 @@ above - it becomes nigh impossible to wire further \textsc{Wl} lines through eac
 unless the decoder is split into bits, in which case the width of the entire assembly drastically increases,
 slowing down all signals.

+\begin{figure}[h]
+    \centering
+    \includegraphics[width=0.5\linewidth]{layout_single.png}
+    \caption{Electric layout for a single cell.}
+    \label{fig:layout-cell}
+\end{figure}
+
+\pagebreak
+My basic cell is shown in Figure \ref{fig:layout-cell}. The arrayed version (in Figure \ref{fig:layout-arrayed})
+merits additional explanation. In my earlier description of the overall design, I mentioned
+that I have precharge PMOS transistors. I have integrated these into my layout to accurately model
+my design. I also made them $10\lambda$ wide, since this is, at the time of writing,
+the size of my 4 precharge transistors. In the bird's eye view (Figure \ref{fig:layout-arrayed-far}),
+three things can be observed:
+\begin{itemize}
+    \item \textit{Additional vertical line:} This line represents the clock signal,
+        which must be fed to the precharge transistors. In the full design, there would
+        be 5 clock lines (3 shared, and 2 on either side).
+    \item \textit{``Empty'' space between nodes:} I left this space because I was not sure
+        how wide I would end up making my \textsc{Bt} and \textsc{Bf} wires. I have measured
+        the distance to ensure that the design will remain DRC clean with up to \textbf{$8\lambda$-wide bitlines}.
+        This appears to be a sweet spot for my design, anyway.
+    \item \textit{Moved well contacts:} I have moved my well contacts to the region between
+        two columns. By extending the N- and P-wells to this area, I was able to
+        share a single contact between two cells, leaving room for prechare transistors
+        on both sides of the cell. This was partially inspired by Reed's compact cell design,
+        which shared a single contact between two cells\footnote{I am operating based on your
+        comment that well contacts for every cell are significantly overkill.}.
+\end{itemize}
+Figure \ref{fig:layout-arrayed-close} shows a closer view of the design. Due to the additional
+space incurred, an entire column is approximately $100\lambda$ wide.
+
+\begin{figure}[h]
+\centering
+\begin{subfigure}{.5\textwidth}
+  \centering
+  \includegraphics[width=.7\linewidth]{layout_arrayed.png}
+  \caption{Bird's eye view of the arrayed SRAM cells.}
+  \label{fig:layout-arrayed-far}
+\end{subfigure}%
+\begin{subfigure}{.5\textwidth}
+  \centering
+  \includegraphics[width=.8\linewidth]{layout_arrayed_closeup.png}
+  \caption{Close up from arrayed SRAM cells.}
+  \label{fig:layout-arrayed-close}
+\end{subfigure}
+\caption{Read block schematics}
+\label{fig:layout-arrayed}
+\end{figure}
+
+\pagebreak
+\bibliographystyle{unsrt}
+\bibliography{bibliography}
+
 \end{document}
--- a/final/testBuffer.cir
+++ b/final/testBuffer.cir
@@ -21,17 +21,17 @@ Xnf fff gnd dead nn ww='number*5'


 *********begin: topLevel*****
-.param per = 1.33ns
+.param per = 1.9ns
 .param dataLead=per*0.1
-.param lw=2000
-.param wirew=14
+.param lw=2200
+.param wirew=12

 vdd vdd 0 'supply'

 Xclok clk               dat1 period='per' start='per+dataLead' total=1 duty=0.5 sz=300
 Xad ad               dat1 period='per' start='per' total=1 duty=0.5 sz=300
-Xrdwr rdw               dat1 period='3*per' start='2*per'        total=2 duty=1 sz=300
-Xdii din                dat1 period='3*per' start='per'          total=4 duty=2 sz=300
+Xrdwr rdw               dat1 period='per' start='2*per'        total=2 duty=1 sz=300
+Xdii din                dat1 period='per' start='per'          total=4 duty=2 sz=300

 Xinv1 clkb1 clk inv
 Xinv2 clkb2 clkb1 inv
@@ -54,8 +54,10 @@ Xmd2 bt3 bf3 memLoad number=16
 Xw3 bt3 bt4 bf3 bf4  clk   wire_precharge len='lw/4' wid='wirew'
 Xmd3 bt4 bf4 memLoad number=16
 Xw4 bt4 btt bf4 bff  clk   wire_precharge len='lw/4' wid='wirew'
-Xmd4 btt bff             memLoad number =16
-Xla bt1 bf1 choose         mem1
+Xmd4 bt3 bf3             memLoad number =16
+* Xla bt1 bf1 choose         mem1
+* Xla bt3 bf3 choose         mem1
+Xla btt bff choose         mem1
 Xrd btt bff set rst rdwf clk choose iReadSub
 Xrc dot set rst vdd vdd vdd vdd vdd vdd readCollect

--- a/final/todo.md
+++ b/final/todo.md
@@ -6,8 +6,8 @@
 * [x] Figure out what to do with flopped write block.
 * [x] Test data close to write block (it pulls up past clock low!)
 * [ ] Drive wires to zero?
-* [ ] Add missing well connection in layout
-* [ ] Make sure width isn't too horrible
+* [x] Add missing well connection in layout
+* [x] Make sure width isn't too horrible
 * [ ] Model additional delay for read read/write block select?
-* [ ] Model worst case of decoder
-* [ ] Cite [this](https://ieeexplore.ieee.org/document/210039)
+* [x] Model worst case of decoder
+* [x] Cite [this](https://ieeexplore.ieee.org/document/210039)
Author	SHA1	Message	Date
Danila Fedorin	39ec744562	Update report.	2021-03-17 12:21:35 -07:00
Danila Fedorin	6530e7ef8c	1.9ns everywhere.	2021-03-17 12:19:33 -07:00
Danila Fedorin	71195df7c9	Add final SRAM design.	2021-03-17 12:19:17 -07:00
Danila Fedorin	0c1d8611b1	Add missing images.	2021-03-17 12:19:02 -07:00
Danila Fedorin	b99403a4ff	Update TODOs.	2021-03-17 12:16:02 -07:00
Danila Fedorin	9afa839bff	Add TODO.	2021-03-17 11:51:50 -07:00
Danila Fedorin	6f99879b8f	Update electric files.	2021-03-17 09:00:29 -07:00