Compare commits
7 Commits
fe52f689f9
...
39ec744562
Author | SHA1 | Date | |
---|---|---|---|
39ec744562 | |||
6530e7ef8c | |||
71195df7c9 | |||
0c1d8611b1 | |||
b99403a4ff | |||
9afa839bff | |||
6f99879b8f |
1742
final/SRAM.jelib
1742
final/SRAM.jelib
File diff suppressed because it is too large
Load Diff
@ -268,7 +268,7 @@ Xh2 nn1 dot inv size='60'
|
||||
.subckt decModel choose din clk size='20'
|
||||
Xi1 nn1 din inv size='size'
|
||||
* Here: stopped using i1 and just used din
|
||||
Xnal ww1 gnd din nnd2 size='size'
|
||||
Xnal ww1 gnd din nnd2 size='size*4'
|
||||
Xnar nn2 vdd din nnd2 size='size'
|
||||
Xnrl ww2 nn2 vdd nor2 size='size*3'
|
||||
Xnrr nn3 nn2 gnd nor2 size='size'
|
||||
|
BIN
final/layout_arrayed.png
Normal file
BIN
final/layout_arrayed.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 394 KiB |
BIN
final/layout_arrayed_closeup.png
Normal file
BIN
final/layout_arrayed_closeup.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 100 KiB |
BIN
final/layout_single.png
Normal file
BIN
final/layout_single.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 210 KiB |
151
final/report.tex
151
final/report.tex
@ -4,6 +4,8 @@
|
||||
\usepackage{amsmath}
|
||||
\usepackage{hyperref}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{caption}
|
||||
\usepackage{subcaption}
|
||||
\definecolor{link}{HTML}{006275}
|
||||
\hypersetup{
|
||||
colorlinks,
|
||||
@ -45,7 +47,7 @@ changes:
|
||||
|
||||
\begin{itemize}
|
||||
\item I added \textbf{additional precharge transistors} along the column, a total of 4.
|
||||
Each was sized at $5\lambda$, much like the SRAM transistors themselves. When the clock
|
||||
Each was sized at $10\lambda$, much like the SRAM transistors themselves. When the clock
|
||||
was low, these PMOS transistors became transparent, and helped precharge the bitlines faster.
|
||||
Doing so helped avid hysteresis. However, this did not help with writing during high clock,
|
||||
so...
|
||||
@ -57,11 +59,21 @@ changes:
|
||||
configuration, I did not place it in the middle of the column, as that would needlessly
|
||||
increase the length of the wires.
|
||||
\end{itemize}
|
||||
|
||||
This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I placed
|
||||
a memory cell at the very top of my column, which is the furthest spot from both the read and write
|
||||
circuit. I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
|
||||
capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $5\lambda$ precharge
|
||||
%
|
||||
This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I \textbf{tested three configurations}:
|
||||
\begin{enumerate}
|
||||
\item A memory cell at the very top of my column, which is the furthest spot from both the read and write.
|
||||
This is the simulation in the figure.
|
||||
\item A memory cell in the middle of my column, in the same place as the write block. Since the write block
|
||||
has brief ``false starts'', this test was to ensure that the read block can still pick up data
|
||||
despite the write block's misfires.
|
||||
\item A memory cell at the very bottom of my column. This area has additional capacitance from the read block;
|
||||
it thus takes longer to charge up, and tends to be the first spot where writes fail.
|
||||
circuit.
|
||||
%
|
||||
\end{enumerate}
|
||||
I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and
|
||||
capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $10\lambda$ precharge
|
||||
transistors, as well as 16 always-off $5\lambda$ transistors, which simulated the remaining memory cells.
|
||||
I also placed \textsc{Din}, \textsc{Ad0}, and \textsc{Rwt} behind the default-sized flip-flops
|
||||
attached to the clock to simulate something like a pipeline stage. My overall design is shown
|
||||
@ -89,12 +101,7 @@ of length to this number, to a total of roughly $2200\lambda$.
|
||||
|
||||
\pagebreak
|
||||
\section{Performance Results}
|
||||
I was able to clock my design at 1.38ns. There is a caveat to this clock speed: my \textsc{Bt} and
|
||||
\textsc{Bf} lines are not pulled all the way to \textsc{Gnd} when they are written low. This
|
||||
doesn't seem to be a problem - it's sufficient to flip the furthest cell in the design in
|
||||
every situation I've tested. However, from what I hear, this was discouraged during one of
|
||||
the office hours (which I was unable to attend). With the constraint of pulling the wires
|
||||
all the way down, my design can operate at around 2.1ns.
|
||||
I was able to clock my design at $1.9\textit{ns}$.
|
||||
%
|
||||
Two factors lead to these upper limits.
|
||||
%
|
||||
@ -116,7 +123,7 @@ Two factors lead to these upper limits.
|
||||
\section{Components}
|
||||
\subsection{Decoder}
|
||||
\subsubsection{In My Own Words}
|
||||
The decoder in this design is exact same one as we were given in lecture.
|
||||
The decoder in this design is \textit{almost} the exact same one as we were given in lecture.
|
||||
It computes all combinations of two consecutive bits using a \textsc{Nand} gate; for
|
||||
each combination, there are 4 adjacent two-bit combinations,
|
||||
leading to a 4 \textsc{Nor} gates connected to each \textsc{Nand}. There are now
|
||||
@ -127,6 +134,20 @@ results in 256 unique \textsc{Wl} wires. Finally, these need to be attached
|
||||
to the clock, so that cells aren't open randomly. This is done using an \textsc{And}
|
||||
gate (a \textsc{Nand} followed by an inverter).
|
||||
|
||||
I adjusted this design to account for the address signals that need to be fed
|
||||
into the write blocks. Which of the read/write columns is triggered
|
||||
depends on the upper two bits of the address (since we have 4 columns). I modeled
|
||||
this by increasing the fanout on the first \textsc{Nand} gate from 1 to 4.
|
||||
This is pessimistic; each 2-bit combination would only feed into one write block,
|
||||
whose trigger gate is normally sized.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\includegraphics[width=\linewidth]{decoder.png}
|
||||
\caption{Decoder model used in project.}
|
||||
\label{fig:decoder}
|
||||
\end{figure}
|
||||
|
||||
% TODO: Domino logic
|
||||
% TODO: More inverters?
|
||||
|
||||
@ -151,7 +172,7 @@ on the two \textsc{Nand3} gates was easy to understand and build, but was less
|
||||
sensitive, and tended to behave strangely under pressure. This led to difficulties
|
||||
with debugging (the output would, for instance, flip completely at certain
|
||||
wire widths), and was seemingly random. Instead, I used
|
||||
an \textbf{improved latch-based sense amplifier design} from . % TODO: cite
|
||||
an \textbf{improved latch-based sense amplifier design} from \cite{210039}. % TODO: cite
|
||||
The design I used is shown in Figure \ref{fig:latch-amp}.
|
||||
I left it sized at $40\lambda$, since larger amplifiers seem to take longer
|
||||
to trigger and exit metastability.
|
||||
@ -163,9 +184,32 @@ the initial clock. Thus, if a write occurred during a previous cycle, the write
|
||||
activate for a short period of time before the read block does. The memory cell
|
||||
will overpower this initial misfire\footnote{According to my additional simulations, this is true even when the memory cell is close to the write block.}, but in this case, both \textsc{Bt} and \textsc{Bf}
|
||||
will be below \textsc{Vdd}. The ``improved sense amplifier'' seems to handle this
|
||||
case better than the one based on two \textsc{Nand} gates. I think that both Reed and
|
||||
Graham experienced this occurrence -- they seemed to post very similar waveforms
|
||||
to the community Discord group chat.
|
||||
case better than the one based on two \textsc{Nand} gates.
|
||||
|
||||
The latch-induced delay in \textsc{Rwt} also causes a strange \textsc{Trigger} signal during write operations
|
||||
directly following read operations. The trigger signal initialy activates, putting the sense
|
||||
amplifier into metastability; however, the correct \textsc{Rwt} value arrives before the
|
||||
sense amp's outputs are compromised. If this became a problem, I would add an additional,
|
||||
delayed clock signal \emph{after} the sense amplifier, and use an \textsc{And} gate
|
||||
to delay the read block's output.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\begin{subfigure}{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=.7\linewidth]{amp.png}
|
||||
\caption{The latch-based sense amplifier from \cite{210039}.}
|
||||
\label{fig:latch-amp}
|
||||
\end{subfigure}%
|
||||
\begin{subfigure}{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=.8\linewidth]{read_select.png}
|
||||
\caption{The block gathering signals from the four columns.}
|
||||
\label{fig:read-collect}
|
||||
\end{subfigure}
|
||||
\caption{Read block schematics}
|
||||
\label{fig:read}
|
||||
\end{figure}
|
||||
|
||||
\pagebreak
|
||||
\subsection{Write Block}
|
||||
@ -173,8 +217,8 @@ to the community Discord group chat.
|
||||
The write block converts a ``data in'', or \textsc{Din}, signal
|
||||
into a one-hot representation. It does so by pulling one of the bitlines high, and the other
|
||||
low. Once the memory cell connects to the bitlines, it takes on the charge provided by the
|
||||
write block, and is therefore overwritten. In my design, two PMOS transistor for each bitline
|
||||
are used to pull down; one of the transistors is triggered by \textsc{Din} signal (which wire
|
||||
write block, and is therefore overwritten. In my design, two PMOS transistors for each bitline
|
||||
are used to pull down; one of the transistors is triggered by the \textsc{Din} signal (which wire
|
||||
we pull down depends on the signal itself!), and the other by a combination of the clock
|
||||
and \textsc{Rwt} (we don't want to touch the wires when reading!).
|
||||
|
||||
@ -194,7 +238,9 @@ time is spent reading the wires, the memory cell in question is able to graduall
|
||||
of charge on one of these wires. Since the original, \textsc{Nand}-based sense amplifier required
|
||||
all inputs to be high to properly function, this led to it eventually ``flipping'' and producing
|
||||
the wrong output. This was only an issue above $5\textit{ns}$, and only with the original sense amplifier
|
||||
design, though.
|
||||
design, though. I think that both Reed and
|
||||
Graham experienced this occurrence -- they seemed to post very similar waveforms
|
||||
to the community Discord group chat.
|
||||
|
||||
One thing to note about the write block is that its \textbf{clock input is deliberately delayed} compared
|
||||
to the ``actual'' clock. This is because of an issue with \textsc{Din}. Since this
|
||||
@ -202,12 +248,19 @@ input is behind a latch, it takes around $300\textit{ps}$ to arrive after the ri
|
||||
edge. If the previous value of \textsc{Din} was different than its current one, the write
|
||||
block will start writing the wrong value. This will typically mean that the block cannot properly
|
||||
perform the write. The delay on the clock input serves to mitigate this issue, by giving more
|
||||
time for \textbf{Din} to settle before starting to write. To compensate for this delay, I sized
|
||||
time for \textsc{Din} to settle before starting to write. To compensate for this delay, I sized
|
||||
the write block's pull down transistors quite large ($100\lambda$), so that they can pull
|
||||
the wire down, even starting $300\textit{ps}$ into the cycle. This is why the ``clock'' input
|
||||
in my diagrams is colored black, unlike every other clocked component. The delay is achieved
|
||||
by 6 sequenced inverters, two of which are sized 10x larger than the rest.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\includegraphics[width=0.65\linewidth]{write.png}
|
||||
\caption{Write block used in this project.}
|
||||
\label{fig:write}
|
||||
\end{figure}
|
||||
|
||||
\pagebreak
|
||||
\subsection{Memory Cell}
|
||||
\subsubsection{In My Own Words}
|
||||
@ -229,7 +282,7 @@ the vertical wires, \textsc{Bt} and \textsc{Bf}. This allowed me to use metal fo
|
||||
\textsc{Wl} (access) signal. Since this was the only use of metal four, I had enough free
|
||||
room to route thee additional \textsc{Wl} signals to the remaining three columns.
|
||||
|
||||
My general principle for designing the layout was that, in an 8-bit, 4-column design, \textbf{a single
|
||||
My general principle for designing the layout was that, in an 12-bit, 4-column design, \textbf{a single
|
||||
unit of height costs as much as 64 units of width}. Thus, I was fairly liberal with my layout's
|
||||
width, but made sure to minimize the height of the design. The most significant bottleneck
|
||||
was the gate oxide ``poking out'' of the ends of the design. In total, I was able to achieve
|
||||
@ -250,4 +303,58 @@ above - it becomes nigh impossible to wire further \textsc{Wl} lines through eac
|
||||
unless the decoder is split into bits, in which case the width of the entire assembly drastically increases,
|
||||
slowing down all signals.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\includegraphics[width=0.5\linewidth]{layout_single.png}
|
||||
\caption{Electric layout for a single cell.}
|
||||
\label{fig:layout-cell}
|
||||
\end{figure}
|
||||
|
||||
\pagebreak
|
||||
My basic cell is shown in Figure \ref{fig:layout-cell}. The arrayed version (in Figure \ref{fig:layout-arrayed})
|
||||
merits additional explanation. In my earlier description of the overall design, I mentioned
|
||||
that I have precharge PMOS transistors. I have integrated these into my layout to accurately model
|
||||
my design. I also made them $10\lambda$ wide, since this is, at the time of writing,
|
||||
the size of my 4 precharge transistors. In the bird's eye view (Figure \ref{fig:layout-arrayed-far}),
|
||||
three things can be observed:
|
||||
\begin{itemize}
|
||||
\item \textit{Additional vertical line:} This line represents the clock signal,
|
||||
which must be fed to the precharge transistors. In the full design, there would
|
||||
be 5 clock lines (3 shared, and 2 on either side).
|
||||
\item \textit{``Empty'' space between nodes:} I left this space because I was not sure
|
||||
how wide I would end up making my \textsc{Bt} and \textsc{Bf} wires. I have measured
|
||||
the distance to ensure that the design will remain DRC clean with up to \textbf{$8\lambda$-wide bitlines}.
|
||||
This appears to be a sweet spot for my design, anyway.
|
||||
\item \textit{Moved well contacts:} I have moved my well contacts to the region between
|
||||
two columns. By extending the N- and P-wells to this area, I was able to
|
||||
share a single contact between two cells, leaving room for prechare transistors
|
||||
on both sides of the cell. This was partially inspired by Reed's compact cell design,
|
||||
which shared a single contact between two cells\footnote{I am operating based on your
|
||||
comment that well contacts for every cell are significantly overkill.}.
|
||||
\end{itemize}
|
||||
Figure \ref{fig:layout-arrayed-close} shows a closer view of the design. Due to the additional
|
||||
space incurred, an entire column is approximately $100\lambda$ wide.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\begin{subfigure}{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=.7\linewidth]{layout_arrayed.png}
|
||||
\caption{Bird's eye view of the arrayed SRAM cells.}
|
||||
\label{fig:layout-arrayed-far}
|
||||
\end{subfigure}%
|
||||
\begin{subfigure}{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=.8\linewidth]{layout_arrayed_closeup.png}
|
||||
\caption{Close up from arrayed SRAM cells.}
|
||||
\label{fig:layout-arrayed-close}
|
||||
\end{subfigure}
|
||||
\caption{Read block schematics}
|
||||
\label{fig:layout-arrayed}
|
||||
\end{figure}
|
||||
|
||||
\pagebreak
|
||||
\bibliographystyle{unsrt}
|
||||
\bibliography{bibliography}
|
||||
|
||||
\end{document}
|
||||
|
@ -21,17 +21,17 @@ Xnf fff gnd dead nn ww='number*5'
|
||||
|
||||
|
||||
*********begin: topLevel*****
|
||||
.param per = 1.33ns
|
||||
.param per = 1.9ns
|
||||
.param dataLead=per*0.1
|
||||
.param lw=2000
|
||||
.param wirew=14
|
||||
.param lw=2200
|
||||
.param wirew=12
|
||||
|
||||
vdd vdd 0 'supply'
|
||||
|
||||
Xclok clk dat1 period='per' start='per+dataLead' total=1 duty=0.5 sz=300
|
||||
Xad ad dat1 period='per' start='per' total=1 duty=0.5 sz=300
|
||||
Xrdwr rdw dat1 period='3*per' start='2*per' total=2 duty=1 sz=300
|
||||
Xdii din dat1 period='3*per' start='per' total=4 duty=2 sz=300
|
||||
Xrdwr rdw dat1 period='per' start='2*per' total=2 duty=1 sz=300
|
||||
Xdii din dat1 period='per' start='per' total=4 duty=2 sz=300
|
||||
|
||||
Xinv1 clkb1 clk inv
|
||||
Xinv2 clkb2 clkb1 inv
|
||||
@ -54,8 +54,10 @@ Xmd2 bt3 bf3 memLoad number=16
|
||||
Xw3 bt3 bt4 bf3 bf4 clk wire_precharge len='lw/4' wid='wirew'
|
||||
Xmd3 bt4 bf4 memLoad number=16
|
||||
Xw4 bt4 btt bf4 bff clk wire_precharge len='lw/4' wid='wirew'
|
||||
Xmd4 btt bff memLoad number =16
|
||||
Xla bt1 bf1 choose mem1
|
||||
Xmd4 bt3 bf3 memLoad number =16
|
||||
* Xla bt1 bf1 choose mem1
|
||||
* Xla bt3 bf3 choose mem1
|
||||
Xla btt bff choose mem1
|
||||
Xrd btt bff set rst rdwf clk choose iReadSub
|
||||
Xrc dot set rst vdd vdd vdd vdd vdd vdd readCollect
|
||||
|
||||
|
@ -6,8 +6,8 @@
|
||||
* [x] Figure out what to do with flopped write block.
|
||||
* [x] Test data close to write block (it pulls up past clock low!)
|
||||
* [ ] Drive wires to zero?
|
||||
* [ ] Add missing well connection in layout
|
||||
* [ ] Make sure width isn't too horrible
|
||||
* [x] Add missing well connection in layout
|
||||
* [x] Make sure width isn't too horrible
|
||||
* [ ] Model additional delay for read read/write block select?
|
||||
* [ ] Model worst case of decoder
|
||||
* [ ] Cite [this](https://ieeexplore.ieee.org/document/210039)
|
||||
* [x] Model worst case of decoder
|
||||
* [x] Cite [this](https://ieeexplore.ieee.org/document/210039)
|
||||
|
Loading…
Reference in New Issue
Block a user