\documentclass{article} \usepackage[margin=1in]{geometry} \usepackage{graphicx} \usepackage{amsmath} \title{Final Project Report} \author{Danila Fedorin} \begin{document} \maketitle \section*{General Design and Considerations} The goal of this assignment was to create a 256-byte SRAM memory unit. In order to minimize wire delays, I chose to split each bit into \textbf{4 columns of 64 SRAM cells each}. This was motivated by the following factors: \begin{itemize} \item \emph{Larger} columns were eliminated due to the high cost of interconnect. Even large write blocks were not able to charge the ``far ends'' of the wire at shorter clock cycles. Increasing wire width did not help; although resistance decreased, the capacitance increased, leading to small net gains. Thus, I made the decision to shrink the columns as much as possible. However... \item \emph{Smaller} columns became a routing challenge. Even with a 4-column split, to properly connect each cell of the SRAM column, the SRAM cells themselves need to accomodate an additional three \textsc{Wl} lines. Due to the pitch requirements on metals three and four, this is the upper limit (for reasonably sized cells). Alternatives included splitting the decoder into pieces, but for large numbers of columns, this meant that the decoder signal traveled through significant amounts of wire, and was thus slower. \end{itemize} For each of the 4 64-bit columns, I attached separate read and write blocks. However, my placement of the write block was unorthodox. I observed that, although the write block is perfectly capable of quickly manipulating the bitlines close to it, the changes to the wires take too long to propagate through to the end. I addressed this with two separate changes: \begin{itemize} \item I added \textbf{additional precharge transistors} along the column, a total of 4. Each was sized at $5\lambda$, much like the SRAM transistors themselves. When the clock was low, these PMOS transistors became transparent, and helped precharge the bitlines faster. Doing so helped avid hysteresis. However, this did not help with writing during high clock, so... \item I also \textbf{placed the write block in the middle of the column}. This increased the distance between my furthest SRAM cell and the read block (since the write block now contributed to wire length). However, this made it significantly easier to drive the entire length of the wire, which was my main bottleneck. This was because the maximum distance from the write block to any cell in the column was halved. Since my read circuit continued to work in this configuration, I did not place it in the middle of the column, as that would needlessly increase the length of the wires. \end{itemize} This led to the configuration shown in Figure \ref{fig:top-design}. To simulate this design, I placed a memory cell at the very top of my column, which is the furthest spot from both the read and write circuit. I also split the wire into 4 equally-sized fragments, each with resistance $\frac{R}{4}$ and capacitance $\frac{C}{4}$. Between each fragment, I added the aforementioned $5\lambda$ precharge transistors, as well as 16 always-off $5\lambda$ transistors, which simulated the remaining memory cells. I also placed \textsc{Din}, \textsc{Ad0}, and \textsc{Rwt} behind the default-sized flip-flops attached to the clock to simulate something like a pipeline stage. My overall design is shown in figure \ref{fig:top-design-sim}. \pagebreak \section*{Performance Results} I made three measurements of my performance. \begin{itemize} \item Without flip-flopping my inputs and outputs, I was able to clock my design around 950\textit{ps}. \item With flip-flops on my inputs (but not on my output), I was able to clock my design around 1.24\textit{ns}. However, at this delay, the output of the gate came in very close to the falling edge of the clock. \item With flip-flops on my inputs and my outputs, I was able to clock my design at 2.6\textit{ns}. This significant delay was to allow enough setup time for the flip flop. \end{itemize} % Two factors lead to these upper limits. % \begin{itemize} \item \textit{Write capacitance} makes it increasingly difficult to overwrite the value in the cell. Clocking my design any faster than 950\textit{ps} or 1.24\textit{ns} (depending on the case) leads my cell to \textit{almost} flip, but not resolve correctly. I have found no way to work around these limits once my wire was properly sized, and my write block was placed in the middle of the column. \item \textit{Flop, decoder, and read delays} are the major limitation when both the inputs and the outputs of the circuit are connected to flip flops. Even though the output of the read block is correct, it doesn't arrive fast enough to be captured by the next cycle. Furthermore, in some cases, the signal to open a memory cell arrives later than the \textsc{Trig} signal for the senseamp, making it read too early and thus output the incorrect value. \end{itemize} \end{document}