Kamalpreet Kaur, Nishant Madan, Syed Shakir Iqbal
Freescale Semiconductor India Pvt. Ltd.
With prior knowledge of delay characterization for combinational standard cells, where the delay values are dependent on the input slew and the output load, one needs to take in account of the propagation delay in picture. The scenario gets a bit complex when it comes to sequential elements. Modeling setup time, hold time, C-Q delay and various other factors add more complexity to the characterization to sequential elements. This paper discusses the models and methodology that are used commonly for characterizing the timing parameters of various sequential logic cells which are key elements of synchronous design.
Timing Parameters in a Sequential Cell
While the timing parameters of a combinational cell are limited to min and max delays using the input transition, the output load and the timing model of the cell as a set of variables, sequential cells come up with their own set of complex timing checks. These new requirements use the basics of combinational delay characterization along with an event driven (clock signal level change/transition) constraint to model the primary timing checks and parameters required to completely specify a sequential cell in terms of timing. Some of the key parameters that we will discuss in this paper include:
- Setup Time
- Hold Time
- Clock to Out (C-Q) Delay
- Min Pulse Width
Methodology for Finding the Sequential Delays of a Standard Cell
With combinational element concerned only with the propagation delay of the cell, the sequential element are bit more complex in this scenario. With different arcs it is necessary to model setup time, hold time and c-q delay of a flop while modeling it into the library. The number of arcs required to model can vary within the sequential elements. In this paper we will be discussing about the methodology to find the setup time, hold time or C-Q delay of flip-flops and latches. The min pulse width requirements as discussed in the previous section are a derivative of setup and hold time itself and hence will be implicitly covered.
a) Setup Time:
Setup time is a common timing parameter associated with sequential devices. The setup time is used to meet the minimum pulse width requirement for the first (master) latch that makes up a flip flop. More simply, the setup time is the amount of time that an input signal (to the device) must be stable (unchanging) before the clock ticks in order to guarantee minimum pulse width and thus avoid possible meta-stability within the latching loop. The problem comes when one has to find the setup time of a flip flop. This can be easily found out by using spice simulation using below steps:
Setup time for Flip Flop:
Now, one question can arise is why a factor of 10% is taken for calculating the setup time? When we are taking the data transition closer to the active edge of the clock, at the instant when setup time is violated the system goes into a meta-stable state and the output takes longer time to settle down which is the reason for increased C-Q delay.
Figure 1: Setup timing measurement for a positive edge triggered flip-flop.
b) Hold Time:
Hold time is also a timing parameter associated with all sequential devices. The Hold time is used to further satisfy the minimum pulse width requirement for the first (Master) latch that makes up a flip flop. The input must not change until enough time has passed after the clock tick to guarantee the master latch is fully disabled. More simply, hold time is the amount of time that an input signal (to a sequential device) must be stable (unchanging) after the clock tick in order to guarantee minimum pulse width and thus avoid possible meta-stability. This can be found out by using spice simulations and following the below mentioned steps:
Hold Time for Flip Flop:
Figure 2: Hold timing measurement for a positive edge triggered flip-flop.
c) Clock to Out (C-Q) Delay:
Clock to out delay is generally considered at infinite setup time. With data making a transition at 10ns before the active clock edge, one can probe the two signals clock and output both at 50% of their voltage levels. The difference between their transitions will give the clock to out or C-Q delay for the flip flop.
d) Min Pulse Width:
Min pulse width is defined as the minimum permissible pulse width values for both high and low levels below which a given sequential element like flip-flop, latch or SRAM cell will fail to work. It signifies the minimum time these cells will take to function and provide the correct output while being operated. For simulation purposes it is simply another form of representing setup and hold time corresponding to the concerned clock edge and hence can be characterized for different output loads, input slews and triggering events similar to setup and hold time. Figure 3, shows how high and low min pulse width requirements can be modeled based on flop/latch setup and time.
Figure 3: Min Pulse Width and its relationship with setup and hold time.
One key point to note here is that all the SPICE deck analysis involved must be performed for both rise-to-rise and fall-to-fall data transitions for the flip flop as the RC delay parameters and the signal paths vary with respect to the value of data itself.
So far all the analysis discussed above has been with a flip-flop or an edge triggered element as a reference. While the definition of setup/hold time and C-Q delay remain same for latches as well, however with the latches being level triggered devices instead of edge triggered, the concept to extract the timing parameters of latch are a bit more tricky. In case of a latch, we need to understand the opening and closing windows for data sampling instead of just a simple edge. Let us consider the timing in a negative level triggered latch as shown in figure 4. The latch will remain transparent in case of a low clock signal while the state will be latched otherwise. Thus, there is an opening window for data sampling when the clock goes low and a closing window when the clock goes high. Now let us add the considerations for setup and hold time into this. Since the circuit elements take a finite time to sample a data in, hence for hold time, the timing requirement is limited with the closing of the latch window (shown in blue in figure 4). This requirement is the same as that in case of a positive edge triggered flip-flop and hence the same setup can be used to measure the hold time. For the case of setup timing, there are two scenarios; setup timing at the opening window and setup timing at the closing window. While the former check is a circuit driven requirement and can be measured by the same method as in the case of positive edge triggered flip-flops, the later check is rather a design driven check used to model time burrowing and is effectively a virtual setup check.
Figure 4: Setup and hold timing for a positive edge triggered flop and negative level triggered latch.
The reason for this check is that unlike a flop, the latch output is not a fixed value during a static clock level. For example, when clock is low, the flop output remains a constant value from the previously captured data, while in case of a negative latch; the output is same as the input data at that instance. Hence, we have a burrow margin which can be given to data path connected at the output of latch, provided we have ensured correct setup timing with the same setup time as at the closing window near the opening edge as shown in figure 3. Hence, to characterize a negative level triggered latch, the characterization methodology is the same as than in case of a positive edge triggered flop and a similar scenario exists for negative edge triggered flop and positive level triggered latch.
This paper elucidates the methodologies followed for setup analysis; hold analysis and setup dependent hold and hold dependent setup analysis. Discussing on the areas of C-Q delays it traverses from setup time to min pulse width checks too for sequential elements flops and latches. This will help persons across industry to understand the sequential cell timing characterization using SPICE and learnt how to use them to get the correct delay estimation.
Low voltage digital design, especially near/sub-threshold design, is becoming more popular in application domains where performance is not the primary concern. More and more systems with low performance requirements are operated from a near/sub-threshold supply voltage in order to save power [3,4,5,6,7]. However, due to the fact that the gate voltage drive of the transistors operating in the sub-threshold domain is small, standard logic cells become more sensitive to process variations. Commercial cell libraries are designed and characterized for super-threshold voltage operation. Without any optimization, most cells of such conventional libraries will not have a robust operation in the presence of process variability at a low operating voltage. Therefore, careful sizing of standard cells working at low voltage is needed. In , the optimization procedures to size standard cells are explained. In , the standard cell libraries optimized for sub-threshold operation are presented. This paper extends the work of [1,2]. Here, the sizing methodology and sizing methods are explained using a CMOS 40 nm low power process as an example. Benchmarking of the libraries is carried out using both a CMOS 90 nm and a CMOS 40 nm low power process. ITC benchmark circuit synthesis results are presented as well.
Unlike conventional “super-threshold” cell sizing methods [8,9], the proposed balancing-based sizing method focuses on the statistical distribution of the drain-source current, rather than the current itself. In the proposed approach, the variation of the current is taken into consideration when sizing the standard cells by balancing the mean current of the equivalent N and P networks. The way of finding the equivalent N and P networks is based on timing arcs. The transition paths within the standard cells are different for distinct input patterns. The longest path, which has the worst delay, is defined as the worst-case transition path; the shortest path, which has the best delay, is defined as the best-case transition path. The transistors of the worst-case and the best-case transition paths are balanced in two possible ways: (i) transistor width and length tuning; and (ii) transistor width tuning only. In one case both the channel length and width of the transistor are optimized to have a better performance at low voltages, since in the sub-threshold regime, increasing the channel length has a positive impact on timing and timing variation . Therefore, by increasing the transistor’s length and by tuning the width  we are able to size the cells in the sub-threshold regime with two degrees of freedom. The second optimization approach, width tuning only, targets better timing and variation from the sub-threshold to the super-threshold regions.
Taking into account transistor sizing effects in sub-threshold , the balancing-based cell sizing methodology is presented in Section 2. Moreover, Section 2 also explains the standard cell optimization methods and how they can be applied to complex cells. A 163 standard cells library was designed and characterized using the proposed sizing methods in two technology nodes; the results are shown in Section 3. The evaluation of these libraries is presented in Section 4. Furthermore, to benchmark the libraries in the 40 nm technology node, ITC benchmark circuits are used to test the performance and variability of different libraries. The results are shown in Section 5. Section 6 concludes the paper.
2. Sub-Threshold Cell Sizing Methodology
Several relevant research results have been presented about sub-threshold sizing. In [3,4], the authors calculate the optimum supply voltage to minimize energy consumption. It is also claimed that, theoretically, minimum sized cells are optimal for energy reduction. In this paper it is shown that under speed constraints, and when process variability is taken into account, this is not the case. In , the authors explain the benefit of technology choices, power supply scaling, and body bias adaptability for circuits working in the sub-threshold regime. It is implied that standard cell timing could be improved using the mentioned design techniques. The concept of sub-threshold logical effort for complex gate sizing is presented in . Particularly interesting is a closed form current equation derived for stacked transistors in relation to other transistors in the same stack. Compared to [3,4,9], our sizing approach focuses on narrowing the current/delay distribution spread and on increasing the performance through a new balancing theory that slows down fast transistors and vice versa. In , the transistor reverse short channel effect (RSCE) is used for device sizing optimization, where the channel length is increased to have an optimal threshold voltage which makes the transistors have a higher current, be less sensitive to random variations, and to have a smaller area. With a higher current and a lower gate capacitance, the delay and power are both reduced. Furthermore, in , the channel lengths of the NMOS and PMOS are increased to achieve the maximum currents for both NMOS and PMOS transistors. Unlike , our sizing optimization does not always lead to the maximum active current for both the NMOS and PMOS transistors. Only the transistors on slower timing arcs are allowed to be upsized, the ones on faster timing arcs are down sized to save area. In , a standard cell library in 65 nm is presented, where by upsizing the channel length of all transistors in a given cell, the energy per operation value is reduced by about 15%. In this paper, the standard cells are tuned individually, with various length and width selections to have balanced transition currents. Reference  presents a searching algorithm based on multiple objectives through a free space search to optimize one cell. The approach is exhaustive and suitable for single cells, but the searching effort is very large for a complete library. Unlike , our optimization targets balancing the mean P and N currents and takes into account the impact of process spread. In , a 45 nm standard cell library optimized for 0.35 V is proposed. The proposed PMOS-to-NMOS transistor ratio optimization is based on the optimal energy-delay product, not on balanced rise and fall times. In our work, the rise and fall times are balanced taking into account the effect of process variations.
Overall, in this section, a new statistical formulation  to size standard cells is introduced. The differences of the proposed work from other sizing methods are that in our work, the threshold voltage variation is treated as one of the statistical parameters in the current/delay equation, and the cells are optimized to have balanced current/delay distributions. The proposed sizing approach is derived from the observation that the transistor’s current distribution in the sub-threshold regime follows a Log-Normal spreading, whereas conventional sizing treats the transistor’s current as a Normal distribution. Considering the above-mentioned fact and the observation that process variability can be mapped onto threshold voltage variability with a first order approximation, a balancing based sizing methodology is developed for robust standard cell design.
2.1. Sub-Threshold Current Distribution Model
The sub-threshold region is often called the weak inversion region , partly because in the sub-threshold region, the transistor is neither completely turned on nor turned off. In digital circuits, the sub-threshold current is the parasitic leakage, ideally zero. By reducing the voltage supply to sub-threshold, and by letting the transistor operate in weak inversion, the power consumption can be reduced quadratically . Transistors operating in the sub-threshold regime obey an exponential dependence on the gate drive voltage :where is the mobility; C is the oxide capacitance; the sub-threshold slope factor; and U is the thermal voltage. is the gate to source voltage; is the drain to source voltage; is the threshold voltage, consists of zero biasing voltage, terminal voltages and device size effects . From Equation (1), one can see that the current has an exponential relationship with the gate-to-source voltage and the threshold voltage of the transistor.
In sub-threshold, the probability distribution function (PDF) of the current obeys a Log Normal distribution. If the supply voltage is reduced to the sub-threshold level, the widely distributed current will lead to a wide transistor delay spread. Therefore, an optimization based on a super-threshold current distribution will not guarantee a robust behavior in the sub-threshold regime. We consider the as a Normal distribution and model the distribution of the transistor current using [18,19] as follows:where stands for the mean value and stands for the standard deviation. In this model and are regarded as technology parameters for a given W and L set. With the width and length tuning, and also change accordingly due to RSCE. Therefore, depending on the range of W and L , different distributions of the are used in the sizing model.
2.2. Sub-Threshold Cell Balancing Method
In traditional CMOS design, the transistor geometry ratio (W/L) of the pull-up PMOS network to the pull-down NMOS network is carefully tuned to compensate for the difference between the mobility of electrons and holes. This ratio is derived from balancing the rise/fall-time delays and minimizing the propagation delay.
In sub-threshold, it is more about equalizing the strength of the pull-up and the pull-down network that directly affects the functional correctness and the minimum . In the proposed sizing methodology, the ratio of the pull-up to pull-down transistors is determined by the balance between the current distributions of the PMOS and NMOS transistors. The difference with regard to the conventional sizing approach is that the current spread caused by the variation is taken into account.
The proposed sizing methodology includes a transition-based approach in which the worst rise and fall times are improved by compromising the best rise and fall times. In this way, there is more room to improve the worst-case performance of the cells without area penalty.
Basically, the mean currents of the PMOS and NMOS networks are made equal, i.e., . From this, one can derive :where is a technology parameter defined by the mobility and oxide capacitance of the NMOS and PMOS transistors. is also used as the conventional sizing factor. Given the mean and variance values, Equation (3) serves as the current balancing equation. The NMOS and PMOS current distributions can be closely matched based on Equation (3).
Figure 1 displays results of Monte Carlo simulations (CMOS 40 nm, 0.3 V power supply) of the normalized active current distributions of the NMOS and PMOS transistors of an inverter of strength 2 (INVD2). In the remaining of the paper the same commercial CMOS 40 nm technology is used as a reference. The current distributions of the NMOS and PMOS transistors can be closely matched, following Equation (3). Before balancing, the widths of the NMOS/PMOS are 0.62 μm/0.82 μm with fixed length of 0.041 μm. After balancing, the widths are 0.31 μm/0.60 μm and the lengths are 0.1 μm/0.044 μm, respectively. Note that the current distribution of the PMOS transistor is improved whereas the current of the NMOS transistor is weakened. In this case, the worst-case current distribution of the INVD2 is improved by reducing the best-case current. After the current balancing, the area of the INVD2 stays the same as before the balancing method is applied.
Figure 1. Normalized transistor current distributions in CMOS 40 nm. (a) Current distribution before balancing; (b) current distribution after balancing.
Figure 1. Normalized transistor current distributions in CMOS 40 nm. (a) Current distribution before balancing; (b) current distribution after balancing.
This balancing equation allows us to balance the rise and fall current distribution of the inverters without area penalty.
2.3. Stack Sizing Model
The magnitude of the current flowing through a transistor stack depends on the number of transistors and the size of each transistor. Without loss of generality, consider a transistor stack as depicted in Figure 2.
Figure 2. PMOS stack schematic.
Figure 2. PMOS stack schematic.
Let us enumerate this stack of PMOS transistors in descending order as a function of their proximity to the power supply VDD. Similarly, consider a stack of NMOS transistors enumerated as a function of their proximity to Ground. Simulation results show that the upper PMOS transistors [lower NMOS transistors] have a similar impact on the current behavior of the stack. Therefore, let these transistors have equal sizes. Using the results of [9,20] to calculate the equivalent transistor width of the stack, , the mean current of transistors in a stack is calculated as follows where