

#### UNIVERSITA' DELLA CALABRIA

Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica (DIMES)

Scuola di Dottorato

Archimede

Indirizzo

Scienze, Comunicazione e Tecnologie

CICLO

XXVIII

#### TITOLO TESI

Low Voltage Digital Design exploiting dynamic body biasing techniques

Settore Scientifico Disciplinare ING-INF/01

Direttore: Firma

Prof. Pietro Pantano ieto Into

Supervisore: Prof. Marco Lanuzza Firma

Dottorando: Ramard Taco Firma



| Contents |
|----------|
|----------|

| A  | bstract                                                                    | 5       |  |  |  |  |  |
|----|----------------------------------------------------------------------------|---------|--|--|--|--|--|
| So | Sommario 9                                                                 |         |  |  |  |  |  |
| 1  | Introduction & Background                                                  | 14      |  |  |  |  |  |
|    | 1.1 Toward High Speed and Energy - efficient Designs                       | 14      |  |  |  |  |  |
|    | 1.2 Ultra-low Voltage (ULV) digital design: Characteristics and Issue      | es18    |  |  |  |  |  |
|    | 1.3 Minimum Energy per Operation                                           | 22      |  |  |  |  |  |
|    | 1.4 Body Biasing as an efficient knob in ULV operation                     | 27      |  |  |  |  |  |
|    | 1.5 Logic family exploiting Dynamic Body Biasing in ULV                    | 30      |  |  |  |  |  |
|    | 1.6 Purpose of this work                                                   | 33      |  |  |  |  |  |
| 2  | Analytical Modeling for Dynamic Gate – Level Body Biased Logic<br>Circuits | e<br>35 |  |  |  |  |  |
|    | 2.1 GLBB Operating Principle                                               | 36      |  |  |  |  |  |
|    | 2.2 Analytical Model                                                       | 39      |  |  |  |  |  |
|    | 2.3 Design Criteria and Analysis Validations                               | 50      |  |  |  |  |  |
|    | 2.4 Logic Gates with Stacked Transistors                                   | 57      |  |  |  |  |  |
|    | 2.5 Final Remarks on Design Criteria                                       | 69      |  |  |  |  |  |
| 3  | Dynamic Gate – Level Body Biasing Technique in Bulk<br>technologies        | 70      |  |  |  |  |  |
|    | 3.1 Physical constraints for gate-level body biasing technique             | 71      |  |  |  |  |  |
|    | 3.2 Final Remarks of Gate – Level Body Biasing in bulk technology.         | 83      |  |  |  |  |  |

| 4  | Dynamic Gate – Level Body Biasing Technique in UTBB F technologies     | D – SOI<br>85 |
|----|------------------------------------------------------------------------|---------------|
|    | 4.1 UTBB FD – SOI Technology Overview                                  | 86            |
|    | 4.2 Design Optimization for Gate-level Body Biasing                    |               |
|    | 4.3 Basic Gates: Design and Operating Characteristics                  | 93            |
|    | 4.4 Final Remarks of Gate-level Body Biasing in UTBB FD-S              | OI96          |
| 5  | Case studies: Application of the GLBB Technique to Arithmetic circuits | metic<br>98   |
|    | 5.1 Mirror Full Adder                                                  | 98            |
|    | 5.2 Ripple Carry Adder                                                 | 102           |
|    | 5.3 Baugh Wooley Multiplier                                            | 107           |
|    | 5.4 Final Remarks of Gate-level Body Biasing Technique to A Circuits   |               |
| 6  | Conclusions                                                            | 113           |
| Bi | ibliography                                                            | 115           |
| A  | cknowledgments                                                         | 127           |
| Li | ist of Publications                                                    | 128           |

## Abstract

Power supply voltage (VDD) scaling below the transistor threshold voltage (VTH) is one of the most effective approaches to achieve low energy consumption at the expense of large performance degradation and a much higher sensitivity to process variations and temperature fluctuations. While acceptable for niche markets, the delay and the robustness issues of conventional subthreshold CMOS circuits can be very limitative for a broader set of applications. In order to increase speed and robustness against process and temperature variations while maintaining high levels of energy efficiency, the forward body biasing (FBB) technique can be adopted.

The FBB technique can be applied (also dynamically) at different levels of granularity ranging from the macro-block level to the transistor level. The key rationale for applying forward body biasing at the macro-block level is to amortize the area and the body control signal routing of a finer grained implementation. However, there is a cost to pay: when threshold voltage is reduced at the block level to compensate for variations and/or to provide a temporary speed boost, leakage power is increased for all the gates in the block, while speed-up would be needed only on timing critical gates. Better energy-delay tradeoffs can be obtained by reducing the body-bias control granularity.

A feasible way to control body biasing at the transistor level is provided by the dynamic threshold voltage (DTMOS) logic. DTMOS logic uses transistors whose gates are tied to their substrates. As the substrate voltage varies with the gate voltage, the threshold voltage of the device is dynamically changed. In the ON-state, the device threshold voltage drops, thus providing a much higher ON current with respect to a standard MOSFET configuration. On the other hand, the behavior of a DTMOS transistor in the OFF-state is similar to that of a regular MOSFET. However, the DTMOS configuration implies input capacitances significantly larger than that of a standard CMOS static gate. Additionally, DTMOS logic incurs in higher energy consumption due to unnecessary charge/discharge of the substrate for input signals that doesn't produce a change in the output voltage of the gate.

In this thesis a gate-level dynamic body biasing technique to overcome the energy limits of DTMOS logic gates, while also improving the gate switching speed has been developed. Logic gates, designed exploiting dynamic body biasing technique exhibit input capacitances equal to those of the standard CMOS configuration. Moreover, when input signals switch without changing the logic gate status the body capacitances are not charged/discharged as occurs in DTMOS logic gates, thus enabling considerable energy saving.

Initially the gate level body biasing technique was modeled and analytically justified. Initially the inverter was adopted as reference circuit to develop the main design guidelines for the body biasing generator and the logic section of the gate. As an extension, logic gates with stacked transistors, (i.e. NAND2, NOR2) was also considered obtaining a good agreement between the predicted and the simulated results.

Following, a preliminary analysis performed on basic gates demonstrated that the speed boosting provided by the gate-level body biased (GLBB) allows to reach performances which are unaffordable for both conventional CMOS and DTMOS configurations. Subsequently parasitic effects of body biasing were taken into account by post layout simulations of a GLBB mirror full adder and compared against its conventional CMOS and DTMOS counterparts. The physical design of the compared circuits was carried out considering the design rules imposed by the ST 45 nm bulk CMOS triplewell technology. Despite of the large area occupancy compared to conventional CMOS, comparative post layout results have shown that the GLBB design style is, at the parity of leakage power consumption, able to obtain significantly higher performance with reduced total energy per operation consumption.

The ultra-thin box and body (UTBB) fully-depleted silicon-on-insulator (FD-SOI) technology is emerging as a valid platform to cope with ULV design bottlenecks in the more scaled technology nodes. The fully depleted channel of devices in this technology avoids the issue of the random dopant fluctuation, thus reducing the impact of process variability. Moreover, the ultra-thin (<30 nm) buried oxide (BOX) guarantees a good electrostatic control of the channel and allows more effective body biasing with respect to bulk CMOS to be applied. The latter is a key feature of the UTBB FD-SOI technology which improves the benefits of the FBB technique for ULV designs in advanced technologies.

In order to reduce the area occupancy of the GLBB technique, several benchmarks has been implemented according to the GLBB technique in 28 nm STM UTBB FD-SOI technology for ULV logic design. The unique feature offered by the technology to integrate PMOS and NMOS devices into a common well configuration has been exploited to achieve improvements in terms of both performance and area.

The efficiency of the GLBB technique for ULV design in UTBB FD-SOI is evaluated by considering three arithmetic benchmarks in ascending order of complexity. As a first benchmark, the GLBB mirror full adder (FA) was considered. As a second benchmark a n-bit RCAs were designed according the evaluated techniques under a wide range of process and temperature (PT) conditions. For the TT/27°C condition, the DTMOS technique shows higher energy consumption mainly due to the larger input capacitances of DTMOS gates. On the contrary, GLBB and CMOS designs exhibit very similar Energy for the worst case operation (E W.C.O.) values, even for long chains of FAs. The GLBB designs always demonstrate better performances than their competitors. For example, at VDD=0.4 V, an advantage of 33% and 46% is achieved in terms of speed and energy when compared to CMOS and DTMOS designs, respectively.

As third benchmark, a 4 x 4-bit Baugh Wooley multiplier was evaluated. At VDD = 0.3 V, the proposed approach leads to a delay reduction of about 30% with respect to a conventional static CMOS design. Such results were obtained while maintaining similar energy consumption and at the only expense of about 13% larger area. Significantly better energy (39%) and area (34%). The above delay and energy benefits are maintained over a wide range of PVT variations.

## Sommario

Lo scaling della tensione di alimentazione (VDD) al di sotto della tensione di soglia dei transistor (VTH) è uno degli approcci più efficaci per ottenere un basso consumo energetico a discapito di un'elevata riduzione delle prestazioni e una sensibilità molto più elevata alle variazioni di processo e di temperatura. Sebbene accettabile per un mercato di nicchia, l'elevato ritardo e la ridotta robustezza dei circuiti sottosoglia CMOS convenzionali possono essere molto limitativi per una gamma più ampia di applicazioni. Al fine di incrementare le prestazioni e la robustezza contro variazioni di processo e di temperatura, pur mantenendo elevati livelli di efficienza energetica, la tecnica di polarizzazione diretta del bulk/body dei transistori (forward body biasing - FBB) può essere adottata.

La tecnica FBB può essere applicata (anche dinamicamente) a diversi livelli di granularità che vanno dal livello di macroblocco al livello di singolo transistor. Applicando la tecnica FBB a livello di macro blocco si riduce il numero di segnali destinati al controllo della tensione di body, riducendo così la complessità del routing. Di contro, si riduce la flessibilità di controllo della tensione di soglia dei singoli transistori con un impatto negativo sul consumo energetico. Diversamente, l'implementazione della tecnica FBB a livello di singolo transistore permette di gestire con minore granularità la tensione di body dei mosfet. Così facendo è possibile agire, incrementandone le prestazioni, solo sui transistori che sono coinvolti nella determinazione del path critico del circuito.

Un esempio di applicazione della tecnica FBB a livello di singolo transistor è rappresentato dalla logica a "tensione di soglia dinamica" (DTMOS). Tale logica utilizza transistor i cui terminali di gate sono collegati al substrato. Di conseguenza, la tensione di soglia del dispositivo cambia dinamicamente in funzione della tensione di gate e quindi della tensione di substrato. Pertanto, nello stato di ON, la tensione di soglia diminuisce, garantendo così una corrente di ON più elevata rispetto alla configurazione standard CMOS. D'altra parte, il comportamento dei transistor in logica DTMOS nello stato di OFF è simile a quello della configurazione CMOS standard. Tuttavia, l'utilizzo della configurazione DTMOS provoca un significativo incremento delle capacità di input rispetto a una porta statica CMOS. Inoltre, la logica DTMOS comporta un maggiore consumo di energia dovuto al verificarsi di eventi di carica/scarica del substrato non necessari per segnali di ingresso che non producono una variazione della tensione di uscita della porta.

In questa tesi è stata proposta una tecnica di polarizzazione dinamica del substrato (gate level body biasing - GLBB) da impiegare a livello di porta logica per ridurre il consumo di energia nelle porte logiche DTMOS e garantire allo stesso tempo una maggiore frequenza di switching. L'implementazione di questa tecnica consente di ottenere capacità di input identiche a quelle delle porte logiche CMOS standard. Inoltre, quando la commutazione dei segnali di ingresso non produce un cambiamento di stato della porta logica, le capacità di substrato non si caricano/scaricano come avviene nella logica DTMOS, consentendo perciò un notevole risparmio di energia.

Inizialmente, è stato sviluppato un modello analitico per validare la tecnica proposta. In questa prima fase, l'inverter è stato adottato come circuito di riferimento per ricavare le principali linee guida per la progettazione del generatore di polarizzazione del substrato e della sezione logica della porta. Inoltre, sono state analizzate anche alcune porte logiche con transistor connessi in serie (ad esempio, NAND2 e NOR2), ottenendo un buon accordo tra i risultati predetti con il modello analitico e quelli ottenuti con le simulazioni.

Successivamente, è stata effettuata un'analisi preliminare su porte logiche basilari per dimostrare che l'impiego della tecnica di polarizzazione del substrato a livello di porta logica consente di ottenere prestazioni superiori alle configurazioni CMOS standard e DTMOS. In seguito, sono state effettuate anche simulazioni post-layout di un circuito "mirror full adder" realizzato con la tecnica GLBB per includere gli effetti parassiti della polarizzazione del substrato. I risultati di queste simulazioni sono stati confrontati con quelli ottenuti per lo stesso circuito realizzato con le tecniche CMOS standard e DTMOS. La progettazione dei circuiti da confrontare è stata realizzata utilizzando la tecnologia ST 45-nm bulk CMOS triple-well. I risultati comparativi hanno dimostrato che la tecnica di progettazione GLBB, a parità di consumo di potenza di leakage, consente di ottenere un significativo incremento delle prestazioni con un ridotto consumo di energia, a discapito di una maggiore occupazione di area rispetto alla logica CMOS convenzionale.

La tecnologia "ultra-thin box and body (UTBB) fully-depleted silicon-oninsulator (FD-SOI)" sta emergendo come una valida soluzione per la progettazione di circuiti a bassissima tensione di funzionamento (ultra low voltage -ULV) in nodi tecnologici sempre più scalati. La presenza di un canale completamente svuotato nei dispositivi realizzati con questa tecnologia consente di eliminare il problema della fluttuazione causale del drogaggio e quindi di ridurre l'impatto della variabilità di processo. Inoltre, il ridotto spessore dell'ossido sepolto (<30 nm) assicura un buon controllo elettrostatico del canale e quindi un più efficace impatto della tecnica di polarizzazione del substrato rispetto alla tecnologia CMOS convenzionale. Quest'ultima rappresenta la caratteristica chiave della tecnologia UTBB FD-SOI, che consente di incrementare i benefici della tecnica FBB nella progettazione di circuiti ULV implementati in nodi tecnologici avanzati.

Diversi circuiti di test sono stati implementati nella tecnologia 28-nm STM UTBB FD-SOI allo scopo di ridurre l'occupazione di area dovuto all'uso della tecnica GLBB. Difatti, grazie alla peculiarità offerta da tale tecnologica di integrare transistor PMOS a NMOS in una configurazione a substrato comune, notevoli miglioramenti sono stati ottenuti sia in termini di prestazioni che di occupazione di area.

L'efficienza della tecnica GLBB per progetti ULV in tecnologia UTBB FD-SOI è stata valutata considerando tre differenti circuiti aritmetici di test in ordine crescente di complessità. Il primo circuito di test considerato è stato un "mirror full adder". Il secondo circuito di test è stato un "ripple carry adder - RCA" a *n* bit, analizzato per studiare l'impatto delle differenti tecniche di progettazione in un'ampia gamma di condizioni di processo e temperatura. Nelle condizioni TT/27°, la tecnica DTMOS ha mostrato un elevato consumo di energia, principalmente dovuto alle elevate capacità di input nelle porte logiche DTMOS. Al contrario, i circuiti progettati con le tecniche GLBB e CMOS standard hanno esibito un analogo consumo di energia nelle condizioni operative peggiori (worst-case operation), anche in presenza di lunghe catene di full adder. Inoltre, il circuito GLBB ha mostrato sempre le migliori prestazioni. Ad esempio, con una VDD di 0.4 V, il circuito GLBB consente di ottenere un vantaggio del 33% e del 46% in termini di velocità ed energia rispetto ai circuiti CMOS standard e DTMOS.

Infine, è stato analizzato come terzo circuito di test un moltiplicatore 4 x 4bit Baugh Wooley. Con una VDD di 0.3 V, l'approccio proposto ha portato ad una riduzione del ritardo di circa il 30% rispetto al circuito CMOS standard. Questi risultati sono stati ottenuti mantenendo inalterato il consumo di energia, a discapito solo di un incremento di area del 13%. Da un confronto con la logica DTMOS si è ottenuto invece un risparmio di energia di circa il 39% ed una riduzione dell'area del 34%. I precedenti benefici in termini di ritardo ed energia sono mantenuti entro un'ampia gamma di variazioni PVT.

# One

## 1 Introduction & Background

#### 1.1 Toward High Speed and Energy - efficient Designs

The ever increasing demand for portable devices to achieve enhanced productivity, a better user experience and multimedia quality drive innovation in digital circuit systems. In the late 1990s, the first commercial phone weighed 16 ounces and had half-hour of talk time. This GSM phone equipped with a simple RISC processor run at 26MHz and supported a primitive user interface [1]. After a steady increase in clock frequency to roughly 300MHz in the early 2000s, there has been a sudden spurt towards 1GHz and beyond [2]. More than 10 years later a wide set of computing capabilities (such as video processing, augmented reality, etc.) are offered by portable devices. Figure 1.1 illustrates the major trends from 2004 to 2014 in smartphones and tablets relevant to digital circuits.



Figure 1.1: Application processor trends in smart phones from 2004 to 2014 [2]

Such impressive achievements have been obtained thanks to the miniaturization of the integrated digital circuits. In 1965 Gordon Moore predicted that technology scaling will allow to increase the number of components to the double every 12-24 months at minimum economical cost [3]. This



Figure 1.2: Real scaling trends over the years 1975-2010 obeying the Moore's law [3].

trend has been followed by semiconductor industry as shown in Figure 1.2 increasing to millions the transistor count per chip.

In general, a CMOS technology when is scaled to the next generation node improves: (1) transistor and interconnection speed, (2) transistor density and (3) switching energy consumption [4]. Although portable devices have been taking advantage of technology scaling to offer higher performance, in the last years as technology nodes enters into the deep submicron era, leakage power consumption rises as a main issue.

To be more specific, traditional scaling of FETs consists in the reduction of supply-voltage ( $V_{DD}$ ) and threshold - voltage ( $V_{TH}$ ) to accommodate both performance and power requirements, but the increasing impact of leakage currents and number of transistors on a single chip has placed limits on this scaling strategy.

At circuit level, voltage scaling (with a fixed threshold voltage) has been demonstrated as the most efficient solution for power constrained applications [5]–[9]. In 1991, some digital signal processors operated at the lower 3V range in the 0.8 µm node. The authors realized that 5V specifications could be relaxed for their performance requirements therefore they operate at lower voltage to dissipate less power [1].

Voltage scaling is certainly a very effective lever to reduce energy and power consumption at expense of performance. As it can be easily observed by Equation (1.1) leakage power has a linear dependency on  $V_{DD}$ :

$$P_{leak} = V_{DD}I_{leak}$$
(1.1)

Thus decreasing  $V_{DD}$  down to a voltage level lower than  $V_{TH}$  linearly reduces leakage power by from 2.5 x to 9 x for various technologies as shown in Table 1.1

| Technology<br>node (V <sub>DD</sub> . nominal) | 65 nm (1V) | 0.13 μm (1.2V) | 0.18 μm (1.8V) |
|------------------------------------------------|------------|----------------|----------------|
| V <sub>DD</sub> (subthreshold)                 | 1          | 1.2            | 1.8            |
| 0.4                                            | 2.5 X      | 3 X            | 4.5 X          |
| 0.3                                            | 3.3 X      | 4 X            | 6 X            |
| 0.2                                            | 5 X        | 6 X            | 9 X            |

Table 1.1: Leakage power saving from  $V_{DD}$  reduction [10]

Additionally, as it is shown by the well-known equation of dynamic energy consumed by a digital circuit in (1.2).

$$E_{active} = C_{eff} V_{DD}^2$$
(1.2)

As  $V_{DD}$  is reduced, quadratic energy savings are obtained since effective switching capacitance are maintained. Again, scaling  $V_{DD} < V_{TH}$  reduce considerable dynamic energy by from 60 x to 80 x as shown in Table 1.2

Table 1.2: Dynamic Energy saving form  $V_{DD}$  reduction [10]

| Technology<br>node (V <sub>DD</sub> . nominal) | 65 nm (1V) | 0.13 μm (1.2V) | 0.18 µm (1.8V) |
|------------------------------------------------|------------|----------------|----------------|
| V <sub>DD</sub> (subthreshold)                 | 1          | 1.2            | 1.8            |
| 0.4                                            | 6.25 X     | 9 X            | 20.25 X        |
| 0.3                                            | 11.1 X     | 16 X           | 36 X           |
| 0.2                                            | 25 X       | 36 X           | 81 X           |

Although operation in sub threshold regime demonstrates significant power and energy saving compared to strong-inversion operation, significant performance loss and higher sensitivity to variations occur [11]–[24]. Such limitations have maintained subthreshold digital design for a strictly narrow set of applications while a broader market demand high performance and energy-efficient designs.

### 1.2 Ultra-low Voltage (ULV) digital design: Characteristics and Issues

As  $V_{DD}$  is reduced to minimize energy per operation, FETs make the transition from strong inversion with large gate overdrive to subthreshold operation in weak inversion. Current of a NMOS transistor operating in subthreshold regime has three main contributions as illustrated in Figure 1.3: (a) the subthreshold current  $I_{ST}$ , due to diffusion of minority carriers between drain and source [25], (b)  $I_G$  due to tunneling through the dielectric and (c)  $I_J$  mainly due to band-to-band tunneling current across the thin depletion regions [26]. Because of the stronger dependence on the gate voltage,  $I_G$  and  $I_J$  tend to be much lower than  $I_{ST}$  at low voltages. Hence the NMOS current operating at subthreshold region represented in (1.3) is mainly dominated by  $I_{ST}$ .

$$I_{D:subthreshold} \approx I_{ST} = I_0 \exp\left(\frac{V_{GS} - V_{TH}}{nV_T}\right)$$

(1.3)





Subthreshold operation differs from strong inversion operation primarily because the  $I_{ST}$  depends exponentially on threshold voltage ( $V_{TH}$ ) and gate - source voltage ( $V_{GS}$ ), while the typical strong inversion operation oncurrent ( $I_{on-super}$ ) depends roughly linearly on  $V_{TH}$  and  $V_{GS}$ .

The exponential characteristic of  $I_{ST}$  drastically affect circuit behavior in several aspects such as the exponential performance degradation, large pMOS / nMOS imbalance and higher sensitivity to variations.

A first issue of circuits operating in subthreshold region is that the delay increase exponentially due to the dependency of the current on  $V_{TH}$  and  $V_{DD}$ . As observed in Figure 1.4 the normalized speed of the basic inverter has two clear stages as  $V_{DD}$  is reduced: (1) in strong operation region the speed slightly decrease with voltage, (2) in sub/near threshold, an exponential decrease in speed is clear. Large performance decrease in subthreshold regime severely limits the range of applications to those requiring medium or slow speeds.



Figure 1.4: Relative inverter speed through all the power supply range

A second important issue in ULV digital designs is the high imbalance between the nMOS and pMOS strength. Symmetrical pMOS and nMOS currents ensure adequate noise margin, reasonably symmetric rise-fall transitions and reduce minimum voltage operation [27]. At subthreshold regime operation, the nMOS/pMOS imbalance is typically much higher than above threshold operation [28]–[30]. For example in the specific case of 65 nm technology the nMOS/pMOS imbalance is obtained by increasing pMOS strength by 7 at the expense of larger capacitances [31].

The large nMOS/pMOS imbalance at ULV operation has important consequences on the DC behavior. Figure 1.5 represents the model of an inverter for b) low and c) high voltage input. The on transistor Mn is equivalent to a current source, whereas Mp is equivalent to a resistance. From the resulting equivalent model of the inverter the output voltage  $V_{OH}$  suffers a voltage drop  $\Delta V_{OH}$  across Mn [1], [27]:

$$V_{OH} = V_{DD} - \Delta V_{OH}$$
$$\Delta V_{OH} = R_P I_n = v_T \frac{\beta_n}{\beta_p} e^{-V_{DD}/n_n v_t}$$
(1.4)

Similarly, for a high voltage input

$$\Delta V_{OL} = v_T \frac{\beta_p}{\beta_n} e^{-V_{DD}/n_n v_U}$$
(1.5)

Where  $\beta_n$ ,  $\beta_p$  are nMOS, pMOS strength respectively. From (1.4) and (1.5), the output levels are exponentially degraded as  $V_{DD}$  is reduced, and their values depend on the nMOS/pMOS strength ratio. In other words

CMOS logic at ULV operation suffers a degradation in the output logic level, hence on the voltage swing [27].

A side effect of voltage swing degradation is the increase in the leakage power consumption of the subsequent logic gate. To be more specific a degradation of  $\Delta V_{OL}$  in the ouput voltage determines an equal increase in the gate-source voltage of the off nMOS transistor in the next logic gate. Thus a perfect nMOS/pMOS balance should be achieved by increasing the strength (i.e. increase the sizing and/or reducing  $V_{TH}$ ) of the weaker transistor.



*Figure 1.5: a) Schematic of the inverter gate and equivalent representation for b) high input and c) low input voltages* [33]

And third, as shown in (1.3) the exponential dependence of  $I_{ST}$  on  $V_{TH}$  means that variations due to random doping fluctuations has a larger impact than above threshold operation. Thus the previously discussed issues of subthreshold digital designs are more critical taking into account process (P) variation [32]. For example, the large impact in the nMOS/pMOS imbalance presented before increases ~2X taking into account process var-

iation. The data obtained from Monte Carlo simulations in the 65nm technology is plotted in Figure 1.6 the shape of the PDF presents a mean value of  $\mu = 13.9$  and a standard deviation of  $\sigma = 25.6$ . Hence the increasing factor of pMOS strength by 7 mentioned in the previous analysis is underestimated taking into account process variations. Indeed this large underestimation is due to the large variability of  $\sigma/\mu = 185\%$  [33].



Figure 1.6: Probability density function of the nMOS/pMOS imbalance from 10,000 Monte Carlo simulations under itradie and interdie variations [33]

#### **1.3** Minimum Energy per Operation

In ULV systems a repetitive short task is performed at a given wake up period  $T_{wkup}$ . Thus, as shown in Figure 1.7, power consumption can be drastically reduced through duty cycling. Duty cycling systems contain two blocks: (1) a very simple always – on block that stores information and periodically triggers a (2) more complex block which works in active mode

for about 0.1% - 1% of the period and in sleep mode for most of the time according to the application [33].



Figure 1.7: Operation of duty - cycled blocks in Ultra Low Power systems.[33]

In duty-cycled ULV systems the average consumption is equal to

$$P_{avg} = P_{always-on} + P_{sleep} + \frac{E_{active}}{T_{wkup}}$$

#### (1.6)

Where  $P_{always-on}$  is the average power consumed by the always-on blocks,  $P_{sleep}$  and  $E_{active}/T_{wkup}$  is the average power consumed by duty cycled block in sleep and active mode respectively. Since duty cycled block is far more complex than always on circuitry  $P_{sleep}$  easily ceeds  $P_{always-on}$ . Indeed in applications with a fraction of a second wake up period (or less) energy consumed by duty cycled blocks in active mode dominates  $P_{avg} \approx E_{active}/T_{wkup}$ . This justifies the extensive research in the last decade in digital circuit designs toward minimum energy per operation. To be more specific ULV design should be optimized for low power consumption (always on block) and minimize energy per operation (consumed by more complex duty cycled block) during active mode.

For large classes of circuits, minimum energy consumption occurs when the voltage is scaled below the device threshold voltage ( $V_{GS} < V_{TH}$ ) [1], [34]. In this region, energy consumption can be reduced by 20x compared to standard superthreshold ( $V_{DD} > V_{TH}$ ) operation at the cost of circuit performance [35].

Figure 1.8 illustrates that as  $V_{DD}$  is reduced a minimum energy point is achieved due to the trade - off between static and dynamic energy at a specific power supply  $V_{DD}$  and voltage threshold  $V_{TH}$ .



Figure 1.8: Minimum Energy Operation Point for a fixed threshold voltage [35].

As shown in Figure 1.8 minimum energy consumption relies upon a compromise between dynamic ( $E_{DYN}$ ) and leakage ( $E_{LEAK}$ ) energies, expressed in (1.7), **Errore. L'origine riferimento non è stata trovata.** [36], [37], assuming rail-to-rail swing ( $V_{GS} = V_{DD}$ )

$$E_{DYN} = C_{eff} V_{DD}^2$$

$$(1.7)$$

$$E_{LEAK} = I_{leak} V_{DD} T_D$$

$$(1.8)$$

Thus taking into account the  $I_{ST}$  the total energy as presented in [1] is,

$$E_{TOT} = E_{DYN} + E_{LEAK}$$
$$E_{TOT} = V_{DD}^2 \left( C_{eff} + W_{eff} K C_g L_{DP} \exp\left(\frac{-V_{DD}}{nv_t}\right) \right)$$
(1.9)

Where  $L_{DP}$  is the depth of critical path in characteristic inverter delay,  $C_{eff}$  is the average effective switched capacitance of the entire circuit, including the average activity factor, short circuit current, glitching effects, etc.  $C_{eff}$  and  $W_{eff}$  estimates the total width, related to the characteristic inverter, that consumes leakage current.

Equation (1.9) indicates that the location of the minimum energy operation point is entirely determined by the operating scenario, environment and temperature of the circuit. To be more specific any decrease in  $E_{LEAK}$  or increase  $E_{DYN}$  will push the optimum  $V_{DD}$  to lower values. On the contrary increase in  $E_{LEAK}$  or decrease  $E_{DYN}$  will push the optimum  $V_{DD}$  to larger values. These types of changes can occur for a given circuit without changing its intrinsic attributes.

Figure 1.9 shows the impact of varying the activity factor on the energy characteristics. The  $E_{DYN}$  increases in proportion to the activity factor due to the amount of switched capacitance per operation thus optimum power supply shifts toward lower values. On the contrary Figure 1.10 shows the

impact of changing duty cycle on the energy characteristics. A longer idle time spent for each operations increments leakage energy consumption and the minimum energy point moves to higher voltages.



Figure 1.9: Energy versus **V**<sub>DD</sub> varying workload in a 8-bit 8-tap FIR filter [1]



Figure 1.10: Energy versus  $V_{DD}$  varying duty cycle in a 8-bit 8-tap FIR filter [1]

#### 1.4 Body Biasing as an efficient knob in ULV operation

Even though energy has become increasingly important, the attention has focused on the high performance tail of the energy-delay Pareto curve. Designers tried to minimize energy consumption while meeting high performance frequency constraints. As demonstrated by [33], [38], [39][40] body biasing is considered as an effective knob for tuning the transistor strength and the impact to alleviate issues raised by subthreshold operation such as large performance degradation and sensitivity to variations.

For this analysis an extended expression of (1.3) is presented taking into account the width, length of the transistor, usually written in the following form [25],[41]:

$$I \approx I_{ST} = I_0 \frac{W}{L} e^{\left(\frac{V_{GS} - V_{TH}}{nv_t}\right)} (1 - e^{(-V_{DS}/v_t)})$$
(1.10)

Where  $I_0$  is the technology-dependent subthreshold current extrapolated for  $V_{GS} = V_{TH}$ . In (1.10) the threshold voltage  $V_{TH}$  also depends on the drain-source voltage  $V_{DS}$  and bulk-source voltage  $V_{BS}$  through DIBL and body effect, respectively.

$$V_{TH} = V_{T0} - \lambda_{DS} V_{DS} - \lambda_{BS} V_{BS}$$
(1.11)

 $\lambda_{DS} > 0$  is the DIBL coefficient and  $\lambda_{BS} > 0$  is the body coefficient [4]. For a better insight, it is convenient to rewrite (1.10) and (1.11) according to [33]:

$$I = \beta e^{\left(\frac{V_{GS}}{nv_t}\right)} \left[ e^{\lambda_{DS}/nv_t} \left( 1 - e^{-V_{DS}/v_t} \right) \right]$$
(1.12)
$$\beta = I_0 \frac{W}{L} e^{-(V_{TO} - \lambda_{BS} V_{BS})/nv_t}$$

(1.13)

Where the exponential dependence on  $V_{GS}$  and  $V_{DS}$  is highlighted and all other terms related to the transistor strength are grouped in the parameter  $\beta$ .

From (1.13) the transistor strength can be tuned by three parameters: aspect ratio W/L, threshold voltage  $V_{TO}$  and statically or dynamically tuning the bulk voltage  $V_{BS}$ .

Is easy to observe that W/L has a linear dependency in  $\beta$  and an implicit dependence in  $V_{TH}$  which is significant for narrow or short channels. More specifically, an increase in W leads to threshold increase due to narrow channel effect (RNCE) [25]. Additionally as shown in (1.9) the increase in transistor sizes will increment  $C_{eff}$  rising the energy consumption. Theoretically minimum energy circuits should use minimum sized. As a result, W is not an effective knob to increase the strength of the transistor

A more effective knob to tune the strength of the transistor is  $V_{T0}$  and  $V_{BS}$  thanks to the exponential dependency on  $\beta$ . As an example we compared two  $V_{T0}$  flavors offered in the 65nm technology:  $low - V_{TH}$  transistors are about 18 times stronger compared to  $std - V_{TH}$  transistors with the same sizing.

Similarly body biasing has a significant impact in  $\beta$ . Applying a Forward Body Bias (FBB) of  $V_{BS} = 300mV$  transistor strength increased 2.3X compared to Zero Body Bias  $V_{BS} = 0 mV$  [33].

It should be mention that FBB can be applied dynamically to satisfy ondemand application requirements. Due to the high impact on the transistor strength and freedom to manage  $V_{BS}$  at different levels of abstraction (i.e. gate-level, circuit-level, etc.)  $V_{BS}$  is an efficient knob to achieve minimum energy consumption for a wide operating scenarios. This principle is better explained in the Figure 1.11. The location of the minimum energy operation point is determined by the tradeoff between  $E_{LEAK}$  and  $E_{DYN}$  thus  $V_{DD}$ and  $V_{BS}$  can tune the performance at run time in the most energy-efficient way. In point A ( $E_{lkg} \gg E_{dyn}$ ) (low activity), performance should be increased by increasing  $V_{DD}$  since this leads to an exponential performance increase and energy decrease. On the other hand in point B ( $E_{dyn} \gg E_{lkg}$ ), performance should be increased by Body Bias, since this leads to an exponential performance increase and a fairly small energy increase.

In conclusion the most powerful knobs to tune the performance at run time are the supply voltage  $V_{DD}$  and the body bias voltage  $V_{BS}$ . The above discussed guidelines demonstrates that dynamic body bias should be used for circuits under timing constraints.



Figure 1.11: Energy per cycle under timing constraints shows  $V_{BS}$  and  $V_{DD}$  as powerful knobs to tune the performance and energy at run time

#### 1.5 Logic family exploiting Dynamic Body Biasing in ULV

The body biasing can be applied (also dynamically) at different levels of granularity ranging from macro-block level to the transistor level. The key rationale for applying, such a technique at the macro-block level is to amortize the silicon area and the body control signal routing complexity of a finer grained implementation. As a drawback, when V<sub>TH</sub> is reduced at the block level to compensate for variations and/or to provide a temporary speed boost, leakage power is increased for all the gates in the block, while speed-up would be needed only on timing critical gates. Better energy-delay tradeoffs can be obtained by reducing the body-bias control granularity, at the expense of larger silicon area occupancy [42]–[44].

Dynamic Threshold MOSFET (DTMOS) is a logic family introduced in [19] which exploits dynamic body biasing at gate level without using additional circuitry control. In this logic family as the substrate voltage varies with the gate voltage, the  $V_{TH}$  of the device is dynamically changed. When the device is turned ON, its threshold voltage is forced to drop, thus allowing a much higher ON current than a standard MOSFET. For example in the off-state  $V_{in} = 0$  ( $V_{in} = V_{DD}$ ) for nMOS (pMOS) the characteristics are exactly the same as a regular MOS transistor. On the other side, in the on state  $V_{BS}$  is FBB and thus reduces the  $V_{TH}$  of the DTMOS transistor. The subthreshold slope of DTMOS improves and approaches the ideal 60 mV/decade which makes it more efficient in subthreshold logic circuits to obtain higher gain [19]. It leads to higher oncurrent compared to conventional CMOS to drive more transistors or faster transitions and robustness.

DC characteristics of DTMOS and conventional CMOS are presented in Figure 1.13. Both show a very good noise margin however, due to higher drive current capability, DTMOS logic can have a higher number of fan-





tors [19]

out, and therefore larger and more complex gates can be implemented without sacrificing the performance as presented in Figure 1.14.



Figure 1.13: Voltage Transfer Curve of conventional CMOS and DTMOS logic families





Figure 1.14: Delay versus Fan comparison for conventional CMOS and DTMOS logic families [19]

As a drawback of DTMOS the forward-biased has to be less than 0.6 V. This is to prevent forward-biasing the parasitic PN junction diode thus strong inversion operation is permitted applying limiter transistors which incur in energy consumption. As an additional drawback, the large body capacitance and resistance [45] of devices provide an additional RC delay in charging the substrate and the input nodes of the DTMOS logic gates [46]. Moreover, the substrate bias voltage of DTMOS logic gates would change also when input transitions do not imply output switching. This would charge and discharge the large body capacitances, thus wasting precious dynamic energy [47]. All the above effects can erode the expected advantages of DTMOS circuits.

#### **1.6 Purpose of this work**

This thesis work focuses on the design of energy – efficient circuits exploiting Gate-level Body Biasing (GLBB) technique, which has been proposed as an effective solution to increase speed at expense of very low energy increasing. The first chapter presents an introduction to basic concepts and design issues related to circuits operating at ultralow voltage regime. The second chapter presents an accurate model of the technique with important design guidelines validated through Cadence Spectre simulations in 45nm Bulk CMOS triple-well technology[48]. The third chapter addressed more complex designs taking into account physical limitations of the technique implemented in Bulk CMOS triple-well technology [49], [50]. In the fourth chapter, after a briefly introduction to the UTBB FD-SOI technology and distinguish the superior body biasing efficiency compared to Bulk CMOS technologies, we propose single well configuration allowed by the technology to significantly reduce the area penalty of lowgranularity body-biasing voltage control[51]. Finally, the fifth chapter presents improved performance and energy characteristics of the GLBB technique demonstrated by comparing several benchmarks (from basic gates to

a Baugh Wooley multiplier) to their conventional CMOS and DTMOS counterparts over a wide range of PVT variations[52],[53].

## Two

## 2 Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits

In this chapter the gate-level body biasing design (GLBB) technique that overcomes DTMOS logic family is explained and theoretically justified through an accurate closed-form analytical modeling. Initially, an inverter was adopted as reference circuit and the main static and dynamic behaviors were modeled with the purpose of furnishing important guidelines to design efficient digital circuits under very low voltage operation. Modeling and design criteria derived for the inverter gate are then extended to more complex logic gates with transistors' stacks. The theoretical analysis and the design considerations have been fully validated by comparing the results predicted by models with Cadence Spectre simulations performed on different process corners and for different temperatures exploiting a commercial 45-nm CMOS technology. The good agreement between the predicted and the simulated results makes the proposed modeling a valuable support during the circuit design phase

## Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits

In the first chapter the FBB has been demonstrated as an energyefficient knob to speed up a circuit and reduce the impact of variations. Furthermore, DTMOS has been analyzed as a logic family which exploits body biasing knob at gate-level. The main drawbacks of DTMOS configuration are: (1) reduced intrinsic speed advantages due to larger input capacitances than a standard CMOS and (2) unnecessary charge and discharge large body capacitances according to the input wastes precious dynamic energy [47].

In this chapter an energy efficient dynamic gate level body biasing (GLBB) technique [47] to overcome the speed and energy limits of DTMOS logic gates is presented. The proposed GLBB technique exhibit input capacitances equal to those of the standard CMOS configuration. Moreover, when input signals switch without changing the logic gate status, the body capacitances are not charged/discharged as occurs in DTMOS logic gates, thus saving considerable energy with respect to a DTMOS design.

#### 2.1 GLBB Operating Principle

As shown in Figure 2.1 (a), the generic logic gate, designed according to the suggested approach, consists of two sub-circuits: the logic subcircuit which is responsible for the logical functionality of the gate and the body biasing generator (BBG) which manages the body voltage ( $V_B$ ) for both the pull-up and the pull-down networks. The BBG is a simple pushpull amplifier, which acts as a voltage follower for the output voltage  $V_{OUT}$ , while decoupling the large body capacitances from the output node. In Figure 2.1 (b-c) the transient behavior for the input voltage ( $V_{IN}$ ), the output voltage ( $V_{OUT}$ ) and the body voltage ( $V_B$ ) is reported for the falling and rising output transitions, respectively.

When  $V_{OUT}$  is equal to  $V_{DD}$  (0V), the BBG transfers a high (low) voltage on the  $V_B$  net, thus preparing the pull-down (pull-up) network for a faster logic gate switching. Since the MOSFETs of the switching network (either pull-up or pull-down) are already forward body biased before gate inputs' arrival, the gate output transition is largely favored by a switching current significantly higher in comparison to the case of conventional body biasing scheme.

Speed and energy advantages exist with respect to a DTMOS configuration [47]. In fact, the transition of the input signals is not slowed down from the body capacitive effects as occurs in DTMOS gates, whereas the high capacitive load seen by the BBG does not constitute a speed bottleneck, since  $V_B$  voltage is always established well before inputs' transition. On the contrary, inspecting the behavior of the BBG circuit (see Figure 2.1 (b-c)), in the proposed scheme the logic sub-circuit benefits from the large body capacitances since they allow a slower transition for the body voltage and consequently a faster transition in the output. Additionally, when input signals switch without changing the gate output voltage, the BBG does not waste energy by charging/discharging the body capacitances.

Due to the FBB effects and the additional BBG circuitry, logic gates, designed as here proposed, show increased leakage current with respect to the conventional static CMOS counterparts.

Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits



Figure 2.1: Logic gate with gate level dynamic body biasing (a) and transient behavior for falling (b) and rising (c) output voltage

Figure 2.2 depicts (a-b) leakage current  $(I_{leak})$  versus delay curves in the case of NAND2 and NOR2 logic gates for the conventional CMOS, DTMOS and GLBB implementations, respectively. At a parity of W, the GLBB technique shows leakage current higher than the other competitors. This means that, among the different evaluated choices, the GLBB style is the less suitable if the minimization of static power. On the contrary, if the speed requirement represents the main design aim, the GLBB style becomes the most reasonable choice allowing higher performance to be

reached at the parity of leakage power consumption since the boosting action of the BBG allows the delay target to be reached using smaller transistors. Moreover, the GLBB technique allows performance ranges which are unaffordable for both CMOS and DTMOS configurations.



Figure 2.2: Leakage current-delay plots for NAND2 (a) and NOR2 (b) logic gates

### 2.2 Analytical Model

In the following, analytical models for leakage current and delay of the inverter gate, designed according to the suggested technique, are derived. The developed models are then validated by comparing the predicted results with Spectre simulations performed for the 45-nm ST CMOS Low Power technology. Moreover, the theoretical analysis is also exploited to define proper design guidelines (both for the BBG and the logic subcircuits) with the main aim to obtain fast and power efficient subthreshold logic circuits. In particular, since reducing power consumption is a main concern in subthreshold design, the obtained design guidelines were extracted by comparing the proposed approach with the conventional static CMOS style, which represents a good solution in terms of leakage power.

#### 2.2.1 Leakage Current Analysis

Figure 2.3 illustrates the DC transfer function of the BBG sub-circuit. Note that, in the steady state, the BBG output voltage differs from  $V_{DD}$  and 0V. This is because a nMOS (pMOS) device is used for charging (discharging) the BBG output node (see Figure 2.1(a)). In the following  $V_H$  and  $V_L$  are used to indicate the voltage transferred by the BBG when  $V_{IN,BBG} = V_{OUT} = V_{DD}$  and  $V_{IN,BBG} = V_{OUT} = 0$ , respectively.



Figure 2.3: DC transfer function of the BBG

To estimate the leakage current, an analytical expression of the BBG output voltage is initially derived. The starting point is the following pair of subthreshold drain current equations for nMOS and pMOS transistors, respectively [25]:



(2.2)

where  $\beta_N$  ( $\beta_P$ ) is subthreshold current factor for the nMOS (pMOS) transistor,  $\mu_N$  ( $\mu_P$ ) is the electron (hole) mobility,  $C_{OX}$  is the oxide capacitance per unit area, W is the channel width, L is the channel length,  $V_T = kT/q$ (with k Boltzmann constant, T absolute temperature and q elementary charge) is the thermal voltage,  $n_N$  ( $n_P$ ) is the subthreshold slope factor for the nMOS (pMOS) transistor,  $V_{GS,N}$  ( $V_{GS,P}$ ) is the gate-to-source (sourceto-gate) voltage for the nMOS (pMOS) transistor,  $V_{DS,N}$  ( $V_{SD,P}$ ) is the drain-to-source (source-to-drain) voltage of the nMOS (pMOS) transistor,  $V_{TH,N}$  ( $V_{TH,P}$ ) is the threshold voltage of the nMOS (pMOS) transistor.

In (2.1),  $V_{TH,N}$  depends on  $V_{DS,N}$  through the drain induced barrier lowering (DIBL) effect and on body-to-source voltage ( $V_{BS,N}$ ) through the body effect. Similarly,  $|V_{TH,P}|$  in (2.2) depends on source-to-drain voltage  $V_{SD,P}$  and on source-to-body voltage  $V_{SB,P}$ . This is expressed by [1]: Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits

 $V_{TH,N} = V_{TH,N}^{0} - \lambda_{D}^{N} V_{DS}^{N} - \lambda_{B}^{N} V_{BS}^{N}$  (2.3)  $\left|V_{TH,P}\right| = \left|V_{TH,P}^{0}\right| - \lambda_{D}^{P} V_{SD}^{P} - \lambda_{B}^{P} V_{SB}^{P},$  (2.4)

where,  $V_{TH,N}^{0} \left( V_{TH,P}^{0} \right)$  is the zero bias threshold voltage for the nMOS (pMOS) transistor,  $\lambda_{D}^{N} \left( \lambda_{D}^{P} \right)$  and  $\lambda_{B}^{N} \left( \lambda_{B}^{P} \right)$  are the DIBL and body effect coefficients of the nMOS (pMOS) device, respectively. The expression of the subthreshold drain current of the nMOS transistor in the BBG circuit is obtained by replacing in (2.1) and (2.3)  $V_{GS,N} = V_{OUT} - V_B, V_{DS,N} = V_{DD} - V_B$  and  $V_{BS} = -V_B$ . Similarly, the drain current in the pMOS transistor is obtained by replacing in (2.2) and (2.4)  $V_{SG,P} = V_B - V_{OUT}; V_{SD,N} = V_B$  and  $V_{SB,P} = V_B - V_{DD}$ . By neglecting the contribution of the term in square brackets in (2.1) and (2.2) and equating the currents of the nMOS and pMOS transistor, the following expression holds for the BBG output voltage:

$$V_B \approx \frac{V_{OUT}\left(1 + \frac{n_N}{n_P}\right) - V_{TH,N}^0 + V_{DD}\left(\lambda_D^N + \lambda_B^P\right) + \left|V_{TH,P}^0\right| - n_N V_T \ln\left(\frac{\beta_P}{\beta_N}\right)}{1 + \frac{n_N}{n_P} + \frac{n_N}{n_P} \lambda_D^P + \lambda_D^N + \lambda_B^N + \lambda_B^P}$$

$$(2.5)$$

From (2.5), both  $V_H$  and  $V_L$  can be easily evaluated, by substituting  $V_{OUT} = V_{DD}$  and  $V_{OUT} = 0V$ , respectively. Figure 2.4 shows the equivalent circuits used to evaluate the leakage current of the whole gate (logic sub-

circuit + BBG) when  $V_{IN}$  is low (Figure 2.4(a)) and high (Figure 2.4(b)), respectively. Taking into account that for  $V_{IN} = 0V$  ( $V_{OUT} = V_{IN,BBG} = V_{DD}$ )  $V_B$  is equal to  $V_H$ , while when  $V_{IN} = V_{DD}$  ( $V_{OUT} = V_{IN,BBG} = 0V$ )  $V_B$  is equal to  $V_L$ , the leakage current of the proposed inverter gate can be expressed as (N(BBG) and P(BBG) indicate the nMOS and pMOS in the BBG, respectively):

 $I_{leak}^{INV+BBG}(V_{IN}=0) = \beta_{N}e^{-\frac{V_{TH,N}^{0}-\lambda_{D}^{N}V_{DD}-\lambda_{B}^{N}V_{H}}{n_{N}V_{T}}} + \beta_{P(BBG)}e^{\frac{V_{H}-V_{DD}-|V_{TH,P(BBG)}^{0}|+\lambda_{D}^{P(BBG)}V_{H}+\lambda_{B}^{P(BBG)}(V_{H}-V_{DD})}{n_{P}V_{T}}},$ 

(2.6)

 $I_{leak}^{INV+BBG}(V_{IN}=V_{DD}) = \beta_{P}e^{\frac{|V_{TH,P(BBG)}^{0}|-\lambda_{D}^{P(BBG)}V_{DD}-\lambda_{B}^{P(BBG)}(V_{DD}-V_{L})}{n_{P}V_{T}}} + \beta_{N(BBG)}e^{\frac{V_{L}+V_{TH,N(BBG)}^{0}-\lambda_{D}^{N(BBG)}(V_{DD}-V_{L})-\lambda_{B}^{N(BBG)}V_{L}}{n_{N}V_{T}}}.$ (2.7)

It is easy to verify that (2.6) and (2.7) are obtained exploiting subtreshold drain current expressions given in (2.1)-(2.4) where the term  $1 - e^{-V_{DS,N}/n_NV_T}$  and  $1 - e^{V_{SD,P}/n_PV_T}$  are neglected since both  $V_{DS,N}$  and  $V_{SD,P}$  are greater than  $4V_T$  [33]. The first term in (2.6) and (2.7) is related to the leakage current in the logic section of the gate, whereas the second additive term captures the static current flowing through the BBG. From (2.6) and(2.7), it is clear that the proposed technique incurs in leakage penalties with respect to the conventional CMOS approach not only because of the FBB of the OFF transistors in the logic section but also because of the additional static current flowing through the BBG. In the following, we quantify this leakage current penalty.

# Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits



Figure 2.4: Equivalent circuits for leakage analysis when  $V_{IN}$  is low (a) and high (b), respectively.

For a conventional static CMOS inverter, the leakage current in the steady state can be expressed as [3, 4]:

$$I_{leak}^{INV}(V_{IN} = 0) = \beta_{N}e^{-\frac{V_{TH,N}^{0} - \lambda_{D}^{0}V_{DD}}{n_{N}V_{T}}},$$
(2.8)
$$I_{leak}^{INV}(V_{IN} = V_{DD}) = \beta_{P}e^{-\frac{|V_{TH,P}^{0}| - \lambda_{D}^{P}V_{DD}}{n_{P}V_{T}}}$$
(2.9)

It is worth noting that (2.8) and (2.9) can be also obtained from (2.6) and (2.7) by simply removing the impact of the BBG on leakage current (second term in (2.6) and (2.7)) and imposing the conditions of  $V_H = 0V$  and  $V_L = V_{DD}$  ( $V_{BS,N} = V_{SB,P} = 0$ ). Hence, when the nMOS is the leaky tran-

sistor, the leakage penalty is obtained by calculating the ratio between (2.6) and (2.8) while, when the pMOS is the leaky transistor, the penalty in terms of leakage current is obtained by evaluating the ratio between (2.7) and (2.9):



| $(\mathbf{n})$ | 1 | 1 | 1 |
|----------------|---|---|---|
| (2.            | I | I | ) |

From (2.10) and (2.11), the leakage penalty consists of the sum of two different contributions. The first term isolates the effect of the FBB provided by BBG, which causes the threshold voltage reduction of leaky devices belonging to the logical section of the gate. The second term provides the increment of the leakage current due to the additional push-pull amplifier, which is also dependent on the body voltage of devices belonging to the logical section of the gate. Moreover, such a term depends linearly on the ratio between the current factors of the leaky transistor in the logic section and the dual device in the BBG. According to (2.10) and (2.11) the impact of the proposed technique on leakage current can be reduced: 1) by lowering  $V_H$  and increasing  $V_L$ ; 2) by choosing an aspect ratio for the BBG transistors much lower than the aspect ratio of the transistors used in the logic sub-circuit; 3) by using higher threshold voltage transistors for the MOSFETs in the BBG. All the conclusions here obtained can be easily extended to a generic logic gate.

### 2.2.2 Delay analysis

Without loss of generality, we initially consider that a  $0 \rightarrow 1$  step input transition occurs. During the  $1\rightarrow 0$  output switching, the nMOS transistor is turned on and the inverter gate can be schematized through the equivalent circuit shown in Figure 2.5 (a). Here,  $R_N$  represents the effective resistance of the nMOS, forward body biased by the BBG sub-circuit, and  $C_{TOT}$  is the overall capacitance on the output node which is given by the sum of the load capacitance  $C_{LOAD}$ , the input capacitance  $C_{IN,BBG}$  of the additional BBG circuit and the internal subthreshold capacitance [54]  $C_{INT}$  of the logic sub-circuit. Modeling the output discharging as shown in Figure 2.5 (a), the H to L delay can be expressed as [55]:

$$\tau_{HL} = \ln(2)C_{TOT}R_N,$$

(2.12)

where  $R_N$  is evaluated considering the  $V_{OUT}$  transition from  $V_{DD}$  to  $V_{DD}/2$  [55]:

$$R_N = \frac{2}{V_{DD}} \int_{V_{DD}/2}^{V_{DD}} \frac{V_{OUT}}{I_N} dV_{OUT}$$

(2.13)

where  $V_{out} = V_{DS,N}$  and  $I_N$  represent the drain-to-source voltage and the drain current of transistor  $M_N$ , respectively. The expression of the drain current can be evaluated by using (2.1) and (2.3) and imposing  $V_{GS,N} = V_{DD}$ ,  $V_{DS,N} = V_{OUT}$  and  $V_{BS,N} = V_B$ . For supply voltages higher than about 200 mV, the contribution due to the term  $e^{-V_{DS,N}/n_N V_T}$  can be neglected

without loss of accuracy. This is because, for  $V_{OUT}$  falling from  $V_{DD}$  to  $V_{DD}/2$ ,  $V_{DS,N}$  is always higher than  $4V_T$  (~104 mV @ 27°C) [33].



Figure 2.5: Equivalent circuits to model inverter switching in the case of an L to H (a) and H to L (b) input transition

In order to obtain an easier expression for  $V_B$ , the gain of the push-pull amplifier is here approximated by its DC value. Thus, according to the previous considerations, the body voltage  $V_B$  during the H to L and L to H transition can be expressed as:

$$V_B \approx \begin{cases} V_H - A(V_{DD} - V_{OUT}) & H \to L \\ V_L + AV_{OUT} & L \to H \end{cases},$$

$$(2.14)$$

where *A* is the DC gain of the push-pull amplifier (i.e. the BBG). Using (2.14) to compute  $I_N$  and solving the integral in (2.13), the H to L delay given by (2.12) becomes equal to:

Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits

$$\tau_{HL} = \ln(2)C_{TOT} \frac{2}{V_{DD}} \frac{1}{k_{HL}k_N^2} \times \left[ e^{-\frac{V_{DD}}{2}k_N} \left( \frac{V_{DD}}{2} k_N + 1 \right) - e^{-V_{DD}k_N} \left( V_{DD} k_N + 1 \right) \right]$$
(2.15)

where  $k_{HL} = \beta_N e^{\frac{V_{DD} - V_{TH,N}^0 + \lambda_B^N (V_H - V_{DD})}{n_N V_T}}$  and  $k_N = \left(\lambda_D^N + A\lambda_B^N\right) / n_N V_T$ .

The L to H delay can be evaluated in a similar way. The charging phase of the capacitance  $C_{TOT}$  is modeled as shown in Figure 2.5 (b), where the current flowing through the transistor  $M_P$  is obtained from (2.2) and (2.4) with the conditions:  $V_{SG,P} = V_{DD}$ ,  $V_{SD,P} = V_{DD} - V_{OUT}$  and  $V_{SB,P} = V_{DD} - V_B$ . Also in this case the contribution due to the term  $e^{-V_{SD,P}/n_PV_T}$  can be neglected. Finally, the L to H delay results to be:

$$\tau_{LH} = \ln(2)C_{TOT} \frac{2}{V_{DD}} \frac{1}{k_{LH}k_P^2} \times \left[ e^{\frac{V_{DD}}{2}k_P} \left( \frac{V_{DD}}{2}k_P + 1 \right) - \left( V_{DD}k_P + 1 \right) \right],$$
(2.16)

where 
$$k_{LH} = \beta_P e^{\frac{V_{DD} - |V_{1H,P}^0| + \lambda_D^P V_{DD} + \lambda_B^P (V_{DD} - V_L)}{n_P V_T}}$$
 and  $k_P = (\lambda_D^P + A \lambda_B^P) / n_P V_T$ .

As previously done for the leakage current, we evaluate the speed improvement offered by the proposed scheme in comparison to the conventional static CMOS solution. In the case of a static CMOS inverter, expressions for H to L and L to H delays can be obtained from (2.15) and (2.16), respectively by canceling the effect of the FBB on the switching device (i.e.  $V_H = 0, V_L = V_{DD}, A = 0$ ). Thus, we find:

$$\tau_{HL}^{CONV} = \ln(2)C \frac{2}{V_{DD}} \frac{1}{k_{HL,C} k_{N,C}^2} \times \left[ e^{-\frac{V_{DD}}{2} k_{N,C}} \left( \frac{V_{DD}}{2} k_{N,C} + 1 \right) - e^{-V_{DD} k_{N,C}} \left( V_{DD} k_{N,C} + 1 \right) \right],$$
(2.17)

$$\tau_{LH}^{CONV} = \ln(2)C \frac{2}{V_{DD}} \frac{1}{k_{LH,C} k_{P,C}^2} \times \left[ e^{\frac{V_{DD}}{2} k_{P,C}} \left( \frac{V_{DD}}{2} k_{P,C} + 1 \right) - \left( V_{DD} k_{P,C} + 1 \right) \right],$$
(2.18)

where  $k_{HL} = \beta_N e^{\frac{V_{DD} - V_{TH,N}^0}{n_N V_T}}$ ,  $k_N = \lambda_D^N / n_N V_T$ ,  $k_{LH} = \beta_P e^{\frac{V_{DD} - |V_{TH,P}^0| + \lambda_D^P V_{DD}}{n_P V_T}}$  and  $k_P = \lambda_D^P / n_P V_T$ . It is worth noting that, in both (2.17) and (2.18), the output capacitance of the static CMOS inverter is indicated with  $C = C_{LOAD} + C_{INT}$  instead of  $C_{TOT} = C_{LOAD} + C_{IN,BBG} + C_{INT}$  in order to take into account the reduced output loading capacitance (i.e. the BBG loading capacitance is not included). From the ratio between (2.15) and (2.17), the delay reduction during the H to L transition is given by:

$$\frac{\tau_{HL}^{CONV}}{\tau_{HL}} = \frac{C}{C_{TOT}} \left( 1 + A \frac{\lambda_B^N}{\lambda_D^N} \right)^2 e^{\frac{\lambda_B^N V_H}{n_N V_T}} \times \frac{\left[ e^{-\frac{V_{DD}}{2} k_{N,C}} \left( \frac{V_{DD}}{2} k_{N,C} + 1 \right) - e^{-V_{DD} k_N} \left( V_{DD} k_{N,C} + 1 \right) \right]}{\left[ e^{-\frac{V_{DD}}{2} k_N} \left( \frac{V_{DD}}{2} k_N + 1 \right) - e^{-V_{DD} k_N} \left( V_{DD} k_N + 1 \right) \right]},$$
(2.19)

Similarly, the delay reduction during the L to H transition is obtained from the ratio between (2.16) and (2.18):

Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits

$$\frac{\tau_{LH}^{CONV}}{\tau_{LH}} = \frac{C}{C_{TOT}} e^{\frac{\lambda_B^P(V_{DD} - V_L)}{n_P V_T}} \left(1 + A \frac{\lambda_B^P}{\lambda_D^P}\right)^2 \times \frac{\left[e^{\frac{V_{DD}}{2}k_{P,C}} \left(\frac{V_{DD}}{2}k_{P,C} + 1\right) - \left(V_{DD}k_{P,C} + 1\right)\right]}{\left[e^{\frac{V_{DD}}{2}k_P} \left(\frac{V_{DD}}{2}k_P + 1\right) - \left(V_{DD}k_P + 1\right)\right]}.$$
(2.20)

From (2.19) and (2.20), the delay reduction during H to L and L to H output transitions is enhanced by raising  $V_H$  and lowering  $V_L$ , respectively. In addition, since  $C_{TOT} \approx C + C_{IN,BBG}$ , the speed advantage of the proposed technique is increased if the input capacitance of the push-pull amplifier is minimized.

### 2.3 Design Criteria and Analysis Validations

The analytical modeling previously discussed is here exploited to define proper design criteria for both logic sub-circuit and BBG. From (2.10) and (2.11), a way to limit the static current flowing in the BBG consists of increasing the threshold voltage of devices employed in the push-pull amplifier. For this reason, both  $M_{N(BBG)}$  and  $M_{P(BBG)}$  are chosen as high threshold voltage (HVT) transistors. Note that this choice emphasizes the inherent limitation of the static current in the BBG due to the reverse body biasing of its transistors  $V_{BS,N}$  and  $V_{SB,P}$  are always less than zero for  $M_{N(BBG)}$  and  $M_{P(BBG)}$ , respectively). Furthermore, since HVT transistors in the referred CMOS technology have the same oxide thickness (i.e. the same  $C_{OX}$ ) but different doping profile in comparison to standard threshold voltage (SVT) devices, this choice does not have any additional impact on the gate loading capacitance. On the other hand, to guarantee higher gate speed, standard threshold voltage (SVT) transistors are employed in the logic section of the gate. Since different threshold voltage devices and triple well option are widely available in modern foundry technologies [55].

From (2.19) and (2.20), the input capacitance of the BBG has to be minimized to further improve the gate speed. To this aim, the minimum sizing (W=120 nm and L=40 nm) allowed by the chosen design kit is used for both  $M_{N(BBG)}$  and  $M_{P(BBG)}$ . This has also a beneficial impact on the leakage current (because of the reduced W). It is worth noting that the option of lowering leakage current of the logic sub-circuit by decreasing  $V_H$  and increasing  $V_L$  was avoided since it has an adverse impact in terms of speed.

Sizing the BBG for the minimum static current and the minimum input capacitance leads to the obvious condition of  $V_L \neq V_{DD} - V_H$  This effect, in conjunction with the difference in mobility, DIBL and body coefficients of the nMOS and pMOS transistors employed in the logic sub-circuit, leads to an implicit asymmetry between the H to L and L to H gate responses. However, as for conventional CMOS logic gates, such asymmetry in the response can be easily compensated by proper sizing transistors in the logic sub-circuit.

In order to validate our theoretical analysis, we compared the predicted results with the simulation data obtained by Cadence Spectre. To ensure subthreshold operation, all comparisons discussed in this paper are performed imposing a power supply voltage ( $V_{DD}$ ) of 300 mV. Figure 2.6 shows the predicted (i.e. evaluated by (2.5)) and the simulated DC transfer function for the BBG. In particular, the predicted value for  $V_H$  ( $V_L$ ) is 261.2 mV (53.2 mV) instead of the simulated value of 261.9 mV (56.6 mV). This small difference is due to the fact that the contribution of the terms 1-

$$e^{\frac{V_{DS,N}}{V_T}}$$
 and  $1 - e^{\frac{V_{SD,P}}{V_T}}$  was neglected in (2.5). In fact, since  $V_{DS,N}$  and  $V_{SD,P}$  are

both lower than  $4V_T$ , these terms are slightly lower than 1. As shown in Figure 2.6, the proposed sizing for the BBG leads the high logic state to be transferred slightly better than the low logic state.



Figure 2.6: Predicted versus simulated BBG DC transfer function

As explained before, the DC analysis of the BBG provides the background needed for the leakage current evaluation. Figure 2.7 shows a very good agreement between the predicted and the simulated leakage currents due to the pull-down (Figure 2.7(a)) and pull-up (Figure 2.7(b)) transistors. As predicted by (2.6) and (2.7) the impact of the BBG leakage is about 1.1% of the overall leakage current, thus confirming the goodness of the design criteria used to limit the static current of the BBG.

In order to validate the developed delay analysis, the H to L and the L to H delays were evaluated for different transistor widths by using (2.16) and (2.19), respectively. In such expressions technological parameters (i.e. current factor  $\beta$ , DIBL and body coefficients,...) were extracted using the Ca-

dence Spectre simulator. Figure 2.8 compares the predicted and the simulated delay data. Percentage errors are also reported. All the results were obtained considering a channel width for the OFF transistors equal to 120 nm, a load capacitance of 2.5 fF, typical-typical (TT) process corner and an operating temperature of 27 °C. By inspecting Figure 2.8, it is easy to observe that the predicted delay values track well the simulation results. Mean errors are 11.5% and 7.1% for L to H delay and H to L delay, respectively. It is worth noting that our delay modeling also provides useful guidelines for sizing the transistors in the logic sub-circuit. By exploiting the results obtained from (2.15) and (2.16), the optimum transistor sizing ratio  $W_P/W_N$  which ensures the condition of near equal delay on the minimum sized transistors ( $W_N = 0.24 \ \mu$ m) in the two output transitions is predicted to be 1.28 which is very close to the simulated value equal to 1.33. This means a difference of only 3.8 % between the simulated and the predicted dimensioning for the transistors in the pull-up network.

Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits



Figure 2.7: Predicted versus simulated leakage currents for low (a) and high (b) input logic value



Figure 2.8: Simulated vs predicted L to H (a) and H to L (b) delay results (channel width of the OFF transistor is equal to 120 nm)

Figure 2.9 reports the predicted and the simulated delay values for the L to H and the H to L transition for different values of the load capacitance. To evaluate the goodness of the design obtained by using the proposed modeling instead of the use of Cadence Spectre simulator, in Figure 2.9 the predicted results are obtained for  $W_P/W_N=1.28$  (predicted optimal) and compared with the simulated delay values obtained assuming the simulated optimal value of 1.33. Comparing the predicted and the simulated results, the mean error in the L to H transition is equal to 9%, 2.5 % and 2.4 % for  $C_{load}$  equal to 1.2 fF, 2.5 fF, and 5 fF, respectively. In the case of the H to L transition, the mean error becomes 9.1% @ 1.2 fF, 2.4% @ 2.5 fF and 0.8% @ 5 fF.

In Figure 2.10 the proposed modeling is further validated for different process corners and temperatures. In particular delay results are extracted for the typical-typical (TT) process corner at the standard temperature of 27 °C, the fast-fast (FF) process corner at the temperature of 100°C and the slow-slow (SS) process corner at the temperature of -25°C. All the results are obtained considering  $C_{load} = 2.5 fF$  and  $W_P/W_N=1.28$ . For the L to H transition the observed mean error (absolute value) is 4.1 %, 4.6 % and 5.1 % for the (TT,27°C), (FF,100°C) and (SS,-25°C) conditions, respectively. Similar percentage errors are obtained in the case of H to L transition.

Table 2.1 and Table 2.2 compare predicted (based on analytically optimization, i.e.  $W_P/W_N=1.28$ ) and simulated (based on only a simulation optimization, i.e.  $W_P/W_N=1.33$ ) delay results for different operating conditions and for a load capacitance of 2.5 fF. Again, the good agreement between the simulated and predicted results confirms the goodness of the proposed approach for sizing the logic sub-circuit.

Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits



Figure 2.9: Comparison between the predicted (WP/WN = 1.28) and simulated (WP/WN = 1.33) L to H (a) and H to L delay (b) for different values of the load capacitance



Figure 2.10: Corner analysis for the L to H (a) and H to L (b) output transitions (WP/WN=1.28 and Cload=2.5 fF)

Table 2.1: H to L inverter delay comparison for different process corner and temperatures. inverter delay comparison for different process corner

| $W_p/W_n = \beta$ | TT @ 27°C      |                | FF @ 100°C     |                | SS @ -25°C     |                |
|-------------------|----------------|----------------|----------------|----------------|----------------|----------------|
|                   | Pred.          | Sim.           | Pred.          | Sim.           | Pred.          | Sim.           |
| W <sub>N</sub>    | $\beta = 1.28$ | $\beta = 1.33$ | $\beta = 1.28$ | $\beta = 1.33$ | $\beta = 1.28$ | $\beta = 1.33$ |
| (µm)              | [ns]           | [ns]           | [ns]           | [ns]           | [ns]           | [ns]           |
| 0.24              | 12.5           | 12.5           | 1.2            | 1.2            | 262            | 263            |
| 0.5               | 5.7            | 5.7            | 0.5            | 0.5            | 130            | 1.30           |
| 0.7               | 4.1            | 4.2            | 0.4.           | 0.4            | 97             | 97.5           |
| 1                 | 3.1            | 3.1            | 0.3            | 0.3            | 73.5           | 73.9           |

and temperatures.

 Table 2.2: L to H inverter delay comparison for different process corner and temperatures.

| $W_p/W_n = \beta$ | TT @ 27°C      |                | FF @           | 100°C          | SS @ -25°C     |                |
|-------------------|----------------|----------------|----------------|----------------|----------------|----------------|
|                   | Pred.          | Sim.           | Pred.          | Sim.           | Pred.          | Sim.           |
| W <sub>N</sub>    | $\beta = 1.28$ | $\beta = 1.33$ | $\beta = 1.28$ | $\beta = 1.33$ | $\beta = 1.28$ | $\beta = 1.33$ |
| (µm)              | [ns]           | [ns]           | [ns]           | [ns])          | [ns]           | [ns]           |
| 0.24              | 12.6           | 12.2           | 1.2            | 1.1            | 296            | 285            |
| 0.5               | 6.7            | 6.4            | 0.6            | 0.6            | 158            | 153            |
| 0.7               | 5.1            | 4.9            | 0.5            | 0.5            | 121            | 117            |
| 1                 | 3.9            | 3.8            | 0.4            | 0.4            | 93             | 90             |

### 2.4 Logic Gates with Stacked Transistors

In this section, the developed delay analysis is extended to logic gates with transistor stacks, such as NAND2 and NOR2.

# NAND2

Figure 2.11 shows the schematic of the proposed NAND2 gate. When  $V_{OUT}$  undergoes a L to H output transition, the worst case delay occurs

# Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits

when the charging of the total output capacitance is driven by just one of the pMOS transistors in the pull-up network. Consequently, the L to H delay modeling can be reduced to that already described for the inverter gate when the pMOS is switched on. On the contrary, the delay modeling for the H to L output transition cannot be easily traced to the switching of a single nMOS with equivalent width. During the H to L output switching, the voltage  $V_X$  at the intermediate node of the pull-down network is greater than 0 V [56], thus reducing the overdrive voltage of the upper transistor in the stack. This effect, which is particularly severe in the subthreshold regime due to the exponential relationship between the drain current and the gate-to-source voltage, has been properly taken into account in the following analysis. The starting point to find an analytical expression for  $V_X$  is obtained by equating the currents flowing in the two stacked transistors:

$$I_{N,1} = \beta_{N,1} e^{\frac{V_{DD} - V_X - V_{TH,N(1)}}{n_N V_T}} \left[ 1 - e^{\frac{V_{OUT} - V_X}{V_T}} \right] = \beta_{N,2} e^{\frac{V_{DD} - V_{TH,N(2)}}{n_N V_T}} \left[ 1 - e^{\frac{V_X}{V_T}} \right] = I_{N,2}.$$

$$(2.21)$$

Typically,  $V_X \approx 0.1 V_{DD}$  [56], thus the term  $e^{-(V_{OUT}-V_X)/V_T}$  can be neglected without any significant penalty in terms of model accuracy. Using this simplification, (2.21) can be rewritten as:

$$\frac{W_{N,1}}{W_{N,2}} e^{\frac{V_{TH,N(2)} - V_{TH,N(1)}}{n_N V_T}} \approx e^{\frac{V_X}{n_N V_T}} \left[ 1 - e^{-\frac{V_X}{V_T}} \right].$$
(2.22)

Since  $M_{N,1}$  and  $M_{N,2}$  have the same zero bias threshold voltage, the difference  $V_{TH,N(2)}$ - $V_{TH,N(1)}$  only depends on the difference between the drainsource and body-source polarizations. Replacing in (2.3) the conditions:  $V_{DS}=V_{OUT}-V_X$ ,  $V_{BS}=V_B-V_X$  for  $M_{N,1}$  and  $V_{DS}=V_X$ ,  $V_{BS}=V_B$  for  $M_{N,2}$ , the difference between the two threshold voltages is given by:

$$V_{TH,N(1)} - V_{TH,N(2)} = \lambda_D^N V_{OUT} - V_X \left( 2\lambda_D^N + \lambda_B^N \right)$$
(2.23)

$$e^{\frac{V_{X}}{n_{N}V_{T}}} \left[ 1 - e^{-\frac{V_{X}}{V_{T}}} \right] \approx a_{X} e^{b_{X}V_{X}},$$

$$(2.24)$$

For a fixed value of temperature, the second term in (2.22) can be approximated as:

where  $a_x$  and  $b_x$  are two fitting parameters which depend on the considered voltage range for  $V_X$ . For example, assuming  $V_X$  in the range from 25 mV to 50 mV the value of  $a_x$  and  $b_x$  are equal to 0.487 and 35, respectively. Replacing (2.23) and (2.24) into (2.22), the following expression for  $V_X$  is obtained:

$$V_X \approx \frac{n_N V_T \left[ \ln \left( \frac{W_{N,1}}{W_{N,2}} \right) - \ln(a_x) \right] + \lambda_D^N V_{OUT}}{n_N V_T b_x + 2\lambda_D^N + \lambda_B^N},$$
(2.25)

where  $W_{N,1}$  and  $W_{N,2}$  are the channel width of transistor  $M_{N,1}$  and  $M_{N,2}$ , respectively. As demonstrated in [56], only negligible improvements in terms of current driving are achieved by considering skewed sizing for transistors in the stack. For this reason, the condition  $W_{N,1}=W_{N,2}$  is assumed for the rest of this analysis. Such a choice also reduces the design complexity [56]. With reference to the upper transistor  $M_{N,1}$ , the driving cur-

rent flowing in the stack, during the H to L output transition, can be expressed as:

$$I_{stack} \approx \beta_{N,1} e^{\frac{V_{DD} - V_X - V_{TH,N(1)}}{n_N V_T}}$$

$$SF = \frac{W_{stack}}{W_{\sin gle}} \approx \frac{C_{NAND}}{C_{INV}} e^{\frac{V_X \left(1 + \lambda_D^N + \lambda_B^N\right)}{n_N V_T}}.$$

(2.27)

From (2.26) the stacking factor (SF) which ensures the condition of equal delay between the NAND2 gate and the inverter in the H to L delay is equal to:

In (2.27)  $C_{NAND}$  is the total output capacitance in the NAND2 gate while  $C_{INV}$  represents the total output capacitance in the inverter gate. According to (2.27), two stacked transistors should be sized up by SF in relation to a single device for similar current drivability. Moreover, (2.27) shows that *SF* depends exponentially on voltage  $V_X$ , which according to (2.25), mainly depends on  $V_{OUT}$ , DIBL coefficient, body coefficient and subthreshold slope factor. This means that it is quite difficult to ensure the equality



Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits

between the driving current in the stacked and non-stacked configurations during the whole transition. However, a proper sizing strategy for the stacked transistors is needed. Sizing for  $V_{OUT}=V_{DD}$  ensures the condition of similar driving current only at the beginning of the output falling transition while in the rest of the transition a higher current (note that higher current leads to a lower H to L delay but also to higher leakage) is observed in the stacked nMOS pair. At the same time sizing for  $V_{OUT}=V_{DD}/2$  leads to higher delay since the current in the two stacked nMOS transistors results to be lower than the current in the single nMOS. To ensure a good trade-off between leakage and delay, *SF* is here evaluated for the condition of  $V_{OUT}=3/4$   $V_{DD}$  (i.e. in the middle of the H to L transition useful for delay calculation). From (2.27) the current factor for the pull-down in the NAND2 gate is equal to  $\beta_{NAND}=SF^*\beta_N$ , where  $\beta_N$  represents the current factor for the pull-up transistor in the inverter gate. Rewriting  $V_X$  as  $V_X=A_N+B_NV_{OUT}$ , the expression of the delay for the H to L transition in the NAND2 gate becomes equal to:

$$\tau_{HL,NAND} = \frac{2}{V_{DD}} \frac{\ln(2)C_{TOT}}{I_{0,NAND}} \int_{V_{DD}/2}^{V_{DD}} \frac{V_X}{\exp(\lambda_{NAND}V_{OUT})} dV_{OUT} = \frac{\ln(2)C_{TOT}}{I_{0,NAND}\lambda_{NAND}^2} \frac{2}{V_{DD}} \times \left\{ e^{-\lambda_{NAND}V_{DD}} \left(\lambda_{NAND}A_N + \lambda_{NAND} \left(B_N - 1\right)V_{DD} + \left(B_N - 1\right)\right) - e^{-\lambda_{NAND}\frac{V_{DD}}{2}} \left(\lambda_{NAND}A_N + \lambda_{NAND} \left(B_N - 1\right)\frac{V_{DD}}{2} + \left(B_N - 1\right)\right) \right\},$$

(2.28)

where:

$$I_{0,NAND} = \beta_{NAND} e^{\frac{V_{DD} - A_N - V_{DH,N}^0 - \lambda_D A_N + \lambda_B (V_H - AV_{DD} - A_N)}{n_N V_T}}, \lambda_{NAND} = \frac{-B_N + \lambda_D - \lambda_D B_N - \lambda_B A - \lambda_B B}{n_N V_T}$$
(2.29)

To validate the above analysis the predicted and the simulated H to L delay values were compared for different channel widths and load capacitances in Figure 2.12. Predicted delay values are obtained considering the predicted optimal value of *SF* (=3), while the simulated delays are extracted for the optimal simulated *SF* (=3.1). Again, a good agreement between the predicted simulated delay values is observed. In particular, the recorded mean error is 6.5% @  $C_{load}$ =1.2 fF, 4.1% @  $C_{load}$ =2.5 fF and 2.7 % @  $C_{load}$ =5 fF.

Figure 2.13 compares simulated and predicted delay values of the H to L delay for different process corners and temperatures. All the results are obtained considering  $C_{load}$ =2.5 fF and *SF*=3.The mean error is 3.6%, 5.5% and 3.5% for the (TT, 27°C), (FF, 100°C) and (SS,-25°C) conditions, respectively.

The validity of the proposed criterion for sizing devices belonging to the pull-down network is further confirmed by results given in Table 2.3

which compares simulation results obtained for SF= 3.1 with the delay data predicted by our modeling (SF= 3).



Figure 2.12: Comparative NAND2 delay results for different values of the load capaci-

tance



Figure 2.13: Corner analysis for the H to L NAND 2 output transition (SF=3 and Cload=2.5 fF)

Table 2.3: NAND2 delay comparison for different process corners and

| temperatures |
|--------------|
|--------------|

| W      | <sub>N</sub> (μm) | TT @   | 27°C     | FF @   | ⊉ 100°C  | SS @   | ∂ -25°C  |
|--------|-------------------|--------|----------|--------|----------|--------|----------|
| Pred.  | Sim.              | Pred.  | Sim.     | Pred.  | Sim.     | Pred.  | Sim.     |
| (SF=3) | (SF=3.1)          | (SF=3) | (SF=3.1) | (SF=3) | (SF=3.1) | (SF=3) | (SF=3.1) |
| (31-3) | (31–3.1)          | [ns]   | [ns]     | [ns]   | [ns]     | [ns]   | [ns]     |
| 0.35   | 0.36              | 23.7   | 24.6     | 2      | 2        | 570    | 590      |
| 0.48   | 0.5               | 17.8   | 17.3     | 1.5    | 1.4      | 437    | 443      |
| 0.67   | 0.7               | 12.6   | 12.6     | 1.     | 1        | 320    | 331      |
| 1.15   | 1.2               | 0.8    | 0.8      | 0.7    | 0.7      | 214    | 221      |

# NOR2

In analogy with the case of the NAND2 gate, in this section the proposed modeling is extended to the NOR2 gate (Figure 2.14). In this case the H to L delay modeling can be reduced to that already described for the inverter gate when the nMOS is switched on, while a set of new equations are necessary to describe the L to H output transition. From Figure 2.14, the worst case delay occurs as a consequence of the falling transitions of the input signals *A* and *B*. In this case the current which flows in the  $M_{P,1}$  and  $M_{P,2}$  is equal to:

$$I_{P,1} = \beta_{P,1} \exp\left(\frac{V_{DD} - |V_{TH,1}|}{n_P V_T}\right) \left[1 - \exp\left(-\frac{V_{DD} - V_X}{V_T}\right)\right],$$

$$I_{P,2} = \beta_{P,2} \exp\left(\frac{V_X - |V_{TH,2}|}{n_P V_T}\right) \left[1 - \exp\left(-\frac{V_X - V_{OUT}}{V_T}\right)\right]$$
(2.30)

Imposing the condition of equal current factors for the two stacked pMOS transistors ( $\beta_{P,1}=\beta_{P,2}$ ), the following condition holds:

$$I_1 = I_2 \rightarrow \exp\left(-\frac{V_X}{n_P V_T}\right) \left[1 - \exp\left(\frac{V_X}{V_T}\right)\right] = \exp\left(\frac{-V_{DD} + \left|V_{TH,1}\right| - \left|V_{TH,2}\right|}{n_P V_T}\right).$$
(2.31)

As in the case of the NAND2 gate, for a fixed value of temperature, the first term in (2.31) can be approximated as  $a_X e^{b_X V_X}$ . As a consequence the  $V_X$  can be expressed as:

$$V_X = \frac{V_{DD} (1 + \lambda_D + \lambda_B) + n_P V_T \ln(a_x) + \lambda_D V_{OUT}}{2\lambda_D + \lambda_B - n_P V_T b_x} = C_X + D_X V_{OUT},$$

(2.32)

which gives the following value for the stacking factor in the NOR2:

$$SF = \frac{W_{NOR}}{W_{SINGLE}} \approx \frac{C_{NOR}}{C_{INV}} \exp\left(\frac{\left(V_{DD} - V_X\right)\left(1 + \lambda_D + \lambda_B\right)}{n_P V_T}\right)$$
(2.33)

where  $C_{NOR}$  is the total output capacitance in the NOR2 gate, while  $C_{INV}$  is the total output capacitance in the inverter gate. In analogy with the case of NAND2 gate, SF is here evaluated for the condition of  $V_{OUT}=1/4V_{DD}$ . From (2.33) the current factor for the pull-up in the NOR2 gate is equal to  $\beta_{NOR}=SF*\beta_P$ , where  $\beta_P$  represents the current factor for the pull-up transistor in the inverter gate. Thus considering the expression of the current flowing in the transistor  $M_{P,2}$ , the delay for the L to H transition becomes equal to:

$$\tau_{LH} = \frac{2}{V_{DD}} \frac{\ln(2)C_{TOT}}{I_{0,NOR}} \int_{0}^{V_{DD}/2} \frac{V_X - V_{OUT}}{\exp(\lambda_{NOR}V_{OUT})} dV_{OUT} = \\ = \frac{\ln(2)C_{TOT}}{I_{0,NOR}\lambda_{NOR}^2} \left\{ C_X \lambda_{NOR} + (D_X - 1) - e^{-\frac{\lambda_{NOR}V_{DD}}{2}} \left( C_X \lambda_{NOR} + (D_X - 1) \left( \frac{\lambda_{NOR}V_{DD}}{2} + 1 \right) \right) \right\}$$

(2.34)

where

$$I_{0,NOR} = \beta_{NOR} \exp\left(\frac{C_X \left(1 + \lambda_D + \lambda_B\right) - \left|V_{TH,0}\right| - \lambda_B V_L}{n_P V_T}\right), \lambda_{NOR} = \frac{D_X \left(1 + \lambda_D + \lambda_B\right) - \lambda_D - \lambda_B A}{n_P V_T}.$$
(2.35)

Figure 2.15 reports the results of the comparison between the predicted and the simulated values of delay assuming different load capacitances for  $V_{DD}$ =0.3V at the nominal temperature of 27 °C and for the TT process corner. The predicted optimal stacking factor, evaluated in according to (2.33)

is 3.2, slightly lower than the simulated optimal value of 3.4. For the minimum load capacitance of 1.2 fF the mean error is equal to 5.7%, while a mean error of 3.3% and 2.9% is observed in the case of  $C_{load}$  equal to 2.5 fF and 5 fF respectively. Finally, Figure 2.16 and Table 2.4 show the comparison between the simulated and the predicted values of the delay for different process corners and temperatures. The recorded mean error is again quite low in all the evaluated conditions. After sizing the pull-up network of the NOR2 to obtain similar delay to that of the equivalent inverter gate, the pull-down network can be designed according to the guidelines previously suggested for the pull-down of the inverter gate.



Figure 2.14: Proposed NOR2 gate

Analytical Modeling for Dynamic Gate – Level Body Biased Logic Circuits



Figure 2.15: Comparative NOR2 delay results for different values of the load capacitance



Figure 2.16: Corner analysis for the L to H NOR 2 output transition (SF=3.2 and Cload=2.5 fF)

| W <sub>P</sub> | (µm)     | TT @ 27°C |          | FF @ 100°C |          | SS @ -25°C |          |
|----------------|----------|-----------|----------|------------|----------|------------|----------|
| Pred.          | Sim.     | Pred.     | Sim.     | Pred.      | Sim.     | Pred.      | Sim.     |
| (SF=3.4)       | (SF=3.2) | (SF=3.4)  | (SF=3.2) | (SF=3.4)   | (SF=3.2) | (SF=3.4)   | (SF=3.2) |
| (31-3.4)       | (31-3.2) | [ns]      | [ns]     | [ns]       | [ns]     | [ns]       | [ns]     |
| 0.53           | 0.51     | 21.5      | 22.3     | 2          | 2        | 502        | 52       |
| 0.79           | 0.76     | 15.3      | 15.8     | 1.4        | 1.5      | 360        | 369      |
| 1.06           | 1.02     | 12.2      | 12.6     | 1.1        | 1.2      | 284        | 294      |
| 1.23           | 1.18     | 10.8      | 11.2     | 1          | 1        | 253        | 261      |

### temperatures

# 2.5 Final Remarks on Design Criteria

In this chapter, the gate level body biasing technique, recently proposed for designing high-speed subthreshold logic gates, was analytically justified. The analytical modeling previously exploited defined proper design criteria for both logic sub-circuit and BBG: For the first proper sizing criteria has been defined according to the delay model previously discussed. For the BBG the static current and input capacitance should be limited by choosing minimum sized and HVT transistors offered by the technology. As an extension of the proposed modeling to logic gates with stacked transistors, the NAND2 gate and the NOR2 gate were also considered. The goodness of the proposed analysis has been fully validated by comparing the predicted values with the Cadence Spectre simulation results performed exploiting the ST 45-nm CMOS technology for different process corners and for different temperatures. The good agreement between the predicted and the simulated results confirms the validity of the proposed analysis as very useful aid to the design high-speed subthreshold logic gates.

# Three

# 3 Dynamic Gate – Level Body Biasing Technique in Bulk technologies

This chapter analyzes a mirror full adder (FA) [50], implemented according to the GLBB technique. To validate the proposal, the mirror FA is compared to its equivalent CMOS and DTMOS counterparts. All the FA designs, were laid-out exploiting the ST 45-nm CMOS triple-well technology.

The key rationale for applying body biased circuits at the macro block level is to amortize the silicon area and the body control signal routing complexity of a finer grained implementation. As a drawback, when  $V_{TH}$  is reduced at the block level to compensate for variations and/or to provide a temporary speed boost, leakage power is increased for all the gates in the block,while speed-upwould be needed only on timing critical gates. Better energy-delay trade-offs can be obtained by reducing the body-bias control granularity at the expense of larger silicon area occupancy [42].

In this chapter, the benefits obtained by GLBB technique in terms of delay and energy for basic gates are validated through the design of a mirror Full Adder taking into account physical implementation in bulk CMOS triplewell technology.

At physical level the major limitation of gate-level body biasing techniques is that a large distance between transistors controlled by different gate signals has to be maintained to ensure correct body isolation between differently body-biased devices [57], [58]. This causes not only a higher occupied silicon area but also longer interconnections which in turn degrade speed and energy performances It is worth noting that post-layout analysis is strictly required when adaptive body biasing techniques are used in nanometer technologies. This is because the physical distances needed to provide correct body isolation between differently body-biased devices have a very large impact on delay and energy characteristics of the circuits.

### 3.1 Physical constraints for gate-level body biasing technique

A commonly used basic block to perform many operations in arithmetic logic units (ALU) such as the mirror Full Adder (FA). has been implemented according to the GLBB technique. A comparative analysis to equivalent CMOS and DTMOS exploiting the ST 45-nm CMOS triplewell technology has been performed for different operation conditions at ultra-low voltage regime under different running conditions. A mirror FA designed according to GLBB design technique, shown in Figure 3.1, requires four BBGs to speed-up the switching of logic sub-circuits. This translates in eight additional devices in comparison to CMOS and DTMOS circuits.

Devices belonging the logic sub-sections of compared circuits were sized with minimum channel length (i.e.  $L_{min}$ =40 nm), whereas the pull-up/pulldown channel width ratio was chosen to obtain comparable strength for  $V_{DD}$ =0.3 V and T=27°C, imposing equal width for series-connected transistors.



Figure 3.1: Low voltage mirror FA designed according to the GLBB technique.

In Table 3.1, the width ratio between pull-up and pull-down networks is explicitly reported for the compared designs and for the different stacking configurations. The sizing factor W was chosen by iterative simulations, imposing similar leakage current at nominal conditions (i.e. TT process corner,  $V_{DD}$ =0.3V and T=27°C) for all the compared designs.

Table 3.1: Pull Up/ Pull Down Width RatioTable I.

| Stack Config. | ZBB          | DTMOS        | GLBB        |
|---------------|--------------|--------------|-------------|
| 1             | 1.5 W / W    | W / W        | 1.1W / W    |
| 2             | 4.5W / 2.5W  | 2.5 W / 2.5W | 3.2W /2.5W  |
| 3             | 8.25W / 4.5W | 4.5W / 4W    | 5.5W / 4.5W |

In order to correctly take into account the impact of layout parasitics on performance, the physical design of the compared circuits was carried out

(see Figure 3.2) considering the design rules imposed by the ST 45-nm bulk CMOS triple-well technology. For DTMOS and GLBB designs, the deep Nwell layer was used to shield N-channel devices from the P-type general substrate, thus obtaining Pwell regions isolated from the underlying substrate. Each of these regions is vertically surrounded by an Nwell region to provide also lateral isolation [1, 2]. Due to distances need to provide correct body isolation between differently body-biased devices, implementations exploiting unconventional body-biasing (i.e. DTMOS and GLBB) exhibit significantly increased silicon area occupancy in comparison to the ZBB CMOS circuit. The DTMOS implementation requires one isolated Pwell region for each different transistor gate signal, thus requiring 5 different isolated Pwell islands. On the contrary, in the proposed approach the number of isolated p-type islands is reduced to 4 (i.e. one for each BBG). This, along with the reduced size of its transistors, leads the proposed implementation to reduce silicon area occupancy of more than 50% with respect to the DTMOS design.



Figure 3.2: Layouts of FA for DTMOS (a), CMOS (b) and GLBB (c), respectively.

## Table 3.2: Comparison between ZBB, DTMOS and GLBB schemes at nominal conditions

|                                               | ZBB  | DTMOS | GLBB |
|-----------------------------------------------|------|-------|------|
| Silicon Area [µm <sup>2</sup> ]               | 20.7 | 123.2 | 60.5 |
| Delay [ns]                                    | 0.70 | 0.78  | 0.59 |
| Leakage current [nA]                          | 0.20 | 0.24  | 0.21 |
| Energy per Operation [fJ]<br>(T=80FO4, □=0.2) | 0.75 | 2.27  | 0.57 |

(TT process corner, VDD=0.3V and T=27°C)

Table 3.2 reports post-layout comparison results under nominal simulation conditions. Comparative post-layout delay results, evaluated for  $V_{DD}$  ranging from 0.2V to 0.5V with a voltage step of 0.05V, are shown in Figure 3.3. Given results are normalized with respect the delay of ZBB CMOS design. For  $V_{DD}$ =0.5 V, the suggested approach allows delay to be reduced of 34% and 24% with respect to the standard CMOS and DTMOS implementations, respectively. It is easy to note that as V<sub>DD</sub> decreases below 0.45V, the impact of FBB in boosting the performance is reduced, but with a different rate on GLBB and DTMOS techniques. As final effect of this, the speed benefit brought by the suggested approach over the conventional CMOS circuit reduces down to the 6% for the minimum considered power supply voltage (i.e.  $V_{DD}$ =0.2V). On the contrary, the speed advantages with respect to the DTMOS implementation become more pronounced coming up to 60% for  $V_{DD}$ = 0.2V (the speed boosting on DTMOS due to the FBB is overcome by the negative impact of the body-induced *RC* delay).



Figure 3.3: Delay versus V<sub>DD</sub>



Figure 3.4: Leakage current (log scale) versus V<sub>DD</sub>.

Figure 3.4 reports  $I_{leak}$  versus  $V_{DD}$  for the three compared circuit topologies. Here,  $I_{leak}$  is normalized to the value of CMOS design for  $V_{DD}$ = 0.3 V. Due to the adopted sizing criterion, all the circuits have similar  $I_{leak}$  for  $V_{DD}$ =0.3 V (see Table 3.2). However, this property is not maintained for different power supply voltage levels. As  $V_{DD}$  drops lower than 0.3 V the proposed approach, which benefits of reduced transistors' sizing, leads to the lowest  $I_{leak}$ . On the contrary, the standard CMOS FA exhibits the lowest  $I_{leak}$  for  $V_{DD}$  >0.3V. Note that for  $V_{DD}$  higher than 0.45 V, the parasitic p-n junctions of DTMOS devices start to conduct a not negligible current which dramatically increases leakage power consumption of DTMOS-based FA.

Figure 3.5 and Figure 3.6 depict the energy per operation ( $E_{OP}$ ) behavior versus  $V_{DD}$  for the three compared circuit implementations, evaluated under different running conditions. Results are normalized to energy data obtained for conventional CMOS circuit evaluated under the operating condition of  $V_{DD}$ =0.3V, activity factor ( $\alpha$ ) of 0.2 and clock cycle time ( $T_{clk}$ ) of 80 FO4 (FO4 represents the delay of a CMOS inverter driving four identical inverters), which is typical of low power VLSI circuits [33]. More precisely, Figure 3.5 plots EOP considering  $T_{clk}$ =80 FO4 for  $\alpha$  = 0.1, 0.2 and 0.3.



Figure 3.5: Energy per operation (log scale) for Tclk =80 FO4 and for different activity factors.

Considering the lowest activity factor ( $\alpha$ =0.1), the GLBB solution allows the E<sub>OP</sub> to be reduced in the range 15%-27% and 47%-77% with respect to the CMOS and DTMOS designs, respectively. This is mainly due to the reduced transistors' sizes (see Table 3.1) of the GLBB circuit, which allow decreased total physical capacitances on the internal nodes of the circuit, even taking all the parasitic of the layout into account. Additionally, the proposed body biasing technique allows faster transitions of the gates which in turn diminish the short circuit component in dynamic energy. The above advantages are even emphasized for larger activity factors (i.e. when dynamic energy contribution in the total E<sub>OP</sub> increases). Due to the previous discussed input capacitive drawbacks, the larger devices and the longer interconnections, the DTMOS implementation results to be very energy hungry. Additionally, the bulk bias voltage of DTMOS devices can change also when input transitions do not imply switching of circuit internal nodes. This further increases the dynamic energy consumption due to unnecessary charging/discharging the large body capacitances.

Figure 3.6 shows  $E_{OP}$  versus  $V_{DD}$  when  $\alpha$ =0.2 and for  $T_{clk}$  =50 FO4, 80 FO4 and 100 FO4. It should be noted that, as the leakage energy contribution increases (i.e. when  $T_{clk}$  increases), the suggested solution continues to maintain significant advantages in terms of total energy, also for  $V_{DD}$  higher than 0.3 V.



*Figure 3.6: Energy per operation (log scale) for*  $\alpha$ *=0.2 and different clock cycle times.* 

Figure 3.7 better emphasizes PDP and delay advantages of the proposed FA, when employed in a 16-bit ripple carry adder (RCA). The power of the FA under test is consequently evaluated for maximum frequency of the whole adder (to correctly take into account leakage contribution), whereas

delay is related to the device under test in the FA chain. In the above scenario, the GLBB FA lowers minimum PDP point of 22% and 68% in comparison to the CMOS and DTMOS circuits, respectively. This is achieved with a speed boost of 17% / 66% when compared to the CMOS/DTMOS implementations. Speed and PDP advantages are recorded in the whole power supply range.



Figure 3.7: PDP (log scale) versus delay (log scale) for different V<sub>DD</sub>.

Figure 3.8 describes the behavior of the compared circuits as the temperature varies from -25°C to 100°C for  $V_{DD}$ =0.3 V. As shown in Figure 3.8 (a), all the circuits demonstrate similar leakage currents at low operating temperatures (<25°C). However, as the temperature increases, the leakage current of the DTMOS circuit increases faster than its counterparts, becoming approximately 1.6 times higher for T= 100 °C. Figure 3.8 (b) demonstrates that the GLBB FA in maintains its speed advantages in the whole considered operating temperature range.



Figure 3.8: Temperature variation results ( $@V_{DD} = 300 \text{ mV}$ ) a)leakage current versus temperature; b) delay versus tempera

The impact of process variability was investigated by performing Monte Carlo (MC) simulations on 1000 samples for  $V_{DD} = 0.3$  V and T=27°C. In this analysis, both inter-die and intra-die fluctuations were considered. MC leakage and delay results are given in Figure 3.9 and Figure 3.10, respectively. When compared to its counterparts, the ZBB CMOS circuit exhibits the lowest mean leakage current (-19% and -9% in comparison to the DTMOS and GLBB designs, respectively) with a slight higher leakage current variability ( $\sigma/\mu$ =11% for the CMOS design against  $\sigma/\mu$ =8% and 10.4% for the DTMOS and GLBB solutions). On the other hand, the suggested approach results to be more robust in terms of delay. In fact, MC delay results reported in Figure 3.10 demonstrate that the mirror FA de-

signed according to the proposed design style reaches a mean delay of only 0.5 us, which is about 20% and 28% lower than that of the standard CMOS (0.63 us) and DTMOS (0.7 us) implementations, respectively, while maintaining a delay standard deviation of about 0.21  $\mu$ s.



*Figure 3.9: Monte-Carlo leakage results (VDD = 0.3V, TT process corner and T= 27^{\circ} C)* 



*Figure 3.10:Monte-Carlo delay results (VDD* = 0.3V, *TT process corner and T*=  $27^{\circ}$  *C)* 

#### **3.2** Final Remarks of Gate – Level Body Biasing in bulk technology

In this chapter the advantages of the ULV gate-level body biasing scheme was investigated. A preliminary analysis performed on simple logic gates demonstrates that the speed boosting provided by the suggested approach allows ULV GLBB circuits to reach performances which are unaffordable for both conventional CMOS and DTMOS configurations.

To take into account all the parasitic effects of the gate level body polarization in the case of more complex circuits, a GLBB mirror full adder was laid-out and compared against its conventional CMOS and DTMOS counterparts. Post-layout simulation results have shown that the GLBB design style is, at the parity of leakage power consumption, able to obtain significantly higher performance with reduced total energy per operation consumption in comparison to conventional CMOS and DTMOS implementations.

Depending on power supply voltage level, the GLBB FA allows delay to be reduced in the ranges 6% - 34% and 24% - 40% in comparison to the ZBB CMOS and DTMOS circuits, respectively. This is achieved also saving energy per operation. As an example, for an 80 FO4 clock cycle period and activity factor of 10%, the GLBB circuit reduces energy per operation in the range 15%-27% and 47%-77% with respect to the ZBB CMOS and DTMOS FAs. Such energy and speed advantages are obtained at the expense of increased silicon area occupancy in comparison to a conventional ZBB CMOS design, but reducing area occupancy of about two times with respect to the DTMOS implementation. Additionally, the GLBB FA maintains a high level of robustness against temperature and process variations. The silicon area required by the GLBB full adder is halved with respect to the equivalent DTMOS implementation, but it is higher in comparison to conventional CMOS design. Finally, performed Monte Carlo simulations prove that the GLBB solution exhibits a high level of robustness against temperature fluctuations and process variations.



## 4 Dynamic Gate – Level Body Biasing Technique in UTBB FD – SOI technologies

The GLBB technique is implemented and evaluated in 28 nm ultra-thin body and box (UTBB) fully-depleted silicon-on-insulator (FD- SOI) technology for ULV logic design. The inherent benefits of the low-granularity body-bias control, provided by the GLBB approach, are emphasized by the efficiency of forward body bias (FBB) in the FD-SOI technology. In addition, the possibility to integrate PMOS and NMOS devices into a single common well configuration allows significant area reduction, as compared to an equivalent triple well bulk implementation.

As demonstrated in the previous chapters, the GLBB technique clearly overcomes the speed and energy limits of ULV DTMOS logic gates in triple-well bulk CMOS designs. However, despite the above benefits, silicon area occupancy of GLBB circuits remains larger than conventional CMOS solutions albeit essentially reduced compared to the equivalent DTMOS implementations. In this chapter the GLBB technique is implemented and evaluated in 28 nm STM UTBB FD-SOI technology for ULV logic design. The unique feature offered by the technology to integrate PMOS and NMOS devices into a common well configuration [59]–[62] has been exploited achieving improvements in terms area. In order to demonstrate the potential of the suggested approach, the GLBB technique was compared to the standard CMOS and DTMOS solutions.

#### 4.1 UTBB FD – SOI Technology Overview

As discussed in the first chapter continuous downscaling of bulk CMOS technology has been used as a main strategy to increase computational speed and integration density. However, for each technology node leakage current has continuously increased due to the short channel effects [63]–[65]. Additionally, at nanometer scale is very challenging to provide optimal levels of robustness against process variability. In this context, the ultra-thin body and box (UTBB) fully depleted silicon on insulator (FD-SOI) technology has been identified as a promising candidate for future downscaled transistors [66]–[70]. The UTBB FD-SOI overcomes the limits of conventional bulk technology by controlling the drain-induced barrier lowering (DIBL) and gate-induced drain leakage (GIDL) [71], [72]. Moreover, it offers a wider power-performance range of operation through the adjustment of the  $V_{TH}$  at architectural and/or device levels [60].



Dynamic Gate – Level Body Biasing Technique in UTBB FD – SOI technologies

Figure 4.1: Device structure of the UTBB FD SOI technology [60]

In the UTBB FD-SOI technology, a thin film transistor is implemented over a buried oxide (BOX) employing silicon on insulator wafers (see Figure 4.1). For the 28nm node the thickness of the BOX enables: (1) excellent electrostatic control, (2) reduce variations avoiding random dopant fluctuation [73], (3) and a wide tuning knob of  $V_{TH}$  through heavily doped back planes and a wide voltage range for back biasing (BB) [74],[75].

The BOX electrically isolates the back plane from the source and drain of the transistors, thus enabling wide voltage back biasing to be applied. As shown in Figure 4.2 regular -  $V_{TH}$  (RVT) n/pMOS transistors are implemented using the conventional well (CW) configuration. The use of flipped wells (FWs), results in low -  $V_{TH}$  (LVT) transistors. The voltage that can be applied to the back plane in this technology is mainly restricted by p/n-well junction underneath the BOX. This means that a strong forward back-biasing (FBB) can be applied to RVT transistors.

Dynamic Gate – Level Body Biasing Technique in UTBB FD – SOI technologies



Additionally in this technology the wells under the BOX (either n- or ptype) can be also shared by the p-mos and n-mos transistors to form either single n-well (SNW) or single p-well (SPW) logic gates [50], [60],[76][77]. In our work the single well feature (see Figure 4.3) is exploited to optimize area occupancy of unconventionally body biased circuits (i.e. for the DTMOS and GLBB designs).



Figure 4.3: Single n-well (SNW) and Single p-well (SPW) configurations offered by 28nm UTBB FD - SOI technology

#### 4.2 Design Optimization for Gate-level Body Biasing

As clearly shown in Figure 4.4, the layout of a GLBB circuit, implemented in the conventional triple well process, leads to a large silicon area. A deep n-well layer is needed to shield n-channel devices from the p-type substrate, thus obtaining p-well regions isolated from the underlying substrate. Each of these areas is vertically surrounded by an n-well region to also provide lateral isolation. Note that significant distance constraints need to be satisfied to assure the correct electrical isolation between differently body-biased devices. This is even worse in the case of a DTMOS implementation which increases the number of isolated p/n-well regions (one for each logic gate input).



Figure 4.4: Layout strategy of GLBB technique for conventional triple well option in UTBB FD-SOI technology

The unique property of the FD-SOI technology of integrating both NMOS and PMOS devices into a common well configuration (either n-well or p-well[59]–[61], [77]allows the silicon area overhead to be significantly reduced. As illustrated in Figure 4.5, such a distinctive feature was utilized in this work to design area-efficient GLBB circuits. To allow abutment of different GLBB gates, the logic subcircuits, including regular VTH (RVT)

PMOS and low VTH (LVT) NMOS devices, are implemented on single nwell (SNW) areas. On the contrary, the BBG sub-circuits, each formed by a LVT PMOS and a RVT NMOS, exploit the single p-well (SPW) option, with the p-well and the p-substrate jointly biased at ground. In this context, the deep n-well layer is no more needed to electrically isolate the p-well from the p-substrate. As a main consequence, spacing constraints of the manufacturing process are alleviated with significant area saving in comparison to a triple well layout design.

It is worth emphasizing that the proposed layout strategy allows better area utilization with respect to the opposite configuration (i.e. with the logic sub-circuit, implemented on the SPW and the BBG embedded in a SNW region). This is mainly because the distance needed to isolate different SNW areas is considerably reduced (i.e. more than three times) in comparison to that is required to isolate different SPW regions..



Figure 4.5: Layout strategy of GLBB technique for single well option in UTBB FD-SOI technology.

## Dynamic Gate – Level Body Biasing Technique in UTBB FD – SOI technologies

In our physical design, all the parasitic diodes (i.e. vertical p-sub/n-well and horizontal p-well/n-well junctions) are always maintained in reverse mode during the circuit operation. This can be easily observed in Figure 4.6, which shows the cross section (Figure 4.6a) and the adopted physical design strategy (Figure 4.6b) in the case of a GLBB inverter. Since the BBG output voltage is always positive (i.e. GND < Vb < VDD), no significant current can flow into the substrate.



Figure 4.6: GLBB inverter architecture in UTBB FD-SOI (a) cross section and (b) design strategy.

For a fair comparison, all the DTMOS circuits, discussed in this thesis (except where explicitly stated otherwise), were laid out exploiting the SNW approach (i.e. a single SNW region for commonly body-biased devices).

This way, we have surpassed the less area efficient solution, including deep n-well layer, to isolate the differently biased p-well areas from the p-type substrate, commonly biased at ground. To facilitate N/P-MOS balancing in SNW regions for both DTMOS and GLBB designs, the gate length of LVT NMOS devices was extended by 10 nm (i.e. poly biasing (PB) [59] of 10 nm was used). The minimum channel length (i.e. Lmin = 30 nm) was used for all the other devices belonging to the compared circuits. To minimize the capacitive effects on the output node of a given logic gate, transistors belonging to the BBGs should be close to the minimum sized. Taking into account such a remark, the minimum size (W= 80 nm and L = 30 nm) allowed by the chosen design kit was used for nMOS devices belonging to the BBGs, whereas the BBG pMOS transistors were sized with W= 100 nm and L = 30 nm to assure symmetrical (with respect to VDD/2) low and high BBG output voltage transitions.

#### 4.3 Basic Gates: Design and Operating Characteristics

While equally sized the GLBB logic gates show somewhat increased leakage current with respect to their CMOS and DTMOS counterparts [47]. This is mainly due to the fact that the output voltage transition of the BBG is not rail to rail (a PMOS device is used to transfer a low voltage on the on the  $V_B$  net whereas a NMOS transistor is used for transferring the high voltage). This causes a threshold voltage reduction of leaky devices belonging to the nominally OFF network (either pull-down or pull-up) of the logic sub-circuit during the idle state. On the other side, the static current flowing in the BBG sub-circuit is strongly limited by the negative gate-tosource voltage of the OFF device and becomes negligible if reduced size transistors are used for its implementation (Figure 4.7 and Figure 4.8)



Figure 4.7: Transient behavior for BBG output high to low transition.



Figure 4.8: Transient behavior for BBG output low to high transition.

Three basic logic gates (i.e. NAND2, NOR2 and XOR2) were initially considered and laid-out for conventional, triple and single well options of the 28 nm STM UTBB FD-SOI technology. For a fair comparison, all the logic cells were sized for similar leakage current. In order to correctly take into account the impact of input and output capacitances on the speed and energy characteristics, each simulated logic gate is driven and loaded by cells identical to itself. Moreover, RVT CMOS and single well logic gates were physically designed as 12 tracks (height = 1.2 um) standard cells. Instead, a higher cell height is needed to accommodate the specific technology rules for the triple well DTMOS and GLBB logic gates.

Table 4.1 summarizes the comparative simulation results. It is easy to observe that triple well implementations lead to large silicon area occupancy. However, triple well GLBB gates reduce area occupancy from 45% (NAND2) to 51% (XOR2) in comparison to the equivalent DTMOS solutions, while also resulting faster and lower energy hungry. As expected, the single well implementation allows area to be significantly reduced for both DTMOS and GLBB solutions (about 77% on average). In addition, also better delay and energy results are recorded in comparison to their triple well counterparts. For all the considered cells, the single well GLBB gates always demonstrate the best delay results, while maintaining the lowest levels of energy consumption and competitive silicon area occupancy. More precisely, a delay reduction ranging from 17% to 28% and from 5% to 20% is obtained in comparison to CMOS and single well DTMOS circuits.

The GLBB technique optimized for the 28nm UTBB FD-SOI effectively exploits the single well flavor offered by the UTBB FD–SOI technology to significantly reduce the area occupancy. In addition, the inherent performance–energy characteristics of GLBB approach are emphasized by the higher efficiency of FBB in this technology.

#### 4.4 Final Remarks of Gate-level Body Biasing in UTBB FD-SOI

In this chapter the GLBB technique was evaluated in ULV regime exploiting an advanced UTBB FD-SOI technology. The single well flavors allowed by the technology permit to significantly reduce the area penalty of low-granularity body-biasing voltage control. Additionally, the higher efficiency of FBB techniques in UTBB FD- SOI technologies emphasizes the inherent performance and energy characteristics of the GLBB approach. As a result, the GLBB technique was shown to be superior for ULV designs in advanced UTBB FD-SOI technology nodes. This was demonstrated by comparing GLBB logic gates their conventional CMOS and DTMOS counterparts. The comparison analysis performed on commonly used logic gates demonstrates that GLBB solutions achieve up to 33% delay reduction for similar energy in comparison to conventional CMOS. Moreover, the GLBB approach reduces energy consumption up to 46% compared to DTMOS, while maintaining a higher operating speed.

### Table 4.1: Comparison results for basic logic gates

|       |                        | De-   | Avg. Energy for | Avg. Leakage | Height -    |
|-------|------------------------|-------|-----------------|--------------|-------------|
|       |                        | lay   | 1MHz input sig- | Current      | Width       |
|       |                        | [ns]  | nals [fJ]       | [nA]         | [µm]        |
|       | CMOS (RVT)             | 5.91  | 0.43            | 0.10         | 1.20 - 1.10 |
| 5     | DTMOS (triple well)    | 9.63  | 0.91            | 0.11         | 4.7 – 2.55  |
| NAND2 | GLBB (triple well)     | 8.11  | 0.52            | 0.11         | 4.7 - 1.40  |
| L     | DTMOS (SNW)            | 5.90  | 0.51            | 0.11         | 1.20 - 1.92 |
|       | GLBB (SNW+SPW)         | 4.92  | 0.49            | 0.11         | 1.20 - 1.40 |
|       | CMOS (RVT)             | 9.59  | 0.69            | 0.10         | 1.20 - 1.70 |
|       | DTMOS (triple well)    | 9.52  | 1.19            | 0.10         | 4.70 - 3.62 |
| NOR2  | GLBB (triple well)     | 9.37  | 0.72            | 0.10         | 4.70 - 1.77 |
|       | DTMOS (SNW)            | 8.27  | 0.81            | 0.09         | 1.20 - 2.98 |
|       | GLBB (SNW+SPW)         | 7.85  | 0.61            | 0.10         | 1.20 - 1.77 |
| XOR2  | CMOS (RVT)             | 12.69 | 0.73            | 0.24         | 1.20 - 2.85 |
|       | DTMOS<br>(triple well) | 15.92 | 1.17            | 0.25         | 4.7 – 7.63  |
| XC    | GLBB (triple well)     | 12.35 | 0.70            | 0.25         | 4.7 - 2.83  |
|       | DTMOS (SNW)            | 11.45 | 0.94            | 0.25         | 1.2 - 6.27  |
|       | GLBB (SNW+SPW)         | 9.17  | 0.68            | 0.25         | 1.20 – 2.83 |

(TT process corner,  $V_{DD}$ =0.3V and T=27°C)

# Five

## 5 Case studies: Application of the GLBB Technique to Arithmetic circuits

In this chapter we deeply evaluate the efficiency of the GLBB technique for ULV design in UTBB FD-SOI by considering three arithmetic benchmarks in ascending order of complexity. The circuits synergistically benefit from low-granularity back-bias control to improve performance in conjunction with the integration of both NMOS and PMOS devices into a common well configuration which allows highly efficient area utilization. The designs were compared over standard CMOS and DTMOS solutions.

#### 5.1 Mirror Full Adder

As a first benchmark, the GLBB mirror full adder (FA), presented in [50], was designed and post-layout characterized in comparison to the correspondent CMOS and DTMOS implementations.

The compared FA circuits were sized to obtain similar leakage current @ VDD = 0.3 V and T = 27 °C. For this purpose, the pull-up/pull-down

channel-width ratio of the analyzed solutions was chosen to obtain comparable strength, while imposing equal width for series-connected transistors. Table 5.1 presents the width ratio between pull-up and pull-down networks for different stacking configurations. The sizing factor W was chosen by iterative simulations to achieve the above mentioned optimization goal.

|           | CMOS         | DTMOS          | GLBB         |
|-----------|--------------|----------------|--------------|
|           | (W = 240 nm) | (W = 380 nm)   | (W = 220 nm) |
| Tr. Stack | Wp / Wn      | Wp / Wn        | Wp / Wn      |
| 1         | 4 W / W      | 4 W / W        | 4.3W / W     |
| 2         | 11 W / 2.5 W | 9W / 2 W       | 10 W / 2 W   |
| 3         | 16.5W / 4W   | 13.5 W / 3.3 W | 15 W / 3.3 W |

Table 5.1: Pull Up / Pull Down Width Ratio Of Tansistors Sacks

Layouts of the FA circuits are shown in Figure 5.1 (a)-(c) for CMOS, DTMOS and GLBB implementations, respectively. It is worth nothing that the sizing strategy and the adopted layout technique allow area occupancy of the GLBB circuit to be only slightly increased (+ 11%) to that of the conventional CMOS design. On the other hand, the area saving in comparison to the DTMOS FA is about 35%.



Figure 5.1: FA layouts for CMOS (a), DTMOS (b) and GLBB (c), respectively.

Figure 5.2 shows the leakage current ( $I_{leak}$ ) versus  $V_{DD}$  for the compared circuit topologies. Due to the adopted sizing criterion, all the circuits have similar  $I_{leak}$  for  $V_{DD}$ =0.3 V (about 0.7 nA). As expected, both DTMOS and conventional CMOS designs show very similar leakage trends with  $V_{DD}$  varying, whereas the GLBB circuit appears to be more sensitive to changes in  $V_{DD}$ . As  $V_{DD}$  drops lower than 0.3 V, the reduced sizes of the GLBB circuit begin to show their benefits in terms of  $I_{leak}$ 



Figure 5.2: Leakage comparison for FA designs.

Figure 5.3 illustrates comparative post-layout delay results, obtained by the simulation setup discussed in [50]. By observing the insert of Figure 5.3, it is easy to note that the GLBB technique reaches the maximum performance advantage for  $V_{DD}$ =0.35V. However, speed improvements are demonstrated for the whole considered VDD range.



Figure 5.3: Delay comparison for FA designs.

Figure 5.4 (a)-(b) compares the average energy per operation ( $E_{OP}$ ) under two different running scenarios. In the first operating condition (Figure 5.4a), an activity factor ( $\alpha$ ) of 0.4 and clock cycle time ( $T_{clk}$ ) of 40 FO4 (FO4 being the delay of a CMOS inverter driving four identical inverters) were considered. In this scenario, which is typical of ultra-energy efficient microprocessor core [78], the GLBB FA always shows the lowest energy consumption mainly due to the reduced transistors' sizes (see Table 5.1). Figure 5.4b illustrates a second scenario with  $\alpha$ = 0.1 and  $T_{clk}$ =100 FO4, which is more typical for low power VLSI circuits [33]. Under such running conditions the impact of the leakage energy considerably increases as it is evident by looking at the minimum energy point (MEP) which moves towards higher  $V_{DD}$ s [33]. However, even in this disadvantageous scenario the GLBB circuit tracks very well (it is even better for lower V<sub>DD</sub>s) the energy consumption of a conventional CMOS design.



Figure 5.4: Energy comparison for FA designs.

#### 5.2 Ripple Carry Adder

The analysis of the GLBB technique in UTBB FD-SOI has been extended to a ripple carry adder (RCA) of 8/16/32-bits. Table 5.2 shows comparison results according the evaluated techniques. The data reported are related to the TT process corner,  $T= 27^{\circ}C$  and  $V_{DD}=0.3$  V. As before, due to the chosen optimization, the different implementation shows similar static power consumption. The GLBB circuits reduce delay of about 23% with respect to the correspondent CMOS designs, while maintaining similar energy consumption for the worst case operation and increasing occupied area by only 11%. Energy and area occupancy are significantly improved in comparison to the equivalent DTMOS designs. Figure 5.5 (a-c) compares energy-performance results for n-bit RCAs designed according the evaluated techniques under a wide range of process and temperature (PT) conditions. For the TT/27°C condition, the DTMOS technique shows higher energy consumption mainly due to the larger input

| bit<br>lengths | Design<br>Styles | Delay<br>[ns] | Energy (w.c.o) [fJ] | Leakage<br>Power [nW] | Area<br>[µm²] |
|----------------|------------------|---------------|---------------------|-----------------------|---------------|
|                | CMOS             | 307           | 0.29                | 1.87                  | 94.08         |
| 8              | DTMOS            | 247           | 0.4                 | 1.78                  | 161.28        |
|                | GLBB             | 237           | 0.3                 | 1.87                  | 104.64        |
|                | CMOS             | 584           | 76.1                | 3.74                  | 188.16        |
| 16             | DTMOS            | 465           | 124.3               | 3.55                  | 322.56        |
|                | GLBB             | 448           | 74.9                | 3.74                  | 209.28        |
|                | CMOS             | 1177          | 207.8               | 7.49                  | 376.32        |
| 32             | DTMOS            | 933           | 307.2               | 7.10                  | 645.12        |
|                | GLBB             | 898           | 207.4               | 7.49                  | 418.56        |

Table 5.2: Ripple Carry Adder (@ TT, 0.3V, 27° C)

capacitances of DTMOS gates. On the contrary, GLBB and CMOS designs exhibit very similar  $E_{W.C.O.}$  values, even for long chains of FAs. The GLBB designs always demonstrate better performances than their competitors. For example, at  $V_{DD}$ =0.4 V, an advantage of 33% and 46% is achieved in terms of speed and energy when compared to CMOS and DTMOS designs, respectively. As shown in Figure 5.5b, the speed advantages of GLBB RCAs are maintained for the slower corner (i.e. SS and T=0°C). For 32-bits RCA evaluated at  $V_{DD}$ =0.5V, the GLBB approach is 38% faster than CMOS with similar energy levels, whereas it consumes 51% less energy than DTMOS solution. Furthermore, the FF/100°C PT corner was also analyzed to emphasize high leakage consumption conditions. Obtained results are shown in Figure 5.5c. Again, the 32-bits GLBB RCA is 13% faster than equivalent CMOS solution and 24% less energy hungry than DTMOS implementation. Finally, the impact of process variability was investigated by performing Monte Carlo (MC) simulations on 1000 samples for V<sub>DD</sub>= 0.3V, TT process corner and T=27° C. As shown in Figure 5.6, in all the experiments the GLBB designs show less energy compared to DTMOS. The 32-bit GLBB design is 24% faster than CMOS and achieves similar values of variance. In addition, the GLBB RCA is 32% less energy-hungry than the equivalent DTMOS implementation.



Figure 5.5: Energy-Delay comparison for RCAs of 8,16 and 32 bits @ a) TT 27° C b) SS 0° C c) FF 100°C.



Figure 5.6: MC results @ 0.3 V, 27° C for a) 32-, b) 16- and c) 8-bit RCAs.

#### **Baugh Wooley Multiplier** 5.3

As third and more complex benchmark, the 4x4-bit Baugh Wooley multiplier, shown in Figure 5.7, was laid out and comparatively evaluated with equivalent CMOS and DTMOS designs.



Figure 5.7: 4x4-bit Baugh Wooley multiplier

The main characteristics of the compared multiplier circuits are provided in Table III, for VDD=0.3V.The GLBB multiplier reduces delay of about 29% and 7% with respect to its CMOS and DTMOS counterparts, respectively. Moreover, the GLBB implementation exhibits energy consumption for the worst case delay operation (Ew.c.o.) very similar to that of the conventional CMOS solution and reduced of about 40% in comparison to the DTMOS multiplier. Such results were achieved at the expense of only 13% larger area with respect to the CMOS design, whereas about 34% area is saved in comparison to the DTMOS circuit.

|       | Delay | Energy (worst case | Area               |
|-------|-------|--------------------|--------------------|
|       | [ns]  | operation) [fJ]    | [µm <sup>2</sup> ] |
| CMOS  | 403   | 15.1               | 408.6              |
| DTMOS | 306   | 25.2               | 703.9              |
| GLBB  | 285   | 15.3               | 461.7              |

 Table 5.3:
 4X4-Bit Baugh Wooley Multiplier Characteristics

Figure 5.8 (a)-(c), plots Ew.c.o. and maximum frequency versus , considering three different process-temperature (PT) corners. The typical case PT corner involves typical N/PMOS transistors and an operating temperature of 27 °C. To cover a wide range of possible operating conditions, the second and the third PT corners involve slow N/PMOS @ T=0°C and fast N/PMOS @ T=100°C, respectively. In all the simulated conditions, the GLBB approach assures the highest operation frequency while showing very competitive Ew.c.o. results.



and FF, 100°C (c)

ULV circuits are usually very sensitive to random process variability [33]. For this reason, the tolerance to intra-die variations was analyzed for different PT corners. Figure 5.9 (a)–(c) illustrates the Ew.c.o. versus delay spreads obtained from a 1K-point Monte-Carlo simulation performed for VDD=0.3V. Mean ( $\mu$ ) and standard deviation ( $\sigma$ ) values are also reported in Figure 5.9. As expected, the GLBB approach, leads to the best mean delay values for all the evaluated PT corners, while it exhibits larger delay variability (evaluated in terms of  $\sigma/\mu$ ) mainly due to the use of smaller devices. The mean energy values of the GLBB multiplier are closely similar

to that of the equivalent CMOS design with reduced energy variability in almost all the cases.

The impact of process variability on FDSOI CMOS circuits at low voltage can be effectively mitigated by exploiting back biasing. Usually, this approach requires an additional circuitry to adjust the levels of body bias voltages for pMOS and nMOS devices to the optimum values [59]. To limit the overall design complexity, the proposed technique avoids the use of a centralized body bias generator. As a counter effect, the possible options to deal with the effect of variations are restricted to the generation of only positive body voltages (i.e. FBB) whose values could be modulated by varying the supply voltage of the BBG sub-circuits.



Figure 5.9: Energy-Delay Monte Carlo Comparison Results for VDD=0.3V: SS, 0°C (a); TT, 27°C (b) and FF, 100°C (c)

#### 5.4 Final Remarks of Gate-level Body Biasing Technique to Arithmetic Circuits

In this chapter, several benchmarks such as: a mirror FA an 8-bit RCA and a 4X4 Baugh Wooley multiplier with embedded dynamic forward backbiasing capability was in-depth evaluated in the 28 nm STM UTBB FD-SOI technology. The proposed design benefits from energy and delay characteristics offered by the GLBB technique and exploits single well flavors in 28nm STM UTBB FD-SOI technology. As a consequence, the area penalty for body isolation at gate level is significantly reduced with respect a conventional triple-well implementation. Additionally, Post- layout simulations demonstrate that the GLBB technique reduce energy consumption and boosts performance compared to DTMOS and CMOS. For example, the RCA, designed as here proposed, improves energy consumption up to 57% compared to an equivalent DTMOS design and boosts performance more than 30%, when compared to standard CMOS solution, while maintaining similar energy consumption. Furthermore, Monte- Carlo simulations demonstrate better delay variability than conventional CMOS design.

# Six

#### 6 Conclusions

This PhD thesis has presented a detailed analysis of the gate level body biasing technique, for designing high-speed subthreshold logic circuits. First, an analytical model of the technique has been developed which serve as a basis to furnish main design guidelines taking into account several logic gates with stacked transistors. The proposed analysis has been validated by comparing the predicted values with the Cadence Spectre simulation results. The good agreement between the predicted and the simulated results confirms the validity of the proposed analysis as very useful aid to the design high-speed subthreshold logic gates.

Furthermore, parasitic effects of the gate level body polarization have been taken into account in the case of more complex circuits such as a GLBB mirror full adder. Post-layout simulation results have shown that the GLBB design technique is, at the parity of leakage power consumption, able to obtain significantly higher performance with reduced total energy per operation consumption in comparison to conventional CMOS and DTMOS implementations.

Additionally, the GLBB FA maintains a high level of robustness against temperature and process variations. The silicon area required by the GLBB

full adder is halved with respect to the equivalent DTMOS implementation, but it is higher in comparison to conventional CMOS design.

In order to reduce the area penalty for unconventionally body biased circuits the GLBB technique has been applied exploiting an advanced UTBB FD-SOI technology. The single well flavors allowed by the technology permit to significantly reduce the area penalty of low-granularity bodybiasing voltage control. Additionally, the higher efficiency of FBB techniques in UTBB FD- SOI technologies emphasizes the inherent performance and energy characteristics of the GLBB approach.

As a result, the GLBB technique was shown to be superior for ULV designs in advanced UTBB FD-SOI technology nodes. This was demonstrated by comparing GLBB logic gates their conventional CMOS and DTMOS counterparts. The comparison analysis performed on commonly used logic gates demonstrates that GLBB solutions achieve up to 33% delay reduction for similar energy in comparison to conventional CMOS. Moreover, the GLBB approach reduces energy consumption up to 46% compared to DTMOS, while maintaining a higher operating speed.

As several case study common arithmetic designs have been analyzed: (1) a mirror FA, (2) 8/16/32-bit RCA and (3) a 4X4 Baugh Wooley multiplier with embedded dynamic forward back-biasing capability in the 28 nm STM UTBB FD-SOI technology. Post-layout simulations demonstrated that the GLBB technique reduces energy consumption and boosts performance compared to DTMOS and CMOS. In general, the GLBB technique designed as here proposed, improves energy consumption up to 57% compared to an equivalent DTMOS design and boosts performance more than 30%, when compared to standard CMOS solution, while maintaining similar energy consumption. Furthermore, Monte-Carlo simulations demonstrate better delay variability than conventional CMOS design.

### Bibliography

- A. Wang, B. H. Calhoun, and A. P. Chandrakasan, *Sub-threshold Design for Ultra Low-Power Systems*. New York: Springer, 2006.
- [2] Issee, "ISSCC (International Solid State Circuit Conference) 2014 Trends," *Ieee Issec*, pp. 89–128, 2013.
- [3] "www.intel.com." [Online]. Available: www.intel.com.
- [4] S. Narendra, V. De, S. Borkar, D. A. Antoniadis, and A. P. Chandrakasan, "Full-chip subthreshold leakage power prediction and reduction techniques for sub-0.18-µm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 3, pp. 501–510, 2004.
- [5] R. Allmon, B. Benschneider, M. Callander, L. Chao, D. Dever, J. Farrell, N. Fitzgerald, J. Grodstein, S. Hassoun, L. Hudepohl, D. Kravitz, J. Lundberg, R. Marcello, S. Marino, J. Pickholtz, R. Preston, M. Richesson, S. Samudrala, and D. Sanders, "System, process, and design implications of a reduced supply voltage microprocessor," in *1990 37th IEEE International Conference on Solid-State Circuits*, 1990, pp. 48–49.
- [6] A. Chandrakasan, A. Burstein, and R. Brodersen, "A low power chipset for portable multimedia applications," *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Pap.*, vol. 37, pp. 82–83, 1994.
- [7] J. B. Burr and J. Shott, "A 200 mV self-testing encoder/decoder using Stanford ultra-low-power CMOS," in *Proceedings of IEEE International Solid-State Circuits Conference - ISSCC '94*, pp. 84– 85.
- [8] S. Mutoh, S. Shigematsu, Y. Matsuya, H. Fukuda, and J. Yamada,"A 1 V multi-threshold voltage CMOS DSP with an efficient power

management technique for mobile phone application," in 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC, p. 168–169,.

- [9] Wai Lee, P. E. Landman, B. Barton, S. Abiko, H. Takahashi, H. Mizuno, S. Muramatsu, K. Tashiro, M. Fusumada, Luat Pham, F. Boutaud, E. Ego, G. Gallo, Hiep Tran, C. Lemonds, A. Shih, M. Nandakumar, R. H. Eklund, and Ih-Chin Chen, "A 1-V programmable DSP for wireless communications [CMOS]," *IEEE J. Solid-State Circuits*, vol. 32, no. 11, pp. 1766–1776, 1997.
- [10] B. H. Calhoun and a. P. Chandrakasan, "Standby power reduction using dynamic voltage scaling and canary flip-flop structures," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1504–1511, 2004.
- [11] A. Kaizerman, S. Fisher, and A. Fish, "Subthreshold dual mode logic," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 21, no. 5, pp. 979–983, 2013.
- [12] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, D. Sylvester, and D. Blaauw, "Exploring variability and performance in a sub-200-mV processor," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 881–890, 2008.
- [13] A. Wang and A. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 310–319, Jan. 2005.
- [14] B. Zhai, L. Nazhandali, J. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant, D. Blaauw, and T. Austin, "A 2.60pJ/Inst Subthreshold Sensor Processor for Optimal Energy Efficiency," in 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers., pp. 154–155.

- [15] S. Hanson, M. Seok, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "A Low-Voltage Processor for Sensing Applications With Picowatt Standby Mode," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1145–1155, Apr. 2009.
- [16] K.-S. Chong, B.-H. Gwee, and J. S. Chang, "A Low Energy FFT/IFFT Processor for Hearing Aids," in 2007 IEEE International Symposium on Circuits and Systems, 2007, pp. 1169–1172.
- [17] D. Albano, F. Crupi, F. Cucchi, and G. Iannaccone, "A picopower temperature-compensated, subthreshold CMOS voltage reference," *Int. J. Circuit Theory Appl.*, vol. 42, no. 12, pp. 1306–1318, Dec. 2014.
- [18] L. Magnelli, F. A. Amoroso, F. Crupi, G. Cappuccino, and G. Iannaccone, "Design of a 75-nW, 0.5-V subthreshold complementary metal-oxide-semiconductor operational amplifier," *Int. J. Circuit Theory Appl.*, vol. 42, no. 9, pp. 967–977, Sep. 2014.
- [19] H. Soeleman, K. Roy, and B. C. Paul, "Robust subthreshold logic for ultra-low power operation," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 9, no. 1, pp. 90–99, 2001.
- [20] M. Lanuzza, P. Corsonello, and S. Perri, "Low-Power Level Shifter for Multi-Supply Voltage Designs," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 59, no. 12, pp. 922–926, Dec. 2012.
- [21] P. Corsonello, F. Frustaci, M. Lanuzza, and S. Perri, "Over/Undershooting Effects in Accurate Buffer Delay Model for Sub-Threshold Domain," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 61, no. 5, pp. 1456–1464, May 2014.
- [22] R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits," *Proc. IEEE*, vol. 98,

no. 2, pp. 253–266, Feb. 2010.

- [23] J. B. Burr, "Cryogenic ultra low power CMOS," in 1995 IEEE Symposium on Low Power Electronics. Digest of Technical Papers, pp. 82–83.
- [24] A. Bryant, J. Brown, P. Cottrell, M. Ketchen, J. Ellis-Monaghan, and E. J. Nowak, "Low-power CMOS at Vdd=4kT/q," in *Device Research Conference. Conference Digest (Cat. No.01TH8561)*, pp. 22–23.
- [25] Y. Tsividis, Operation and Modeling of the MOS Transistor, 2nd ed. New York: McGraw-Hill, 1999.
- [26] A. P. Narendra, Siva G., Chandrakasan, *Leakage in Nanometer CMOS Technologies*, 1st ed. New York: Springer US, 2006.
- [27] M. Alioto, "Understanding DC behavior of subthreshold CMOS logic through closed-form analysis," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 57, no. 7, pp. 1597–1607, 2010.
- [28] J. Kwong and a. P. Chandrakasan, "Variation-Driven Device Sizing for Minimum Energy Sub-threshold Circuits," *ISLPED'06 Proc.* 2006 Int. Symp. Low Power Electron. Des., no. 2, pp. 8–13, 2006.
- [29] T. Gemmeke, M. Ashouei, B. Liu, M. Meixner, T. G. Noll, and H. De Groot, "Cell libraries for robust low-voltage operation in nanometer technologies," *Solid. State. Electron.*, vol. 84, pp. 132–141, 2013.
- [30] A. C. Smith, "A Sub-threshold Cell Library and Methodology," *Electr. Eng.*, 2006.
- [31] G. de Streel and D. Bol, "Study of Back Biasing Schemes for ULV Logic from the Gate Level to the IP Level," *J. Low Power Electron. Appl.*, vol. 4, no. 3, pp. 168–187, 2014.
- [32] M. Orshansky, S. R. Nassif, and D. Boning, *Design for* 118

Manufacturability and Statistical Design. Boston, MA: Springer US, 2008.

- [33] M. Alioto, "Ultra-low power VLSI circuit design demystified and explained: A tutorial," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 59, no. 1, pp. 3–29, 2012.
- [34] J. Burr and A. Peterson, "Ultra Low Power CMOS Technology," in *3rd NASA Symposium on VLSI Design 1991*, 1991.
- [35] S. Hanson, B. Zhai, K. Bernstein, D. Blaauw, A. Bryant, L. Chang, and K. K. Das, "Ultralow-voltage, minimum-energy CMOS," *IBM J. RES DEV.*, vol. 50, no. 4, pp. 469–490, 2006.
- [36] B. H. Calhoun, A. Chandrakasan, and B. D. Aids, "Characterizing and Modeling Minimum Energy Operation for Subthreshold Circuits," pp. 1–6, 2004.
- [37] B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," *IEEE J. Solid-State Circuits*, vol. 40, no. 9, pp. 1778–1785, 2005.
- [38] M. Meijer and J. Pineda de Gyvez, "Body-Bias-Driven Design Strategy for Area- and Performance-Efficient CMOS Circuits," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 20, no. 1, pp. 42– 51, Jan. 2012.
- [39] R. Faraji, H. R. Naji, M. Rahimi-Nezhad, and M. Arabnejhad, "New SRAM design using body bias technique for low-power and highspeed applications," *Int. J. Circuit Theory Appl.*, vol. 42, no. 11, pp. 1189–1202, Nov. 2014.
- [40] G. De Streel and D. Bol, "Impact of back gate biasing schemes on energy and robustness of ULV logic in 28nm UTBB FDSOI technology," *Proc. Int. Symp. Low Power Electron. Des.*, pp. 255– 260, 2013.

- [41] Y. Cheng and C. Hu, *Mosfet Modeling & amp; BSIM3 User's Guide*. Boston: Kluwer Academic Publishers, 2002.
- [42] M. R. Kakoee and L. Benini, "Fine-Grained Power and Body-Bias Control for Circuits, Near-threshold Deep Sub-micron Cmos," *IEEE Trans. Emerg. Sel. Top. Circuits Syst.*, vol. 1, no. 2, pp. 131–140, 2011.
- [43] A. Hokazono, S. Balasubramanian, K. Ishimaru, H. Ishiuchi, Tsu-Jae King Liu, and Chenming Hu, "MOSFET design for forward body biasing scheme," *IEEE Electron Device Lett.*, vol. 27, no. 5, pp. 387–389, May 2006.
- [44] W. Zhao, Y. Ha, and M. Alioto, "Novel Self-Body-Biasing and Statistical Design for Near-Threshold Circuits With Ultra Energy-Efficient AES as Case Study," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 23, no. 8, pp. 1390–1401, Aug. 2015.
- [45] C. Wann, F. Assaderaghi, R. Dennard, C. H. C. Hu, G. Shahidi, and Y. Taur, "Channel profile optimization and device design for lowpower high-performance dynamic-threshold MOSFET," *Int. Electron Devices Meet. Tech. Dig.*, vol. 0, no. 2, pp. 113–116, 1996.
- [46] G. O. Workman and J. G. Fossum, "A comparative analysis of the dynamic behavior of BTG-SOI MOSFETs," *IEEE Trans. Electron Devices*, vol. 45, no. 10, pp. 2138–2145, 1998.
- [47] P. Corsonello, M. Lanuzza, and S. Perri, "Gate-level body biasing technique for high-speed sub-threshold CMOS logic gates," *Int. J. Circuit Theory Appl.*, vol. 42, no. 1, pp. 65–70, Jan. 2014.
- [48] D. Albano, M. Lanuzza, R. Taco, and F. Crupi, "Gate-level body biasing for subthreshold logic circuits: analytical modeling and design guidelines," *Int. J. Circuit Theory Appl.*, vol. 43, no. 11, pp. 1523–1540, Nov. 2015.

- [49] R. Taco, M. Lanuzza, and D. Albano, "Ultra-Low-Voltage Self-Body Biasing Scheme and Its Application to Basic Arithmetic Circuits," *VLSI Des.*, vol. 2015, pp. 1–10, 2015.
- [50] M. Lanuzza, R. Taco, and D. Albano, "Dynamic gate-level body biasing for subthreshold digital design," in 2014 IEEE 5th Latin American Symposium on Circuits and Systems, 2014, pp. 1–4.
- [51] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Low voltage logic circuits exploiting gate level dynamic body biasing in 28nm UTBB FD-SOI," *Solid. State. Electron.*, vol. 117, pp. 185–192, Mar. 2016.
- [52] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Low voltage ripple carry adder with low-granularity dynamic forward back-biasing in 28 nm UTBB FD-SOI," in 2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2015, pp. 1–2.
- [53] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Extended exploration of low granularity back biasing control in 28nm UTBB FD-SOI technology," in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), 2016, vol. 2016–July, pp. 41–44.
- [54] D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat, "Interests and Limitations of Technology Scaling for Subthreshold Logic," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 17, no. 10, pp. 1508– 1519, Oct. 2009.
- [55] J. Rabaey, A. P. Chandrakasan, and B. Nicolic, *Digital Integrated Circuits 2nd Edition*, Prentice H. 2003.
- [56] J. Keane, H. Eom, T. H. Kim, S. Sapatnekar, and C. Kim, "Stack sizing for optimal current drivability in subthreshold circuits," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 16, no. 5, pp. 598–602, 2008.

- [57] M.-E. Hwang and K. Roy, "A 135mV 0.13μW process tolerant 6T subthreshold DTMOS SRAM in 90nm technology," in 2008 IEEE Custom Integrated Circuits Conference, 2008, vol. 2, no. CICC, pp. 419–422.
- [58] H. Mostafa, M. Anis, and M. Elmasry, "A Novel Low Area Overhead Direct Adaptive Body Bias (D-ABB) Circuit for Die-to-Die and Within-Die Variations Compensation," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 19, no. 10, pp. 1848–1860, Oct. 2011.
- [59] D. Jacquet, F. Hasbani, P. Flatresse, R. Wilson, F. Arnaud, G. Cesana, T. Di Gilio, C. Lecocq, T. Roy, A. Chhabra, C. Grover, O. Minez, J. Uginet, G. Durieu, C. Adobati, D. Casalotto, F. Nyer, P. Menut, A. Cathelin, I. Vongsavady, and P. Magarshack, "A 3 GHz dual core processor ARM cortex TM -A9 in 28 nm UTBB FD-SOI CMOS with ultra-wide voltage range and energy efficiency optimization," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 812–826, 2014.
- [60] J. P. Noel, O. Thomas, M. A. Jaud, O. Weber, T. Poiroux, C. Fenouillet-Beranger, P. Rivallin, P. Scheiblin, F. Andrieu, M. Vinet, O. Rozeau, F. Boeuf, O. Faynot, and A. Amara, "Multi-VT UTBB FDSOI device architectures for low-power CMOS circuit," *IEEE Trans. Electron Devices*, vol. 58, no. 8, pp. 2473–2482, 2011.
- [61] R. Taco, I. Levi, A. Fish, and M. Lanuzza, "Exploring back biasing opportunities in 28nm UTBB FD-SOI technology for subthreshold digital design," in 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI), 2014, pp. 1–4.
- [62] P. Corsonello, F. Frustaci, and S. Perri, "Low-leakage SRAM wordline drivers for the 28-nm UTBB FDSOI technology," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 23, no. 12, pp. 3133–122

3137, 2015.

- [63] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits," *Proc. IEEE*, vol. 91, no. 2, pp. 305–327, Feb. 2003.
- [64] D. Bol, D. Flandre, and J.-D. Legat, "Technology flavor selection and adaptive techniques for timing-constrained 45nm subthreshold circuits," in *Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design - ISLPED '09*, 2009, p. 21.
- [65] D. Bol, D. Kamel, D. Flandre, and J.-D. Legat, "Nanometer MOSFET effects on the minimum-energy point of 45nm subthreshold logic," *Proc. 14th ACM/IEEE Int. Symp. Low power Electron. Des. - ISLPED '09*, p. 3, 2009.
- [66] K. A. Bowman, A. R. Alameldeen, S. T. Srinivasan, and C. B. Wilkerson, "Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors," in *Proceedings of the 2007 international symposium on Low power electronics and design - ISLPED '07*, 2007, pp. 50–55.
- [67] G. De Streel and D. Bol, "Scaling perspectives of ULV microcontroller cores to 28nm UTBB FDSOI CMOS," 2013 IEEE SOI-3D-Subthreshold Microelectron. Technol. Unified Conf. S3S 2013, 2013.
- [68] F. Abouzeid, S. Clerc, B. Pelloux-Prayer, F. Argoud, and P. Roche,
  "28nm CMOS, energy efficient and variability tolerant, 350mV-to-1.0V, 10MHz/700MHz, 252bits frame error-decoder," in 2012 Proceedings of the ESSCIRC (ESSCIRC), 2012, pp. 153–156.
- [69] H. Reyserhove, N. Reynders, and W. Dehaene, "Ultra-low voltage

datapath blocks in 28nm UTBB FD-SOI," in 2014 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2014, pp. 49–52.

- [70] B. Pelloux-Prayer, M. Blagojević, O. Thomas, A. Amara, A. Vladimirescu, B. Nikolić, G. Cesana, and P. Flatresse, "Planar fully depleted SOI technology: The convergence of high performance and low power towards multimedia mobile applications," *Faibl. Tens. Faibl. Consomm. (FTFC), 2012 IEEE*, pp. 1–4, 2012.
- [71] P. Magarshack, P. Flatresse, and G. Cesana, "UTBB FD-SOI: A Process/Design Symbiosis for Breakthrough Energy-efficiency," *Des. Autom. Test Eur. Conf. Exhib. (DATE)*, 2013, pp. 952–957, 2013.
- [72] N. Planes, O. Weber, V. Barral, S. Haendler, D. Noblet, D. Croain, M. Bocat, P. O. Sassoulas, X. Federspiel, A. Cros, A. Bajolet, E. Richard, B. Dumont, P. Perreau, D. Petit, D. Golanski, C. Fenouillet-Béranger, N. Guillot, M. Rafik, V. Huard, S. Puget, X. Montagner, M. A. Jaud, O. Rozeau, O. Saxod, F. Wacquant, F. Monsieur, D. Barge, L. Pinzelli, M. Mellier, F. Boeuf, F. Arnaud, and M. Haond, "28nm FDSOI technology platform for high-speed low-voltage digital applications," *Dig. Tech. Pap. - Symp. VLSI Technol.*, vol. 33, no. 4, pp. 133–134, 2012.
- [73] R. Rao, N. Dasgupta, and A. Dasgupta, "Study of random dopant fluctuation effects in FD-SOI MOSFET using analytical threshold voltage model," *IEEE Trans. Device Mater. Reliab.*, vol. 10, no. 2, pp. 247–253, 2010.
- [74] K. R. A. Sasaki, M. B. Manini, J. A. Martino, M. Aoulaiche, E. Simoen, L. Witters, and C. Claeys, "Ground plane influence on enhanced dynamic threshold UTBB SOI nMOSFETs," 2014 Int. Caribb. Conf. Devices, Circuits Syst. ICCDCS 2014 Conf. Proc., 124

pp. 1–4, 2015.

- [75] G. Moritz, B. Giraud, J. P. Noel, D. Turgis, and A. Grover, "Optimization of a voltage sense amplifier operating in ultra wide voltage range with back bias design techniques in 28nm UTBB FD-SOI technology," *ICICDT 2013 - Int. Conf. IC Des. Technol. Proc.*, pp. 53–56, 2013.
- [76] J. Le Coz, B. Pelloux-Prayer, B. Giraud, F. Giner, and P. Flatresse,
  "DTMOS power switch in 28 nm UTBB FD-SOI technology," in
  2013 IEEE SOI-3D-Subthreshold Microelectronics Technology
  Unified Conference (S3S), 2013, pp. 1–2.
- [77] B. Pelloux-Prayer, A. Valentian, B. Giraud, Y. Thonnart, J. P. Noel,
  P. Flatresse, and E. Beigne, "Fine grain multi-VT co-integration methodology in UTBB FD-SOI technology," *IEEE/IFIP Int. Conf. VLSI Syst. VLSI-SoC*, pp. 168–173, 2013.
- [78] D. Jeon, M. Seok, C. Chakrabarti, D. Blaauw, and D. Sylvester, "A Super-Pipelined Energy Efficient Subthreshold 240 MS/s FFT Core in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 1, pp. 23–34, Jan. 2012.

# Acknowledgments

First of all I would like to thank to my advisor, Prof. Marco Lanuzza, for his constant support and patience. Through his vision and encouragement he has kept me on track and helped me during this research.

I am grateful with SENESCYT for their support and assistance, I am very proud for being part of this ambitious program for my country.

I would also like to thank to the friendly and talented EnICS research group at University of Bar-Ilan for their time and support, especially to Prof. Alex Fish and Itamar Levi for their efforts and contributions during my Ph. D. program.

I would like to express my gratitude to several UNICAL members, especially Prof. Felice Crupi for his wisdom and valuable support. I am indebted with my colleagues at the University: Antonio Cordopatri, Domenico Albano, Eliana Acurio, Lorena Guachi, Luis Miguel Procel, Luis Villamagua, Marco Guevara, Noemi Guerra, Paul Procel, Paul Romero, Raffaele de Rose and Sebastiano Strangio for such inspiring discussions usually with a cup of espresso or tea.

Many thanks to my Ecuadorian and Italian friends. To my roommates in Italy and Israel for the great company, meaningful conversations, tasty meals and making sure I get out of the lab enough.

Finalmente muchas gracias a mis padres Guillermo y Gloria, a mis hermanos Wladimir, Margarita y Sebastián por su constante apoyo y amor a pesar de la diferencia horaria y la distancia los amo mucho.

# List of Publications

- [P1] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Extended exploration of low granularity back biasing control in 28nm UTBB FD-SOI technology," in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), 2016, vol. 2016–July, pp. 41–44.
- [P2] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Low voltage logic circuits exploiting gate level dynamic body biasing in 28nm UTBB FD-SOI," *Solid. State. Electron.*, vol. 117, pp. 185–192, Mar. 2016.
- [P3] R. Taco, M. Lanuzza, and D. Albano, "Ultra-Low-Voltage Self-Body Biasing Scheme and Its Application to Basic Arithmetic Circuits," *VLSI Des.*, vol. 2015, pp. 1–10, 2015.
- [P4] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Low voltage ripple carry adder with low-granularity dynamic forward back-biasing in 28 nm UTBB FD-SOI," in 2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2015, pp. 1–2.
- [P5] D. Albano, M. Lanuzza, R. Taco, and F. Crupi, "Gate-level body biasing for subthreshold logic circuits: analytical modeling and design guidelines," *Int. J. Circuit Theory Appl.*, vol. 43, no. 11, pp. 1523– 1540, Nov. 2015.
- [P6] M. Lanuzza and R. Taco, "Improving speed and power characteristics of pulse-triggered flip-flops," in 2014 IEEE 5th Latin American Symposium on Circuits and Systems, 2014, pp. 1–4.
- [P7] M. Lanuzza, R. Taco, and D. Albano, "Dynamic gate-level body biasing for subthreshold digital design," in 2014 IEEE 5th Latin American Symposium on Circuits and Systems, 2014, pp. 1–4.
- [P8] R. Taco, I. Levi, A. Fish, and M. Lanuzza, "Exploring back biasing opportunities in 28nm UTBB FD-SOI technology for subthreshold digital design," in 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI), 2014, pp. 1–4.