# Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read

Feb 9<sup>th</sup> 2015 HPCA-21 San Francisco, USA

Prashant Nair - Georgia Tech Chiachen Chou - Georgia Tech Bipin Rajendran – IIT Bombay Moinuddin Qureshi - Georgia Tech

Georgia Institute of Technology



- Phase Change Memory (PCM) promises higher density and better scalability
- Key Challenges:
- Limited Endurance (10-100M writes/cell)
- High Write Latency (4X-8X higher than PCM read)
- High Read Latency (2X of DRAM)

- Phase Change Memory (PCM) promises higher density and better scalability
- Key Challenges:
- Limited Endurance (10-100M writes/cell)
  Wear Leveling, Error correction, Graceful degradation
- High Write Latency (4X-8X higher than PCM read)
- High Read Latency (2X of DRAM)

- Phase Change Memory (PCM) promises higher density and better scalability
- Key Challenges:
- Limited Endurance (10-100M writes/cell)
  - Wear Leveling, Error correction, Graceful degradation
- High Write Latency (4X-8X higher than PCM read)
   PreSET, Write Cancellation, Write Pausing
- High Read Latency (2X of DRAM)

- Phase Change Memory (PCM) promises higher density and better scalability
- Key Challenges:
- Limited Endurance (10-100M writes/cell)
  - Wear Leveling, Error correction, Graceful degradation
- High Write Latency (4X-8X higher than PCM read)
  PreSET, Write Cancellation, Write Pausing
- High Read Latency (2X of DRAM)
   Hybrid Memory, combining PCM and DRAM



- Phase Change Memory (PCM) promises higher density and better scalability
- Key Challenges:
- Limited Endurance (10-100M writes/cell)
  - Wear Leveling, Error correction, Graceful degradation
- High Write Latency (4X-8X higher than PCM read)
  PreSET, Write Cancellation, Write Pausing
- High Read Latency (2X of DRAM)
   Hybrid Memory, combining PCM and DRAM



Hybrid Memory

#### Goal → Reduce the high read latency of PCM

### OUTLINE

- Background
- Early Read
- Turbo Read
- Early+Turbo Read
- Results
- Summary

• Low (SET) and High (RESET) resistance states



• Low (SET) and High (RESET) resistance states



• Cell states are compared to reference resistance

• Low (SET) and High (RESET) resistance states



- Cell states are compared to reference resistance
- The states correspond to binary values of 0 and 1

• Low (SET) and High (RESET) resistance states



- Cell states are compared to reference resistance
- The states correspond to binary values of 0 and 1

PCM stores binary values by varying resistance of cells

#### **READ PROCESS IN PCM**



#### **READ PROCESS IN PCM**

![](_page_12_Figure_2.jpeg)

![](_page_13_Figure_2.jpeg)

![](_page_14_Figure_2.jpeg)

#### Three step process to read a PCM cell

![](_page_15_Figure_2.jpeg)

The discharging time determines the sensing time

![](_page_16_Figure_1.jpeg)

![](_page_17_Figure_1.jpeg)

Capacitive Discharge and compare against V<sub>ref</sub>

![](_page_18_Figure_1.jpeg)

Capacitive Discharge and compare against V<sub>ref</sub>

![](_page_19_Figure_1.jpeg)

- Capacitive Discharge and compare against V<sub>ref</sub>
- Variation in SET and RESET distributions

![](_page_20_Figure_1.jpeg)

- Capacitive Discharge and compare against V<sub>ref</sub>
- Variation in SET and RESET distributions

![](_page_21_Figure_1.jpeg)

- Capacitive Discharge and compare against V<sub>ref</sub>
- Variation in SET and RESET distributions

![](_page_22_Figure_1.jpeg)

- Capacitive Discharge and compare against V<sub>ref</sub>
- Variation in SET and RESET distributions

Sensing time is determined by worst case cells

### **REDUCE READ LATENCY : SENSE EARLIER**

• Sense data earlier than the provisioned time

![](_page_23_Figure_2.jpeg)

# **REDUCE READ LATENCY : SENSE EARLIER**

- Sense data earlier than the provisioned time
- Lower Resistance → Lower RC time to discharge

![](_page_24_Figure_3.jpeg)

# **REDUCE READ LATENCY : SENSE EARLIER**

- Sense data earlier than the provisioned time
- Lower Resistance → Lower RC time to discharge

![](_page_25_Figure_3.jpeg)

Reduce time to sense by lowering the RC time

![](_page_26_Figure_1.jpeg)

![](_page_27_Figure_1.jpeg)

![](_page_28_Figure_1.jpeg)

![](_page_29_Figure_1.jpeg)

Sensing earlier causes errors while reading higher resistances

• Increase bitline voltage more than the provisioned value

![](_page_30_Picture_2.jpeg)

• Increase bitline voltage more than the provisioned value

![](_page_31_Picture_2.jpeg)

- Increase bitline voltage more than the provisioned value
- Higher Voltage → Higher Current → Low Read Latency

![](_page_32_Picture_3.jpeg)

- Increase bitline voltage more than the provisioned value
- Higher Voltage → Higher Current → Low Read Latency

![](_page_33_Picture_3.jpeg)

Increase bitline voltage and reduce sensing time

#### EFFECT OF HIGH BITLINE VOLTAGE

![](_page_34_Figure_1.jpeg)

#### EFFECT OF HIGH BITLINE VOLTAGE

![](_page_35_Figure_1.jpeg)
#### EFFECT OF HIGH BITLINE VOLTAGE



#### **EFFECT OF HIGH BITLINE VOLTAGE**



Increasing bitline voltage causes errors

# GOAL

Reduce read latency by
1. Exploiting variability in PCM cells → Early Read
2. Higher voltage to read PCM cells → Turbo Read

#### OUTLINE

- Background
- Early Read 🖕
- Turbo Read
- Early+Turbo Read
- Results
- Summary



































Sensing early causes errors in sense amplifiers
 The cells in PCM substrate have no error







#### **Sense Amplifiers**



#### **Sense Amplifiers**





Lower sensing time  $\rightarrow$  more errors  $\rightarrow$  stronger ECC

Strong ECC  $\rightarrow$  Huge area overheads









- 1. Sense Data Early
- 2. Read Line
- 3. Correct errors
- 4. Detect errors





- 1. Sense Data Early
- 2. Read Line
- 3. Correct errors
- 4. Detect errors
- 5. Retry with normal latency on error detection

Memory Controller



- 1. Sense Data Early
- 2. Read Line
- 3. Correct errors
- 4. Detect errors
- 5. Retry with normal latency on error detection

Memory Controller



- 1. Sense Data Early
- 2. Read Line
- 3. Correct errors
- 4. Detect errors
- 5. Retry with normal latency on error detection

**Memory Controller** 



Sense Amplifiers

Early Read → detect and retry to read correctly at lower latency









Sensing errors→Unidirectional→SET classified as RESET

# UNIDIRECTIONAL ERROR DETECTION

 All unidirectional errors can be detected using Berger Code



# UNIDIRECTIONAL ERROR DETECTION

- All unidirectional errors can be detected using Berger Code
- For a 512 bit cache line, only 10 bits are needed



# UNIDIRECTIONAL ERROR DETECTION

- All unidirectional errors can be detected using Berger Code
- For a 512 bit cache line, only 10 bits are needed



Berger Code detects unidirectional errors with low cost

#### **BERGER CODES: HOW AND WHY**

#### Sum the number of 1's in data, invert and store

Data

Berger Code

Berger code provides guaranteed detection of all unidirectional errors

#### **BERGER CODES: HOW AND WHY**

#### Sum the number of 1's in data, invert and store



# Berger code provides guaranteed detection of all unidirectional errors
Sum the number of 1's in data, invert and store



#### Sum the number of 1's in data, invert and store







#### Sum the number of 1's in data, invert and store



#### Sum the number of 1's in data, invert and store



#### Sum the number of 1's in data, invert and store











• Early Read reduces  $R_{sense}$  from 10K $\Omega$  to 7K $\Omega$ 



- Early Read reduces  $R_{sense}$  from 10K $\Omega$  to 7K $\Omega$
- The BER increases from 10<sup>-16</sup> to 10<sup>-5</sup>



- Early Read reduces  $R_{sense}$  from 10K $\Omega$  to 7K $\Omega$
- The BER increases from 10<sup>-16</sup> to 10<sup>-5</sup>
- Detect using Berger Code, retrying 0.5% times



- Early Read reduces  $R_{sense}$  from 10K $\Omega$  to 7K $\Omega$
- The BER increases from 10<sup>-16</sup> to 10<sup>-5</sup>
- Detect using Berger Code, retrying 0.5% times

25% reduction in read latency using Early Read

# OUTLINE

- Introduction and Background
- Early Read
- Turbo Read 🖕
- Early+Turbo Read
- Results
- Summary

• PCM writes data by passing current through cell



↑ Write Current

Time

- PCM writes data by passing current through cell
- PCM reads data by passing current through cell

Write Current

Current

Read Current

Time

Time

- PCM writes data by passing current through cell
- PCM reads data by passing current through cell
  - Read current << Write current</p>

↑ Write Current

Read Current

Current

- PCM writes data by passing current through cell
- PCM reads data by passing current through cell
  Read current << Write current</li>
- Higher read current can reduce read latency



- PCM writes data by passing current through cell
- PCM reads data by passing current through cell
  Read current << Write current</li>
- Higher read current can reduce read latency
- Read Disturb → Causes PCM cells to accidently flip





- PCM writes data by passing current through cell
- PCM reads data by passing current through cell
  Read current << Write current</li>
- Higher read current can reduce read latency
- Read Disturb → Causes PCM cells to accidently flip





- PCM writes data by passing current through cell
- PCM reads data by passing current through cell
  Read current << Write current</li>
- Higher read current can reduce read latency
- Read Disturb → Causes PCM cells to accidently flip



Higher bitline voltage causes Read Disturb















Reading with higher voltage → Read Disturb → causes errors in PCM cells

Incorrect value may be read



Incorrect value may be read



- Incorrect value may be read
- Read disturb errors can be corrected with Error Correcting Codes (ECC)



- Incorrect value may be read
- Read disturb errors can be corrected with Error Correcting Codes (ECC)





- 1. Read with higher bitline voltage
- 2. If read disturb errors



- 1. Read with higher bitline voltage
- 2. If read disturb errors



- 1. Read with higher bitline voltage
- 2. If read disturb errors
- 3. ECC to correct errors



- 1. Read with higher bitline voltage
- 2. If read disturb errors
- 3. ECC to correct errors

Memory Controller

Turbo Read → Read with higher bitline voltage and use ECC to correct read disturb errors

**PCM Cells** 

Sense Amplifiers

ECC

# **TURBO READ: DESIGN**

- Systems are typically designed for failure rate < 10<sup>-16</sup>
- Fix with a small amount of budget → DECTED

# **TURBO READ: DESIGN**

- Systems are typically designed for failure rate < 10<sup>-16</sup>
- Fix with a small amount of budget → DECTED

| BER<br>Read Disturb | Probability Line has 3 Errors | Latency |
|---------------------|-------------------------------|---------|
| 10 <sup>-9</sup>    | < <b>10</b> <sup>-19</sup>    | 57ns    |

# **TURBO READ: DESIGN**

- Systems are typically designed for failure rate < 10<sup>-16</sup>
- Fix with a small amount of budget → DECTED

| BER<br>Read Disturb | Probability Line has 3 Errors | Latency |
|---------------------|-------------------------------|---------|
| 10 <sup>-9</sup>    | < 10 <sup>-19</sup>           | 57ns    |

ECC can mitigate read disturb errors in Turbo Read
## **TURBO READ: DESIGN**

- Systems are typically designed for failure rate < 10<sup>-16</sup>
- Fix with a small amount of budget → DECTED

| BER<br>Read Disturb | Probability Line has 3 Errors | Latency |
|---------------------|-------------------------------|---------|
| 10 <sup>-9</sup>    | < <b>10</b> <sup>-19</sup>    | 57ns    |

• Probabilistic Scrub (PRS) to mitigate latent faults

ECC can mitigate read disturb errors in Turbo Read

## OUTLINE

- Background
- Early Read
- Turbo Read
- Early+Turbo Read



- Results
- Summary

#### Early read → Error → Retry

- Bimodal Read Latency













Combine Early and Turbo Reads → Get benefits of both without bimodal latency

#### **CHALLENGES IN EARLY+TURBO READ**



## **CHALLENGES IN EARLY+TURBO READ**



## **CHALLENGES IN EARLY+TURBO READ**



28



- 1. Read with higher bitline voltage + Sense early
- 2. If read disturb errors + sensing errors



- 1. Read with higher bitline voltage + Sense early
- 2. If read disturb errors + sensing errors



- 1. Read with higher bitline voltage + Sense early
- 2. If read disturb errors + sensing errors
- 3. ECC to correct errors





- Read with higher bitline voltage + Sense early
- 2. If read disturb errors + sensing errors
- 3. ECC to correct errors

**Memory Controller** 

Early+Turbo Read → Read with higher bitline voltage and sense early → Use ECC to correct errors

**PCM Cells** 

Sense Amplifiers

ECC

#### **EARLY+TURBO READ: DESIGN**

|                     | Early Read                | Turbo<br>Read    | Early+Turbo Read   |
|---------------------|---------------------------|------------------|--------------------|
| BER                 | <b>10</b> -5              | 10 <sup>-9</sup> | 2x10 <sup>-9</sup> |
| Sensing<br>Latency  | 48ns or 69ns<br>(Bimodal) | 57ns<br>(Fixed)  | 45ns<br>(Fixed)    |
| Storage<br>Overhead | 10 bits/line              | 20 bits/line     | 20 bits/line       |

• 2x10<sup>-9</sup> BER → DECTED → System Failure Rate < 10<sup>-19</sup>

## **EARLY+TURBO READ: DESIGN**

|                     | Early Read                | Turbo<br>Read    | Early+Turbo Read   |
|---------------------|---------------------------|------------------|--------------------|
| BER                 | 10 <sup>-5</sup>          | 10 <sup>-9</sup> | 2x10 <sup>-9</sup> |
| Sensing<br>Latency  | 48ns or 69ns<br>(Bimodal) | 57ns<br>(Fixed)  | 45ns<br>(Fixed)    |
| Storage<br>Overhead | 10 bits/line              | 20 bits/line     | 20 bits/line       |

- 2x10<sup>-9</sup> BER → DECTED → System Failure Rate < 10<sup>-19</sup>
- Sensing Latency Fixed → 45ns

## **EARLY+TURBO READ: DESIGN**

|                     | Early Read                | Turbo<br>Read    | Early+Turbo Read   |
|---------------------|---------------------------|------------------|--------------------|
| BER                 | 10 <sup>-5</sup>          | 10 <sup>-9</sup> | 2x10 <sup>-9</sup> |
| Sensing<br>Latency  | 48ns or 69ns<br>(Bimodal) | 57ns<br>(Fixed)  | 45ns<br>(Fixed)    |
| Storage<br>Overhead | 10 bits/line              | 20 bits/line     | 20 bits/line       |

- 2x10<sup>-9</sup> BER → DECTED → System Failure Rate < 10<sup>-19</sup>
- Sensing Latency Fixed → 45ns

Early+Turbo Read reduces read latency by 30%

## OUTLINE

- Background
- Early Read
- Turbo Read
- Early+Turbo Read
- Results 🖕
- Summary

#### SYSTEM CONFIGURATION

| Parameter      | Configuration                 |  |
|----------------|-------------------------------|--|
| Cores          | 8 cores @ 3Ghz                |  |
| L1-L2-L3 Cache | 32KB-256KB-1MB (Private)      |  |
| L4 Cache       | 128MB (Shared) @ 15ns latency |  |
| PCM System     |                               |  |
| Channels       | 4 Channels @ 8GB/Channel      |  |
| Read Latency   | 80ns 🗲 69ns sensing time*     |  |
| Write Latency  | 250ns*                        |  |

Spec Benchmarks with read MPKI from DRAM Cache > 1















#### Our proposals improve performance by upto 21%














# OUTLINE

- Background
- Early Read
- Turbo Read
- Early+Turbo Read
- Results
- Summary 🖕

Goal → Reduce the read latency of PCM

- Goal → Reduce the read latency of PCM
- Two low cost solutions

- Goal → Reduce the read latency of PCM
- Two low cost solutions
- Early Read: Better-than-worst-case sensing using Berger Codes to detect errors and retry

- Goal → Reduce the read latency of PCM
- Two low cost solutions
- Early Read: Better-than-worst-case sensing using Berger Codes to detect errors and retry
- Turbo Read: Read with higher current and fix read disturb errors with ECC

- Goal → Reduce the read latency of PCM
- Two low cost solutions
- Early Read: Better-than-worst-case sensing using Berger Codes to detect errors and retry
- Turbo Read: Read with higher current and fix read disturb errors with ECC
- Proposed solutions reduce read latency by 30%
  → Performance improves by 21%, EDP by 28%

# **Thank You**



# BACKUP

# SENSITIVITY TO TARGET ERROR RATES



# SENSITIVITY TO TARGET ERROR RATES



Our proposals become even more effective at higher target design error rates

### **SENSITIVITY TO DRIFT**



### **SENSITIVITY TO DRIFT**



Our proposals become even more effective when drift margins are taken into account

### **MLC PCM LATENCY**



### **MLC PCM LATENCY**



Latency determined by highest resistance states

- PCM stores values by varying resistance
- Higher resistance causes more read latency

- PCM stores values by varying resistance
- Higher resistance causes more read latency

```
Resistance = (Resistivity x Length )/Area
```

- PCM stores values by varying resistance
- Higher resistance causes more read latency
- With Technology Scaling:

Resistance = (Resistivity x Length ]/Area

- PCM stores values by varying resistance
- Higher resistance causes more read latency
- With Technology Scaling:

Resistance = (Resistivity x Length ]/Area

- PCM stores values by varying resistance
- Higher resistance causes more read latency
- With Technology Scaling:

Resistance = (Resistivity x Length ]/Area

- PCM stores values by varying resistance
- Higher resistance causes more read latency
- With Technology Scaling:

Resistance = (Resistivity x Length )/Area



- PCM stores values by varying resistance
- Higher resistance causes more read latency
- With Technology Scaling:

Resistance = (Resistivity x Length )/Area



- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled

- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled



- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled



- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled



- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled



- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled



- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled



- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled



- Read requests tend to halt execution
- Write requests can be buffered/paused/cancelled



Adversarial read sequences can cause latent faults

PCM Row

time

Adversarial read sequences can cause latent faults
 PCM Row
 Line A

Read Line A

time

Adversarial read sequences can cause latent faults

PCM Row

time

Adversarial read sequences can cause latent faults
 PCM Row
 Line A

Read Line A



time

 Adversarial read sequences can cause latent faults

PCM Row

time

Adversarial read sequences can cause latent faults
 PCM Row
 Line A


#### LATENT FAULTS FROM READ DISTURB

 Adversarial read sequences can cause latent faults
PCM Row

Latent Faults

time

#### Need a low cost solution to mitigate latent faults

#### LATENT FAULTS FROM READ DISTURB

Adversarial read sequences can cause latent faults

PCM Row

Latent Faults

Line B

#### Need a low cost solution to mitigate latent faults

Read Line B

## LATENT FAULTS FROM READ DISTURB

 Adversarial read sequences can cause latent faults Line B

PCM Row

|    |      | -   |     |
|----|------|-----|-----|
| La | tent | Fau | Its |
|    |      |     |     |

#### 4 Errors! System Failure

Need a low cost solution to mitigate latent faults

Read Line B

• Scrub the entire row with low probability (say 1%)



• Scrub the entire row with low probability (say 1%)





• Scrub the entire row with low probability (say 1%)







• Scrub the entire row with low probability (say 1%)







• Scrub the entire row with low probability (say 1%)









• Scrub the entire row with low probability (say 1%)



Probabilistic Scrub improves reliability by 10<sup>5</sup> times with negligible impact on performance

# **END OF BACKUP**