Fault Models for Embedded-DRAM Macros

Report
Fault Models for EmbeddedDRAM Macros
Mango C.-T. Chao, Hao-Yu Yang, Chin-Yu Chin
National Chiao-Tung University, Hsinchu, Taiwan
Rei-Fu Huang
MediaTek Inc., Hsinchu, Taiwan
Shin-Chin Lin
UMC Inc., Hsinchu, Taiwan
Outline




Introduction
Faults Models for eDRAM
Defect-level Estimation of Wear-out Defects under
ECC
Conclusion
From Commodity DRAM to
Embedded DRAM (eDRAM)


DRAM has been the mainstream of commodity memory
since its invention
Researchers attempt to bring commodity DRAM’s
advantages into a SoC
– Reduce eDRAM’s process adders to CMOS process
– Deep-trench capacitor with bottle etch, planar capacitor,
shallow trench capacitor, metal-insulator-metal capacitor
Applications:
Networking
Gaming consoles
Multimedia handhelds
High definition TV
MP3/PDA … etc
DRAM’s advantages:
*
*
*
*
High density
Digital
Analog
Structure simplicity
Embedded SoC
low-power
Memory
RF
low-cost
Others
UMC eDRAM Architecture
Size: 16Mb
(64 x 64 x (16 x 2) x 128 banks)
Technology: 65nm low-leakage process
2
Area: 4 mm
Supply voltage: 1.2 V
Operating frequency: 100 MHz
Retention time: 16 ms
Bandwidth: 3.125 Gb/s
Required cycles
for one refresh: 64 x 128 cycles
(0.08192 ms)
Difference between eDRAM and
commodity DRAM
stand-alone DRAM
embedded DRAM
metal layers
2~3
5~6
Cs
30f ~ 45f
7f ~ 10f
Cbl
fixed ratio to Cs
fixed ratio to Cs
refresh period
> 64ms
4ms ~ 16ms
data size
512Mb ~ 2Gb
2Mb ~ 64Mb
operating modes
multiple
single
ESD
Yes
No
interface test
timing check + IO
setup/hold time
ECC
Mostly no
Yes
Outline




Introduction
Faults Models for eDRAM
Defect-level Estimation of Wear-out Defects under
ECC
Conclusion
Fault Models for eDRAM

Since eDRAM uses the SRAM interface, we start from
a standard SRAM test algorithm, March C-, then
discuss the faults which does not cover by SRAM test
but may occurs in eDRAM.
1.
2.
3.
4.
5.
Retention faults
Word-line-coupling faults
Bit-line-coupling faults
Stuck-open faults
Bank faults
Retention Faults

Definition
– When the charges in the storage capacitor leak due to
the leakage current, then the storage capacitor lose it
stored value before next refresh.

Detection
– {Wa}, Delay, {Ra}

Different mechanism in SRAM
off
“1”
Word-line-Coupling Faults

The word-line-coupling faults can be classified into
two types:
1. Switching word-line-coupling fault
2. Hammering word-line-coupling fault
Switching Word-line-Coupling
Fault

Definition
– When the coupling capacitor between two word-line is
too large, then the two word-line may turn on at the
same time.
blb0 bl0 blb1 bl1 blb2 bl2 blb3 bl3 blb4 bl4 blb5 bl5 blb6 bl6 blb7 bl7
WL0
1
0
1
0
1
0
1
0
WL2
0
1
0
1
0
1
0
1
cs<0>

cs<1>
cs<2>
Detection
– Y-direction with checkerboard background
cs<3>
Hammering Word-line-Coupling
Fault

Definition
– When the current word-line(aggressor) turns on, it will
induces a noise signal to adjacent word-line(victim),
then the victim word-line turns on slightly and induces
extra leakage to the storage capacitor.

Detection
– {Rwlb}n,{Rwla}
notation
– wlb: current word-line
– wla: adjacent word-line
– { }n: repeat n times
Bit-line-Toggling Faults

Definition
– When the coupling capacitor between two bit-line is
too large, it will slow down the charge-sharing
mechanism, and then the sense amplifier will senses
the wrong data.
blb0 bl0 blb1 bl1 blb2 bl2 blb3 bl3 blb4 bl4 blb5 bl5 blb6 bl6 blb7 bl7
WL0
0
0
0
0
0
0
0
0
WL2
0
0
0
0
0
0
0
0
0 Vdd 0
2

Detection
– X-direction with solid background
Stuck-Open Faults

Stuck-open faults in DRAMs can be classified into
two categories:
1. Transistor-open faults
2. Resistive-open faults

BL
WL
R
Detection
– for transistor-open fault
{Ra, Wb, Rb}
– for resistive-open fault
{R0, W1, R1}
R
R
R
Bank Faults

Definition
– Two banks turn on at the same time due to the large
parasitical RC.
Slave
bank0
bank1
bank2
RC1
……
… …
… …
Ck
A0
A1
RC1
Ck
B
Master
Slave
Master

Detection
– consecutive read operations from the farthest bank to
the nearest one
Fault Occurrences between eDRAM
and commodity DRAM
eDRAM
commodity DRAM
retention faults
high
low
word-line-coupling
faults
high
low
bit-line-toggling
faults
low
high
transistor-open
faults
low
high
resistive-open faults
high
low
-
-
bank faults
Outline




Introduction
Faults Models for eDRAM
Defect-level Estimation of Wear-out Defects under
ECC
Conclusion
Defect-level Estimation of Wearout Defects under ECC




Reliability testing, such as THB test, HAST test,
HTOL test, is applied to measure the reliability or
lifetime of manufactured chips
Due to the cost and application time, the reliability
testing can only be applied to a small portion of the
products to accelerate the wear-out failures
The most straight-forward method to estimate this
defect level is just to run the reliability testing with
the ECC function and count the failed parts
The number of sampled parts for the reliability
testing is usually around few hundreds and the
general acceptable defect level is under 100DRRM,
so this sampling size is not enough to support such a
fine resolution of the defect level
Defect-level Estimation of Wearout Defects under ECC



Instead of counting the failed part, we directly count
the number of defective eDRAM cells for each part
Because the ECC circuitry may mask some defective
cells, we need to turn off the ECC function
The probability distribution of defective cells can be
modeled by the Poisson distribution
DBP
Production
Test
pass
P(Pass_PT)
fail
+DDR
Reliability
Test
pass
P(Pass_RT|Pass_PT)
fail
Defect-level Estimation of Wearout Defects under ECC
DL  1  P( Pass _ RT | Pass _ PT )
P( Pass _ RT Pass _ PT )
P( Pass _ PT )
P( Pass _ RT )
1
(1)
P( Pass _ PT )
1
C n
 P ( DBP  x)
s
Cx
x 0
P( Pass _ PT )  
w
x
x
C n e

s
C
x 0
x

 ( 1 2 )
DBP : the random variable denoting the number of single
defects existing before applying the production testing.
DDR : the random variable denoting the number of added
single defects during the reliability testing.
2 : the mean of the random variable DDR .
Cxw  n x
P( Pass _ RT )  
 P ( DBP  DDR  x)
s
Cx
x 0
x
w: the number of words in one memory chip.
s: the number of bits in one memory chip(s=w  n).
1: the mean of the random variable DBP .
Cxw  n x e  1 (1 ) x


(2)
s
C
x
!
x 0
x
w
x
notations:
n: the number of bits per word.
(1  2 )
(3)
x!
x
P( E ): the probability that event E occurs.
Pass _ PT : the event that a part containing random single
defects passes the production testing with the use of ECC.
Pass _ RT : the event that a part containing random single
defects passes the reliability testing with the use of ECC.
DL: the defect level caused by the wear-out defects.
With Equation 1, 2, and 3, the defect level DL with the use of ECC can
be obtained.
Outline




Introduction
Faults Models for eDRAM
Defect-level Estimation of Wear-out Defects under
ECC
Conclusion
Conclusion



We introduced an exemplary eDRAM design and
discussed the key issues which should be
emphasized in eDRAM testing by comparing to
commodity-DRAM testing
We started from a short SRAM algorithm and
discussed the fault models that are not covered by
the SRAM testing but should be considered in DRAM
testing
We proposed a mathematical model to estimate the
defect level caused by wear-out defects under ECC

similar documents