02-Altera Stratix Family - Computer Science and Engineering

Report
Commercial FPGAs:
Altera Stratix Family
Dr. Philip Brisk
Department of Computer Science and Engineering
University of California, Riverside
CS 223
Notes on These Slides
• Altera has disclosed the details of their
devices both in online documentation and
academic papers
• The academic papers evaluate different design
decisions and tradeoffs; the experiments are a
bit too specialized for this course.
– Please do not overly emphasize the
experimentation in your studies
The Stratix TM Routing
and Logic Architecture
D.M. Lewis, et al.,
International Symposium on FPGAs, 2003
Online documentation
Altera Stratix FPGA
Stratix Logic Element (LE)
Register Feedback Mode
Register Cascade (Shift Regs.)
Logic Array Block (LAB)
Directionally Biased Routing
• Long vertical wires
require power drivers
– Fewer vertical wires
• More rows than
columns
– More demand for
horizontal wires
The Stratix II Logic and Routing
Architecture
D.M. Lewis, et al.,
International Symposium on FPGAs, 2005
Online documentation
Logic Array Block (LAB)
Adaptive Logic Module (ALM)
Adaptive Logic Module (ALM)
Four ALM Operating Modes
•
•
•
•
Normal Mode
Extended LUT Mode
Arithmetic Mode
Shared Arithmetic Mode
Normal Mode
LUT Input Utilization
Extended LUT Mode
• Some 7-input logic functions
Arithmetic Mode
Arithmetic Mode Example
R = (X < Y) ? Y : X
Configure the LUTs to pass X
through unmodified, and
ignore the carry chain outputs
(X < Y)
• Compute X-Y using the carry
chain
• Only look at the carry output
• Use the carry output to select
either X or Y accordingly
Shared Arithmetic Mode (3-input Add)
Register Chain (Shift Registers)
Separates logic and shift
register functions
• Cycle 1
• Combination logic
• Cycles 2..k+1
• Shift by k
…
ALM Benefits
• Reduced LAB area by 2.6% compared to Stratix
• 15% performance improvement
• When shrinking from a 0.13um(Stratix) to 90nm
(Stratix II) technology node
– 51% performance improvement
– 50% area decrease
TriMatrix Embedded Memories
M512 RAM Block
Functions
• 1-port RAM
• 2-port RAM
• FIFO
• ROM
• Shift Register
576 RAM bits (32 x 18), includes parity bits
M4K RAM Block
Functions
• 1-port RAM
• 2-port RAM
• True 2-port
RAM
• FIFO
• ROM
• Shift Register
4,608 RAM bits (128 x 36), includes parity bits
M-RAM Block
Functions
• 1-port RAM
• 2-port RAM
• True 2-port
RAM
• FIFO
589,824 RAM bits (4K x 144), includes parity bits
MRAM LAB Interface
DSP Blocks
• Eight 9x9 multipliers
• Four 18x18 multipliers
• One 36x36 multiplier
DSP Block
Internals
Add/Sub/Accum
Functions
• Multiplier
• Multiply-Accum
• AB + CD
• AB + CD + EF + GH
DSP Block Interconnect Interface
Architectural Enhancements in
Stratix-IIITM and Stratix-IVTM
D.M. Lewis, et al.,
International Symposium on FPGAs, 2009
Online documentation (Stratix III)
Online documentation (Stratix IV)
New Features
•
•
•
•
Programmable power management
LUT-RAM
LUT-Register Mode
Enhanced DSP Block
Programmable Body Bias Control
Large regions
• Less body bias control circuitry
Small regions
• Fine-grained power mgmt
Power Efficiency
LUT-RAM
x y
SRAM
SRAM
SRAM
SRAM
Idea
• Use the SRAM bits as memory
• Granularity is LAB-wide
What is needed?
• Write capability
• Signals for address and data for
the write path
LUT-RAM Architecture
Supports one read + one
write in a single cycle
MLAB vs. LAB
ALM LUT-Register Mode
https://upload.wikimedia.org/wikipedia/commons/c
/c6/R-S_mk2.gif
ALM LUT-Register Mode
DSP Block Capabilities
•
•
•
•
•
•
•
•
•
•
•
•
High-performance, power-optimized, fully registered and pipelined multiplication
operations
Natively supported 9-bit, 12-bit, 18-bit, and 36-bit wordlengths
Natively supported 18-bit complex multiplications
Efficiently supported floating-point arithmetic formats (24-bit for single precision
and 53-bit for double precision)
Signed and unsigned input support
Built-in addition, subtraction, and accumulation units to combine multiplication
results efficiently
Cascading 18-bit input bus to form tap-delay line for filtering applications
Cascading 44-bit output bus to propagate output results from one block to the
next block without external logic support
Rich and flexible arithmetic rounding and saturation units
Efficient barrel shifter support
Loopback capability to support adaptive filtering
DSP Block Overview
Multiply-Add
4-Multiply Add w/Accumulation
Cascading Output for FIR Filters
Full DSP Block
Half-DSP Block Architecture
Four 9-bit Independent
Half-DSP Multiplier Mode
Three 12-bit Independent
Half-DSP Multiplier Mode
Two 18-bit Independent
Half-DSP Multiplier Mode
36-bit Half-DSP Multiplier Mode
54x54-bit
Multiplier Mode
Used for double-precision floating-point
Architectural Enhancements in
Stratix-VTM
D.M. Lewis, et al.,
International Symposium on FPGAs, 2013
Online documentation
Larger MLAB/LUT-RAM
4 Flip-Flops per ALM
Embedded Memories with Error
Correction Codes (ECC)

similar documents