C28x Core - Hostindiaevents

Report
1
Embedded Processing Portfolio
Software, Tools, Kits & Boards
MCU
Microcontroller (MCU)
Portfolio at-a-glance
16-bit
Ultra-Low Power
&
Value Line MCUs
32-bit
Real-Time MCUs
MSP™ MCU
C2000™ MCU
• MSP430 MCU
Measurement,
sensing, general
purpose, consumer,
medical
• Delfino, Piccolo
single-core MCU
• Concerto C28x+
ARM Cortex™-M
Motor control,
digital power, lighting,
renewable energy,
smart grid
ARM®
Portfolio at-a-glance
32-bit
MCUs
Stellaris®
ARM MCU &
Hercules™
Safety ARM
MCU
• ARM Cortex™-M
• ARM Cortex™-R
32-bit
Microprocessor
s
Sitara™
ARM MPU
• ARM CortexA8
• ARM9™
DSP & ARM® MPU
Digital Signal Processor (DSP)
Portfolio at-a-glance
16/32-bit
Single-core DSPs
C6000™ &
C5000™
single-core DSP
• C6000 high
performance
fixed/floating-point DSP
• C5000 ultra-low
power fixed-point DSP
Industrial automation,
Motion control,
Connected audio/voice,
point-of-service,
human machine interface,
video, fingerprint biometrics,
human machine interface, portable medical, sensors
industrial automation,
portable navigation
smart grid, safety,
transportation,
industrial & medical
32-bit
Multicore DSPs
C6000™-based
multicore DSP
Fixed/floating-point:
• DSP + ARM
• C66x multicore DSP
• DaVinci video
processors
High performance realtime computing, video
security and analytics,
video communications,
multimedia infrastructure
C67x Architecture and Features
3
C6x VLIW CPU Core
’C62x Fixed-Point CPU Core
Program Fetch
Control
Registers
Instruction Dispatch
Instruction Decode
Data Path 1
Data Path 2
A Register File
B Register File
Control
Logic
Test
Emulation
L1 S1 M1 D1
D2 M2 S2 L2
Interrupts
• DSP architecture challenge:
– DSP algorithms have a high
degree of parallelism
– Cost-effective control of
parallelism is difficult
• VLIW architecture solution:
– Provides simple, cost-effective
control of parallelism
•fetches 8 instructions/cycle
•executes 1-8 instructions/cycle
reducing
–code size
–program fetches
–power consumption
– Can support high-performance
compilers
•3x improvement in efficiency
based on DSP benchmark suite
– Can scale to support architectural
enhancements
C67x Floating point core
•
Performance (Comm/Ind)
– IEEE Floating Point Format
•Double Precision
•Single Precision
– 668 Multiplies & AccumulatesSingle-Precision
•2 Multipliers
(334 MFLOPS)
•2 ALUs (334 MFLOPS)
– 420 MFLOPS, Double Precision
– 250 Multiplies & AccumulatesDouble-Precision
•1 Result/4 Cycles (83.5 MFLOPS)
•1 Result/2 Cycles (167 MFLOPS)
5
VelociTITM: Speed with efficiency
• Execute: CPU executes 1 to 8 instructions/cycle
• As a result, fetch packets can contain
multiple execute packets
• Parallelism is determined at
compile/assembly time and can be: Fully
Parallel
Fully Serial
Serial/Parallel
6
Floating Point DSP Comparison
C6701B
167 MHz
C6713B
200 MHz
C6727
250 MHz
MIPS
MFLOPs
167 x8= 1336
1000
1600
1200
2000
1500
Architecture
C67x
C67x
C67x+
4KB L1-P, 4KB L1-D, 256KB
L2 Cache/SRAM
32KB L1-P, 256KB L2 SRAM,
384KB ROM
Memory
64KB Data Memory
64KB Program Memory
HPI
HPI-16
1 32/16-bit
1 UHPI 32/16-bit
EMIF
100MHz 32-bit (SDRAM)
100MHz 32-bit (SDRAM)
100MHz 32-bit (SDRAM)
DMA
4-ch DMA
16-ch EDMA
16-ch dMAX
McBSP
2
2
0
McASP
0
2
3
I2C
0
2
3
SPI
0
0
2 (10MHz)
Package
429-pin Ceramic BGA
(27mm, 1.27mm)
352-pin Plastic BGA,
(35.2mm, 1.27mm)
272-pin PBGA
27x27xmm, 1.27mm
Software Compatible
256-pin PBGA
16x16mm, 1.0mm
(Ceramic Package TBD)
7
TMS320C672x Device Overview
TMS320C672x Floating-Point DSP
256K
Bytes
SRAM
768K
Bytes
ROM
Instruction
Cache
32K Bytes
SPI 0
C67x+TM
DSP
Core
IIC 0
IIC 1
Memory Controller
300 MHz DSP core
 300 MHz 67x+™ core
 64 Reg + Additional FP instructions
 Code Compatible with 6713 Devices
Large on-chip memory

384KB on-chip ROM

256KB on-chip RAM
McASP 0

32KB Inst. cache (Int Mem + EMIF)
McASP 1

EMIF for expansion
Config
McASP 2
EMIF
Switch
HPI
DMA
SPI 1
Enhanced Audio IO


RTI TImer
Max

Max
Control
dMax

16 serial data pins
Up to 6 different clock rates
dMAX
- Support for dma, circular and
multi-tap memory delay
(for Reverb)
HPI supports mux A/D and nonmux A/D
8
Memory Architecture
• New memory architecture
– Improved Instruction cache
• Size increased from 4KB to 32KB
• Cache miss penalty to Internal Memory reduced
40%
• Supports internal RAM/ROM and EMIF
– Direct single level flat memory for data,
Single Cycle access (ROM and RAM)
– All RAM and ROM is accessible as pgm or
data (like C6713)
9
Enhancements – DP, Code Density
• Changes in 67x+
– All changes are backwards compatible to 67x CPU (C6713)
– General Purpose Registers increased from 32 to 64
– New MPYSPDP instruction – SP x DP into DP
– New MPYSP2DP instruction – SP x SP into DP
– Additional ADDSP/DP, ADDDP, SUBSP, SUBDP in S unit
• Now have 4 floating point add or subtracts in parallel
– Execution packets can span Fetch Packets (64x feature)
• Code size reduction (5 to 10% reduction) since no padding
with NOPs
10
Benchmark Performance
11
Performance: The BDTImark
TM
Real block FIR filter
Complex block FIR filter
Single-sample LMS-adaptive FIR filter
Single-sample real FIR filter
Single-sample IIR filter
Vector dot product
Vector add
Vector maximum
IS-54 convolutional encoder
Finite state machine
256-point FFT
TM
Berkeley Design Technology, Inc - Berkeley, CA
12
’C67x: Floating point performance*
BDTImarkTM: A DSP Speed Metric
Source www.BDTI.com. ©1999 BDTI
23
Intel Pentium
200 MHz
17
ADI ADSP-2106x
60 MIPS
TI TMS320C3X
30 MIPS, 80 MFLOPS
TI TMS320C4X
25 MIPS, 60 MFLOPS
9
7
TI TMS320C67x
1 GFLOPS
65
*Commercial Temp
TM
Berkeley Design Technology, Inc - Berkeley, CA
13
’C67x: Benchmark performance*
Floating-Point Performance
Execution time (in Sec)
108.33
Complex Radix
4 FFT
Block FIR
Convolution
Matrix Vector
Multiply
*Commercial Temp
1,672
13.296
149
0.828
16.6
0.420
1.25
TI TMS320C6701
1 GFLOPS
Typical Floating-Point DSP
(60 MFLOPS)
14
C28x Digital Signal Controller
TMS320F2812
18Kw
RAM
128Kw Flash
+ 1Kw OTP
4Kw
Boot
ROM
Event Mgr A
Event Mgr B
XINTF
Memory Bus
12-Bit ADC
Watchdog
Interrupt Management
TM
150 MIPs C28x
32x32 bit
Multiplier
32-bit
Timers (3)
Real-Time
JTAG
32-bit DSP
RMW
Atomic
ALU
32-Bit
Register
File
GPIO
Peripheral Bus
McBSP
CAN 2.0B
SCI/UART-A
SCI/UART-B
SPI
TMS320F2812 Features and Benefits
Features
Benefits
150-MHz C28x 32-bit
DSP core
C28x 32-bit DSP core enables high-speed
execution of control algorithms. Faster control code
execution gives headroom for advanced control
techniques enabling great efficiency and cuttingedge features
Unique control
peripherals
12-bit high-speed dual-sample-hold ADC allow for
simultaneous sampling of power system currents
and voltages; Event Manager modules provide a
hardware interface for sensored or sensorless
three-phase inverter control.
On-chip communication CAN, I2C, SPI, UART, and external memory
peripherals
interface allow for a full system implementation.
17
C28x CPU
•
•
•
•
•
•
Modified Harvard Bus
Architecture
Emulation Logic
32-bit fixed-point DSP
RISC instruction set
8-stage protected pipeline
32x32 bit fixed-point MAC for single-cycle
32-bit multiply
Dual 16x16 bit fixed-point MACs
Single-cycle instruction execution
•Separate data and instruction buss
•Two data buses – one for read, one for write
•Enables fetch, read, and write in a single cycle
•Essential to maximizing single-cycle MAC
• Real-time emulation allows interrupt
servicing even when main program is halted
• Debug host has direct access to registers
and memory
• Multiple hardware debug events and
breakpoints
18
C28x Core: Bus Structure
Program Address Bus (22)
Program
(4 M* 16)
Program Data Bus (32)
Data Address Bus (32)
Data
(4 G * 16)
Data Data Bus (32)
Registers
ARAU
SP
DP @X
XAR0
to
XAR7
Execution
MPY32x32
ALU
XT
P
ACC
R-M-W
Atomic
ALU
Debu
gReal-Time JTAG
Emulation
&
Test
Engine
Register Bus
Data Write Bus (32)
Program Write Bus (32)
The C28x multiple bus architecture makes better use of the
processor cycles: Instruction fetch, decode and execute can
happen on the same clock cycle
Memory
Standard
Peripherals
External
Interfaces
C28x Core: Protected Pipeline
A
F1 F2 D1 D2
R1 R2
X
W
B
F1 F2
D1 D2 R1
R2
X
W
F1 F2 D1 D2 R1
R2
X
W
R2
X
W
R2
X
F
F1 F2 D1 D2 R1
R2
G
F1 F2 D1 D2
C
D
E
F1 F2 D1 D2 R1
F1 F2 D1 D2 R1
8-stage pipeline
F1 F2 D1
F1
F2
D1
D2
R1
R2
X
W
Writes: are
? “free”
E & G access
same address
W
X
W
RR1 1 R2 XR2 X
W
W
DD22 R1 R
R21 X
R2 W
X W
Instruction address
Protected Pipeline
Instruction content
Decode instruction
 Order of results are as written in source code
Resolve operand address
 Programmer need not worry about the
Operand address
Many
MCUs
Get operand
pipeline
CPU doing “real” work
 Shared bus for program and data address and
Store content to memory

content
Typically results in only one instruction in 4
cycles
C28x Core: Instruction set for Control
Read/Modify/Write and Atomic Operation
Offers sufficient hardware resources to efficiently handle control algorithms
LOAD
Registers
CPU
ALU / MPY
Atomic
WRITE
Memory
READ
Atomic Instructions
Benefits:
Simpler programming
Smaller, faster code
Non-interruptible operations
STORE
RISC Read/Modify/Write
SETC
MOV
AND
MOV
CLRC
INTM
AL,*XAR2
AL,#1234h
*XAR2,AL
INTM
6 words/ 5 cycles
DSP Read/Modify/Write
SETC
AND
MOV
CLRC
INTM
AL,*XAR2,#1234h
*XAR2,AL
INTM
5 words/ 4 cycles
C28x Atomic Operation
AND
*XAR2,#1234h
2 words/ 1 cycle
C28x Core: Fast Interrupt Response
PIE: Peripheral Interrupt Expansion
Internal Sources
TINT2
TINT1
TINT0
EV and Non-EV
Peripherals
(EV, ADC, SPI,
SCI, McBSP, CAN)
External Sources
XINT1
XINT2
PDPINTx
RS
XNMI_XINT13
C28x Core
PIE
(Peripheral
Interrupt
Expansion)
RS
NMI
INT1
INT2
INT3
•
•
•
INT12
INT13
INT14
C28x Core: Fast Interrupt Response
Latency is Minimized
Latency
External Signal
INTx
Internal Signal
Sync
Interrupt
Signal
Set
IFR
Vector fetch
Auto context save
2
1
8
1
PIE HW
Interrupt jammed
Sync Set
PIEIFR into pipeline
Decode 1st ISR
instruction

Latency: time between when an interrupt occurs to decoding
(D2) the first ISR instruction

Minimum latency:


Internal peripherals: 10-14 cycles (100 ns @100MHz)

External signals: 11 cycles (110 ns @ 100 MHz)
Maximum latency: depends on wait states, ready, INTM, etc.
C2000™ real-time controllers software
Software Highlights
 ControlSuite™ Software
 Software infrastructure and tools for every stage of
development and evaluation
 Allows customers to focus on differentiation, not basics
 Key Functional Areas:
 Device Support (Bit fields, API Drivers, Examples)
 Library Repository (Math Library, DSP Library,
Application Library, Utilities)
 Development Kits (Hardware Package, Software
Examples, Complete System Framework, Graphical
User Interfaces)
 Debug and Software Tools (IDE, RTOS, Emulation
 Integrated Development Environment (IDE)
 Eclipse-based Code Composer Studio™ IDE supports all
 Application Specific Software:
 Motor Control Software Library
 Supports multiple motor types and control
techniques
(ex: FOC (sensored and sensorless) for ACI,
PMSM
 Digital Power Software Library
 Library for both C28x Core and CLA
Getting Started
ControlSuite
Application
Notes
Users Guide
Tools/Reference Designs
ControlSticks
ControlCards
Evaluation Kits
Development Tools
25
Tools
• Code Composer is an Integrated Development
Environment (IDE) similar to MS Visual C++ and built
specifically for DSP
• DSP/BIOS is a library of scheduling, instrumentation,
and communications functions that provides real-time
analysis and RTDXTM (Real-Time Data Exchange)
• Hardware Emulation, and Evaluation tools allow
code debug on actual silicon and low-cost analysis of
performance in early stages of development cycle
• Code Composer Studio provides an extensible tool
plug-in and seamless integration between the host and
target DSP tools
26
CCSv4/v5
Perspectives contain separate
window arrangements depending
on what you are doing.
Customize toolbars & menus
Tab data displays together
to save space
Tabbed editor windows
Fast view windows don’t display
Until you click on them
10/19/11
27
Code Composer Studio v5
CCSv5 is split into two phases


5.0
Not a replacement for CCSv4
Targeted at users who are using devices running Linux & multi-core
C6000
Addresses a need (Linux debug) that is not supported by CCSv4
Available today
5.1
replacement for CCSv4 and is targeted at all users
Available fall 2011
Supports both Windows & Linux



Note that not all emulators will be supported on Linux
SD DSK/EVM onboard emulators, XDS560 PCI are not supported
Most USB/LAN emulators will be supported
XDS100, SD 510USB/USB+, 560v2, BH 560m/bp/lan
http://processors.wiki.ti.com/index.php/Linux_Host_Support
Code Composer Studio v4
• Easy to use, Eclipse based
IDE: Compiler, linker, more
• Supports all MSP430 MCUs
• Enhancements since CCE v3:
– Speed
– Code size improvements
– Auto-updating
• $495 for CCS v4 MCU Edition
• Free for apps <16KB
• Identical look and feel as Code
Composer Essentials
http://wiki.msp430.com/wiki/index.php?title=Category:Code_Composer_Studio_v4
Analyze: Visualize data
Graphical Signal Analysis:
– View signals in native
format
– Change variables on the fly
and see their effects
– Numerous applicationspecific graphical plots
•
•
•
•
FFT waterfall
Eye diagram
Constellation plot
Image displays & more
– Requires no additional
code
30
BACKUP
31
C6701 DSP Block Diagram
C672x DSP Block Diagram
33
THANK YOU
34

similar documents