Mid Semester Presentation - High Speed Digital Systems Lab

Yaron Doweck
Yael Einziger
Supervisor: Mike Sumszyk
Spring 2011
Semester Project
1.Project Goals
2.Development Tools
3.Learning Steps
4.What’s next
2 / 26
* Learn to use the new TI C66 platform and to
exploit its abilities and advantages.
* Implement a Real-Time computer vision
algorithm using multi-core programming.
3 / 26
1.Project Goals
2.Development Tools
3.Learning Steps
4.What’s next
4 / 26
TMS320C6678 Multicore Fixed and
Floating-Point Digital Signal
Code Composer Studio v5 with
5 / 26
* 8 C66x CorePac DSP’s
* Based on TI’s Keystone Multicore Architecture
* 320 GMAC/160 GFLOP @ 1.25GHz
* 32KB L1P, 32KB L1D, 512KB L2 Per Core
* 4MB Shared L2
* 64-Bit DDR3 Interface (DDR3-1600)
6 / 26
1.Project Goals
2.Development Tools
3.Learning Steps
4.What’s next
7 / 26
1. CCS Simulator and Profiler
2. Cache configuration
3. DMA data transfer
4. Interrupts
5. Fixed and Floating point libraries
(DSPlib, IMGlib, Vlib,…)
7. Multi-core programming
8 / 26
*The CCS V5 can simulate the C6678
processor and some peripherals.
*The profiler analyzes execution time
and statistics for functions and code
9 / 26
*Graph viewer – enables to view data
from memory in time or frequency
*Image Analyzer – enables to view an
image stored in memory or file.
Supports grayscale, RGB and YUV color
10 / 26
*32 KB L1P cache. L1P is read-allocate and
direct mapped.
*32 KB L1D cache. L1D is read-allocate, writeback and 2-way set associative.
*Each can be configured as 0, 4, 8, 16 or 32
KB cache.
*512KB L2 cache. L2 is read and write allocate
and 4-way set associative.
*L2 can be configured as 0, 32, 64, 128, 256
or 512 KB cache.
*All configurations can be done during run
11 / 26
different L1 and L2 cache
sizes during or before run time.
*Using L1 and L2 as SRAM memory (fully
SRAM or part SRAM and part cache).
*Controlling variable locations (L1,L2 or
DDR3 memories).
12 / 26
*C66xx Processors has 3 EDMA3 controllers,
each with 64 DMA channels + 8 QDMA
*EDMA3 supports data transfer to\from
cache, shared memory or external
*EDMA3 supports the use of hardware
*In addition, each core has a faster IDMA
controller for internal transfers.
13 / 26
*Using IDMA
to transfer data inside a
core (L2↔L1).
*Using EDMA3 to transfer data to\from L1,
L2 and DDR3.
14 / 26
The interrupt controller
supports up to 128
system events. They
consist of both
events (within the C66x
CorePac) and chip-level
15 / 26
The interrupt controller
outputs 15 signals to the
core from the event
*One maskable hardware
*12 maskable hardware
*One non-maskable signal
*One reset signal
16 / 26
*Configuring manually triggered events.
*Configuring EDMA transfer completion routine using
EDMA system event.
17 / 26
*DSPLib – an optimized DSP function
library that includes general-purpose
signal-processing routines for real-time
18 / 26
*IMGLib – an optimized image/video
processing function library that includes
general-purpose image/video processing
routines for real-time applications.
19 / 26
Some more libraries
*VLib – a collection of computer vision
algorithms that are optimized for TI DSPs.
*IQMath – a collection of highly optimized fixed
point arithmetic, trigonometric and
mathematical functions. typically used in realtime applications.
*fastMath – optimized arithmetic and
trigonometric functions for floating point
20 / 26
* Using DSPLib for a simple signal-processing
application with floating point arrays.
* Using IMGLib for a simple image-processing
Still left:
* Studying VLib, IQMath and fast Math Libraries.
* Compare actual running time to the running time
specified in the User Guide.
21 / 26
*SYS/BIOS is a real time operating system
designed to be used by applications that
require real-time scheduling and
*SYS/BIOS provides preemptive multi-threading,
hardware abstraction, real-time analysis, and
configuration tools.
*SYS/BIOS is designed to minimize memory and
CPU requirements on the target.
22 / 26
*Using SYS/BIOS modules to configure
DSP’s memory (cache sizes, memory
sections, heap and stack size).
*Running a multi-threaded program with
shared variables protection.
Still left:
*Using SYS/BIOS modules to configure
DSP peripherals (LAN, SRIO, PCIe).
23 / 26
1. CCS Simulator and Profiler - done
2. Cache configuration - done
3. DMA data transfer - done
4. Interrupts - done
5. Fixed and Floating point libraries
(DSPlib, IMGlib, Vlib,…) – In Progress
6. SYS/BIOS – In Progress
7. Multi-core programming
24 / 26
1.Project Goals
2.Development Tools
3.Learning Steps
4.What’s next
25 / 26
1. Implementation of a bidirectional data flow
between DDRIII and L1, possibly through L2. (3
2. Performance analysis (throughput, latency and
accuracy) when using floating point versus fixed
point libraries. (2 weeks)
3. Usage of hardware semaphores for parallel data
access and Multicore Navigator for enabling
messages communication between different
cores. (4 weeks)
26 / 26

similar documents