Architecture and Instruction Set of the C6x Processor Module 1 Reference • R. Chassaing, DSP applications using C and the TMS 320C6x DSK, Wiley, 2002 • • • • • • • • DSP TMS320 Introduction Architecture Functional Unit Fetch & Execute Packet Pipelining Registers Addressing Modes DSP • Digital Signal Processing : Application of mathematical operations to digitally represented signal • Signals represented digitally as sequence of samples. • Digital Signal Processor: Electronics System that process digital Signal. DSP System DSP tasks • Most DSP tasks Require – – – – Repetitive numeric computation Real time processing High memory System flexibility • DSP must perform these tasks efficiently while minimizing – – – – Cost Power Memory use Development time TMS DSP IC • TMS 320 C6X – TMX – experimental device – TMP – prototype – TMS – Qualified device – 320- TI DSP family – C- CMOS with ROM – E- CMOS with EPROM – 5- Generation – X- version number TMS320 Introduction • Texas Instruments introduced the ﬁrst generation TMS32010 digital signal processor in 1982, the TMS320C25 in 1986 , and the TMS320C50 in 1991. • These 16-bit processors are all ﬁxed pointprocessors and are code-compatible. • Von neumann VS Harvard • The ﬁxed-point processors C1x, C2x, and C5x are based on a modiﬁed Harvard architecture with separate memory spaces for data and instructions that allow concurrent accesses. • Quantization error or round-off noise from an ADC is a concern with a ﬁxed point processor. • The TMS320C30 ﬂoating-point processor was introduced in the late 1980s. • The TMS320C6201 (C62x), announced in 1997. • C62x is based on a very-long-instruction-word (VLIW) architecture, still using separate memory spaces for instructions and data as with the Harvard architecture. • The C62x is not code-compatible with the previous generation of ﬁxed-point processors. TMS320C6x ARCHITECTURE • The TMS320C6711 is a ﬂoating-point processor based on the • VLIW architecture . • Internal memory includes a two-level cache architecture with 4kB of level 1 program cache (L1P), 4kB of level 1 data cache (L1D), and 64kB of RAM or level 2 cache for data/program allocation (L2). • It has a direct interface to both synchronous memories and asynchronous memories • • On-chip peripherals include two multichannel buffered serial ports (McBSPs),two timers, a 16-bit host port interface (HPI), and a 32-bit external memory interface (EMIF). • It requires 3.3V for I/O and 1.8V for the core (internal). • Internal buses – – – – – 32-bit program address bus 256-bit program data bus (eight 32-bit instructions), two 32-bit data address buses, two 64-bit data buses two 64-bit store data buses. • With a 32-bit address bus, the total memory space is 2^32 • = 4GB, including four external memory spaces: CE0, CE1, CE2, and CE3. 3-Access level of Memory Map 1. L1 Memory -Cache-based Architecture -Program Cache & Data Cache -Size : PC(4Kbyte), DC(4Kbyte) 2. L2 Memory - Size : 64Kbyte - Program & Data 3. L3 Memory External Memory Internal Memory • Independent memory banks on the C6x allow for two memory accesses within one instruction cycle. • Two independent memory banks can be accessed using two independent buses. • Two loads or two stores instructions can be performed in parallel. • No conﬂict results if the data accessed are in different memory banks. • Separate buses for program, data, and direct memory access (DMA) allow the C6x to perform concurrent program fetches, data read and write, and DMA operations. • C6x has a byte-addressable memory space. • Internal memory is organized as separate program and data memory spaces, with two 32bit internal ports (two 64-bit ports with the C64x) to access internal memory. • With a clock of 150MHz onboard the DSK, one can ideally achieve two multiplies and accumulates per cycle, for a total of 300 million multiplies and accumulates (MACs) per second. • With six of the eight functional units capable of handling ﬂoating-point operations, it is possible to perform 900 million ﬂoating-point operations per second (MFLOPS). • 1200 million instructions per second (MIPS) FUNCTIONAL UNITS • The CPU consists of eight independent functional units divided into two data paths • Each path has a unit for – multiply operations (.M), – logical and arithmetic operations (.L), – branch, bit manipulation, and arithmetic operations (.S), – loading/storing and arithmetic operations (.D). • The .S and .L units are for arithmetic, logical, and branch instructions. • All data transfers make use of the .D units. • The arithmetic operations, such as subtract or add (SUB or ADD), can be performed by all the units except the .M units. • The eight functional units consist of four floating/fixed-point ALUs (two .L and two .S), two fixed-point ALUs (.D units), and two floating/fixed-point multipliers (.M units). • Each path includes a set of sixteen 32-bit registers, A0 through A15 and B0 through B15. • Two cross-paths (1x and 2x) allow functional units from one data path to access a 32-bit operand from the register file on the opposite side. • Each functional unit side can access data from the registers on the opposite side using a cross-path. • There are 32 general purpose registers, but some of them are reserved for specific addressing or are used for conditional instructions. VelociTI™ • VLIW modification done by TI is called VelociTI – Reduces code size – Increases performance when instructions reside off-chip • C6X architecture is based on the high-performance advanced VelociTI very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI) • an excellent choice for multichannel and multifunction applications (Several instructions captured & processed simultaneously) VelociTI™ • VLIW modification done by TI is called VelociTI – Reduces code size – Increases performance when instructions reside off-chip • C6X architecture is based on the high-performance advanced VelociTI very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI) • an excellent choice for multichannel and multifunction applications (Several instructions captured & processed simultaneously) FETCH AND EXECUTE PACKETS • The architecture VELOCITI, introduced by TI, is derived from the VLIW architecture. • An execute packet (EP) consists of a group of instructions that can be executed in parallel within the same cycle time. • The number of EPs within a fetch packet (FP) can vary from one to eight • The VLIW architecture was modified to allow more than one EP to be included within an FP. • The least signiﬁcant bit of every 32-bit instruction is used to determine if the next or subsequent instruction belongs in the same EP (if 1) or is part of the next EP if 0). • EP1 contains the two parallel instructions A and B; EP2 contains the three parallel instructions C, D, and E; and EP3 contains the three parallel instructions F, G, and H. • Bit 0 (LSB) of each 32-bit instruction contains a “p” bit that signals whether it is in parallel with a subsequent instruction. • The “p” bit of instruction B is zero, denoting that it is not within the same EP as the subsequent instruction C. • Similarly, instruction E is not within the same EP as instruction F. Pipelining • Pipelining is a key feature in a digital signal processor to get parallel instructions working properly. • There are three stages of pipelining: – program fetch, decode, and execute. • non-pipelined scalar architecture - A processor that executes every instruction one after the other - may use processor resources inefficiently, potentially leading to poor performance. • pipelining - executing different sub-steps of sequential instructions simultaneously • superscalar architectures - executing multiple instructions entirely simultaneously • Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction throughput. • The throughput of the instruction pipeline is determined by how often an instruction exits the pipeline • If the stages are perfectly balanced, then the time per instruction on the pipelined machine is equal to Time per instruction on nonpipelined machine Number of pipe stages Program Fetch • The program fetch stage is composed of four phases: • (a) PG: program address generate (in the CPU) to fetch an address • (b) PS: program address send (to memory) to send the address • (c) PW: program address ready wait (memory read) to wait for data • (d) PR: program fetch packet receive (at the CPU) to read opcode from memory Decode Stage • The decode stage is composed of two phases: • (a) DP: to dispatch all the instructions within an FP to the appropriate functional units • (b) DC: instruction decode Execute Stage • The execute stage is composed of from six phases (with ﬁxed point) to 10 phases (with ﬂoating point), due to delays (latencies) associated with following instructions: • (a) Multiply instruction, which consists of two phases due to one delay • (b) Load instruction, which consists of ﬁve phases due to four delays • (c) Branch instruction, which consists of six phases due to ﬁve delays Pipeline phases Program fetch PG PS PW execute decode PR DP E1- E6 (E1-E10 for double DC precision) Pipelining effects Clock cycles 1 2 3 4 5 6 7 8 9 10 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 PG PS PW PR DP DC E1 E2 PG PS PW PR DP DC E1 PG PS PW PR DP DC PG PS PW PR DP PG PS PW PR • Each row represents an FP • PG of first FP starts in cycle 1,PG of second FP starts in cycle 2 and so on…. • Each FP has 4 phases for fetch ,2 phases for decode and execution phases can take from 1 to 10 phases • At cycle 7, instruction in the first FP are in the first execution phase E1, instruction in the second FP is in decoding phase, instruction in the third FP is in dispatching phase and so on….. All the instructions are proceeding through various phases Therefore pipeline is FULL • Most instructions have 1 execute phase • Multiply (MPY) has 2 Load (LDH/LDW) has 5 Branch (B) has 6 phases • Additional execute phases are associated with floating point and double precision type instructions (upto 10 phases) eg: MPYDP has 9 delay slots and a total 10 phases Functional unit latency: • The number of cycles that an instruction ties up a functional unit. • it is 1 for all instructions except double precision instructions • no other instructions can use the functional unit • it is different from delay slot eg: MPYDP has 4 functional unit latency but 9 delay slots delay slot: some instructions that are physically after the instruction are executed as if they were located before it. Classic examples are branch and call instructions, which often execute the following instruction before the branch or call is performed. Registers – The two register files each contain 16 32-bit registers for a total of 32 general-purpose registers (A0~A15, B0~B15) – Interaction with the CPU must be done through these registers – The four functional units on each side of the CPU can freely share the 16 registers belonging to that side. – two cross paths 1x and 2x connects all the registers on the other side – (which can access data from the register files on the opposite side.) – If register access is by functional units on the same side of the CPU, register file can service all the units in a single clock cycle • Registers A0, A1, B0, B1 are used as conditional registers. • Registers A4 through A7 and B4 through B7 are used for circular addressing. • Registers A0 through A9 and B0 through B9 (except B3) are temporary registers. • Any of the registers A10 through A15 and B10 through B15 used fsubroutine. • A 40-bit data value can be contained across a register pair. • The 32 least signiﬁcant bits (LSBs) are stored in the even register (e.g.,A2) and the remaining 8 bits are stored in the 8LSBs of the next-upper (odd) register (A3). • A similar scheme is used to hold a 64-bit double-precision value within a pair of registers (even and odd). Addressing modes • Determines how one access memory • Addressing refers to means to specify location of operands for instructions - types of addressing are called addressing modes - operands may be input operands for the operation as well as results of the operation • Addressing modes supported by the TMS320C67x include register-indirect, indexed register-indirect, and modulo addressing (circular addressing). Immediate data is also supported. • The TMS320C67x does not support modulo addressing for 64-bit data. • Immediate – The operand is part of the instruction • Register – The operand is specified in a register • Direct – The address of the operand is part of the instruction (added to imply memory page) • Indirect – The address of the operand is stored in a register ADD .L1 -13,A1,A6 (implied) ADD .L1 A7,A6,A7 not supported LDW .L1 *A5++,A1 Register-Indirect Addressing • Operand is located in memory address stored in a register • Special group of registers can be used to store addresses (address registers) Most important addressing mode in DSPs Efficient from instruction set point of view Few bits are needed to indicate address of operand 32 registers(A0-A15,B0-B15) are used as pointers • • • • • Indirect addressing uses ‘*’ in conjunction with one of the 32 registers 1. *R – register R contains address of a memory location where a data value is stored 2. *R++ (d) - register R contains memory address - after the memory address is used, R is postincremented such that new address is R+1 if d=1 - double minus (- -) update the address by d-1 3. * ++ R(d) - address is preincremented or offset by d - current address is R+d or R-d 4. * + R(d) - address is preincremented by d, such that the current address is R+d - however R pre increments without modification - unlike previous case, R is not updated or modified Delay Line implemented with shifting of sample Delay Line pointer manipulation using Circular Addressing Circular addressing • Circular addressing is used to create a circular buffer • Buffer is created in hardware and is very useful for applications like digital filtering • This addressing mode in conjunction with circular buffer updates samples by shifting data without creating overhead as in direct shifting • When pointer reaches bottom location, and when incremented the pointer is automatically wrapped around to the top location. • Two independent buffers are available using BK0 and BK1 within the AMR register • Registers A4-A7 and B4-B7 in conjunction with .D unit can be used as pointers • MVC (move constant) is the only instruction to access AMR and other control registers Circular Buffer At the beginning of each sample period, a new sample will be read into the circular buffer,overwriting the oldest sample. The newest sample x(n) will be stored at the memory location pointed at by auxiliary register AR(i). • The need of processing the digital signals in real time, evolves the concept of Circular Buffering. • Circular buffers are used to store the most recent values of a continually updated signal. • Circular buffering allows processors to access a block of data sequentially and then automatically wrap around to the beginning address exactly the pattern used to access coefficients in FIR filter. • Circular buffering also very helpful in implementing first-in, first-out buffers, commonly used for I/O and for FIR delay lines. • Most DSP Implement Circular addressing in hardware in order to conserve memory and minimizing software overhead. Addressing Mode Register (AMR) • For each of the eight registers (A4–A7, B4–B7) that can perform linear or circular addressing, the addressing mode register (AMR) specifies the addressing mode. • A 2-bit field for each register selects the address modification mode: linear (the default) or circular mode. • With circular addressing, the field also specifies which BK (block size) field to use for a circular buffer. • In addition, the buffer must be aligned on a byte boundary equal to the block size. AMR mode and description Mode 00 01 • • description for linear addressing for circular addressing using BK0 For circular addressing using BK1 reserved Block size = 2N+1 bytes Eg: MVK .S2 0X0004,B2 ; lower 16 bits to B2 MVKLH .S2 0x0005,B2 ; upper 16 bits to B2 The value 0x0004 =(0100) into 16 LSB of AMR sets bit 2 (third bit) to 1 and all other bits to zero. This sets the mode to 01 and selects register A5 as pointer to buffer using BK0 The value 0x0005 =(0101) into 16 MSB of AMR sets bits 16 and 18 to 1. This corresponds to value of N used to select size of buffer = 2 N+1 = 64 bytes using BKO Instruction set • They are designed to make maximum use of the processors’ resources and at the same time minimize the memory space required to store the instructions. • Minimizing the storage space ensures the cost effectiveness of the overall system. • To ensure the maximum use of hardware of the DSP, the instructions are designed to perform several parallel operations in a single instruction, typically including fetching of data in parallel with main arithmetic operation. Assembly Format • Label || [ ] Instruction Unit Operands ;comments • A label, if present, represents a specific address or memory location that contains an instruction or data. • The parallel bars (||) are there if the instruction is being executed in parallel with the previous instruction. • The subsequent field is optional to make the associated instruction conditional. Eg: [A2] specifies that the associated instruction executes if A2 is not zero. • On the other hand, with [!A2], the associated instruction executes if A2 is zero. 'C6x Instruction Set (by category) Arithmetic Logical ABS ADD ADDA ADDK ADD2 MPY MPYH NEG SMPY SMPYH SADD SAT SSUB SUB SUBA SUBC SUB2 ZERO AND CMPEQ CMPGT CMPLT NOT OR SHL SHR SSHL XOR Bit Mgmt CLR EXT LMBD NORM SET Data Mgmt LDB/H/W MV MVC MVK MVKL MVKH MVKLH STB/H/W Program Ctrl B IDLE NOP 'C6x Instruction Set (by unit) .L Unit .S Unit ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH MVKLH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO .M Unit MPY MPYH SMPY SMPYH Other NOP IDLE ABS ADD AND CMPE Q CMPG T CMPLT LMBD MV NEG NORM NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ADD ADDA LDB/H/W MV NEG STB/H/W SUB SUBA ZERO .D Unit ‘C67x Add’l Instructions (by unit) .S Unit ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP .L Unit ADDDP ADDSP DPINT DPSP INTDP INTDPU INTSP INTSPU SPINT SPTRUNC SUBSP SUBDP .D Unit .M Unit MPYSP MPYDP MPYI MPYID ADDAD LDDW Add/Subtract/Multiply • ADD .L1 A3,A7,A7 ;add A3 + A7 = A7 (accum in A7) • adds the values in registers A3 and A7 and places the result in register A7. • The unit .L1 is optional. If the destination or result is in B7, the unit would be .L2. • SUB .S1 A1,1,A1 ;subtract 1 from A1 • MPY .M2 A7,B7,B6 ;multiply 16 LSBs of A7,B7 => B6 • || MPYH .M1 A7,B7,A6 ;multiply 16MSBs of A7,B7 =>A6 • multiplies the lower or least significant 16 bits (LSBs) of both A7 and B7 and places the product in B6, in parallel with a second instruction that multiplies the higher or most significant 16 bits (MSBs) of A7 and B7 and places the result in A6. Load/Store • LDH .D2 *B2++,B7 ;load (B2) =>B7, increment B2 • || LDH .D1 *A2++,A7 ;load (A2) -> A7, increment A2 • The instruction LDW loads a 32-bit word. Two paths using D1 and .D2 allow for the loading of data from memory to registers A and B using the instruction LDW. • The double-word load floating-point instruction LDDW on the C6711 can simultaneously load two 32-bit registers into side A and two 32-bit registers into side B. Store • STW .D1 A1,*+A4 ;store A1->(A4) offset by 20 • The address register A4 is preincremented with offset, but it is not modified (two plus signs are used if A4 is to be modified). Branch Assembler Directive • An assembler directive is a message for the assembler and is not an instruction. • It is resolved during the assembling process and does not occupy memory space as an instruction does. • It does not produce executable code. • 1) .short: to initialize a 16-bit integer. • 2) .int: to initialize a 32-bit integer (also .word or .long). • 3) .float: to initialize a 32-bit IEEE singleprecision constant. • 4) .double: to initialize a 64-bit IEEE doubleprecision constant. ASM STATEMENT WITHIN C • Assembly instructions and directives can be incorporated within a C program using the asm statement. • The syntax is asm (“assembly code”); • The assembly line of code within the set of quotes has the same format as a valid assembly statement. • If the instruction has a label, the first character of the label must start after the first quote so that it is in column 1. • The assembly statement should be valid since the compiler does not check it for syntax error but copies it directly into the compiled output file. • If the assembly statement has a syntax error, the assembler would detect it. C-CALLABLE ASSEMBLY FUNCTION • Register B3 is preserved and is used to contain the return address of the calling function. • An external declaration of an assembly function called within a C program using extern is optional. • For example, extern int func(); • is optional with the assembly function func returning an integer value. Timer • Two 32-bit timers can be used to time and count events or to interrupt the CPU. • A timer can direct an external ADC to start conversion or the DMA controller to start a data transfer. • Registers – Time period register, - specifies the timer’s frequency – Timer counter register, - contains the value of the incrementing counter; – Timer control register, - monitors the timer’s status. Timer • The ’C67x has two 32-bit general-purpose timers that can be used to: – Time events – Count events – Generate pulses – Interrupt the CPU – Send synchronization events to the DMA controller • The timer works in one of the two signaling modes depending on whether clocked by an internal or an external source. • The timer has an input pin (TINP) and an output pin (TOUT). • The TINP pin can be used as a general purpose input, and the TOUT pin can be used as a general-purpose output. • When an internal clock is provided, the timer generates timing sequences to trigger peripheral or external devices such as DMA controller or A/D converter respectively. • When an external clock is provided, the timer can count external events and interrupt the CPU after a specified number of events. Interrupts The C6711device supports 16 prioritized interrupts Types of interrupts: • Reset • Maskable • Non maskable Interrupt process • An interrupt can be issued internally or externally. • An interrupt stops the current CPU process so that it can perform a required task initiated by the interrupt. • The program flow is redirected to an interrupt service routine (ISR). • The conditions of the current process must be saved so that they can be restored after the interrupt task is performed. • On interrupt, registers are saved and processing continues to an ISR. Then the registers are restored. • Reset (RESET) Reset is the highest priority interrupt and is used to halt the CPU and return it to a known state. The reset interrupt is unique in a number of ways: - RESET is an active-low signal. All other interrupts are active-high signals. - RESET must be held low for 10 clock cycles before it goes high again to reinitialize the CPU properly. - The instruction execution in progress is aborted and all registers are returned to their default states. - • Nonmaskable Interrupt (NMI) - NMI is the second-highest priority interrupt - generally used to alert the CPU of a serious hardware problem such as imminent power failure. - For NMI processing to occur, the non maskable interrupt enable (NMIE) bit in the interrupt enable register must be set to 1. Maskable interrupt process • • • • 1. The GIE bit is set to 1. 2. The NMIE bit is set to 1. 3. The appropriate IE bit is set to 1. 4. The corresponding IFR bit is set to 1. • Maskable Interrupts (INT4−INT15) - These have lower priority than the NMI and reset interrupts. - These interrupts can be associated with external devices, on-chip peripherals, software control etc. • The interrupt source for interrupts 4-15 can be programmed by modifying the selector value (binary value) in the corresponding fields of the Interrupt • CSR (control status register): contains the global interrupt enable (GIE) bit and other control/status bits • IER (interrupt enable register): enables/disables individual interrupts • IFR (interrupt flag register): displays status of interrupts • ISR (interrupt set register): sets pending interrupts • ICR (interrupt clear register): clears pending interrupts • ISTP (interrupt service table pointer): locates an ISR • IRP (interrupt return pointer) • NRP (nonmaskable interrupt return pointer) Interrupt Acknowledgment • The signals IACK and INUMx (INUM0 through INUM3) are pins on the C6x that acknowledge an interrupt has occurred and is being processed. • The four INUMx signals indicate the number of the interrupt being processed. • INUM3 = 1 (MSB), INUM2 = 0, INUM1 = 1, INUM0 = 1 (LSB) corresponds to (1011)b = 11, indicating that INT11 is being processed. • The IE11 bit is set to 1 to enable INT11. • The interrupt flag register (IFR) can be read to verify that bit IF11 is set to 1. • Writing a 1 to a bit in the interrupt set register (ISR) causes the corresponding interrupt flag to be set in IFR; whereas a 1 to a bit in the interrupt clear register (ICR) causes the corresponding interrupt to be cleared. Multichannel Buffered Serial Port (McBSP) • The standard serial port interface provides: – Full-duplex communication – Double-buffered data registers, which allow a continuous data stream – Independent framing and clocking for reception and transmission – Direct interface to industry-standard codecs, analog interface chips (AICs), and other serially connected A/D and D/A devices - Multi channel transmission and reception of up to 128 channels. – An element sizes of 8, 12, 16, 20, 24, or 32-bit. - 8-bit data transfers with LSB or MSB first. • The McBSP consists of a data path and a control path that connect to external devices. • Separate pins for transmission and reception communicate data to these external devices. • Four other pins communicate control information (clocking and frame synchronization). • The device communicates to the McBSP using 32-bit-wide control and data registers accessible via the internal peripheral bus. Pin Description CLKR CLKX CLKS DR DX FSR FSX Receive clock Transmit clock External clock Received serial data Transmitted serial data Receive frame synchronization Transmit frame synchronization • CPU or DMA write the DATA to be transmitted to the Data transmit register (DXR) which is shifted out to DX via the transmit shift register (XSR). • Similarly, receive data on the DR pin is shifted into the receive shift register (RSR) and copied into the receive buffer register (RBR). • RBR is then copied to DRR, which can be read by the CPU or the DMA controller. • This allows internal data movement and external data communications simultaneously. • The following control registers are used in multichannel operation: The multi channel control register (MCR) The transmit channel enable register (XCER) The receive channel enable register (RCER) • Other registers for clock generation, frame synchronization and control are: serial port control register (SPCR) receive control register (RCR) transmit control register (XCR) pin control register (PCR) Sample rate generator register (SRGR) DMA • Direct Memory Access transfers data to or from the processor’s memory without the involvement of the processor itself. • DMA is commonly used to provide improved performance with input/output devices. • Rather than have the processor read data from an I/O device and copy the data into memory or vice versa, a separate DMA controller can handle such transfers in parallel. • The processor loads the DMA controller with control information including the starting address for the transfer, the number of words to be transferred, the source and the destination. • The DMA controller uses the bus request pin to notify the DSP core that it is ready to make a transfer to or from external memory. • The DSP core completes its current instruction, releases control of external memory and signals the DMA controller via the bus grant pin that the DMA transfer can proceed. • The DMA controller then transfers the specified number of data words and optionally signals completion through an interrupt. • Some processor can also have multiple channels DMA managing DMA transfers in parallel. Data Allocation • Blocks of code and data can be allocated in memory within sections specified in • the linker command file. These sections can be either initialized or uninitialized. • Initialized or uninitialized sections, except .text, cannot be allocated into internal • program memory. • • • • • • • • • • The initialized sections are: 1. .cinit: for global and static variables 2. .const: for global and static constant variables 3. .switch: contains jump tables for large switch statements 4. .text: for executable code and constants The uninitialized sections are: 1. .bss: for global and static variables 2. .far: for global and static variables declared far 3. .stack: allocates memory for the system stack 4. .sysmem: reserves space for dynamic memory allocation used by the malloc, calloc, and realloc functions Data Alignment • The C6x always accesses aligned data which allows it to address bytes, half-words, and words (32 bits) Control Register File Addressing mode register (AMR) - specifies the addressing mode Control status register (CSR) - contains control and status bits. Interrupt clear register (ICR) - allows you to manually clear the maskable interrupts INT4) in the interrupt flag register (IFR). - Writing a 1 to any of the bits in ICR causes the corresponding interrupt flag (IFn) to be cleared in IFR. - Writing a 0 to any bit in ICR has no effect. - You cannot set any bit in ICR to affect NMI or reset. Interrupt enable register (IER) - enables and disables individual interrupts. (INT15- The interrupt flag register (IFR) - contains the status of INT4-INT15 and NMI interrupt. - Each corresponding bit in the IFR is set to 1 when that interrupt occurs; otherwise, the bits are cleared to 0. - If you want to check the status of interrupts, use the MVC instruction to read the IFR. The interrupt return pointer register (IRP) - contains the return pointer that directs the CPU to the proper location to continue program execution after processing a maskable interrupt. - A branch using the address in IRP (B IRP) in your interrupt service routine returns to the program flow when interrupt servicing is complete. The interrupt set register (ISR) - allows you to manually set the maskable interrupts (INT15INT4) in the interrupt flag register (IFR). - Writing a 1 to any of the its in ISR causes the corresponding interrupt flag (IFn) to be set in IFR. - Writing a 0 to any bit in ISR has no effect. - You cannot set any bit in ISR to affect NMI or reset. The interrupt service table pointer register (ISTP) - is used to locate the interrupt service routine (ISR). The NMI return pointer register (NRP) - contains the return pointer that directs the CPU to the proper location to continue program execution after NMI processing. - A branch using the address in NRP (B NRP) in your interrupt service routine returns to the program flow when NMI servicing is complete. The E1 phase program counter (PCE1) - contains the 32-bit address of the fetch packet in the E1 pipeline phase.