Secured Memory Bus Fighting probing attacks to ensure confidentiality and integrity of softwares and datas K. Khalfallah, ENST/LabSoC Aug 04 – Jun 05 17/07/2015 GET - ENST - LabSoC 1 Introduction example (1/2) What is a probing attack ? Foo ® : new efficient video decoder on mobile phones Bar ® : wants the algorithm to implement on its own terminals Bar : one Foo terminal + one video play + one bus probing + one disassembly pass + a few days = a stolen secret CPU External memory Memory Bus Logic analyzer 17/07/2015 GET - ENST - LabSoC Binary frame 2 Introduction example (2/2) Binary frame .section _start 0x1004 add r0, 1 0x1008 jmp start 0x100c mul r1, r2 Assembly language if (i == VAL) { result++ } else { switch (i) ... } High level description Algorithm reverse engineering 17/07/2015 GET - ENST - LabSoC 3 Different kinds of malicious probing attacks Ensure confidentiality of software IPs Prevent any hacker from reverse-engineer binary code Done Ensure confidentiality of sensible datas Prevent one from sampling sensible datas on-the-fly when they leave CPU to memory More difficult for datas than for softwares : instructions are only fetched, whereas datas move back from CPU to external memory. Ensure integrity of both softwares and datas : make external memory reliable to CPU To do forbid spatial permutation (a word from external memory substitued to another) forbid temporal permutation, or replay (an obsolete value given instead of new one) 17/07/2015 GET - ENST - LabSoC 4 Chosen platform : Leon SoC (SPARC) on-chip off-chip 17/07/2015 GET - ENST - LabSoC 5 Digression : what’s an FPGA ? A specific circuit, the specificity of which is to be programmable ASICs are not programmable, FPGAs are. A quarter of a tenth of an Altera Stratix FPGA S40 A B A.B 2 technologies In an FPGA : tens of transistors Inconvenience/ASIC 17/07/2015 In an ASIC : 4 transistors : speed & surface overhead GET - ENST - LabSoC 6 Leon SoC on FPGA Stratix S40 (2/2) ≈20% of logic cells 4% of on-chip memory 22% of pins SRAM SDRAM 17/07/2015 GET - ENST - LabSoC 7 Software confidentiality (1/6) I CPU D External memory Instructions are only fetched from memory to CPU (whereas for datas transfers apply in both directions) Therefore ciphering) 17/07/2015 static ciphering is sufficient (no need of dynamic GET - ENST - LabSoC 8 Software confidentiality (2/6) Main assumptions All on-chip hardware is safe All off-chip hardware is unreliable All cryptologic algorithms are public : secret is only held by private keys Mandatory : no self-modifying code We use MDC2 hash function to encrypt memory bus @50MHz, MDC2 can be processed in 8 cycles with a silicon surface < 50% of CPU Chip integration allows for supplementary on-chip caches & hash-logic (reminder : basic Leon CPU = 20% of logic cells in S40) 17/07/2015 GET - ENST - LabSoC 9 Software confidentiality (3/6) Three implementation choices I. II. SDRAMs are slower than SRAMs, but also cheaper : we use their latency to process deciphering in parallel of fetching instructions We take advantage of burst cache-feeding at instruction fetch time : I-cache MMU 1 word query 4 instructions SDRAM 17/07/2015 GET - ENST - LabSoC 10 Software confidentiality (4/6) Three implementation choices (continued) We use virtual address together with private key as a seed to begin an MDC2 hash computation before instructions have returned (performance++) III. Instructions being saved in I-cache, they are deciphered only once We do not by-pass MMU work since we use virtual addresses I-cache 28 bits virtual address 128 bits seed MDC2 hash 100 bits key MMU 4 plain instructions Actual deciphering (subpart of AES) 4 ciphered instructions SDRAM 17/07/2015 GET - ENST - LabSoC 11 Software confidentiality (5/6) FPU Integer unit CP Local RAM PCI hash I-cache D-cache Local RAM MDC2 (hash function) virtual address Deciphering Plain instruction MMU Ethernet AHB bus Ciphered instruction AHB arbiter/ctler APB bus Memory Controler on-chip (reliable) off-chip (not reliable) UARTs PROM 17/07/2015 I/O Timers SDRAM GET - ENST - LabSoC AHB/APB bridge IrqCtrl I/O ports SRAM 12 Software confidentiality (6/6) IU input clock Instruction was ready here We sampled it there 8 cycles Time loss : 3 cycles Hash Plain instructions Ciph. instructions 5 cycles I-Cache input 17/07/2015 GET - ENST - LabSoC 13 Leon SoC with Secured Memory Bus on FPGA Stratix S40 (1/2) ≈35% ☹) (+15%) of logic cells (but one mistake remaining Optimizable +0% memory +0% pins 17/07/2015 GET - ENST - LabSoC 14 Data confidentiality (1/3) I CPU D External memory Datas are transfered in both directions Therefore static deciphering is insufficient We also need dynamic ciphering 17/07/2015 GET - ENST - LabSoC 15 Data confidentiality (2/3) I CPU D External memory Example : user wants to listen to <file.mp3> CPU or DMA downloads ciphered file from disk and uploads it into memory CPU has not enough ressources to entirely on-line-decompress <file.mp3> into <file.sound> (even streaming can’t) Therefore, plain data would have to pass on the bus security flaw 17/07/2015 GET - ENST - LabSoC 16 Data confidentiality (3/3) FPU CP Local RAM Plain data I-cache D-cache Local RAM virtual address Integer unit Deciphering PCI Ciphering Ethernet MMU AHB bus Ciphered datas AHB arbiter/ctler APB bus Memory Controler on-chip (reliable) off-chip (not reliable) UARTs PROM 17/07/2015 I/O Timers SDRAM GET - ENST - LabSoC AHB/APB bridge IrqCtrl I/O ports SRAM 17 Memory Integrity (1/5) Example (reminder) : for (i=0;i<j;i++) { *p++ } If one achieves a man-in-the-middle attack… … either masks writing new values of i into memory …or always answers 0 to a read of i Loop will never end : memory would become entirely readable (!) The reason of that : even if memory content has been made confidential, one can replay old datas We must ensure memory content integrity But we cannot hash all memory for each access made by CPU ! We use hierachical Merkle Trees [Keryell, 2003] [Blum & al 1989] 17/07/2015 GET - ENST - LabSoC 18 Memory Integrity (2/5) k m0 m1 One way hash function CPU k2,0 Safe (on-chip) k1,0 k1,1 k0,0 k0,1 Overhead Information in memory k0,2 k0,3 Words to ensure integrity of m0 m1 m2 m3 m4 m5 m6 m7 thier addresses in memory a0 a1 a2 a3 a4 a5 a6 a7 17/07/2015 GET - ENST - LabSoC 19 Memory Integrity (3/5) CPU k2,0 k1,0 k1,1 k0,0 k0,1 Overhead Information in memory k0,2 k0,3 Words to ensure integrity of m0 m1 m2 m’3 m4 m5 m6 m7 thier addresses in memory a0 a1 a2 a3 a4 a5 a6 a7 m’3 17/07/2015 GET - ENST - LabSoC Old value of a3 : has the same signature as m3 20 Memory Integrity (4/5) This is equivalent to hash complete memory, with the advantage of hierarchical structure : no need to read all memory words to check integrity, only a few of them O(log(n)) We shall cache a number of tree stages inside chip (hierarchical tree : if a node is correct, all datas under it are too) We shall use quaternary trees to reduce the number of stages 17/07/2015 GET - ENST - LabSoC 21 Memory Integrity (5/5) CPU Safe (cached on-chip) Overhead Information in memory Words to ensure integrity of 17/07/2015 GET - ENST - LabSoC 22 Conclusion : several countermeasures for several kinds of malicious attacks steal IP (force confidentiality) steal software passive attack (just listen to CPU) active attack (lie to CPU) 17/07/2015 steal datas machine misuse (force integrity) static both ciphering dynamic ciphering dynamic deciphering deciphering ensure memory integrity (revoke third-party injected datas) GET - ENST - LabSoC 23