Connection Machine - University of Virginia

Connection Machine
Greg Faust, Mike Gibson, Sal Valente
CS-6354 Computer Architecture
Fall 2009
Historic Timeline
1981: MIT AI-Lab Technical Memo on CM
1982: Thinking Machines Inc. Founded
1985: Danny Hillis wins ACM “Best PhD” Award
1986: CM-1 Ships
1987: CM-2 Ships
1991: CM-5 Announced
1991: CM-5 Ships
1994: TMI Chapter 11 – Sun/Oracle pick bones
Heavily DARPA funded/backed
$16M+ Direct Contracts plus subsidized CM sales
Involved Notables
Danny Hillis – CM inventor and TMI Founder
Charles Leiserson – Fat tree inventor
Richard Feynman – Noble Prize winning Physicist
Marvin Minsky – MIT AI Lab “Visionary”
Guy Steele – Common Lisp, Grace Hopper Award
Stephen Wolfram – Mathematica inventor
Doug Lenat – Mind/Body problem philosopher
Greg Papadopoulos – MIT Media lab, Sun CTO
various others
CM-1 and CM-2 Architecture
Original design goal to support neuron like simulations
Up to 64K single bit processors (actually 3 bits in and 2 out)
16 Processors/chip, 32chips/PCB, 16 PCBs/cube, 8cubes/hypercube
Hypercube architecture – Each 16-Proc chip a hyper-node
Each proc has 4K bits of bit addressable RAM
– Distributed Physical Memory
– Global Memory Addresses
Up to 4 front-end computers talk to sequencers via 4x4 crossbar
“Sequencers” issue SIMD instructions over a Broadcast Network
Bit procs communicate via 2D local HW grid connections (“NEWS”)
Bit procs communicate via hypercube network using MSG passing
Lots of Twinkling Lights!!
CM-1 CM-2 Architecture
CM-1 and CM-2 Programming
• ISA supports:
– Bit-oriented operations
– Arbitrary precision multi-bit scalar Ops
using bit-serial implementation on bit procs
– Full Multi-Dimensional Vector Ops
• “Virtual Processor” idea similar to CUDA threads
but they are statically allocated
• OS and Programming Tools run on front-ends
• *Lisp as the initial programming language
• Later C* and CM-Fortran
CM-2 Improvements
1 Weitek IEEE FP coprocessor per 32 1-bit procs
Up to 256K bits of memory per processor
Added ECC to Memory
Implemented the IO subsystem
– Up to 80 GByte RAID array called “Data Vault”
uses 39 Striped Disks and ECC, plus spare disks on standby
– High Speed Graphics Output
• En-route MSG combining in H-Cube router
• New implementation of Multi-Dimensional
NEWS on top of H-Cube (special addressing mode)
CM-1 Photo
CM-5 vs CM-1 and CM-2
Significant departure from CM-1 and CM-2
Targeted at more scientific and business applications
More Commercial Off-The-Shelf components (“COTS”)
Large Array of SPARC Processing Nodes
– 1-bit processors are abandoned
• Abandoned “NEWS” Grid and Hyper-Cube Networks
• Delivered 1024 node machine,
with claims 16K nodes possible
• Even More Twinkling Lights!
CM-5 Photo – Watch it Blink
CM-5 Overall Architecture
• "Coordinated Homogeneous Array
of RISC Processors“ or “CHARM”
• Asymmetric CoProcessors Model
– Large Array of Processor Nodes
– Small Collection of Control Nodes
• 2 Separate scalable networks
– One for data
– One for control and synchronization
• Still uses striped RAID for high disk BandWidth
Division of Labor
• Processor Nodes can be assigned to a “Partition”
• One Control Node per Partition
• Control Node runs scalar code,
then broadcasts parallel work to Processor Nodes
• Processor Nodes receive a program,
not an instruction stream, have own Program Counter
• Processor nodes can access other node's memory by
reading or writing a global memory address
• Processor Nodes also communicate via MSG passing
• Processor Nodes cannot issue system calls
Control Nodes
Full Sun Workstations
Running UNIX
Connected to the “Outside World”
Handles Partition Time Sharing
Connected to both data and control networks
Performs System Diagnostics
Processor Nodes
• Nodes are a 5-chip microprocessor
– Off the Shelf SPARC processor @ 40 MHz
– 32MBytes local node memory
– Multi-port memory controller for added BW
– “Caching techniques do not perform as
well on large parallel machines”
– Proprietary 4-FPU Vector coprocessor
– Proprietary network controller
CM-5 Processor Node Diagram
Data Network Architecture
• Point to Point Inter-node communication and I/O
• Implemented as a Fat Tree
– Fat Trees invented by TMI employee Charles Leiserson
Claim: Onsite BandWidth Expandable
Delivering 5GB/sec Bisection BW on 1024 node machine
Data router chip is a 8x8 crossbar switch
Faulty nodes are mapped out of network
– Programs can not assume a network topology
• Network can be flushed when Time Share swaps occur
• Network, not processors, guarantee end to end delivery
Fat Tree Structure
Separate Control Network
Synchronization & control network
Complete Binary Tree organization
Provides broadcast capability
Implements barrier operations
Implements interrupts for timesharing
Performs reduction operators
(Sum, Max, AND, OR, Count, etc)
CM-5 Programming
• Supports multiple Parallel High Level Languages
and Programming Styles
– Including Data Parallel Model from CM-1 and CM-2
• Goal: Hide many decisions from programmers
– CM-1, CM-2 vs CM-5 ISA changes
– Use of Processor Node CPU vs Vector CoProcessors
– Partition Wide Synchronizations generate by Compiler
– “Globally Synchronized MIMD”
Sample CM Apps
• Machine Learning
– Neural Nets, concept clustering, genetic algorithms
VLSI Design
Geophysics (Oil Exploration), Plate Tectonics
Particle Simulation
Fluid Flow Simulation
Computer Vision
Computer Graphics , Animation
Protein Sequence Matching
Global Climate Model Simulation
Danny Hillis PhD: The Connection Machine
Inc: The Rise and Fall of Thinking Machines
Wiki: Connection Machine
ACM: The CM-5 Connection Machine
ACM: The Network Architecture of the CM-5
IEEE: Architecture and Applications of the Connection
• IEEE: Fat-trees: universal networks for hardware-efficient
• Encyclopedia of Computer Science and Technology

similar documents