### Routing - ECE Users Pages - Georgia Institute of Technology

```Routing Algorithms- I
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated)
• Sections 4.1-4.5
• Alternative: relevant papers.
 References provided
ECE 8813a (2)
Overview
• Separate the routing decisions from
implementation
 Choice of path vs.
 Performance
o
E.g., buffering, route computation implementation
• Routing Algorithm fixed by
 Routing function
 Selection function
A single function for deterministic protocols
• Primary correctness concern
 Deadlock and livelock freedom
• Focus in key principles
 Not coverage of all algorithms designed to date
ECE 8813a (3)
Virtual Channel Router
S
Note: Separation of physical and
virtual paths, e.g., shortest path
may traverse multiple virtual
networks
D
ECE 8813a (4)
Taxonomy
Routing Algorithms
# Destinations
Routing Decisions
Implementation
multicast
Unicast
Centralized
Source
Distributed
Multi-phase
Table Lookup
Finite State Machine
Deterministic
Progressiveness
Progressive
Backtracking
Minimality
Profitable
Misrouting
Complete
Partial
Number of Paths
ECE 8813a (5)
Classes of Routing Algorithms
• Deterministic and Oblivious Routing
 Deterministic protocols are oblivious
o
Converse may not be true
 Variety of implementations
o
Source Routing, finite state machines, interval routing,
table-lookup
• Fully, maximal, and true fully adaptive
 Fully: maximize alternative physical paths
 Maximal: maximize all routing options
 True: no constraints on VC usage
• Non-minimal Routing
• Hybrid or multi-phase protocols
ECE 8813a (6)
Deterministic Routing Algorithms
Mesh
Binary Hypercube
• Strictly increasing or decreasing order of
dimension
• Routing function always returns the same
output channel for each destination
Deterministic routing
ECE 8813a (7)
Tori
c0
n0
n1
c1
c3
n3
n2
c2
c10
c10
n0
c13
c03
c11
c01
c03
n1
c00
• Dimension order routing
• Create an acyclic channel
dependency graph with the
following routing function - c0i
when j<i, c1i when j >i
c01
c11
c02
c12
n3
n2
c02
Deterministic routing
c12
ECE 8813a (8)
Physical Router
Distributed Routing
Finite state
machine or Tablelook up
Deterministic routing
ECE 8813a (9)
Virtual Channel Router
Distributed Routing
Finite state
machine or
Table-look up
Performance
constraints
Remember, we can have virtual lanes
Deterministic routing
ECE 8813a (10)
Deterministic Routing Algorithms:
Implementation Issues
• Relatively, the most inexpensive to implement
 Each node implements a routing function
o
Shared or private logic across channels
 Absolute vs. relative addressing
 Necessary for relative addressing
 Necessary to maintain uniformity of implementation
• Implementation
 Table look-up vs. finite state machine
Deterministic routing
ECE 8813a (11)
Source Routing
W N E
index
output port
• All routes are pre-computed at the source
 Stored as a sequence of port addresses at
intermediate routers
• Each intermediate router uses the header flit to
identify the output port
 Can be extended to use virtual channels
Oblivious routing
ECE 8813a (12)
Characteristics
• Simple fast, route computation at intermediate
routers
• Oblivious routing
 The source-destination pair does not need to identify
a unique path  not deterministic
• Complexity of computing paths
 Ensuring deadlock freedom?
• Extensions to virtual channels
Oblivious routing
We will come
back to this
ECE 8813a (13)
Filling in the Routing Tables
index at next
node
index
N
E
4
1
S
S
0
1
• How do you find deadlock free paths?
 Non-trivial problem
• Short-cuts
 Obtain a channel dependency graph for a specific
algorithm
 Find paths in channel dependency graphs!
 End with virtual channel assignment
Oblivious routing
From M. Kinsy, et.al., ”Application Aware, Deadlock Free Oblivious
ECE 8813a (14)
Routing,” Proceedings of ISCA 2009
Application-Specific Oblivious Routing
Channel Dependence Graph
Two different Channel Dependency Graphs
Flow Graph
Oblivious routing
From M. Kinsy, et.al., ”Application Aware, Deadlock Free Oblivious
ECE 8813a (15)
Routing,” Proceedings of ISCA 2009
Interval Routing
0
1
•
5
8
9
12
13
2
6
10
14
3
7
11
15
Channel
Node
Interval
X+
6
8-15
X-
6
0-3
Y+
6
4-5
Y-
6
7
Each output link corresponds to an interval of nodes


•
4
Union of intervals at a node is the set of all destination nodes
Must be able to distinguish invalid intervals (at the edges of a
mesh)
Use overlapping intervals for fault tolerance
Oblivious routing
J.V. Leeuwen and R. B. Tan, “Interval Routing,” The Computer Journal,
ECE 8813a (16)
vol. 30, no.4, 1987
Partially Adaptive Routing Algorithms
• Trade-off between hardware resources and
 Maximize adaptivity for given resources
 Minimize resources required for a given level of
• Typically exploit regular topologies
Routing
A. A. Chien and J. H. Kim, “Planar Adaptive Networks: Low Cost
Adaptive Networks for Multiprocessors, ISCA 1992
ECE 8813a (17)
A0
A0
A1
A1
A2
• Packets are routed adaptively in a series of two
dimensional planes
 Order of planes (dimensions) is arbitrary
• Routing in two dimension uses two virtual
networks
 Increasing and decreasing networks
ECE 8813a (18)
Adaptive Routing in Two Dimensions
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
10
11
8
9
10
11
12
13
14
15
12
13
14
15
Increasing Network
Decreasing Network
Di+1
Di
ECE 8813a (19)
PAR in Multidimensional Networks
Dimension i+2
A0
Dimension i
Dimension i+1
•
•
Routing is fully
adaptive in a plane
When can you skip a
plane?
A1
ECE 8813a (20)
PAR Properties
• Each plane is comprised of the following
channels
Ai = di,2 + di+1,0 + di+1,1
• Three virtual channels/link in meshes and six
virtual channels/link in Tori
ECE 8813a (21)
The Turn Model
abstract cycles
XY routing
west-first routing
• What is a turn?
 From one dimension to another : 90 degree turn
 To another virtual channel in the same direction: 0
degree turn
 To the reverse direction: 180 degree turn
• Turns combine to form cycles
• Goal: prohibit the least number of turns to
break all possible cycles
Routing
C. J. Glass and L. Ni, “The Turn Model for Adaptive Routing,” ISCA 1992
ECE 8813a (22)
Turn Constraints
• Choice of prohibited turns is not arbitrary
equivalent
• Alternative designs
 Three combinations unique (within symmetry)
o
Three algorithms: west-first, north-last, negative-first
ECE 8813a (23)
West First Routing
deterministic region
• Fully adaptive to the east, deterministic to the
west (for non-minimal routing)
• Non-minimal is partially adaptive to the west
ECE 8813a (24)
Generalization of the Turn Model
Application to Binary hypercubes
•Base is e-cube routing
•Identify up channels and down
channels
•Adaptively route though one class
and then the other
•Turns prohibited from one class
to the other
• Identify channel classes and prohibit turns
between them
• Cycles are infeasible within a channel class
• Transitions between channel classes are acyclic
• Partially adaptive: number of shortest paths
are reduced
ECE 8813a (25)
P-Cube Routing
1111
0010
down channel
up channel
1101
0000
• For a given source destination pair compute
the up channel set and down channel set
• Restrict turns from one set to the other
• Adaptively route within a set
• The escape path corresponds to some fixed
order which remains a subset of channels
ECE 8813a (26)
• Using all available paths
 Minimal vs. non-minimal
• Distinguish between buffer-based schemes vs.
channel-based schemes
 Typically the former can be translated to the latter
ECE 8813a (27)
Structured Buffer Pools
buffer D-1
buffer 1
buffer 0
• Positive hop algorithm




D+1 buffers at each node (D = diameter)
Nodes request/use buffers in strictly increasing order
Minimal path algorithm, valid for any topology
Large buffer requirements: O(Diameter)
Routing
I. S. Gopal, “Prevention of Store and Forward Deadlock in Computer
Networks,” IEEE Transactions on Communication, December 1985. ECE 8813a (28)
Structured Buffer Pools: Extension
set 0
set 1
subset 0
subset 1
subset S-1
• Negative hop algorithm
 Partition nodes into non-adjacent subsets
 Order subsets
 Down transitions request higher numbered buffer,
else same numbered buffer
 Number of buffers required in each node is given by
 D( S  1)  + 1
 S 


ECE 8813a (29)
Extensions to Wormhole Switching
Acyclic central buffer
dependencies are
transformed into
acyclic virtual channel
dependencies
•
•
replace central buffers with
equivalent number of
virtual channels across
each physical channel
Basic version produces unbalanced use of virtual channels

distance rarely equal to diameter


Number is equal to unused hops: diameter - #req
Use this number to increase the number of choices of virtual
channels at any node
Not the same as adaptivity
Extension: bonus cards.

ECE 8813a (30)
Virtual Networks
• Establish multiple virtual networks
 Each network works for a specific destination set
 Routing functions in a network are acyclic, but are
typically not connected
• Establish routing constraints between virtual
networks
• Simple customized protocols
 Expensive in terms of virtual channels
ECE 8813a (31)
Virtual Networks in a Mesh
Y
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
10
11
8
9
10
11
12
13
14
15
12
13
14
15
X
X+Y+
•
Each virtual network is constructed to have acyclic
channel dependencies

•
•
•
X-Y+
Routing function in a virtual network is not connected
Packets are injected into the appropriate virtual network
Fully adaptive, no transitions between networks
2n virtual networks with (n.2n) virtual channels/node
ECE 8813a (32)
References
• C.R. Jesshope, P.R. Miller, and J.T. Yantchev,
“High performance communications in
processor networks,'‘ Proceedings of the 16th
International Symposium on Computer
Architecture, pp.150-157, May-June 1989
• D.H. Linder and J.C. Harden, ``An adaptive
and fault tolerant wormhole routing strategy
for k-ary n-cubes,'‘ IEEE Transactions on
Computers, vol. C-40, no. 1, pp.2-12, January
1991
ECE 8813a (33)
Virtual Networks in a Mesh
Y
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
10
11
8
9
10
11
12
13
14
15
12
13
14
15
X
X+Y+
•
•
•
X-Y+
Extensions to 3D
Extensions to tori and irregular topologies
• Couple with source routing?
Extensions to sub-topologies
• Limit adaptivity to a subset of nodes
ECE 8813a (34)
Optimize
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
10
11
8
9
10
11
12
13
14
15
12
13
14
15
Virtual network 0
•
•
Virtual network 1
Reduce the number of networks by 50% with one
additional channel in dimension 0
Total number of channels/node

Number of channels in each network + extra channel in
dimensions 0 (in 2n1 networks
n  2n1  2n1  (n  1)2n1
ECE 8813a (35)
Optimize
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
10
11
8
9
10
11
12
13
14
15
12
13
14
15
Virtual network 0
•
Virtual network 1
Extensions to k-ary n-cubes by introducing levels in each
virtual network



One level for each wrap-around channel
At most n levels are traversed (in addition to the original
level)
(n+1).2(n-1) virtual channels/physical channel for
dimensions >0
ECE 8813a (36)
Extensions to Tori
• Addition of layers
(virtual networks)
for each wraparound
connection
• Number of layers
increases by
#dimensions
ECE 8813a (37)
Using Dynamic Message Dependencies
• Two virtual channel classes across each
 Adaptive channels & deterministic channels
• Fully adaptive use of adaptive channels
 Keep track of #dimension reversals for each message
o
Moving from a dimension p to a lower dimension q
 Label each channel with the DR# of the message
 Messages cannot block on a channel with lower DR#
o
If no channel available, permanent transition to the
deterministic channel
 Dependencies between messages are acyclic
W.J. Dally and H. Aoki, “Deadlock-free adaptive routing in multicomputer networks using
virtual channels,” IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 4, pp.
ECE 8813a (38)
466-475, April 1993.
Summary of Design Techniques
• Ordered use of topological features
 Dimensions and paths
• Ordered use of resources
 Buffers/channels, and networks
• Order the message population
 Each message is uniquely identified by some
attribute, e.g., number of wrap-around channel
crossings
 Order blocking based on message population
membership
Design
ECE 8813a (39)
Design Methodology
•
Start with a network, set of channels C1, and routing
function R1

•
R1 is connected and deadlock free and may be
Split each physical channel into a set of additional virtual
channels and define the new routing function
R( x, y)  R1( x, y)  (Cxy  (C  C1))


•
Set of channels from
node x (local node) to
node y)
Set of channels includes escape channels and adaptive
channels
Selection function can be defined in many ways
For wormhole switching, verify that the extended channel
dependency graph is acyclic: likely if R is restricted to
minimal paths
Design
ECE 8813a (40)
Example Binary Hypercube
dimension order ecube algorithm
channels for
2
3
6
4
0
Design
7
5
1
ECE 8813a (41)
• Establish a relationship between routing
freedom and resources
 Maximize adaptivity for fix resources
 Minimize resources for target adaptivity
• Relationship between adaptivity and
performance
 Not obvious
 Unbalanced use of physical or virtual channel
resources
ECE 8813a (42)
References
• C.J. Glass and L.M. Ni, “Maximally fully
adaptive routing in 2D meshes,” Proceedings of
the International Conference on Parallel
Processing, August 1992.
• R. Cypher and L. Gravano, “Storage-efficient,
deadlock-free packet routing algorithms for
torus networks,” IEEE Transactions on
Computers, vol. 43, no. 12, pp.1376-1385,
December 1994
• L. Schwiebert and D.N. Jayasimha, “Optimal
routing for meshes,”, pp.782-791, November
1993
ECE 8813a (43)
In 2D Meshes: Double Y
0
1
2
3
Y2
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Y1
Y2
4
0
6
Y1
increasing network
Y1+ X+
X-
Y1-
decreasing network
X-
Y1+
Y1-
X+
Y2+ X+
X-
Y2-
X-
Y2+
Y2-
X+
ECE 8813a (44)
In 2D Meshes: Mad-Y
permitted
X-
Y1+ X+
Y1+
Y2+ X+
X-
Y2+
Y2
6
X-
Y1-
Y1-
X+
X-
Y2-
Y2-
X+
Y1
No coupling
between Y2 and Y1
• Permit turns
 From the Y1 channels to the X+ channels
 From X- channels to Y2 channels
• Remove unnecessary turn restrictions
• Still overly restrictive!
ECE 8813a (45)
In 2D Meshes: Opt Y
Y1+ X+
X-
Y1-
X- Y1+
Y1-
X+
Y2+ X+
X-
Y2-
X-
Y2+
Y2+ Y1+
Y2-
X+
Y1+ Y2+
Y1Y2-
Y2Y1-
• Further reduce the number of restrictions
 Only restrict turns from Y1 to X Turns from X- to Y1 and 0-degree turns in Y only
when X offset is 0 or positive
• Extensions to multidimensional meshes
• Basic idea: fully adaptive routing in one set of
channels, and dimension order in the other set
until specified lower dimension traversals are
complete
ECE 8813a (46)
Routing with Minimum Buffer
Requirements
• Key Idea:
 Organize packet traffic into disjoint groups that use
separate buffers in each node
 Place acyclic routing restrictions in buffer usage
• Based on node orderings
ECE 8813a (47)
Node Labeling for 2D Torus
0
1
2
3
15
14
13
12
4
5
6
7
11
10
9
8
8
9
10
11
7
6
5
4
12
13
14
15
3
2
1
0
right increasing node ordering
left increasing node ordering
0
1
3
2
15
14
12
13
4
5
7
6
11
10
8
9
12
13
15
14
3
2
0
1
8
9
11
10
7
6
4
5
inside increasing node ordering
outside increasing node ordering
ECE 8813a (48)
Algorithm 1
• Algorithm
 Packet moves from the injection queue to the A
queue
 Stay in the A queues as long as we can move to right
along at least one dimension along a minimal path
 Transition to the B queues under same rule for left
traversals
 Transition to the C queue and remain there until
packet is delivered
• Note the de-coupling of node labeling from
buffer labeling
ECE 8813a (49)
Application
• Orderings analogous
to virtual planes
0
 Note the orderings are
acyclic
1
2
3
• Extensions to edge
buffers
4
5
 Check Algorithm 2
6
7
0
1
2
3
4
5
6
7
ECE 8813a (50)
True Fully Adaptive Routing
• Adaptivity extends across physical and virtual
channels