Introduction

Report
Deadline-constrained
workflow scheduling algorithms
for Infrastructure as a Service Clouds
Future Generation Computer Systems(FGCS.J)
journal homepage: www.elsevier.com/locate/fgcs
Saeid Abrishami a,∗, Mahmoud Naghibzadeha, Dick H.J. Epemab
Tai, Yu-Chang
4/29/2013
*Outline
* Introduction
* Scheduling system model
* IaaS cloud partial critical paths algorithms
* An illustrative example
*Time complexity
* Performance evaluation
* Conclusions
*Introduction
* Clouds are different from utility Grids
- on-demand resource provisioning
- homogeneous networks
- the pay-as-you-go pricing model
* consider the benefits of using Cloud computing for
executing scientific workflows
-there exist several commercial Clouds, such as Amazon
*Introduction
Infrastructure as a Service (IaaS) Clouds, has some potential benefits for
executing scientific workflows
1. users can dynamically obtain and release resources on demand,
and charged on a pay-as-you-go basis
2.resource provisioning
3. illusion of unlimited resources
important parameter : economic cost
-faster resources are more expensive than slower ones
-time-cost tradeoff in selecting appropriate services
-belongs to the multi-criteria optimization problems
minimize the execution cost of the workflow, while completing the workflow
before the user specified deadline
IaaS Cloud Partial Critical Paths (IC-PCP)
IaaS Cloud Partial Critical Paths with Deadline Distribution (IC-PCPD2)
*Scheduling system model
* An application is modeled by a directed acyclic graph G(T ,
E)
* T is a set of n tasks {t1, t2, . . . , tn}
* E is a set of dependencies ei,j=(ti,tj)
* two dummy tasks tentry and texit to the beginning and the end
of the workflow (zero execution time and they are connected with
zero-weight dependencies to the actual entry and exit tasks)
*Scheduling system model
* services S = {s1,s2,…,sm}
with different QoS parameters such as CPU type
and memory size, and different prices
* The pricing model is based on a pay-as-you-go basis similar to the current
commercial Clouds, i.e., the users are charged based on the number of
time intervals that they have used the resource, even if they have not
completely used the last time interval
c1=5
c2=2
c3=1
* ET(ti, sj) :execution time of task ti on computation service sj
* average bandwidth between the computation services is roughly equal
*
*
TT(ei,j) : data transfer time of a dependency ei,j
MET(ti) : Minimum Execution Time of a task ti
-execution time of task ti on a service sj ∈ S which has the
minimum ET(ti, sj) between all available services
p
p
ti
ti
c
c
*SS (ti) = sj,k : Selected Service for each scheduled task ti
*sj,k
: kth instance of service sj.
*AST (ti)
: Actual Start Time of ti
*assigned node :has already been assigned to (scheduled on)
a service
*Critical Parent : of a node ti is the unassigned parent of ti
that has the latest data arrival time at ti, that is, it is the
parent tp of ti for which EFT(tp) + TT(ep,i), is maximal
*PCP: The Partial Critical Path of a node ti is:
- empty if ti does not have any unassigned parents
-consists of the Critical Parent tp of ti and the Partial
Critical Path of tp if has any unassigned parents
*Algorithm1
IC-PCP
*example
0
10
20
30
1
2
3
0~2__19
2
5
8
0~5__16
5
12
16
0~3__16
3
5
9
3~7__24
4
6
10
7~10__23
8~13__30
5
8
11
14~17__30
3
8
11
3
6
8
7~11__22
14~19__30
4
8
11
5
8
14
D=30
*example
10
0
20
30
1
2
3
S2,1
Path{t2,t6,t9}
28
2
0~2__19
2
5
8
0~12__14
0~5__16
5
12
16
0~3__12
0~3__16
3
5
9
3~7__24
8~13__30
4
6
10
5
8
11
14~17__23
7~10__23
21~24__23
14~17__30
3
8
11
12~20__23
7~11__22
4
8
11
3
6
8
20~28__30
14~19__30
5
8
14
D=30
10
0
20
30
1
2
3
S2,1
S3,1
9
28
2
Path{t3}
1
0~2__19
2
5
8
3~7__24
4
6
10
8~13__30
5
8
11
0~12__14
14~17__23
21~24__30
5
12
16
3
8
11
3
6
8
0~9__12
0~3__12
3
5
9
12~20__22
20~28__30
4
8
11
5
8
14
D=30
10
0
20
30
1
2
S2,1
28
2
S2,2
14
Path{t5,t8}
6
0~2__18
3
S3,1
9
1
0~2__19
2
5
8
0~12__14
5
12
16
0~9__12
3
5
9
3~7__23
3~7__24
4
6
10
14~22__24
14~17__23
3
8
11
8~13__30
5
8
11
22~28__30
21~24__30
3
6
8
12~20__22
20~28__30
4
8
11
5
8
14
D=30
10
0
20
30
1
2
S2,1
28
Path{t1,t4}
2
S2,2
14
6
0~8__13
3
S3,1
S3,2
9
1
0~2__18
18
2
2
5
8
8~18__23
3~7__23
4
6
10
19~24__30
8~13__30
5
8
11
0~12__14
14~22__24
22~28__30
5
12
16
3
8
11
3
6
8
0~9__12
3
5
9
12~20__22
20~28__30
4
8
11
5
8
14
D=30
10
0
20
30
5 1
2 2
S2,1
28
2
S2,2
14
6
COST=2*5+1*4=14
Path{t7}
18~29__30
1 3
S3,1
S3,2
9
1
0~8__13
18
2
1
S3,3
11
2
5
8
8~18__23
19~24__30
4
6
10
5
8
11
0~12__14
14~22__24
22~28__30
5
12
16
3
8
11
3
6
8
0~9__12
3
5
9
12~20__22
20~28__30
4
8
11
5
8
14
D=30
* Applicable
* applicable instance for a path if it satisfies two conditions:
- The path can be scheduled on the instance such that each
task of the path is finished before its latest finish time
- The new schedule uses (a part of) the extra time of the
instance,which is the remaining time of the last time
interval of thatinstance.
P
C
Cost=zero
P
C
IC-PCPD2
*Algorithm2
Call PLANNING(G(T,E))
Assign subdeadline on PCP node
(assigned node)
0
10
20
30
S1,1
S2,1 t1
tentety t1 t2 t3 t4
S3,1
0~5__6
0~2__6
2
5
8
0~5__7
sb=0
5
12
16
0~3__16
3
5
9
6~10__24
3~7__24
4
6
10
7~10__13
11~16__30
8~13__30
5
8
11
14~17__30
3
8
11
3
6
8
7~11__17
14~19__30
4
8
11
5
8
14
D=30
0
10
20
30
S1,1 t2
S2,1 t1
tentety t1 t2 t3 t4 t5
S3,1
0~5__6
2
5
8
0~5__7
sb=0
5
12
16
0~3__16
3
5
9
6~10__24
4
6
10
7~10__13
11~16__30
5
8
11
14~17__30
3
8
11
3
6
8
7~11__17
14~19__30
4
8
11
5
8
14
D=30
0
10
20
30
S1,1 t2
S2,1 t1
tentety t1 t2 t3 t4 t5 t6
S3,1
0~5__6
2
5
8
0~5__7
sb=0
5
12
16
0~9__16
0~3__16
3
5
9
t3
6~10__24
4
6
10
7~10__13
3
8
11
11~15__17
7~11__17
4
8
11
11~16__30
5
8
11
14~17__30
3
6
8
18~23__30
14~19__30
5
8
14
D=30
0
10
20
30
S1,1 t2
S2,1 t1
tentety t1 t2 t3 t4 t5 t6 t7
S3,1
t3
S3,2
0~5__6
2
5
8
0~5__7
sb=0
5
12
16
0~9__16
3
5
9
t4
6~16__24
6~10__24
4
6
10
7~10__13
3
8
11
11~15__17
4
8
11
17~22__30
11~16__30
5
8
11
17~20__30
14~17__30
3
6
8
18~23__30
5
8
14
D=30
0
10
S1,1 t2
20
30
t5
S2,1 t1
tentety t1 t2 t3 t4 t5 t6 t7 t8
S3,1
t3
S3,2
0~5__6
2
5
8
0~5__7
sb=0
5
12
16
0~9__16
3
5
9
t4
6~16__24
4
6
10
7~10__13
17~22__30
5
8
11
17~20__30
3
8
11
3
6
8
11~15__17
18~23__30
4
8
11
5
8
14
D=30
0
10
S1,1 t2
20
30
t5
S1,2 t6
S2,1 t1
tentety t1 t2 t3 t4 t5 t6 t7 t8 t9
S3,1
t3
S3,2
0~5__6
2
5
8
0~5__7
sb=0
5
12
16
0~9__16
3
5
9
t4
6~16__24
4
6
10
7~10__13
17~22__30
5
8
11
17~20__30
3
8
11
3
6
8
11~15__17
18~23__30
4
8
11
5
8
14
D=30
0
10
S1,1 t2
20
30
t5
S1,2 t6
S2,1 t1
tentety t1 t2 t3 t4 t5 t6 t7 t8 t9
S3,1
t3
t7
S3,2
t4
16~28__30
17~29__30
0~5__6
2
5
8
0~5__7
sb=0
5
12
16
0~9__16
3
5
9
6~16__24
4
6
10
7~10__13
17~22__30
5
8
11
17~20__30
3
8
11
3
6
8
11~15__17
18~23__30
4
8
11
5
8
14
D=30
0
10
S1,1 t2
20
30
t5
S1,2 t6
S2,1 t1
tentety t1 t2 t3 t4 t5 t6 t7 t8 t9
S3,1
t3
t7
S3,2
t4
S3,3
0~5__6
2
5
8
6~16__24
4
6
10
t8
16~28__30
5
8
11
17~25__30
0~5__7
sb=0
5
12
16
0~9__16
3
5
9
7~10__13
17~20__30
3
8
11
3
6
8
11~15__17
18~23__30
4
8
11
5
8
14
D=30
0
10
5 S1,1 t2
20
30
t5
S1,2 t6
2 S2,1 t1
S2,2
tentety t1 t2 t3 t4 t5 t6 t7 t8 t9
1 S3,1
t3
t7
S3,2
2
5
8
0~5__7
sb=0
5
12
16
COST=5*2+2*2+1*4=18
t4
S3,3
0~5__6
t9
6~16__24
4
6
10
7~10__13
3
8
11
t8
16~28__30
5
8
11
17~25__30
3
6
8
18~26__30
0~9__16
3
5
9
11~15__17
4
8
11
18~23__30
5
8
14
D=30
*Time complexity
O(n+e)~O(n^2)
IC-PCP=O(n^2)
O(n)
O(n-1)
O(m*n)=O(n^2)
O(n^2)
*Time complexity
O(n+e)~O(n^2)
Call PLANNING(G(T,E))
O(n^2)
O(n^2)
IC-PCPD2=O(n^2)
Assign subdeadline on PCP node O(n)
*evaluation
Algo1
IC-PCP
Algo2
Algo3
IC-PCPD2 IC-LOSS
Fastest schedule : scheduling each workflow task on a distinct instance of the
fastest computation service, while all data transmission
times are considered to be zero
MF =makespan of the Fastest schedule
deadline factor α
set the deadline = α・MF
-Since the problem has no solution for α = 1, we let α ranges from
1.5 to 5 in our experiments, with a step length equal to 0.5
Cheapest schedule : scheduling all workflow tasks on a single instance of the
cheapest computation service
normalize the total cost of each workflow execution
*evaluation
Algo1 Algo2
Algo3
IC-PCP IC-PCPD2 IC-LOSS
1>2>3
1>2>3
1≈2>3
1>2>3
2>1>3
1>2>3
1>2>3
1≈2>3
1>2>3
2>1>3
*Conclusions
* The new algorithms consider the main features of the current
commercial Clouds such as on-demand resource provisioning,
homogeneous networks, and the pay-as-you-go pricing model
* The time complexity of both algorithms is O(n2), The polynomial time
complexity makes them suitable options for the large workflows
* IC-PCP outperforms both, IC-PCPD2 and IC-Loss in most cases
* experiments show that the computation times of the algorithms are
very low, less than 500 ms for the large workflows
* intend to improve our algorithms for the real Cloud environments
*
*

similar documents