talk-yarsi-grid-cloud-110427-ver2

Report
Heru Suhartanto
Faculty of Computer Science,
Universitas Indonesia
E-mail: [email protected]
Presented at University of YARSI
– General Course – on 27-th April 2011
A revised version of presentation at ICACSIS2010,
http://icacsis2010.cs.ui.ac.id/
Soon the presentation will be available at http://hsuhartanto.wordpress.com
1
Hungry problems that need super computing resources.
(examples and types)
Why Grid and Cloud computing (definition, structure, ….)
Some past and current works








The development of the first Indonesia Grid infrastructure
parallel Molecular dynamics process in drug design based on typical
Indonesian plants on Cluster environment;
and IndoEdu-grid design for Indonesian e-learning resources based
on Grid computing.
Prospects in the future and some proposals to overcome the
challenges will be covered and this includes cloud computing.
Next coming works
2
Resource Hungry Applications
[Ref Hai Jin and Raj Buyya]
• Solving grand challenge applications using
computer modeling, simulation and analysis
Aerospace
Internet &
Ecommerce
Life Sciences
CAD/CAM
Digital Biology
Military Applications
33
•Information simulation - Compute dominate
•Information repository - Storage dominate
•Information access - Communication dominate
•Information integration - System of systems
•These applications are impossible to be solved
using ordinary computing resources
4

There are 3 ways to improve performance:
Work Harder
 Work Smarter
 Get Help


Computer Analogy
 Using faster hardware
 Optimized algorithms and techniques used to
solve computational tasks
 Multiple computers to solve a particular task
5

Improve the operating speed of processors & other
components
 constrained by the speed of light, thermodynamic laws, &
the high financial costs for processor fabrication

Connect multiple processors together & coordinate their
computational efforts
 parallel computers
 allow the sharing of a computational task among multiple
processors
Ref: Buyya
6
Supercomputer ?
Cluster Computing ?
Grid Computing ?
Cloud Computing?
7
We need to ‘collect’ these resources
and share them among the needed
people.
This lead to Grid Computing concept.
8
•http://www.pragma-grid.net/
•The Pacific Rim Application and Grid Middleware Assembly
(PRAGMA) was formed in 2002 to establish sustained
collaborations and advance the use of grid technologies in
applications among a community of investigators working with
leading institutions around the Pacific Rim.
•Four working groups focus our activities in the areas of:
• * Resources and Data
• * Biosciences
• * Telescience
• * Global Earth Observatory (GEO)
9
members have been doing a combination of the following:
•- join their resources with PRAGMA grid
•http://goc.pragma-grid.net/pragma-doc/userguide/join.html
•http://goc.pragma-grid.net/pragma-doc/computegrid.html
•- running grid applications in PRAGMA grid
•http://goc.pragma-grid.net/pragmadoc/userguide/pragma_user_guide.html
•http://goc.pragma-grid.net/wiki/index.php/Applications
•- develop, integrate, enhance, implement and share software in PRAGMA grid
•http://goc.pragma-grid.net/wiki/index.php/Main_Page#Middleware
•Our recent focus is virtualization. Some sites have been actively working
together on VM technology.
•http://goc.pragma-grid.net/wiki/index.php/Virtualization
10
•Deteksi kerusakan pipa, Inspeksi 100 km pipa dgn garis tengah 50 inci,
data yang terkumpul 280 Terabytes (2.8 x 10^{14} bytes), kecepatan
transfer 2.8 Gb. Hanya bisa diproses oleh SDK Grid computing, [ ref:
inspektionmolch : http://www.hpe.fzk.de/projekt/molch/, akses 27 Sep 08]
•Analisis data aktifitas otak yang dikumpulkan dari instrument MEG
(Magnmetoencephatolgraphy) adalah topik riset yg sangat penting karena
mendorong para dokter untuk identifikasi simptom penyakit. Kerja sama
Grid Lab – Univ Melbourne, Nimrod-G Project Monash Univ, dan MEG
project – Osaka Univ [ref: http://www.gridbus.org/neurogrid/, akses 27 sep
08]
•Novartis Institute for Biomedical Research perlu 6 tahun waktu proses dgn
komputer super, namun dengan PC Grid berjumlah 3700 desktop Pc,
hanay perlu waktu proses 12 jam. Hemat dana sekitar 200 juta dollar untuk
tiga tahun, kekuatan komputasi tercapai lebih dari 5 Tera-flops [Ian Foster,
www.globus.org]
11
•the combination of computer resources from multiple administrative
domains to reach a common goal. The Grid can be thought of as a
distributed system with non-interactive workloads that involve a large
number of files.
•Infrastruktur komputasi yang menyediakan akses berskala besar
terhadap sumber daya komputasi yang tersebar secara geografis
namun saling terhubung menjadi satu kesatuan fasilitas. Sumber
daya ini termasuk antara lain supercomputer, system storage,
sumber sumber data, dan instrument instrument.
12
Grid computing physical structure [Ian Foster]
13
Grid Architecture [GridBus]
14

Thailand – ThaiGrid





Singapore – NGP (National Grid Project)



Started September 2002
3 univ., 5 ministries (MOE, MOH, MITA, MINDEF, MTI)
Malaysia


Started at 2002
Funding : $ 6M (3 years)
10 univ., Weather Forecast Services, NECTEC
158 CPUs
Proposal “National Technology Roadmap for Grid Computing”
submitted to MOSTI (initiator: MIMOS Berhad, th. 2005)
Regional forums:


SEA Grid Forum (3 countries)
ApGrid (14 countries)
15
Ask others to provide them, and users
use them as a Services then Grid
computing will be function as Cloud
computing;
16
Services in the Cloud
•Software as a Service (SaaS)
•Platform as a Service (PaaS)
•Infrastructure as a Service (IaaS)
17
– bisa dalam bentuk Aplikasi seperti CRM –
customer relationship management, Email,
• SaaS
•PaaS – Platform, antara lain Programming Language,
APIs, Development Environment,
•IaaS
•Virtualization : Provisioning, Virtualization, billing,
•Hardware : Memory, computation, Storage
•Colocation : the data center owner rents out floor
space and provides power and cooling as well as a
network connection
18
Some cloud vendors: amazon
•Aws.amazon.com, amazon web services (AWS) offers
a large number of cloud services. Focuses on Elastic
Compute Cloud (EC2) and its supplementary storage
services
•EC2 offers the user a choice of virtual machine
templates that can be instantiated in a shared and
virtualized environment,
•Each virtual machine is called Amazon Machine Image.
The customer can use pre-packaged AMIs from
Amazon and 3rd parties or they can build their own.
19
Appian- www.appian.com
•Offers management softwares to design an deploy
business processes. The tool is available as a web
portal for both business process designers and users,
•the design is faciliated with a graphic user interface
that maps processes to web forms,
•End users are then able to access the functionality
through a dash board of forms,
•Executives and managers can access the same web
site for bottleneck analysis, real time visibility and
aggregated high level analysis
20
Google:
apps.google.com , appengine.google.com
•Google App Engine is a platform service. It provides basic run time
environment, it eliminates many of the system administration and
development challenges involved in building applications scale to million
users,
•Another infrastructural services, used primarily by Google applications
themselves is Google Big Table. It is a fast and extremely large-scale
DBMS designed to scale into petabyte range across “hundreds or
thousands of machines”
•On the SaaS, google offers some free and competitively priced services
including Gmail, Google Calendar, Talk, Docs, and sites.
21
Cloud computing services by
Indonesians?
Gratis: Esfindo (SaaS), InGrid (IaaS), ……
Bayar : telkomcloud, webhosting, collocation, ….
22



Over 20 definitions:
 http://cloudcomputing.sys-con.com/read/612375_p.htm
Buyya’s definition:
 "A Cloud is a type of parallel and distributed system consisting
of a collection of inter-connected and virtualised computers
that are dynamically provisioned and presented as one or
more unified computing resources based on service-level
agreements established through negotiation between the
service provider and consumers.”
Keywords: Virtualisation (VMs), Dynamic Provisioning
(negotiation and SLAs), and Web 2.0 access interface
Segala kebutuhan pengelolaan data di Internet dengan sumber daya
yang disiapkan oleh suatu provider. [. H Suhartanto, 2011]
23
Public/Internet
Clouds
3rd party,
multi-tenant Cloud
infrastructure
& services:
* available on
subscription basis
(pay as you go)
Private/Enterprise
Clouds
Hybrid/Mixed Clouds
Cloud computing
model run
within a company’s
own Data Center /
infrastructure for
internal and/or
partners use.
Mixed usage of
private and public
Clouds:
Leasing public
cloud services
when private cloud
capacity is
insufficient
24







No upfront infrastructure investment
 No procuring hardware, setup, hosting, power, etc..
On demand access
 Lease what you need and when you need..
Efficient Resource Allocation
 Globally shared infrastructure, can always be kept busy by serving
users from different time zones/regions...
Nice Pricing
 Based on Usage, QoS, Supply and Demand, Loyalty, …
Application Acceleration
 Parallelism for large-scale data analysis, what-if scenarios
studies…
Highly Availability, Scalable, and Energy Efficient
Supports Creation of 3rd Party Services & Seamless offering
 Builds on infrastructure and follows similar Business model as
Cloud
25
some previous research works are
available
•The development of internet
infrastructures among universities;
•Some related courses are offered
in universitities
•
26

National network infrastructure provided by
telecommunication industries
Combining terrestrial and satellite connections
 Terrestrial: optical fiber, copper, digital micro wave;
(wireless and on-wire)



Pengguna Internet : 40 juta
Pelanggan telp seluler: 105 juta
Nizam, presentasi Aptikom 2011
Topologi “INHERENT” tahun 2010
Konfigurasi Zona Perguruan Tinggi
Banda Aceh
Unsyiah
47
Lhokseumawe
Poltek
15
37
Medan
48
Lhokseumawe
Unimal
Batam
14
56
16
Padang
STSI
38
Pekanbaru
Manado
Potianak
50
Padang
51
Gorontalo
Samarinda
55 Pol Smr
13
17
Padang
Panjang
23
31
Jambi
20
35
26
27
Manado
Manukwari
Palu
21
40
Palangkaraya
22
Palembang
Ternate
52 `
41
25
Banjarmasin
18
Bengkulu
19
28
JarDikNas
32
936
Bandar lampung
10
Jayapura
Ambon
24
Kendari
54
Makasar Pangkep
39
Tual
42
Jkt DIKTI
34
Serang
1
2
Jkt UI
Semarang
5
30
8
Jkt UT
44
Bogor
1 Mbps
4
53
Bangkalan
Singaraja
Bandung 3
Purwokerto
Solo
43
Jogya
Mataram
6
11
45
46
7
Malang
49
28
Jember
Denpasar
29
155 Mbps
8 Mbps
4 Mbps
2 Mbps
12
Surabaya
2 Mbps
16 Mbps
33
Catatan:
Total Link teresterial: 41
Link VSAT:12
Total link : 53
Nizam, 2011 at APTIKOM meeting
Kupang

Jumlah koneksi
82 PTN (32 sebagai Local Nodes)
 224 PTS
 12 Kopertis
 SEAMEO-Seamolec


Kapasitas bandwidth






Advance: 155Mbps
Medium: 8 Mbps
Basic: 2 Mbps
Self-funding: (leased line 512 – 1 M; wireless 11-55 M)
Network configuration: scale-free network
Cita-cita ke depan: Higher Education super corridor dengan dark fiber
sehingga koneksi antar perguruan tinggi minimal 1 GBps dan backbone
nasional 10 GBps (Thailand antar PT sudah 1-10 GBPs)
Nizam, 2011 at APTIKOM meeting
inGRID
PORTAL
User
U*
Globus
Head Node
User
Windows/x86
Cluster
INHERENT
Linux/x86
Cluster
Solaris/x86
Cluster
UI
I*
Globus
Head Node
Globus
Head Node
Linux/Sparc
Cluster
Custom
PORTAL
30

inGRID Portal


Globus Head Node


SUN Fire X2100, AMD Opteron Processor (2.2 GHz, dual core),
1 GB Memory, 80 GB Disk, 2 10/100/1000 Mbps NICs, DVDROM Drive
Linux Cluster (16 nodes)


SUN Fire X2100, AMD Opteron Processor (2.4 GHz, dual core),
2 GB Memory, 80 GB Disk, 2 10/100/1000 Mbps NICs, DVDROM Drive
SUN Fire X2100, AMD Opteron Processor (2.2 GHz, dual core),
1 GB Memory, 80 GB Disk, 2 10/100/1000 Mbps NICs
Storage Server

Dual Xeon Processor (3.0GHz), 2 GB Memory, 1 TB Disk
31

User Interface:


Middleware


Globus Toolkit
Job Scheduler:


UCLA Grid Portal
Sun Grid Engine
(SGE)
Programming:


C, Java
Paralel: MPICH

Applications:

Chemistry:
 Gromach

Biology:
 Blast

Computer Graphic:
 Povray

Utilities:
 Matrics multiplication,
Sort, Octave (Matlab-like)
32
33
•Ari Wibisono, Heru Suhartanto, Arry Yanuar, Performance Analysis of
Curcumin Molecular Dynamics Simulation using GROMACS on Cluster
Computing Environment, this conference.
•Muhammad Hilman, Heru Suhartanto, Arry Yanuar, Performance
Analysis of Embarrassingly Parallel Application on Cluster Computer
Environment : A Case Study of Virtual Screening with Autodock Vina
1.1 on Hastinapura Cluster, this conference.
34




used to study the solvation of proteins, the interaction
of DNA-protein complexes and lipid systems, and
study the ligand binding and folding of proteins.
to produce a trajectory of molecules in a finite time
period, where each the molecules in these simulations
have positional parameters and momentum.
be used to assist drug discovery. The usage of
computers offer a method of in-silico as a complement
to the method in-vitro and in-vivo that are commonly
used in the process of drug discovery. Terminology insilico, analog with in-vitro and in-vivo, refers to the use
of computer in drug discovery studies
GROMACS is used in the simulation.
35




Molecular docking is a computational
procedure that attempts to predict non
covalent binding of macromolecules.
The goal is to predict the bound conformations
and the binding affinity.
The prediction process is based on information
that embedded inside the chemical bond of
substance.
Autodock Vina is used in the simulation.
36
Amount of Processor
No
Time Step
2
3
4
5
1
200ps
1.85
2.64
3.07
3.74
2
400ps
1.84
2.46
3.13
3.73
3
600ps
1.83
2.42
3.04
3.69
4
800ps
2.03
2.47
3.09
3.76
5
1000ps
1.87
2.51
3.14
3.82
37
38




discusses the design and simulation of an e-learning computer
network topology, based on Grid computing technology, for
Indonesian schools called the Indonesian Education Grid
(abbreviated as IndoEdu-Grid).
The establishment of such network without Grid computing
capabilities will lead to redundancies of the idle resources.
We proposed scenarios that have different network topologies
based on their routers and links configuration. Each scenario will
be run in the simulator using two packet scheduling algorithms,
one will be FIFO (First In First Out) Scheduler and the other
SCFQ (Self-Clocked Fair Queuing) Scheduler.
The processing time of the job’s packets will be evaluated to
determine the most effective network topology for IndoEdu-Grid
39




The entities of our design are resources, users, and jobs or Gridlets
Resource entities are responsible to perform computation on job
entities in form of Gridlets sent by one or more users and send it
back to the user. Our work uses one resource for each province;
each resource consists of one Machine and each Machine consists
of 4 PEs (processing elements).
Users are entities responsible to submit jobs in form of Gridlet
objects to the resources. The users are programmed to send jobs to
a particular resource at the same time, thus we are able to gain
more knowledge on the performance of Grid system in its peak
load, when all the users are accessing the resource at the same
time.
Jobs in GridSim are represented as the objects of the class Gridlet
provided by GridSim. In our work, each user will create three
Gridlets having different lengths–5000 MI (millions instructions),
3000 MI, and 1000 MI. This was aimed to simulate the real
situation where a user does not just send one job, but it can also
send more than one job with different sizes and needs of
computation powers.
40

The first scenario is a representation of our thought that divides the whole territory of
Indonesia into three main sections–the western, central, and eastern part of Indonesia. Each of
these three sections will be subdivided into parts or units that are smaller–the islands and/or
archipelagos.
41
The second scenario is a representation of our thought that divides
the whole territory of Indonesia directly into islands and/or
archipelagos units. These islands and/or archipelagos will be
divided again into province units.
42











Hardware
Intel® Core™ 2 Duo T5800 processor with 2.0 GHz clock speed, 800 MHz FSB
(Front Side Bus), and 2 MB L2 cache.
2048 MB RAM (Random Access Memory) with shared dynamically with Mobile
Intel® Graphics Media Accelerator 4500MHD.
320 GB Fujitsu MHZ2320BH G2 SATA harddisk with 5400 rpm rotation speed.
Software
32-bit Microsoft Windows Vista™ Business operating system.
JDK (Java Development Kit) version 1.6.0_05 with Java™ Runtime
Environment 1.6.0_05-b13.
GridSim version 5.0 beta.
The simulation was run 10 times in each scenario to increase the validity of
simulation results, and then the results were averaged.
SCFQ scheduling algorithm, even-numbered users are set to have a weight 1,
indicating that they have a higher priority, while odd-numbered users are set to
have a weight 0, indicating that they have normal priority. This weighting is
useful to determine the type of service (ToS) which is owned by the packets sent
by the users.
FIFO scheduling algorithm, all users by default are set to have a weight 0, so all
sent packets will have the same ToS.
43
Scheduling
Algorithm
FIFO
SCFQ
Scenario
Scenario 1
Scenario 2
Scenario 1
Scenario 2
Processing Time
(in Simulation Seconds)
Gridlet#0 Gridlet#1 Gridlet#2
239.76471 184.89620 124.45739
240.23045 185.26774 124.11812
235.50311 180.73233 124.67395
235.78695 181.59782 124.05540
Average Simulation Results Data for the Entire Provinces per Gridlet Using
FIFO and SCFQ Scheduling Algorithm
•Job = Gridlet, which simulates the job packets that contain information about the length of
jobs in units of MI (millions instruction), the length of input and output files in units of bytes,
starting and finishing execution time, and the owner of the jobs.
•three Gridlets #0, #1, #2 has different lengths–5000 MI (millions instructions), 3000 MI,
and 1000 MI, respectively.
44





More people are becoming interested in shared
computing facilities,
Many free of charge grid development tools are
available,
Develop a strong unit that capable building the Grid
infrastructure, but it needs commitment and dedication
from at least university level and government, or
INHERENT can be improved, it will open more
collaboration among universities,
Nusantara Super Highway Rampung di 2015,
"Nusantara Super Highway berbasis optical network merupakan
kelanjutan dari cita-cita Telkom untuk menyatukan Indonesia
melalui visi Nusantara 21 yang sudah dimulai sejak 2001 dengan
teknologi berbasis satelit,"http://www.detikinet.com/read/2011/04/19/143116/1620709/328/nusantarasuper-highway-rampung-di-2015?i991101105
45



Unreliable electricity supplies
No coordination at national level to have ICT research
and development programs involving across
government and private organizations
Relies on grant fund which leads to other negatives
effects such as,




Most Indonesian funding resources do not allow hardware
(computers) investment (only spare parts are allowed  )
Permanent human resources that manage the Grid,
Maintenance of the grid to adapt with current technology
development.
Many organization are “very protective” to their
computing resources, only a few are willing to share
them.
46
Challenges - cont
Only few (may one or two) faculties teach
cluster, cloud and grid Computing. So only
few master and understand them.
Perhaps Cloud computing is the alternative
solution in one way, however ……….the
cloud itself has some challenges
47
Scalability
Reliability
Billing
Utility & Risk
Management
Programming Env.
& Application Dev.
Uhm, I am not quite
clear…Yet another
complex IT paradigm?
Software Eng.
Complexity
48
•More bioinformatics, medical informatics,
image analysis, finance with GPU computing
environment,
•Indonesian Egov Grid services
•Indonesian Archeology and Culture-Grid
services
•Indonesian Health-Grid services
49
References
•ABCGrid, http://abcgrid.cbi.pku.edu.cn (akses 3 Oktober 2008), also by Ying Sun, Shuqi
Zhao, Huashan Yu, Ge Gao and Jingchu Luo. (2007) ABCGrid: Application for Bioinformatics
Computing Grid. Bioinformatics
• Rajkumar Buyya, www.gridbus.org/megha; www.buyya.com; www.manjrasoft.com
•GCIC, http://www.gridcomputing.com/, akses 25 Sep 2008.
• Globus, http://www.globus.org, akses 25 Sep 2008
•Gridbus Application, http://www.gridbus.org/applications.html, akses 25 Sep 2008
•Gridbus Middleware, http://www.gridbus.org/middleware/, akses 25 Sep 2008
GridGain, http://www.gridgain.com, akses 15 Sep 2008
•Ivo Bahar, Heru Suhartanto, Design and Simulation of Indonesian Education Grid Topology
using Gridsim Toolkit, to appear at Asian Journal of Information Technology, 2010
•H. Suhartanto, Kajian Perangkatbantu Komputasi tersebar berbasis Message Passing,
Makara Teknologi, Vol 10, No 2, 2006, page 72 – 81.
•H. Suhartanto, Peluang dan tantangan Aplikasi Grid Computing di Indonesia, pidato
pengukungan guru besar, 2008.
•InGrid, https://grid.ui.ac.id/gridsphere/gridsphere, akses 28 Sep 2008
•Jardiknas, http://jardiknas.diknas.go.id/, akses 28 Sep 2008
•John Rhoton, cloud computing explained, 2nd ed, recursice press, 2010
50
•Molecular Docking, http://grid.apac.edu.au/OurUsers/MolecularDocking, akses 27
Sep 2008
• Molecular Docking Definition, http://en.wikipedia.org/wiki/Docking_(molecular), akses
3 Oktober 2008
•MultimediaGrid, http://www.gridbus.org/papers/MultimediaGrid-MJCS2007.pdf, akses
27 Sep 2008
•NeuroGrid, http://www.gridbus.org/neurogrid/, akses 27 Sep 2008
•Paul Coddington, Distribute and High Performance Computing course, University of
Adelaide, 2002 UK national HPC service,
http://www.csar.cfs.ac.uk/user_information/grid/grid-middleware.shtml
•Peluang dan tantangan Aplikasi Grid Computing di Indonesia Page 12 of 12
•Pipeline – Inspektionmolch: http://www.hpe.fzk.de/projekt/molch/, akses 27 Sep 2008
•Top500, http://www.top500.org, di akses 14 September 2008.
• Wahid Chrabakh, Computational Grid Computing: Application Viewpoint, Computer
Science, Major Exams, UCSB, ppt file,
• Zlatev, Z. and Berkowicz, R. (1988), Numerical treatment of large-scale air pollutant
models, Comput. Math. Applic., 16, 93 -- 109
51
52

similar documents