幻灯片 1

Report
IT Service Management
2011年度教育部-IBM精品课程
同济大学软件学院 严海洲
[email protected]
Chapter 5
Service Operation
Tivoli Software
服务运营
•服务运营指导如何达到服务交付和服务支持的效果和效率,从而确保客户
和服务提供者的价值得以实现。
• 《服务运营》介绍了如下的主题和流程:
Service
Design
Service
Strategy
ITIL
Service
Operation
Service
Transition
• Service Operation Principles
• Service Operation Processes
Event Management
Incident Management
Request Fulfillment
Problem Management
Access Management
• Common Service Operation Activities
IT Operations ( Console, Job Scheduling etc.)
Mainframe Support
Server Mgmt and Support
Desktop Support, Middleware Mgmt, Internet/Web Mgmt
Application Mgmt Activities
• Organization Service Operation
Service Desk
Technical Management
IT Operations Management
Application Management
5.1 Introduction
Service Operation (SO)
• Coordinate and carry-out day-to-day activities and
processes to deliver and manage services at agreed
levels
• Ongoing management of the technology that is used
to deliver and support services
• Where the plans, designs and optimizations are
executed and measured
5
Service Operation Goals
• Coordinate and Execute: all ongoing activities required
to deliver and support services
--Execute the Services
--Coordinate Service Management processes
--Management of the technology infrastructure used
to deliver services
--Coordinate the people who manage the
technology, processes, and services
.
6
Scope of SO
• Ongoing management of:
– The services themselves
– The Service Management processes
– Technology
– People
.
7
Value to business of SO
• Where actual value of strategy, design and transition
are realized by the customers and users
Though
• Where business dependency usually commences
8
ACHIEVING BALANCE IN SERVICE OPERATION
• Service Operation: More than repetitive execution
--Services delivered in a changing environment
--Conflict between status quo and adaptation
--Balance between conflicting sets of priorities
• Balance Areas of Conflict:
--Internal IT View vs. External Business View
--Stability vs. Responsiveness
--Quality of Service vs. Cost of Service
--Reactive vs. Proactive
ACHIEVING BALANCE IN SERVICE OPERATION
• Internal IT View vs. External Business View
--Internal: IT components and systems
--External: Users and customer experiences
• Stability vs. Responsiveness
--Stability: Stable platform and consistent
--Responsiveness: Quick response and flexible
• Quality of Service vs. Cost of Service
--Quality: Consistent delivery of service
--Cost: Costs and resource utilization optimal
• Reactive vs. Proactive
--Reactive: Does not act until prompted
--Proactive: Always looking to improve
10
Internal IT View vs. External Business View
11
Internal IT View vs. External Business View 3-1
12
Internal IT View vs. External Business View 3-2
13
Internal IT View vs. External Business View 3-3
14
Stability vs. Responsiveness
15
Stability vs. Responsiveness
16
Stability vs. Responsiveness
17
Quality of Service vs. Cost of Service
18
Quality of Service vs. Cost of Service
19
Quality of Service vs. Cost of Service
20
Reactive vs. Proactive
21
Reactive vs. Proactive
22
Reactive vs. Proactive
23
Operational Health
• What is operation health?
• Who should pay attention to operation healthy?
• Think your Health.......
Heart?
Brain?
or others?
24
Communication
• Good communication is needed between all ITSM
personnel and with users/customers/partners
• Issues can often be mitigated or avoided through good
communication
• All communication should have:
– Intended purpose and/or resultant action
– Clear audience, who should be involved in deciding the
need/format
25
5.2 Service Operation Processes
Event Management
• Objectives
• Basic concepts
• Roles
27
Event Management — Objectives
• Detect, make sense of them, and determine the
appropriate control action
• Event Management is the basis for Operational
Monitoring and Control
28
Event Management — Basic concepts
• Event
An alert or notification created by any IT Service,
Configuration Item or monitoring tool. For example a
batch job has completed. Events typically require IT
Operations personnel to take actions, and often lead to
Incidents being logged.
• Event Management
The Process responsible for managing Events
throughout their Lifecycle.
• Alert
29
Event Management — Logging andFiltering
Exception
Filter
Warning
Information
30
Event Management — Managing Exceptions
Exception
31
Incident
/Proble
m/Chan
ge
Incident
Incident
Management
Problem
Problem
Management
RFC
Change
Management
Event Management —Information and Warnings
Incident
Incident
/Proble
m/Chan
ge
Do any one or
combination
of …
RFC
Alert
Warning
Auto Response
Information
32
Problem
Log
Human
Intervention
Event Management — Roles
• Event management roles
are filled by people in the
following functions
– Service Desk
– Technical Management
– Application Management
– IT Operations Management
33
Metrics of Event Management
Designing for event management
1.Instrumentation
2.Error Messaging
35
Designing for event management
3.Event Detection and Alert Mechanisms
4.Identification of thresholds
36
Incident Management
• Objectives
• Scope
• Business value
• Basic concepts
• Activities
• Interfaces
• Key metrics
• Roles
• Challenges
37
Incident Management — Objective
• To restore normal service operation as quickly as
possible and minimize adverse impact on the business
38
Incident Management — Scope
• Managing any disruption or potential disruption to live
IT services
• Incidents are identified
– Directly by users through the Service Desk
– Through an interface from Event Management to Incident
Management tools
• Reported and/or logged by technical staff
39
Incident Management — Business value
• Quicker incident resolution
• Improved quality
• Reduced support costs
40
Why Incident Management
Ensure the best use of resource to support the business
Develop and maintain meaningful records relating to incidents
Devise and apply a consistent approach to all incidents reported
Incident Definition
An incident is an event which is not part of the
standard operation of a service and which causes,
or may cause an interruption to, or a reduction in
the quality of that service
41
Incident Management — Basic concepts
• An Incident
– An unplanned interruption or reduction in the quality of
an IT Service
– Any event which could affect an IT Service in the future is
also an Incident
• Timescales
• Incident Models
• Major Incidents
42
Incident Management — Activities
43
Impact, Urgency & Priority
IMPACT
- The likely effect the incident will have on the
business (e.g. numbers affected, magnitude)
URGENCY
- Assessment of the speed with which an incident
or problem requires resolution (i.e. how much
delay will the resolution bear)
PRIORITY
- the relative sequence in which an incident or
problem needs to be resolved, based on impact
and urgency
44
Example
45
Incident Management — Interfaces
• Problem Management
• Service Asset and Configuration Management (SACM)
• Change Management
• Capacity Management
• Availability Management
• Service Level Management
46
Incident Management — Key metrics
• Total number of incidents (as a control measure)
• Breakdown of incidents at each stage (for example, logged,
WIP, closed, etc.)
• Size of incident backlog
• Mean elapsed time to resolution
• % resolved by the Service Desk (first-line fix)
• % handled within agreed response time
• % resolved within agreed Service Level Agreement target
• No. and % of Major Incidents
• No. and % of incident correctly assigned
• Average cost of incident handling
47
Incident Management — Roles
• Incident Manager
– May be performed by Service Desk Supervisor
• Super Users
• First-Line Support
– Usually Service Desk Analysts
• Second-Line Support
• Third-Line Support (Technical Management, IT
Operations, Applications Management, Third-party
suppliers)
48
Benefits
• Reduced business impact of Incidents by timely resolution
• Improved monitoring of performance against targets
• Elimination of lost Incidents and Service Requests
• More accurate CMDB information
• Improved User satisfaction
• Less disruption to both IT support staff and Users
49
Possible Problems
• Lack of Management commitment
• Lack of agreed Customer service levels
• Lack of knowledge or resources for resolving incidents
• Poorly integrated processes
• Unsuitable software tools
• Users and IT staff bypassing the process
50
Incident Management — Challenges
•Ability to detect incidents as quickly as possible
(dependency on Event Management)
•Ensuring all incidents are logged
•Ensuring previous history is available (Incidents,
Problems, Known Errors, Changes)
•Integration with Configuration Management System,
Service Level Management, and Known Error Database
(CMS, SLM, KEDB)
51
Request Fulfillment
• Objectives
• Basic concepts
• Roles
52
Request Fulfillment — Objectives
• To provide a channel for users to request and receive
standard services for which a pre defined approval and
qualification process exists
• To provide information to users and customers about
the availability of services and the procedure for
obtaining them
• To source and deliver the components of requested
standard services (for example licenses and software
media)
• To assist with general information, complaints or
comments
53
Request Fulfillment — Basic concepts
• Service Request
– A request from a User for information or advice , or for a
Standard Change. For example
• To reset a password, or to provide standard IT Services
for a new User
• Request Model
54
Request Fulfillment — Roles
• Not usually dedicated staff
• Service Desk staff
• Incident Management staff
• Service Operations teams
55
Problem Management
• Objectives
• Basic concepts
• Roles
56
Problem Management — Objectives
• To prevent problems and resulting Incidents from
happening
• To eliminate recurring incidents
• To minimize the impact of incidents that cannot be
prevented
57
Problem Management—Basic concepts(1 of 2)
• Problem
– The unknown cause of one or more incidents
• Problem Models
• Workaround
• Known Error
• Known Error Database
58
Problem Management—Basic concepts(2 of 2)
• Reactive Problem Management
– Resolution of underlying cause(s)
– Covered in Service Operation
• Pro-active Problem Management
– Prevention of future problems
– Generally undertaken as part of CSI
59
Proactive Activities
Trend Analysis
- Post-Change occurrence of particular Problems
- Recurring Problems per type or per component
- Training, documentation issues
Preventative Action
- Raising RFC to prevent occurrence/recurrence
- Initiate education and training
- Ensure adherence to procedures
- Initiate process improvement
- Provide feedback to testing, training and
documentation
60
Problem Management — Roles
• Problem Manager
• Supported by technical groups
– Technical Management
– IT Operations
– Applications Management
– Third-party suppliers
61
Problem Management
— Problem Investigation and Diagnosis
• Objectives
• Basic concepts
• Roles
62
Chronological Analysis
63
Pain Value Analysis
64
Kepner and Tregoe
65
Brainstorming
It can often be valuable to gather together
the relevant people, either physically or by
electronic means, and to ‘brainstorm’ the
problem – with people throwing in ideas on what
the potential cause may be and potential actions
to resolve the problem. Brainstorming sessions
can be very constructive and innovative but it is
equally important that someone, perhaps the
Problem Manager, documents the outcome and
any agreed actions and keeps a degree of control
in the session(s).
66
Ishikawa Diagram
67
Preto Analysis
68
Preto Analysis-Example
69
Preto Analysis-Example
Network Failures
70
Access Management
• Objectives
• Basic concepts
• Roles
71
Access Management — Objectives
• Granting authorized users the right to use a service
• Preventing access by non-authorized users
72
Access Management — Basic concepts
• Access
• Identity
• Rights
• Service or Service Groups
• Directory Services
73
Access Management — Roles
• Not usually dedicated staff
• Access management is an execution of Availability
•
•
•
•
74
Management and Information Security Management
Service Desk staff
Technical Management staff
Application Management staff
IT Operations staff
5.3 Common Service Operation activities
MONITORING AND CONTROL
•Monitoring refers to the activity of observing a situation to
detect changes that happen over time.
•Control refers to the process of managing the utilization or
behaviour of a device, system or service. It is important to note,
though, that simply manipulating a device is not the same as
controlling it. Control requires three conditions:
1. The action must ensure that behaviour conforms to a defined
standard or norm
2. The conditions prompting the action must be defined,
understood and confirmed
3. The action must be defined, approved and appropriate for
these conditions.
76
The Monitor Control Loop
77
MAINFRAME MANAGEMENT
Activities are likely to be undertaken:
• Mainframe operating system maintenance and support
• Third-level support for any mainframe-related incidents/problems
• Writing job scripts
• System programming
• Interfacing to hardware (H/W) support; arranging maintenance,
agreeing slots, identifying H/W failure, liaison with H/W engineering.
• Provision of information and assistance to Capacity Management
to help achieve optimum throughput, utilization and performance
from the mainframe.
78
SERVER MANAGEMENT AND SUPPORT
• Operating system support
• Licence management
• Third-level support
• Procurement advice
• System security
• Definition and management of virtual servers
• Capacity and Performance
• Ongoing maintenance
• Decommissioning and disposal of old server equipment
79
FACILITIES AND DATA CENTRE MANAGEMENT
•Building Management
• Equipment Hosting
• Power Management
• Environmental Conditioning and Alert Systems
• Safety
• Physical Access Control
• Shipping and Receiving
• Involvement in Contract Management
• Maintenance
80
5.3 Organizing for Service Operation
Service Operation functions
Service Desk
IT Operations
Management
Technical
Management
Application
Management
Operations Control
Facilities Management
82
Service Desk
• Primary point of contact
• Deals with all user issues (incidents, requests,
standard changes)
• Coordinates actions across the IT organization to
meet user requirements
• Different options (Local, Centralized, Virtual, Followthe-Sun, specialized groups)
83
Local Service Desk
84
Centralized Service Desk
85
Virtual Service Desk
86
Service Desk objectives
• Logging and categorizing Incidents, Service Requests
and some categories of change
• First line investigation and diagnosis
• Escalation
• Communication with Users and IT Staff
• Closing calls
• Customer satisfaction
• Update the CMS if so agreed
87
Service Desk staffing
• Correct number and qualifications at any given time,
considering:
– Customer expectations and business requirements
– Number of users to support, their language and skills
– Coverage period, out-of-hours, time zones/locations,
travel time
– Processes and procedures in place
• Minimum qualifications
– Interpersonal skills
– Business understanding
– IT understanding
– Skill sets
• Customer and Technical emphasis, Expert
88
Service Desk metrics
• Periodic evaluations of health, maturity, efficiency,
effectiveness and any opportunity to improve
• Realistic and carefully chosen – total number of call is
not itself good or bad
• Some examples:
– First-line resolution rate
– Average time to resolve and/or escalate an incident
– Total costs for the period divided by total call duration
minutes
– The number of calls broken down by time of day and day
of week, combined with the average call-time
89
Technical Management
• The groups, departments or teams that provide
technical expertise and overall management of the IT
Infrastructure
– Custodians of technical knowledge and expertise related
to managing the IT Infrastructure
– Provide the actual resources to support the IT Service
Management Lifecycle
– Perform many of the common activities already outlined
– Execute most ITSM processes
90
Technical Management organization
• Technical teams are usually aligned to the technology
they manage
• Can include operational activities
• Examples
– Mainframe management
– Server Management
– Internet / Web Management
– Network Management
– Database Administration
91
Technical Management — Objectives
• Design of resilient, cost-effective infrastructure
configuration
• Maintenance of the infrastructure
• Support during technical failures
92
Technical Management — Roles
• Technical Managers
• Team Leaders
• Technical Analysts / Architects
• Technical Operator
93
IT Operations Management
• The department, group or team of people responsible
for performing the organization’s day-to-day
operational activities, such as:
– Console Management
– Job Scheduling
– Backup and Restore
– Print and Output management
– Performance of maintenance activities
– Facilities Management
– Operations Bridge
– Network Operations Center
– Monitoring the infrastructure, applications and services
94
IT Operations Management — Objectives
• Maintaining the “status quo” to achieve
infrastructure stability
• Identify opportunities to improve operational
performance and save costs
• Initial diagnosis and resolution of operational Incidents
95
IT Operations Management — Roles
• IT Operations Manager
• Shift Leaders
• IT Operations Analysts
• IT Operators
96
Applications Management
• Manages Applications throughout their Lifecycle
• Performed by any department, group or team
•
•
•
•
•
97
managing and supporting operational Applications
Role in the design, testing and improvement of
Applications that form part of IT Services
Involved in development projects, but not usually the
same as the Application Development teams
Custodian of expertise for Applications
Provides resources throughout the lifecycle
Guidance to IT Operations Management
Applications Management — Objectives
• Well designed, resilient, cost effective applications
• Ensuring availability of functionality
• Maintain operational applications
• Support during application failures
98
Applications Management — Roles
• Application Manager / Team leaders
• Applications Analyst / Architect
Note: Application Management teams are usually
aligned to the applications they manage
99
SERVICE OPERATION ROLES
• Service Desk roles
• Technical Management roles
• IT Operations Management roles
• Application Management roles
• Event Management roles
• Incident Management roles
• Request Fulfilment roles
• Problem Management roles
• Access Management roles
100
SERVICE OPERATION ORGANIZATION STRUCTURES
Organization by technical specialization
Organization by activity
Organizing to manage processes
Organizing IT Operations by Geography
Hybrid organization structures
101
Organization by technical specialization
102
Organization by activity
103
Organizing to manage processes
It is not a good idea to structure the whole organization
according to processes. Processes are used to overcome
the ‘silo effect’ of departments, not to create silos.
However, there are a number of processes that will need a
dedicated organization structure to support and manage
it. For example, it will be very difficult for Financial
Management to be successful without a dedicated Finance
department – even if that department consists of a small
number of staff.
In process-based organizations people are organized into
groups or departments that perform or manage a specific
process. This is similar to the activity-based structure,
except that its departments focus on end-to-end sets of
activities rather than on one individual type of activity.
104
Organizing IT Operations by Geography
105
Hybrid organization structures
It is unlikely that IT Operations Management
will be structured using only one type of
organization structure. Most organizations use a
technical specialization, with some additional
activity- or process-based departments.
The type of structure used and the exact
combination of technical specialization, activitybased and process-based departments will
depend on a number of organizational variables.
106
Centralized IT Operations, Technical and Application
Management structure
107

similar documents