iWork: Analytics in Human Resource management

Invited Talk at the Forum for Information Retrieval
Evaluation (FIRE 2012), Indian Statistical Institute,
Kolkata, India on 19-Dec-2012
iWork: Analytics for Human
Resources Management
Girish Keshav Palshikar
Tata Consultancy Services Limited
54B Hadapsar Industrial Estate, Pune 411013, India.
[email protected]
Human Resources Management
 HR Management is a crucial function within any organization
– More so in services, IT and BPO industries
– HR is a cost center; everyone interacts with HR!
 HR function is characterized by a large number of
Business processes
IT systems that automate these business processes
Huge databases of employee activities
Many employee contact initiatives
Close interactions with L&D
Associated metrics and KPI
Monitoring and regulatory compliance
Phases in HR Management
 Talent Acquisition
– Requirement gathering, recruitment planning
– Campus recruitment, EP interviews and recruitments
 Talent Management and Utilization
Allocation (e.g., to project), team formations, monitoring
Roles and tasks, Utilization tracking (billing, timesheets)
Transfer, deputation, Travel
Training, Knowledge management
Performance appraisal, Promotion, Salary
Administration (leave, medicals, …)
 Talent Retention
– Feedback, complaints, grievances, …
– Resignation handling, retention, knowledge transfer, succession
Business Goals of HR Management
 HR is responsible for maintaining a high-quality workforce
– Well-aligned and competitive for the business of the organization
– Effective in performing the business tasks and services, delivering
required value and meeting client expectations;
– Well-trained in the required and emerging skills
– Highly responsive to emerging business requirements
– Stable (low attrition, low impact of attrition; successful succession
and knowledge transfer)
– Cost-effective (low salary and overhead costs)
– Able to evolve into leadership
– Agile and Mobile (quickly form effective and distributed teams)
– Motivated, high on initiative and ownership, highly proactive
– Happy with their environment, work, roles, salaries, career paths
– Follows professional ethics, codes of conducts etc.
– Well-integrated and diverse; highly communicative
What Makes HR Management Challenging?
The human factors!
Large and varied backgrounds of the workforce
Globalization, diversified and distributed workforce
New business demands (services, products, …)
New business models
New customers across the globe
Mergers and acquisitions across the globe
Changing skills requirements
Innovation (disruptive / incremental, technical / domain)
Risks (ethical violations, data privacy, client confidentiality)
iWork: Analytics for HR
HR Domain Knowledge
• Make effective use of historical databases and document repositories for
solving HR domain-specific problems
• Use analytics-driven decision making to meet HR business goals
• Combine data and text mining to build innovative HR domain-specific
solutions for significant enterprise
• Deliver analytics-derived to the right users at the right point in the HR
business processes
Technology Areas
Data Mining, Machine Learning, Pattern Recognition, Statistical Analysis, Natural
Language processing, Computational Linguistics, Text Mining; Optimization
iWork: Opportunities
 Reduce effects of attrition: understanding of root causes, accurate
prediction, targeted retention strategies, backup/replacement plans, …
 Improve project team formation: optimal mix of experience and
expertise for all projects, optimal cost team, maximize match with
associates’ interests
 Improve associate satisfaction: better understanding of drivers, identify
concrete action items, cost-effective improvement plan, what-if analysis
 Talent acquisition: reduce delays and costs, maximize match with
requirements, school evaluation, identify patterns of long-term stay, …
 Team profiles: in terms of backgrounds, skills, roles, domains, …
 Effective RFI Responses/proposals: locate experts, experience, tools
 Available HR Datasets: resumes (internal, external), timesheets, project
details, in-house tool/document repositories, allocations, trainings, surveys, …
iWovierk Overview
 QUEST: employee/customer survey analytics
 iRetain: Attrition analytics; retention analytics
 Resume Center: extract structured information from resumes;
match job requirements; find experts; team skill profiles; …
 ExBOS: optimal project team formation
 iTAG: analytics for talent acquisition
 Analytics for improving effectiveness of training programs
 …
iWork Vision: Mine HR data to Drive Improvements to
Workforce Management
Talent Retention
Talent Utilization and Management
Talent Acquisition
HR Datasets
iWork: Strategy and Offerings
Build a set of integrated offerings that identify specific improvement
opportunities for Workforce Management
Transcend silos in HR systems and data
Integrate the offerings with existing HR systems to deliver the actionable
recommendations to the users when they need
Talent Acquisition
•TA analytics: improve cost,
quality, timeliness of TA
• ILP analytics
Talent Retention
• As-is state dashboard for attrition
• Discovering high-attrition groups
• Predictive models for attrition
• Identifying root-causes of attrition
• Plan for reducing attrition impact
• Optimal retention plan
Talent management
• Quest: employee survey analytics
• EXBOS: optimal team formation
• Visa analytics
• Resume Center: competency extraction; find right
people for positions; find experts; find misinformation;
enrich RFP response (past projects, tools); talent pool
profiling; find customer intelligence;
• Bench analytics
ITIS workforce management: team sizing + shift
planning; optimal team skill profile; service level
rationalization; expert finding; training plans; DC
transformation planning
Automation of survey responses tagging Nielsen
Practical Applications Need a Lot More than IR
Document retrievals; ranking
Fine-grained retrieval
Goal-directed retrieval
Information extraction
Cross-linking and information fusion
(e.g., with FB, LinkedIn)
Analytics (problem-specific)
Learning to rank
Various databases
and text repositories
Effective use of resumes in all HR functions
Using Resumes in HR Functions
 Resumes: a valuable source of information for people’s work
– ~250,000 TCS employees’ resumes
– ~2 million candidates’ (applicants’) resumes
Business goals
– Use information extraction: extract personal, job history, project
details, education, training, awards etc. from given resumes
o Validate information in resumes
o Create gazettes (colleges, degrees, certifications, tools, companies)
– Update employee experience profile and skills/competencies
– Identify top K best matches for a given job requirement / position
o Learning to rank (poster in FIRE 2012)
o improve team formation; shorten recruitment cycle
– Perform mining of data extracted from resumes to derive novel,
actionable insights about the available talent pool
o reduce bench; reduce attrition; improve utilization
o identify training opportunities; help in career planning
Resume Center: Information Extraction
On-the-fly extraction
using an IR engine
Resume Center: Information Extraction
Resume Center: PowerMiner
 Given a set of resumes, provide facilities to help in filing
RFI/RFP responses, form project teams etc.
 locate relevant projects for a given project description
 locate relevant tools for a given project description
 identify expert persons for a given technical area
 assign domain(s) to each resume (e.g., insurance, railways,
banking, telecom etc.)
 Identify "unusually high quality" resumes in terms of a set of
pre-defined quality criteria
– Special tools, niche skills, extra qualifications (e.g., domainrelated), top-quality academic performance, awards, publications
Resume Center: Team Profiler
 Given a resume repository, help HR executives in building an
“understanding” of their teams:
– What are the strengths and weaknesses of my team in terms of
technical skills, domain knowledge, roles etc.?
– What should I do to improve the quality of my teams?
 Create a summary profile of a team, in terms of technology
skills, domains, experience etc.;
 Group the given resumes into clusters (from different
perspectives), with specific interpretation for each cluster
– Similar to customer segmentation?
 Document repository visualization and exploratory facilities
R. Srivastava, G. K. Palshikar, RINX: Information Extraction, Search and Insights
from Resumes, Proc. TCS Technical Architects' Conf., (TACTiCS 2011),
Thiruvanthapuram, India, Apr. 2011.
S. Pawar, R. Srivastava, G.K. Palshikar, Automatic Gazette Creation for Named
Entity Recognition and Application to Resume Processing, Proc. ACM COMPUTE
2012 Conference, Pune, India, 24-Jan-2012.
G.K. Palshikar, R. Srivastava, S. Pawar, Delivering Value from Resume
Repositories, TCS White Paper published on www.tcs.com, Feb. 2012. (c) Tata
Consultancy Services Limited.
Survey response analytics
QUEST: Overview
 Advanced analytics tool to mine survey response data and derive
novel, actionable insights for improving workforce management
 Surveys are a direct and effective mechanism to gauge concerns and
issues that affect satisfaction of employees or customers
 Motivation: TCS conducts an annual in-house employee survey
– 250,000 employees, ~100 questions (structured, free-form)
– 250,000 textual responses to each of ~20 questions
– Challenges: volumes; dependencies; mixed structured/text responses
Business goals: improve satisfaction levels among employees
Benefits: deeper insights, objective results, reduced time/efforts
Impact: Satisfaction levels affect projects quality, client satisfaction
Status: Currently deployed in-house
– QUEST should be an integral part of all HR contact and feedback
programs throughout TCS (ISU, geographies, clients etc.)
– Deploy for customer / product satisfaction surveys
QUEST: Approach
Dashboards and standard reports
Drill-down exploratory analysis
Summarize responses to specific questions/categories
Identify specific issues, concerns and suggestions
Characterize low-satisfaction groups (discover common
characteristics of employees with high/low satisfaction)
 Identify factors (root causes) that affect satisfaction
 Design optimal plans to improve satisfaction levels
 Use survey results in team planning and other workforce
management tasks
G.K. Palshikar, S. Deshpande, S. Bhat, QUEST: Discovering Insights from Survey Responses, Proc. 8th
Australasian Data Mining Conf. (AusDM09), Dec. 1-4, 2009, Melbourne, Australia, P.J. Kennedy, K.-L. Ong,
P. Christen (Ed.s), CRPIT, vol. 101, published by Australian Computer Society, pp. 83 - 92, 2009.
QUEST: Results
QUEST: Results…
Things you don’t like about TCS
Quest: Results…
PULSE 2008-09 Responses for TCS Mumbai
Groups having unusually low ASI
EXPERIENCE_RANGE = ‘4-7’ (60.4; global avg. = 73.8)
Root causes for low ASI
Canteen, Transportation, RMG
Interesting subset discovery: finding bumps in a large-dimensional distribution
M. Natu, G.K. Palshikar, Interesting Subset Discovery and its Application on Service
Processes, Proc. Workshop on Data Mining for Services (DMS 2010) held as part of the Int.
Conference on Data Mining (ICDM 2010), Australia, 2010, pp. 1061-1068.
QUEST: Results…
Actionable suggestions made by associates
TCS can have tie ups with best Schools in the near
by locations for their employee kids
… the moment you step out there is only garbage
and randomly parked autos around
TCS can engage with lease agreement … with TATA
Housing itself and provide economical
I don`t have any leg space...n my knees are hurting
S. Deshpande, G.K. Palshilkar, G Athiappan, An Unsupervised Approach to Sentence
Classification, Proc. Int. Conf. on Management of Data (COMAD 2010), Nagpur, 2010,
Allied PublishersPvt. Ltd., pp. 88 - 99.
Sentence Classification
 Sentence class labels are usually domain-dependent
 Unsupervised classification of sentences: specific / general
Sentence Classification…
 A SPECIFIC sentence is more ”on the ground”
 A GENERAL sentence is more ”in the air”
 Example:
– My table is cramped and hurts my knees.
– The work environment needs improvement.
– Travel vouchers should be cleared within 2
working days.
– Accounts department is very inefficient.
Sentence Classification…
 Compute a specificity score for each sentence:
– Unsupervised (knowledge-based), without the need for
any labeled training examples.
– Define a set of features and compute their values for
each sentence.
– The features are lexical / semantic.
– The features are context-free: their values are
computed exclusively using the words in the sentence
and do not depend on any other (e.g., previous)
– Then combine the feature values for a particular
sentence into its specificity score.
 Rank the sentences in terms of their specificity score.
Sentence Classification…
 Sentence features
– Average semantic depth (ASD)
– Average semantic height (ASH)
– Total occurrence count (TOC)
– Count of Named Entities (CNE)
– Count of Proper Nouns (CPN)
– Sentence Length (LEN)
Sentence Classification…
 Semantic depth (SD) SDT(w) of a word w is the
distance (number of edges) from the root of
ontology T to word w in T
– We use T = WordNet ISA ontology
– More semantic depth  more specific word
Sentence Classification…
 Semantic depth of a word changes with its POS
tag and with its sense;
– SD(bank) = 7 for financial institution
– SD(bank) = 10 for flight maneuver sense.
 Solution:
– Apply word sense disambiguation (WSD) during
pre-processing; or
– Take average of the semantic depths of the
word for top k of its senses
 Average semantic depth S.ASD for a sentence
S = <w1 w2 . . . wn> containing n content-carrying
words = the average of the semantic depths of
the individual words
 My table hurts the knees.
– (8 + 2 + 6)/3 = 5.3
 The work environment needs improvement.
– (6 + 6 + 1 + 7)/4 = 5.
 Semantic height (SH) SHT(w) of a word w is the length of
the longest path in T from word w to a leaf node
– We use T = WordNet hyponym ontology
– Lower semantic height  more specific word
 Average semantic height S.ASH for a sentence
S = <w1 w2 . . . wn> containing n content-carrying
words (non stop-words) = the average of the
semantic heights of the individual words
 Semantic height of a word changes with its POS
tag and with its sense;
 Solution: use WSD or take average of the
semantic heights of the word for top k of its
 Intuition: more specific sentences tend to include words
which occur rarely in some reference corpus
– apple (2), fruit (14), food (34)
 More the number of rare words in a sentence, more
specific it is likely to be.
 OC(w) = occurrence count of word w in WordNet;
– if w has multiple senses, then OC(w) = average of the occurrence
counts for top k senses of w
 Total occurrence count S.TOC for a sentence S = <w1 w2
... wn> containing n content words is the sum of the lowest
m occurrence counts of the individual words, where m is a
fixed value (e.g., m = 3).
 OC of a word changes with its POS tag and with its sense;
 Solution: use WSD or take average of the OC of the word
for top k of its senses
 Named entities (NE) are commonly occurring
groups of words which indicate specific semantic
Person name (e.g., Bill Gates)
Organization name (e.g., Microsoft Inc.),
location (e.g., New York),
date, time, amount, email addresses etc.
 Since each NE refers to a particular object, an NE
is a good indicator that the sentence contains
specific information.
 Another feature S.CNE for a sentence S is the
count of NE occurring in S
 Proper Nouns (PN) are commonly occurring
groups of words which indicate specific semantic
– Abbreviation (IBM or kg), domain terms
(oxidoreductases), words like (Apple iPhone),
numbers etc.
 Since each PN may refer to a particular object, an
PN is a good indicator that the sentence contains
specific information.
 Another feature S.CPN for a sentence S is the
count of PN occurring in S
 Sentence length, denoted S.Len, is a weak indicator of
its specificity in the sense that more specific sentences
tend to be somewhat longer than more general sentences.
 Length refers to the number of content carrying words (not
stopwords) in the sentence, including numbers, proper
nouns, adjectives and adverbs
 Features have contradictory polarity.
– We want higher values  more specificity.
– Not true for features ASH and TOC
– Lower values  higher specificity for these
 Scales of values for various features are not the same,
because of which some features may unduly influence the
overall combined score.
– E.g., ASD is usually  10, whereas TOC is a larger
 Uniform scaling: map x[a, b] to y  [c, d]
Scaling + reversal of polarity
Sentences 6, 7, 9 as top 3 in terms of specificity score
Some specific sentences identified by our
algorithm from 110,000 responses in
an employee satisfaction survey
Some specific sentences identified by our algorithm from 220 sentences from 32
reviews of a hiking backpack product by Kelty.
 Domain-driven IR = IR + text-mining of retrieved
 Enterprise document repositories offer good
scope for Domain-driven IR to deliver solutions
and insights relevant for real-life business
problems and decisions

similar documents