Powering up Analytics with Big data

Report
POWERING UP ANALYTICS WITH BIG DATA THE SAS WAY!
-PRIYA SARATHY, PH.D
ANALYTIC SALES CONSULTANT, SAS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SALUTE TO THE WORLD RUN BY
STATISTICIANS
Play
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
AGENDA
•
High Performance Analytics (HPA)
•
•
•
•
•
Meeting Challenges
The What?
Understanding the Analytic paradigm Shift
High Performance Analytics – the SAS way
What is the business value add
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MEETING CHALLENGES
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
HIGH
PERFORMANCE WHAT IS HPA DELIVERING
ANALYTICS
•
What is HPA about?
•
•
Evolving business needs
Why does business need it?
•
Leveraging information to compete in the market
• Raise revenue/ profits
• Reduce costs and inefficiencies
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Proactive
Big
Analytics
Big Data
Analytics
Reactive
Analytic Capabilities
HIGH PERFORMANCE ANALYTICS GREW FROM THE NEED FOR BIG DATA
ANALYTICS!
BI
Big Data BI
Large
Big
Data Size
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
HIGH
HPA IS IMPACTING BUSINESS PERFORMANCE IN MANY
PERFORMANCE
AREAS
ANALYTICS
Probability of Default on
Mortgage
Stress Testing Portfolio
Next Best Offer
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
•Data Analysis, Variable Selection, Modeling – millions of customers
scored in batch
•Reduce the time to complete all these tasks from 167 hours to 84
seconds!!!
•Market risk solution that simulates market states to derive the value at
risk
•Understand exposures by counterparties / instrument , Rapidly respond
to crisis and adjust your positions accordingly
•Recalculate entire risk portfolio in 12 minutes –down from 18 hours!!
•Multiple offers, millions of customers, Regional, response history ,
business rule constraints.
•Optimization across cross-sell, upsell offers can run several hours
•Speed up computation from 5.5 hours to 2 minutes.
HIGH
PERFORMANCE WHAT CONVERSATIONS ARE YOU INVOLVED IN?
ANALYTICS
Fraud Detection
Forecasting Inventory
Management
Retail Marketing
Real Time Relationship Marketing
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
•More data analyzed for fraud – more quickly and accurately than ever –
across all departments from inside a single enterprise data warehouse.
•Trade monitoring-unauthorized trades, Commercial fraud –ACH, Wire,
Warranty, Customer fraud - payroll, claims fraud.
•Multi level relationships, Segments, global markets
•Accuracy in demand forecasting, daily to weekly forecast updates
across several models
•Promote inventory flow from 24 months by 85%
•Household Targeting, Retail bank Campaigns, Customer Acquisition Model
•Data Analysis, Variable Selection, Modeling
•Real time offers – coupons, cross sell offers
• Sports retailer, Location-based analytics and CLV modeling with real time
updates, pattern and behavioral analysis = > 60% increase in response rates.
• Airline operations: 8-10 hours of modeling, lagged data creating suboptimal
decisions – faster insights, greater accuracy from multiple iterations, reduce
operation cost.
WHAT DO OTHERS DATA MEASUREMENT IS THE MODERN EQUIVALENT OF
THINK? THE MICROSCOPE*
28 year Asst. professor at Stanford combined math
with political science in his undergraduate and
graduate studies, seeing “an opportunity because the
discipline is becoming increasingly data-intensive.”
His research involves the computer-automated analysis
of blog postings, Congressional speeches and press
releases, and news articles, looking for insights into
how political ideas spread.
It’s not just more streams of data, but entirely new
ones- countless digital sensors worldwide in industrial
equipment, automobiles, electrical meters and shipping
crates- measure and communicate location, movement,
vibration, temperature, humidity, even chemical
changes in the air..
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
At the World Economic Forum last month in
Davos, Switzerland, Big Data was a marquee
topic. A report by the forum, “Big Data, Big
Impact,” declared data a new class of economic
asset, like currency or gold.
* Quote from Professor Brynjolfsson
The Age of Big Data, By STEVE LOHR, NYT
THE WHAT?
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
THE NEW NORMAL – WHAT IS HPA DOING TO
ANALYTICS?
•
•
•
The Things you
can Think!
•
•
•
•
•
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Analyze 100% of data
More/New variables
More model iterations
Manage complex models
More models (per domain area)
More questions/ideas/scenarios to evaluate
Multiple deployment options: batch, real-time
Continuously monitor model effectiveness and retrain
HIGH
HPA COMBINES THE THREE PILLARS TO DELIVER
PERFORMANCE
RESULTS
ANALYTICS
•
Data: Leveraging technology to collect, access
and manage data
• Analytics: Adapting to new technology, Inmemory, Grid, In-database
• Platform: Positioning analytics within industry
leaders technology solutions
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
HIGH
ADVANCED ANALYTICS AND FAST COMPUTING CAPABILITIES ARE
PERFORMANCE BROUGHT TOGETHER WITH SAS HPA
ANALYTICS
•
In a recent National Post interview with Jim Goodnight, the SAS CEO
explains it like this:
There's a lot of business processes that will be changing because of the speed at
which we can do analytics; using a thousand processes in parallel to do these
computations can make it possible to do huge problems that we would never have
been able to do before because it would take too long on a single processor.
•
A big part of how HPA gets its speed: it breaks larger problems down into
smaller pieces.
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
HIGH
PERFORMANCE HPA HELPS REMOVE LIMITATIONS
ANALYTICS
•
•
•
•
•
•
•
•
•
From Sampling to Populations analysis
50 Attributes to 500+ Attributes
Reduce run times 18 Hrs - 30 minutes
Build more complex models
3 month Lagged modeling to Real time updates
Structured data to combining unstructured data
Shortening model lifecycle
More frequent updates, model iterations
real time scoring impacting business bottom-line
You will have more time to think!
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
UNDERSTANDING THE ANALYTIC PARADIGM
SHIFT
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MODEL LIFECYCLE HOW MUCH TIME DO YOU SPEND ON YOUR MODELS?
•
Where would you like to
spend more time?
Monitoring
& Results
Reporting
15%
Data Analysis
45%
Validation &
Implementation
10%
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Model Build
30%
RESPONSIBILITIES
OF AN STATISTICAL MODEL BUILDING PARADIGM SHIFT
ANALYST
•
•
•
•
•
•
•
•
•
•
•
Extract, Transform, Load data
Data massaging/ mining
Aggregating, normalizing data
Identifying Analytic approach
Building Samples
Building Models
Creating Scoring Code
Validation Reports/ model documentation
Implementation for Production
Results monitoring
Update, refresh, or rebuild model
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
•
IT – shifting responsibilities to
•
•
•
•
•
•
EDW/ DW
Data Quality
Data integration
ODS
Production implementation
Analyst – building models
•
•
•
•
Access to more and better data
Need for documentation and transparency
Greater number of business solutions
Changing market and data dynamics
impacting frequency of build and update
MODEL LIFECYCLE CHANGING ROLES AND RESPONSIBILITIES
•
New technology, new tools
• New business processes
• New competitive demands
Monitoring
& Results
Reporting
5%
Data Analysis
25%
Validation &
Implementation
10%
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Model Build
60%
THE FARFALLE
THE BASIC STRUCTURE OF ANALYTIC FUNCTION
MODEL
Source: IDC, 2012
• 70% of the effort in analytics is typically on the information management side of the model.
• Analytical teams in the middle are small but crucial for translating the data assets into actionable
insights.
• The organization change side highlights the attributes of behavior changes needed by business
users.
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Working with a Tsunami of data
VOLUME
DATA SIZE
VARIETY
VELOCITY
VALUE
TODAY
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
THE FUTURE
HIGH PERFORMANCE ANALYTICS –
THE SAS WAY
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS® HIGHPERFORMANCE
ANALYTICS
EMBRACING NEW TECHNOLOGY, BUILDING NEW
STRENGTHS
Visual
Analytics
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PHYSICAL
LAYOUT
SCALABLE ANALYTIC CAPABILITY
Node 1
SAS Analytic
& Scoring
Accelerators
RDMBS
Node 2
Shared /
Clustered File
SAS Metadata
Servers
[Controller Node n cores]
Node n
SAS Analytic
& Scoring
Accelerators
HADOOP
CLIENT FRAME
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MID-TIER
COMPUTING FRAME
DATA FRAME
HIGH
PERFORMANCE CHANGING THE WAY ANALYTICS IS DONE BOTTOMS UP
ANALYTICS
Data
Preparation
• DS2
• SORT
Data Exploration
Analytics
• SUMMARY/MEANS
• HPLOGISTIC
• FREQ
• HPREG
• RANK
• HPLMIXED
• HPFOREST
• HPNEURAL
• HPREDUCE
• HPNLIN
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS® HIGHPERFORMANCE AREAS OF MODEL DEVELOPMENT THAT BENEFIT
ANALYTICS SERVER
Predictive Analytics &
Data Mining
• Binary target &
continuous no.
predictions
• Linear & NonLinear modeling
• Complex
relationships
• Tree-based
Classification
Text Mining
•
Parsing largescale text
collections
• Extract entities
• Auto. stemming &
synonym
detection
• Topic discovery
Optimization*
•
Econometrics Time
Series
Local search
optimization
• Large-scale linear
& mixed integer
problems
•
Probability of an
event(s)
• Severity of
random event(s)
*Currently only available for Teradata and EMC Greenplum
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
IN-MEMORY HIGH PERFORMANCE ANALYTICS
HPA
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
VA
FINANCIAL SERVICES CUSTOMER ACQUISITION USE CASE
Current Process
MODEL
DEPLOYMENT
DATA
EXPLORATION
MODEL
DEVELOPMENT
One algorithm (Neural Network)
1 model per day
5 hours to process model
Model lift of 1.6%
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
High-Performance Process
Multiple algorithms (e.g. Forest, Logistic
Reg., etc.)
1 model per 30 minutes
3 minutes to process model
Model lift of 2.5%
84
SECONDS
•
Think left and think right and think low and think high. Oh, the thinks you can
think up if only you try!
Oh the things you can find, if you don't stay behind! Dr. Seuss (On Beyond
Zebra!, 1955)
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS® HIGHPERFORMANCE
ANALYTICS SERVER
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
KEY DIFFERENTIATORS
•
Only in-memory offering in the market delivering highend analytics, including text mining and optimization
•
Addresses the entire model development and
deployment lifecycle
•
36 years of proven technology...faster. Opens up vast
array of possibilities to get value from big data
ADDITIONAL CASE STUDIES
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
TOP FIVE WAYS HIGH-PERFORMANCE ANALYTICS WILL
TRANSFORM MARKETING
•
Faster, more sophisticated, effective segmentation
•
•
Real-time, relevant next-best customer actions or offers
•
•
companies to quickly and efficiently update their numerous models without submitting a slow overnight batch
update process.
1:1 real-time experiences to bolster brand connections
•
•
This results in a more relevant offer or customer interaction surfacing at the “point of need” in real-time
Instant deployment and management of marketing models that give you a sustainable
advantage
•
•
segmentation tests can be run against the entire populations in order to determine the best campaign interaction
methods
The outcome is more precise, real-time interactions with consumers at the “point of need.”
Optimized marketing for broader business impact
•
Now businesses can not only determine the customer and financial impacts of their campaigns faster but also
adapt instantaneously to market, competitive and customer changes.
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
UNITED HEALTHCARE GROUP
BUSINESS ISSUE
•
Electronic medical records (EMRs) driving a data explosion
• Utilize all of the unstructured text (records, case notes, emails,
transcripts, etc.)
• How to improve quality and cost of care? “Create Healthier Lives”
SOLUTION
SAS® High-Performance Analytics Server including HP Text Mining
• Greenplum Data Computing Appliance
•
RESULTS
•
•
•
•
•
•
Reduce model processing time from four hours to 10 seconds.
Reduce misclassification rates from 30% to 10%
Historical models improved with more than 10% lift
I can now tell that a prescription will harm a patient before you write it…
I can tell that a customer is dissatisfied before you lose him or her...
I can now determine that a claim is fraudulent before you pay it…
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
HEALTHCARE PAYER
“ SAS is helping make our
member services the best in
the industry, In less than one
hour, we can load a huge
table (169 million row
dataset), find the best
variables, compare different
models and pick the best
model. I would not attempt
to model a dataset this large
without SAS HPA Server.”
Mark Pitts
Director of Data Science,
Solutions and Strategy
SAS HIGHLEVERAGING DATABASE APPLIANCE FOR HPA
PERFORMANCE
Request is
Root Node
(Teradata Managed Server)
sent to the
root node
inside the
appliance
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Worker Node 1
Worker Node 2
Worker Node N
SAS HIGH- ANALYTICAL COMPUTATION AND DATA REQUEST SENT
PERFORMANCE TO THE WORKER NODES
Root Node
Worker Node 1
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Worker Node 2
Worker Node N
SAS HIGH- DATA REQUEST SENT TO THE DATABASE. DATA SLICE
PERFORMANCE MOVED INTO MEMORY
Root Node
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS HIGHANALYTIC PROCESSING WITH INTERNODE COMMUNICATION
PERFORMANCE
Root Node
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS HIGH- WORKER NODE RETURNED TO THE ROOT NODE. JOB IS
PERFORMANCE COMPLETE.
Root Node
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

similar documents