Unsupervised Modelling , Detection
and Localization of Anomalies in
Surveillance Videos
Project Advisor : Prof. Amitabha Mukerjee
Deepak Pathak (10222)
Abhijit Sharang (10007)
What is an “Anomaly” ?
• Anomaly refers to the unusual (or rare event) occurring
in the video
• Definition is ambiguous and depends on context
Idea :
• Learn the “usual” events in the video and use the
information to tag the rare events.
• Unsupervised
• Anomalous
• SpatioTemporal
Step 1 : Unsupervised Modelling
• Model the “usual” behaviour of scene using parametric bayesian
• Topic Models : Leveraged from Natural Language Processing
• Given: Document and Vocabulary
• Document is histogram over vocabulary
• Goal: Identify topics in a given set of Documents
[Topics are latent variables]
Alternate view :
• Clustering in topic space
• Dimensionality reduction
NLP to Vision : Notations
Text Analysis
Video Analysis
Vocabulary of words
Vocabulary of visual words
Text documents
Video clips
Video Clips (or Documents)
45 minute video footage of traffic available
25 frames per second
4 kinds of anomaly
Divided into clips of fixed size of 4 seconds (obtained
empirically last semester)
Feature Extraction
• Three components of visual word :
• Location
• Spatio-Temporal Gradient and Flow Information
• Object size
• Features are extracted only from foreground pixels for
increasing the efficiency
Foreground Extraction
• Extracted using ViBe foreground algorithm and smoothened
afterwards using morphological filters
Visual Word
• Location :
• Each frame of dimension m x n is divided into blocks of 20 x 20
• HOG - HOF descriptor :
• For each block, a foreground pixel was selected at random and spatio-temporal
descriptor was computed around it.
• From the descriptors obtained from the training set, 200,000 descriptors were
randomly selected. 20 cluster centres were obtained from these descriptors by kmeans clustering.
• Each descriptor was assigned to one of these centres.
• Size :
• In each block , we compute the connected components of the foreground pixels
• The size of the connected components is quantised to two values: large and small
pLSA : Topic Model
• Fixed number of topics : 1 , 2 …  .
Each word in the vocabulary is
attached with a single topic.
• Topics are hidden variables. Used for
modelling the probability distribution
• Computation
• Marginalise over hidden variables
• Conditional independence
assumption: p(w|z) and p(d|z) are
independent of each other
Step 2 : Detection
• We propose “Projection Model Algorithm” with the following
key idea –
Project the information learnt in training onto the test document
word space, and analyze each word individually to tag it as usual or
• Robust to the quantity of anomaly present in video clip.
• Bhattacharyya Distance between documents :
• For documents  and  represented by the probability distributions in topic
space   and   respectively, the distance is defined by d = − log

• Cumulative histogram of m documents:
• A histogram obtained by stacking the word count histogram of the m
• Spatial neighbourhood of a word :
• For a word at location ,  , all words at locations  ± 1,  ± 1 ,  ± 1,  and
,  ± 1 with the same flow and size quantisation
• Significant distribution of neighbourhood word :
The distribution of a word is significant if its frequency in the cumulative
histogram is greater than a threshold ℎ
Test document
m nearest training
histogram of
More than 
have significant
Word occurs
more than
ℎ times
Word is “Usual”
Eight Spatial
neighbours of
Detection :
• Now each visual word has been labelled as “anomalous” or
• Depending on the amount of anomalous words, call the
complete test document as anomalous or usual.
Step 3 : Localization
• Spatial Localization :
Since every word has location information in it, w can directly localize the
anomalous words in test document to their spatial locality.
• Temporal Localization :
This requires some book-keeping while creating term-frequency matrix of
documents. We could maintain a list of frame numbers corresponding to
document-word pair.
Results Demo
• Anomaly detection
• Anomaly localization
Results : Precision-Recall Curve
Results : ROC Curve
Main Contributions
• Richer word feature space by incorporating local spatiotemporal gradient-flow information.
• Proposed “projection model algorithm” which is agnostic to
quantity of anomaly present.
• Anomaly Localization in spatio-temporal domain.
• Other Benefit :
Extraction of common actions corresponding to most
probable topics.
Varadarajan, Jagannadan, and J-M. Odobez. "Topic models for scene analysis and abnormality
detection." Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on.
IEEE, 2009.
Niebles, Juan Carlos, Hongcheng Wang, and Li Fei-Fei. "Unsupervised learning of human action
categories using spatial-temporal words." International Journal of Computer Vision 79.3 (2008): 299-318.
Olivier Barnich and Marc Van Droogenbroeck. “Vibe: A universal background subtraction algorithm for
video sequences”. Image Processing, IEEE Transactions on, 20(6):1709-1724, 2011.
Mahadevan, Vijay, et al. "Anomaly detection in crowded scenes." Computer Vision and Pattern
Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.
Roshtkhari, Mehrsan Javan, and Martin D. Levine. "Online Dominant and Anomalous Behavior Detection
in Videos.“
Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. “Learning realistic human
actions from movies”. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference
on, pages 1-8. IEEE, 2008.
Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international
ACM SIGIR conference on Research and development in information retrieval. ACM, 1999.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine
Learning research 3 (2003): 993-1022.
Summary (Last Semester)
• Related Work
• Image Processing
– Foreground Extraction
– Dense Optical Flow
– Blob extraction
• Implementing adapted pLSA
• Empirical estimation of certain parameters
• Tangible Actions/Topics Extraction
Extra Slides
• About
Background subtraction
pLSA and its EM
Previous results
Background subtraction
• Extraction of foreground from image
• Frame difference
• D(t+1) = | I(x,y,t+1) – I(x,y,t) |
• Thresholding on the value to get a binary output
• Simplistic approach(can do with extra data but cannot miss
any essential element)
• Foreground smoothened using median filter
Optical flow example
(a) Translation perpendicular to a surface. (b) Rotation about axis
perpendicular to image plane. (c) Translation parallel to a surface at a
constant distance. (d) Translation parallel to an obstacle in front of a more distant background.
Slides from Apratim Sharma’s presentation on optical
Optical flow mathematics
• Gradient based optical flow
• Basic assumption:
• I(x+Δx,y+Δy,t+Δt) = I(x,y,t)
• Expanded to get IxVx+IyVy+It = 0
Sparse flow or dense flow
Dense flow constraint:
Smoothness : motion vectors are spatially smooth
Minimise a global energy function
pLSA : Topic Model
• Fixed number of topics : 1 , 2 …  .
Each word in the vocabulary is
attached with a single topic.
• Topics are hidden variables. Used for
modelling the probability distribution
• Computation
• Marginalise over hidden variables
• Conditional independence
assumption: p(w|z) and p(d|z) are
independent of each other
EM Algorithm: Intuition
• E-Step
• Expectation step where expectation of the likelihood function is
calculated with the current parameter values
• M-Step
• Update the parameters with the calculated posterior probabilities
• Find the parameters that maximizes the likelihood function
EM: Formalism
EM in pLSA: E Step
• It is the probability that a word w occurring in a document d,
is explained by aspect z
(based on some calculations)
EM in pLSA: M Step
• All these equations use p(z|d,w) calculated in E Step
• Converges to local maximum of the likelihood function
Results (ROC Plot)
Results (PR Curve)

similar documents