Report

Presented by Yehuda Dar Advanced Topics in Computer Vision (048921) Winter 2011-2012 Video Compression Basics Fundamental tradeoff among: Bit-rate Distortion Computational complexity Video Compression Basics Utilized redundancies: Spatial Temporal Psycho-visual Statistical H.264 Overview H.264 Redundancy Utilization Redundancy Utilization Means Spatial High Temporal High Psycho-visual Medium Statistical High • Transform coding • Intra coding (spatial prediction) Motion estimation & compensation • YCbCr color space • 4:2:0 sampling • DC \ AC coefficients quantization Entropy coding Compression using Computer Vision Motivation: Better utilization of the psycho-visual redundancy Application-specific compression methods Exploring new approaches A Review of: A Scheme for Attentional Video Compression R. Gupta and S. Chaundhury PAMI 2011 Method Outline Salient region detection Foveated video coding Integration into H.264 Foveated image coding demonstration Figure from Guo & Zhang, Trans. Image Process., 2010 Saliency Map Step 1: Creating a 3D Feature Map Feature type Calculation method Based on Global Color spatial variance Liu et al, CVPR 2007 Local Center-surround multi-scale ratio of dissimilarity Pulse-DCT Huang et al, ICPR 2010 Rarity Yu et al, ICDL 2009 Relevance Vector Machine (RVM) Used here as a binary classifier Advantages over support-vector-machine (SVM): Provides posterior probabilities Better generalization ability Faster decisions Saliency Map Step 2: Unify Features using RVM Training Procedure for MBs: Global average local average rarity average ground truth count pixels æavg ö çç global ÷ ÷ ÷ çç ÷ çç avglocal ÷ ÷ ÷ çç ÷ ÷ ÷ çç avg ÷ rarity ø è sample ‘salient’ \ ‘non salient’ label RVM Saliency Map Step 2: Unify Features using RVM Trained RVM Usage: æavg ö çç global ÷ ÷ ÷ çç ÷ çç avglocal ÷ ÷ ÷ çç ÷ ÷ ÷ çç avg ÷ rarity ø è New input Binary label ‘salient’ \ ‘non salient’ Probability Relative saliency RVM Saliency Map: Result Comparison input global local rarity [Huang et al, ICPR 2010] [Yu et al, ICDL 2009] proposed [Harel et al, NIPS 2006] [Bruce & Tsotsos, NIPS 2006] Figures from Gupta & Chaundhury, PAMI 2011 Saliency Map: ROC Curve Proposed [Harel et al, NIPS 2006] Figure from Gupta & Chaundhury, PAMI 2011 Integration Into H.264: Calculation of Saliency Values Recalculating saliency map only when it significantly changes Mutual-information between successive frames indicates changes in saliency: Figures from Gupta & Chaundhury, PAMI 2011 Integration Into H.264: Propagation of Saliency Values For inter-coded MBs, the saliency value is a weightedaverage of those pointed by the motion-vector Figures from Gupta & Chaundhury, PAMI 2011 Integration Into H.264: Salient-Adaptive Quantization Non-uniform bit-allocation Smaller saliency value => coarser quantization Integration Into H.264 Figure from Gupta & Chaundhury, PAMI 2011 Paper Evaluation Novelty: Methods for: saliency map saliency value propagation Assumption: All the MBs in P-frames are inter-coded (problematic) Writing level: Good Partially self-contained Paper Evaluation Feasibility: Higher complexity than H.264 encoders Not for real-time encoders Useful at low bit-rates Objects entering the scene may be considered unimportant Experimental evaluation: Saliency: visual comparison: good ROC curve comparison: partial Compression: None (authors’ future direction) Future Directions Improving encoding complexity less complex saliency method Better object entrance treatment Using mutual-information of frame areas Treat intra-coded MBs in P-frames A Review of: 3D Models Coding and Morphing for Efficient Video Compression F. Galpin, R. Balter, L. Morin, K. Deguchi CVPR 2004 Method Outline 3D model extraction 3D model-based video coding Reconstruction using adaptive geometric morphing 3D Models Stream Generation Figure from Galpin et al, CVPR 2004 Stream Compression Three data types to compress: 3D model Texture images Camera parameters Texture Image Compression Reconstruction Process: Figure from Galpin et al, CVPR 2004 3D Model Compression The 3D model originates in decimated depth map Compressed by: Wavelet transform Depth-adaptive quantization Figures from Galpin et al, CVPR 2004 Video Reconstruction: Texture Fading Figure from Galpin et al, CVPR 2004 Video Reconstruction: Texture Fading without texture fading with texture fading Figures from Galpin et al, CVPR 2004 Video Reconstruction: Geometric Morphing Improving 3D model interpolation Figure from Galpin et al, CVPR 2004 Video Reconstruction: Geometric Morphing regular interpolation interpolation with geometric morphing Figures from Galpin et al, CVPR 2004 Result Comparison with H.264 Paper Evaluation Novelty: Compression using unknown 3D model Assumptions: Static scene Moving monocular camera Neglected camera rotation GOP intrinsic parameters are fixed Writing level: Good Not self-contained Paper Evaluation Feasibility: Only for static scene video High encoder\decoder complexity Real-time unsuitable Useful at very low bit-rates Experimental evaluation: Sufficient visual comparison with H.264 No run-time information Future Directions Treat moving objects Improve complexity At least for real-time decoding Approach Comparison Attention 3D model Video type Any Static scene Bit-rates useful at Low Very low Encoder complexity High High Decoder complexity Regular High Integration in H.264 Possible Unsuitable Promising Inferior Overall evaluation