Building text features for object image classification Group 1: Eddie Sun, Youngbum Kim, Yulong Wang Which object is presented？ Why we need text features? Main idea & Insights Main idea ◦ Determine which objects are present in an image based on the text that surrounds similar images. Insights ◦ First, it is often easier to determine the image content using surrounding text than with currently available image features. ◦ Given a large enough dataset, we are bound to find very similar images to an input image, even when matching with simple image features. Illustration for building text features Internet Images with text Text Features Framework of the approach Texts of These Similar Images Training Process K Most Similar Images Visual Features: SIFT, Gist, Color, Gradient and Unified of all previous one Experiment Dataset ◦ The PASCAL Visual Object Classes Challenge Experiment Features ◦ SIFT ◦ Gist an abstract representation of the scene that spontaneously activates memory representations of scene categories (a city, a mountain, etc.) ◦ Color Color Features in the RGB space ◦ Gradient ◦ Unified a concatenation of the above four features Experiment Experiment Experiment Experiment Experiment Summary How it works Results How it works? Extract visual features Input Image 1. Training images 2. Test images • • • • • SIFT Gist Color Gradient Unified Get similar images based on visual features Return most similar images with their labels Internet images dataset with text Visual features Puppy Visual Classifie r Dog cool dogs, boxer Construct text features from labels Dog Learn parameters on training images Cute, puppy, canine Dog, pet, animal Text features Text Classifier Merge Fusion Classifie r • Final Output Dog • Notes Unified Feature – weighted average of the above 4 features Text features – normalized histogram of tags counts Results Text features are built from visual features. Better visual features -> better text features Combining visual and text classifiers Visual and text classifiers correct each other Number of training images Small number of training images -> text classifiers outperform visual classifiers Combine -> always better Number of Internet images in dataset 200,000 -> 600,000 : Big improvement 600,000 -> 1 million : very small improvement Questions? Thank you!