Picture Tags and Word Knowledge

This project is on analyzing human description of photos. A full description can be found in the following paper, and an overview is provided below.

"Picture Tags and World Knowledge: Learning Tag Relations from Visual Semantic Sources"
Lexing Xie, Xuming He, ACM Multimedia 2013, Oct 2013, Barcelona (full paper) [pdf] [slides]

jump to: motivation | methodology | results | demo | data
Motivation

We studies the use of everyday words to describe images. The common saying has it that a picture is worth a thousand words, here we ask which thousand? The pro- liferation of tagged social multimedia data presents a chal- lenge to understanding collective tag-use at large scale. One can ask if patterns from photo tags help understand tag-tag relations, and how it can be leveraged to improve visual search and recognition.
Methodology

There are three main parts of this work:
  • A new method to jointly analyze three distinct visual knowledge resources: Flickr, ImageNet/WordNet, and ConceptNet. This allows us to quantify the visual relevance of both tags learn their relationships.
  • A novel network estimation algorithm, Inverse Concept Rank, to infer incomplete tag relationships.
  • An algorithm for image annotation that takes into account both image and tag features, by posing image tagging as a recommendation problem
  • Results

    We analyze over 5 million photos with over 20,000 visual tags. The statistics from this collection leads to good results for image tagging, relationship estimation, and generalizing to unseen tags. This is a first step in analyzing picture tags and everyday semantic knowledge. Potential other applications include generating natural language descriptions of pictures, as well as validating and supplementing knowledge databases.

    Demo: Explore the visual informativeness of tags

    This link will lead to an interactice plot exploring the visual informativeness of photo tags.
  • Each dot in the red scatter plot is a tag. Mouse-over to see which tag it is.
  • Select an area to zoom-in and zoom-out. The plot will display all tag names on a zoomed-in view of with 200 tags or less.
  • Click on a tag to display the top 50 wordnet synsets associated with this tag (yellow bars). Mouse-over each bar to display synset information.
  • Data

    We release the data file that contains the list of associated Flickr images for each synset in imagenet/wordnet.
      data file (.tar.gz, 124M)
        Each line in each file content contains the (synset-id, Flickr image id, url of the image).
        >> head n03437741.txt
        n03437741 2517428224 http://farm4.static.flickr.com/3160/2517428224_f0ac83532f.jpg
        n03437741 3106498561 http://farm4.static.flickr.com/3150/3106498561_f3e8c5c580.jpg
        n03437741 1290786822 http://farm2.static.flickr.com/1051/1290786822_9a9e69e00d.jpg
        n03437741 3421957804 http://farm4.static.flickr.com/3629/3421957804_7990e8f7b2.jpg
        ... ...

    September 2013
    Contact: Lexing Xie