Free: Pack Of MySpace Branded Playing Cards »
Google Researchers Teach Computers Out How To Recognize Images Of Famous Landmarks
by Erick Schonfeld on June 22, 2009

Image recognition is still one of those things that humans find easy to do but computers keep stumbling over. Some Google researchers published a paper describing progress they are making in teaching computers how to identify famous landmarks, which may eventually be applied more broadly to image search in general.

In a blog post, Jay Yagnik, the head of Computer Vision Research at Google, writes:

While we’ve gone a long way towards unlocking the information stored in text on the web, there’s still much work to be done unlocking the information stored in pixels.

In the experiment, the researchers fed “an unnamed, untagged picture of a landmark” found on the Internet and the system would spit back the name and location of the landmark, such as the Acropolis in Greece. Each untagged photo was be compared to 40 million GPS-tagged images on Picasa and Panoramio (both owned by Google), as well as related photos found through Google Image Search. Using clustering and new image indexing techniques, the Google researchers were able to identify untagged photos of the same landmarks from different angles and under various lighting conditions.

The researchers report that their system can identify 50,000 landmarks with 80 percent accuracy. I’m not sure that’s quite good enough to even roll that out in a beta product, but if Google can get it to 90 percent or 95 percent that would start to be consumer-friendly. Over the past few years, there’s been a lot of progress in image recognition, especially with facial recognition For instance, Face.com does a particularly good job with Facebook photos. But buildings and objects may be pose a different set of image-recognition challenges.

Advertisement

Comments rss icon

  • I’m curious. Whats harder; landmark recognition or facial recognition? Because Apple recently released facial recognition.

  • Why can’t you easily commercialize it at 80%? Let humans clean up the rest by applying “suggested tags” with buttons to vote them applicable or not, and then have that as feedback for the system to continue to learn.

    A more likely impediment to commercialization is the huge number of photos, the limited number of landmarks and computational expense of adding more.

  • Another case of GOOG imitating MSFT? Seems an awful lot like Photosynth to me.

    • I think that Microsoft has been doing R&D on this topic of web image retrieval over the last few years.

      Text-Search Tricks Speak Volumes in Image Search

      There are lots of various research topics that are pouring out of Microsoft Research Asia Division over the last few years which are being published in the computing literatures where Google and others might be close or secretive about their R&Ds although they do occasionally publish their researches, one can always find that Microsoft is ahead in cutting-edge new researches. Since Google doesn’t publish as many research articles as Microsoft and when something or product comes out of Google, you thought, umm, Microsoft’s R&D people have been doing development in that area before Google had jumped in.

      • For readers here who are not familiar with R&Ds or reading scientific papers, check out this site for Web Search & Mining at Microsoft Research , Asia Division, where this army of PhDs do make their research papers available for free download (ie, very genuine considering that the copyrights are now owned by the publishers and not Microsoft ). One can check out each individual researcher and it will take you to their own sites, which you can find any published paper there for download if it is available or request to send a copy if it is not.

        You won’t find free new algorithms from Google genuinely or openly being made freely available like the way that Microsoft does. I frequently request papers from Google researchers, but that’s because I stumbled upon their work from the online literatures (where only the abstracts that are available for viewing but not the full publication).

        I regularly check out the Microsoft Research’s various groups publications just to check if any new algorithm would be of interest to me. I had implemented various algorithms being published in Microsoft papers (mostly from Web Search & Mining) in the past and they (authors) were eager to help me out, in giving more details about methods of implementations (faster versus accuracy), checking my codes for correctness, memory issues, etc,…

  • Sounds a little like Microsoft’s photosynth, which recognizes relationships between images and cobbles them together (http://photosyn...et/Default.aspx)

    • Extremely different services. Stitching photos together based on overlaps and tagging an unknown picture are so different I am shocked anyone would say they sound the same. Probably not your fault though, you clearly didn’t read the entire article or just spew nonsense regularly.

      • they’re closer than you think; Photosynth (the lab technology, not the released beta) does have the capability to do the same matching. I’ve not seen metrics on accuracy though, since the focus up to now has been on the stitching of identified “related” shots into 3D models. But hey, good for Google, this is an interesting & rich area for research on all sides.

  • 80 percent accuracy isn’t too bad for a beta product, but agreed if Google can achieve a more consistant experience that would be impressive. Until machines learn how to time travel, then I fear for the safety of John Connor!

  • 80% accuracy is sure better than the typos in this article

  • “While we’ve gone a long way towards unlocking the information stored in text on the web”

    I find this assessment a little bit self-sufficient… is matching a few keywords “unlocking the information stored in text” ???

  • this feels like a lot of recent stuff out of google – cool technology, but when you get right down to it, not that useful.

    i mean, sure, you could probably name half a dozen narrow use cases, but when you add it up i doubt any of them rise to the level of being something people would pay for.

    i imagine the most important use for it will be in image search. but photos of monuments are already pretty easy to come by. and honestly, once you’ve seen 2 or 3 pictures of a monument, you’ve seen them all.

    • How about if you have a picture and you don’t know what it is of? Maybe the plan is to have an option to upload a picture, then from there google will compare it with what it has and tell you what it is.

  • They’ve been investing heavily in image recognition for a while now

  • “we present a new technology that enables computers to quickly and efficiently identify images of more than 50,000 landmarks from all over the world with 80% accuracy”

    When you dig into the paper, this is not quite the case. The system actually only uses 5312 landmarks (off by a factor of 10), and they only evaluate with 963 of these. They test 728 landmark images from 124 landmarks, and it correctly identifies 337 (46%) of the landmark images. It also correctly guessed that 417 are landmarks of some sort (e.g. that the Eiffel tower is “a landmark”). The 80% number comes from the fact that of the items that were classified as “landmarks”, it correctly identified which landmark it was (337/417).

    That’s a very interesting interpretation of accuracy: if it misses something completely, no problem, doesn’t count :) It’s kind of like a student taking an exam and saying “I didn’t answer questions 6-10, so you can’t count them!”

    I believe the more accurate statement would be:

    “…efficiently identify images of 124 landmarks (out of 963) from all over the world with 46% accuracy”

    But, hey, that’s a tougher sell.

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
Short URL
bugbugbug