January 4, 2008

Google Lodges Patent For Reading Text In Images And Video

Duncan Riley

70 comments »

A patent application lodged by Google in July 2007 but recently made public seeks to patent a method where by robots (computers) can read and understand text in images and video.

The extension of the application would be that images and video indexed by Google would be searchable by the text located within the image or video itself, a big step forward in indexing that has not previously been available.

Information Week suggests that privacy issues raised by Google Maps Street View will get more complicated as eventually YouTube videos will be indexable via the text that appears within them.

A full copy of the patent application “Recognizing Text In Images” can be viewed here.

Some choice lines from the patent:

“Digital images can include a wide variety of content…For example, digital images can illustrate landscapes, people, urban scenes, and other objects. Digital images often include text. Digital images can be captured, for example, using cameras or digital video recorders. Image text (i.e., text in an image) typically includes text of varying size, orientation, and typeface. Text in a digital image derived, for example, from an urban scene (e.g., a city street scene) often provides information about the displayed scene or location. A typical street scene includes, for example, text as part of street signs, building names, address numbers, and window signs.”

I may be stating the blatantly obvious when I say that if Google has found a way to index text in static images and video this is a great leap forward in the progression of search technology. This will make every book in the Google Books database really searchable, with the next step being YouTube, Flickr (or Picasa Web) and more. The search capabilities of the future just became seriously advanced.

  • Sphere It

Comments

 

I think we’re on two 6.0 and I have trademarked it so that nob from web guild can’t steal it OK mr nob? good.

 

You know I don’t disagree with either comments. This is one time I will honestly say “wow” and more than once :-)

 

to*
Daya Baran*

 

Well, so much for e-mail images being an effective anti-spam technique.

 

wait, this is like any OCR algo? whats the difference?

 

Is Plaxo in violation of this filing already? Isn’t that what Scoble got screwed for?

From Jeremiah Owyang’s blog yesterday, covering a Q & A with Plaxo’s McCrea and Smarr:

Why didn’t you use use the FaceBook API?
FB doesn’t allow email addresses to be exported, which is critical to build a social graph. The script would grab the email addresses (which were an image), and profile data.

 

goflip
as far as I’m aware OCR reads text in *text* documents, not images. What this means is that Google can index text in anything, and the last time I looked OCR doesn’t read anything.

 

I wonder how much processing time they would need for indexing text in a minute of video. Probably as much as for a million websites, so that’s rather nothing for the near future.

 

There goes captcha validation :)

 

I only looked at the claims of this patent but it seems that the main claims end with “and performing optical character recognition on the enhanced image”. So, there seems to be no new OCR algorithm here… In spite of its apparent complexity, the patent reminds me of the one-click patent.

 

Peter
I’m not aware of any OCR platform being able to do images on the level that is suggested here, indeed I’d suggest this is why Google thinks it can patent the technique.

 

What’s more complicated than just OCR in this will probably be to find the text in an image first.
Probably not completely trivial for a machine to find the text in such an image and read it:
http://data.lustich.de/bilder/l/353-der-wunsch.jpg

 
 

Razvan
Picasa web albums, sort of Google’s version of Flickr that not many ppl have heard of :-)

 

The crazy guy on the corner of ever major metropolitan city holding a “THE END IS NIGH” sign is about to get mad internet famous via Street View.

 

For a background on OCR:

http://en.wikipedia.org/wiki/O.....ecognition

Has anyone here used Microsoft Document Imaging?

According to MS, MDI does a host of things, including allowing the user to “Search for text within scanned documents.”

http://en.wikipedia.org/wiki/M.....nt_Imaging

 

Oh yea…we all know Picasa…but Picaca? :P

 

ive been using the Riya API to do this same exact thing. parsing text in uploaded images using the OCR feature in Riya then making the image data searchable. Albiet, I’d much rather google patent this than some sue happy startup or SPAC, but only because I don’t expect google to sue me for continuing to offer search on images with text in them?

 

@9 (Thomas Promny)
“I wonder how much processing time they would need for indexing text in a minute of video. Probably as much as for a million websites, so that’s rather nothing for the near future.”

How much would it cost Google to put the necessary infrastructure in place needed to run this service?

On a side note, are Google’s data centers “green” and energy efficient? I am intrigued how Google fits into the call for a global reduction in our impact on the planet.

 

Whether Google can make it work is really not the issue. Most of what Google announces turns out to be ” much about nothing ‘ anyway. The key here is Google is turning more and more into a patent house. With all the patents they have filed in the past 24 months alone, seems like they’re just waiting for someone outside their complex to bring many of these ideas to operate, and more beholding to Google when they are ready for the masses. All of this in a move to sustain their ridiculous share price.

 

Why using algorithms if they have people doing it for free?

http://video.google.com/videop.....cture+game

 

This process is a simple hack like embedding a image in an image.

http://lifehacker.com/software.....282119.php

Not genius, but google gets a hat tip for wanting the patent more than anyone else out there.

 

Remember Riya? http://www.riya.com/

They where doing this all the way back in 2005 receiving quite a bit of publicity at the time.

It was rumored that Google was in acquisition talks at one point:
http://www.searchenginelowdown.....earch.html

 

@2 Dawson
> I think we’re on two 6.0

what is two 6.0?

Re Web 4.0 –

Web 4.0 will come later. It is about connecting intelligences in a ubiquitous Web where both people and things reason and communicate together.
(one view, per http://www.project10x.com/misc/SW2008.pdf)

 

I dont really understand this patent, since there are softwares for ages which can read the text on an image or scan
and if a software can do this, than a robot also can do this

 

i wonder what happened to that technology they created where they improved the photo quality by hundred folds so that a camera could take a picture of a few miles away very clearly.

can’t think of it but was in news less than a year ago .

its good that we are utilizing our brains to the advancement of human race.

Whats next from pioneers like Google — i would hope that it would find a way a quick software to tranlate languages (voice and text) on the fly !

<language translators sorely needs an technology breakthrough !!!)
what do you say ?

Ujw
exoticbuddha.com

 

BTW the police uses it with video, they film the passing cars, the software reads the number plate and says immidiately if that given car is wanted or not

 

Excellent news!

This will make it possible to leverage the added value of CycloMedia Technology for business and society even more!

 

You’ve missed the huge advances in Microsoft’s OneNote 2007, which already can search and index text within images. You should give it a try, it really is amazing technology.

As to whether or not Google has a legitimate patent claim, it might be a timing thing, and/or depend on the specific technique involved.

http://xidey.wordpress.com

 

@David Litsky:

“How much would it cost Google to put the necessary infrastructure in place needed to run this service?”

Increasing their infrastructure by factor 10 wouldn’t really be an easy thing to do for Google.

 

For those who are interested in seeing what OneNote already can do in this area, I’ve put a little screenshot demo up at http://xidey.wordpress.com/200.....microsoft/.

 

OCR software have been able to recognize text from images since they were introduced. Recognizing text in videos is the same thing as breaking the video down into individual frames and treating each frame as an image for OCR software to recognize. I don’t see how any part of this patent is legitimate.

 

Abolish software patents now! This doesn’t even come close to the non-obvious test. Google is about as evil as they come.

 

I think this is the same technology that http://catalogs.google.com uses. Do a search like “xbox 360″ and it will find the page of a catalog that has the text, very cool.

 

….I may be stating the blatantly obvious when I say that if Google has found a way to index text in static images and video this is a great leap forward in the progression of search technology.

I think Blinkx is already indexing video by transcribing voice to text.

 

This will also affect SEO…

Web designers may no longer have to just use TEXT or Alt tags to rank high.

Their Photoshop - and Flash logos and c ould appear as text in the SERPs
by the next decade

 

This is a good start for GOOG - but certainly nothing revolutionary. People have been doing this form of basic OCR for years.

The companies that are going to “win” in this space will be the ones that are identifying true objects within images/video. Text searching/identifying is not reliable enough on its own - companies like Eyealike, Polar Rose and a few others have been ahead of this curve for awhile now. Those are the types of companies that the GOOG’s of world be looking at to expand into this area.

 

Duncan, perhaps I’m getting a little nerdy here but for OCR to work the “text document” must be turned into a digital form known in the trade as a “DIGITAL IMAGE”. Video is just a series of these DIGITAL IMAGES. These DIGITAL IMAGES are then processed by what was once known as an “ANALYTICAL ENGINE” but what is now commonly referred to as a “COMPUTER”. OCR or OPTICAL CHARACTER RECOGNITION has been around for quite some time. The only thing unique here is that Google has access to a googles worth of DIGITAL IMAGES and can apply OCR in a batch process. The fact that you can do this batch process and from images found on the web seems rather obvious to people experienced in the art but I haven’t seen the patent so what do I know.

 

There is public domain prior art on this, i think

4 years ago, in SF, I talked with a vision researcher at the vision instit there (trying to remember the name ….) who had been working on a device that the partially sighted could carry around and which would speak road-signs to them. the method was basically as described - take video, recognise where the text is, ocr it and then, in this case, speech synthesise it

this was published, academic material

 

What they are doing is actually quite clever. First take a video of your target text, from, say, a passing car. Next determine the text area in each frame of that video, then use the slight variations between frames to enhance that area, and use standard OCR to read it.

You wouldn’t think that this is anything new, but in the AI world it is. Much of vision research has been stuck in a 2d world of flat, static images. Only recently has the power and importance of motion/time become apparent, and only very recently has it been used effectively. This is how your eyes work, you attend to a specific portion of a moving image and it is the moving object that your brain recognizes.

I’d say they are two thirds of the way there, using motion to get more information out of an inevitably lossy camera, but will probably still shove the enhanced image off to a standard OCR engine. The final step will be a new type of OCR which really uses on motion. My bet is on Numenta for that.

 

The first thing that came to my mind is that I can finally copy and paste the emails from my friend’s Facebook accounts. The image to text thing might work

 

I wrote a program to do OCR in the mid 80s and even then it was a far from new idea, with the post office having such systems to scan postcodes before then. So, I’m not sure what’s new here?

 

Paul Ford’s Aug 2009 prediction is looking interesting:

http://feedblog.org/2008/01/04.....eal-world/

I guess I’m going to have to implement this in Spinn3r now :-P

 

If the US Patent Office lets this through then they have lost the plot.

Last time I checked a “moving image” was nothing more than a sequence of “still images” display in sequence?? Digital Video, TV … all are simply displaying X Frames per second … where each Frame is a still image. Even if Google were taking the time-axis into account … they are simply stacking a series of still images for analysis - blah.

The UK Police, Australian Police, and Toll Road operators around the world us OCR on Number Plates daily (still images taken from Video).

Oh - and dont our “eyes” do this for us constantly?

There is nothing novel or new about this attempt by a corporate giant (who cares if they “do-no-evil”) to patent the bleeding obvious!

HamishE

 

This one will be tough for Google to enforce. There is so much technology around text recognition.

 

Evernote - the note-taking/web-clipping app has been doing this for ages, it is pretty neat the first time you see it.

 

The search capabilities of the future may be going to index sound too. :)

 

valla çok güsel bi şey yapmışsınız ne diyim helal olsun uraştınıza deycene emin olabilirsiniz bütün arkadaşlarıma önericem

 

Hi Duncan,

There are actually three Google patent applications which detail this method, and not just one.

Recognizing text in images (20080002893)
Enhancing text in images (20080002914)
Using extracted image text (20080002916)

The character recognition processes for Google Book Search are described in patent applications from Google that came out in 2006. Here’s one of them:

Presentation of search results based on document structure (20060074907)

This is already being done by Google for book search.

 

Vaporware; Hypeware. We’re at least some 10 years away from that.

 

Someone already pointed you to EverNote ( http://www.evernote.com/produc.....php#images ): it can recognize text in images (photos, scans, snapshots, even photos from crappy phone cameras, printed or handwritten text) pretty well.

 

There’s a publication from the year 2000 about how to do text recognition in video by segmenting regions and then using OCR on those regions: http://portal.acm.org/citation.cfm?id=329351. It seems to me that much of what Google is patenting is covered by research publications from that area. Of course, a proper analysis by a patent attorney would be necessary to evaluate the real value of the patent.

 

I’m wondering if SEO with Flash content will be impacted by that - I suppose so.

 

Now this indeed is serious development.

 

We have already developed a product that does this. In addition to finding text it is also able to find brand logos etc. in user generated video.

Findit LLC

 
 

Sorry, the comment form is closed at this time.