
As the Web becomes more sophisticated and audio and video becomes a more important part of our online lives, we need something that will help us sift through the junk and find what we’re looking for in all that content. To address that, Google announced Wednesday that it has launched audio indexing in Google Labs.
Dubbed GAudi, the new service is designed to work with YouTube, and will catalog all the words uttered during an audio or video clip. Once collected, the transcript will be added to a searchable database that can be accessed in much the same way you search for text-based websites.
But before you start wondering if Google has indexed your latest vlog, don’t get too excited. According to the company, GAudi is only in beta at this point and during that time, it will only index audio and video from political sources.
Aside from being able to search for terms mentioned in the video, you can click on a group of play buttons to bring you to the exact point in the video where the keyword is mentioned. In other words, if you’re searching for discussions the candidates have had about health, simply input that into the search field, and the best video result will be shown in the right pane. You can either choose to watch the entire video or click on the links below to be brought to the exact moment during the video that health was said by the political figure.
Google didn’t give any word about what would happen to GAudi after it’s done indexing political speeches, but the service has some promise and is definitely worth checking out.





This definitely has incredible potential. Can’t wait until they start expanding out from just political videos.
I also agree that his would be incredible, but in the future would like to see open submissions from key thought leaders, academics, etc. All of this would be an invaluable resource.
“Thanks Goog411!”
This is really amazing stuff. Looking forward to seeing the results.
I think Drew nailed this - all the training of GOOG411 (pretty good actually) has built up their internal models enough to make this a useful test-ground. And hey, if this means we can get to full-transcripts (or searchable at any rate) faster than the news channels can provide, that’s a great pilot project as we head into November.
this is big step towards the future of new search
This is awesome. We’ve known this will come and now it’s here.
Foreign languages? Now that is something.
this is so cool. searching screencasts and conferences will be easier now.
I agree with fahad. Conference speakers’ ideas will be more accessible
Daaaaamn, Google… I don’t like you but you RULE!
Even if you don’t want to like Google, you just gotta love them for some of the stuff they come out with. They have pioneered some of the most useful technological innovations of our time. Truly radical.
This is not new: mediaclipping.de / spactor.com have a search service based on voice recognition software that transcribes the audio to text. They cover about 80 TV and Radio channels in Germany.
First flash indexing now this!? HOT.
Byebye Delve Networks!
I wonder if this has anything to the fact that Adobe is getting ready to release some some sort of workflow that converts speech to text from Flash videos, as reported from beet.tv here and here.
I imagine the full announcement will likely come next week, when Adobe is supposed to reveal the details of CS4.
However, I imagine that Adobe’s software is likely in a private beta right now and who better to beta test the software than Google with YouTube videos?
Or it could be that Google is ahead of the curve and that this is their own technology, which many more will have access to when Adobe releases their own version of it.
It is nice how Google covers the shortcomings of the technology using the US election. From my point of view it is highly likely that they picked politicians not because of the race to the White House but because of a simple technical reason: Politicians are trained to speak slow, clear and about a pretty fixed set of names and topics.
Speech to text recognition is still very CPU intense (usually 0.5 -1.5 times realtime on a modern blade server) and the results are mostly disappointing. Accuracy is far away from a human generated transcript. It is good enough for tag clouds or so - but not for transcribing content in a readable and accurate format. Google might be fine with having keywords for their search, but to users it is very disappointing seeing 20% or so of their results are simply a false detect.
For example search for “sport” and listen to the two “matches” in the “Dr Chris Rothfuss at WY Democratic Convention Pt.2/3″ clip (last hit on first search page).
If you like at a couple other transcript fragments (the words prior and post to the supposively found search term) you can see that the algorithm mixes in quite a bunch of politician names, political topics or random english words.
Speech2Text is dictionary and voice training based (which is why it works well in medicine transcribing for a frequent using doctor for example) but getting this to work for random videos with non-professional speakers talking slang and having music in the background will be a loooong way.
Old news. http://www.everyzing.com/
Been doing it for years now.
See a detailed review on this subject and who else is offering it:
Wednesday, July 16, 2008
Google Video Search via Speech Recognition
Finally a hint on the expected Google move to the speech recognition arena.
Google announced at the Official Google Blog, the availability of a new video search capability based on speech recognition.
It was release as a gadget you can embed on your iGoogle homepage and is a good preview of things to come.
….
http://speechanalytics.blogspo.....peech.html
this is really old news. something google has been behind on. others have been doing this for a while now. but the reality is, the technology is shoddy at best and it far from being useful.
Wow. Scary good.
Maybe it’s only me, but I can’t hear the audio for videos found in GAudi…. which is a shame!
Fantastic potential. I’m very excited to see this go maisntream!
Has anyone considered that the ‘audio-to-text’ feature carries an implication of audio captchas being translated to text?
Would a programmer be able to build that into a bot?
SEO Chatter
Didn’t Microsoft do something very similar a while back. They allowed searching words in videos and showed ads based on content… I think Google is a step-back on this one!!
i likee
Wouldn’t GAudi violate some sort of trademark law for Audi cars? Its like Ali-G backward G-Ali, GAudi, sounds like a sports car, vrrrrooommmmm! Get all suped up wit yo GAudi boi!! You got rims on that thing?
really impressive… I am very sure goog411 might have helped them in collecting data samples…
Good one.. I m sure it will gain grounds within no time..
It’s a great tool if you like political speeches.
Wonder if Google is going to acquire Audioo.
Hey guys!
A french version already exist, let’s try it here : http://voxalead.labs.exalead.com/Politics
This was announced over a month ago by Google. For some thoughts on how the technology lacks see this blog post: http://www.deakondesign.com/?p=50
Also, for a different approach to solving the transcription issue for videos check out Audio Puzzler, a game people play which has the side effect of producing 90% accurate transcripts: http://www.audiopuzzler.com/
mikx has it right. Also, it is highly unlikely Google is using GOOG 411 to train their speech to text engine given the differences in quality btw telephony and broadcast audio.