
Google has acquired reCAPTCHA, an open source technology that provides CAPTCHAs to prevent spam and fraud. Captchas are those security questions you find on Web sites that require you to decipher and type words or numbers and detects whether the user is a human.
Here’s what Google wrote in a blog post about the announcement:
CAPTCHAs are designed to allow humans in but prevent malicious programs from scalping tickets or obtain millions of email accounts for spamming. But there’s a twist — the words in many of the CAPTCHAs provided by reCAPTCHA come from scanned archival newspapers and old books. Computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.
Google says that reCAPTCHA’s technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). It sounds like Google will be using the technology to power massive scanning projects for Google Books and Google News Archive Search as well as for fraud and spam prevention.
In May, the New York Times reported that Google was developing their own type of captcha and also took notice of the potential of reCAPTCHA’s technology. Sounds like Google found it more effective to acquire reCAPTCHA’s technology instead of reinventing the wheel.









Awesome! I really like the recaptcha service, nice to see them get some of those delicious google dollars.
How many start ups exist with their ENTIRE focus devoted to being acquired by Google?
Very cool tho..
This was developed by Carnegie Mellon’s Computer Science department, it’s not a startup.
worse- it was developed with TAXPAYER MONEY!!!! so a professor makes millions on the side with a grant funded by our government? there’s a great scandal brewing here, please TC, sniff out the details. oh yeah, plus, google just bought the rights to view all of your comments in real time.
You must not be familiar with Bayh-Dole. If this story irks you, one can only imagine how you’d feel about that
Tax payer or Microsoft money? Luis von Ahn, the “executive producer” of reCaptcha is supported by a “Microsoft New Faculty Fellowship” (http://www.cs.cmu.edu/~biglou/)
Smart move google make us all work for free
reCaptcha your investment. Way to go. I love this deal.
Cool, they bought it not for the captchas, but for their OCR (kind of a captcha killer!).
It’s quite anticlimactic.
And BTW, Google, stop buying shit!
So how does the training work then? For a captcha to be useful, the website would need to already know the text that they want people to type in.
One word they feed you is known. Read more about that on captcha site.
That’s a very cool deal. Re-Captcha is a great service and I think Google will help it grow so much more than it would have on its own.
I’ve never understood how reCaptcha knows what to expect you to type if it couldn’t understand the word(s) it has taken from old books in the first place.
Surely someone has to manually teach reCaptcha what each word says in the first place which seems to defeat the purpose of the old book transliteration idea.
It’s always bugged me
Same.
It present two words, one it knows and one it doesn’t. If the word it knows matches what you type is uses your other word as a hint for the other word. the thing is you don’t know which word is made up, and while it doesn’t know exactly what the other word it is often knows something about it, like certain letters or the length. So you can’t confidently guess which word you can fudge, which satisfies the “are you human” part of why CAPTCHAs exist.
Also, I believe they use the same word on multiple people, and only count it as recognized if they get the same answer more than once.
That and it also confirms that the word it doesn’t know is interpreted the same way by several users.
Same concept as SETI@HOME giving the same work to multiple users, they confirm each other’s work.
That’s why it uses two words, one word it knows, and one word it needs help figuring out. If you can figure out the second word (which it already knows) then it just also assumes you got the first word correct, plus it compares it to many other people who also attempted the same word.
It’s really a genius idea!
Ok, I checked it out…
From their site:
‘Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.’
That’s just smart.
From their website:
But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
There are two wordsin teh captcha, one that it knows, and one that it doesn’t.
if you enter the one that it knows correctly, it assumes you will enter the other one right and learns off what you wrote.
it will pair that one it doesnt know with lots of different known words, and over time, the results will normalize.
From the ReCaptcha site:
But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correc
You can check the idea at http://recaptch.../learnmore.html
from reCAPTCHA’s website
“But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.”
The idea is that the captcha contain two words, one automatically generated by ReCaptcha (hence recognized) and the other one is the one to decipher. The system assume that if you typed the first one correctly you also properly deciphered the other one.
And if you ever wonder what CAPTCHA stands for:
http://www.abbr...aspx?KEY=391280
Every time I have to type one of those damn words in, I have to remind myself not to get angry at the service that’s making me do it, but to get angry at the spammers who make the service required.
What was the acquisition amount?
This tech seems easy to duplicate. Google must want to implement their OCR capabilities ASAP. Maybe one day I’ll be able to upload my own documents and get them OCR’ed by Google. I have plenty of one of kind books I’d love to get digitally transcribed and hosted on my website http://www.prop...ayersrealty.com
GL with the acquisition Google.
Any idea what was the size of the deal ?
Congrats !
Genius! And congratulations reCapatcha.
This is something I’ve always wondered. If the computer can’t figure it out, how does it know whether or not you’ve typed the word in correctly?
percentages.
It shows you two words: one that it knows, one that it doesn’t. If you type the one it knows correctly, it will trust that you typed the other one correctly as well. After many of the same interpretations from the crowd, reCaptcha can more definitively say what the word is.
it uses the unknown word in a lot of separate captchas, and then based on how many people agree on the word, thats when it decides what it is .
I love recaptcha.
Why does TechCrunch just repeat everything off the Google Blog? There was absolutely NO new information in this post compared to google’s. Quite annoying to read the same entry twice.
because its a blog. it got you hear didnt it? And your visit added another page impression on their ad sales didnt it.
You do realize that these blogs are a business, right?
and yes, before the gramma nazi’s show up, I know I mistyped “hear”. sue me.
lol, not to mention that you also misspelled grammar in a comment excusing poor grammar.
Oh, and you said “nazi’s”. It should’ve been “nazis”
Also the “gramma”
muahaha
And while I was fixing ur punctuations, I forgot to put fullstops in my comments. Sue me too!
There is value in aggregation. I rarely read Google’s blogs since I know that the really important stuff will get posted on TechCrunch! Saves me time.
But it’s not the important ones, it’s all of them from the main Google Blog.
ReCAPTCHA can add extra feature in Google platforms to pervent spamming and fraud but surprised that Google is tech God then why not creating any new and innovative system for spam and fraud prevention ?
Anyways , this is good news for genuine publishers and bad for spammers.
reCAPTCHA has a big installed base.
Google could code his own captcha system to help with Google Books…. but again, he would have to take a shot hopping to get a big volume of sites using it. Buying rC just gave them this.
They are actually buying the user base. The tech itself is something that Google has already been doing for long anyway.
Google = freee….
reCaptcha is already free!
Seems google is on buying spree saw this on Business Insider http://www.busi...ightcove-2009-9
Google is the best …
Seems google is on buying spree http://www.busi...ightcove-2009-9
i wonder the price tag. yeah google is in a buying mood today.
Using CAPTCHAs to teach computers how to read CAPTCHAs – ironic.
How long until there is an ad underneath it on every website?
Not only did they choose not to reinvent the wheel, but now they won’t have to do any marketing or pushing a new product. reCaptcha is already on a huge number of web sites, as well as it is being used by large social sites. It is cheaper for them to acquire not only the technology but also the presence on all of those sites and the organic traffic they have.
reCAPTCHA is a great service. Is there any information about the deal?
http://www.hoti...ianstartups.com
…and now Google has access to data from the sites where those captchas appear. They could easily pull usage data from where those captchas appear.
Leena,
>>reCAPTCHA, an open source technology
Not sure about how reCAPTCHA is open source. reCAPTCHA plugins are opensource, yea – but don’t think the source code for reCAPTCHA is available out there..
“reCAPTCHA is mostly powered by open source software.” — quote from the website of reCAPTCHA.
I think what this sentence means is that reCAPTCHA IS NOT open sourced.
However that is exactly what Google is.
reCAPTCHA is annoying, but you see it everywhere!
What else is Google going to take over?
Plus how is it open source and who wants to make them anyway?
Not Re-inventing the wheel. Just re-CAPTCHA!
Cool from a machine learning standpoint but does this freak you out since Google could potentially have registration and use data for a huge number of sites around the web that leverage reCAPTCHA? More thoughts here http://bit.ly/ggEWT.
recaptcha rocks. go Luis go. i hope he got paid hansomely
Another great move by Google, leveraging more free labor from the masses
“Sounds like Google found it more effective to acquire reCAPTCHA’s technology instead of reinventing the wheel. ”
Google probably bought reCAPTCHA for its user base, because it already had its own CAPTCHA technology.
What could be worse for users than captcha? Re-captcha.
This is really good news as now recaptcha images will be google powered…i am just curies to know what is the size of the deal. I think it is going to be a handsome amount
How big is this deal?