Thursday, May 31, 2007

[web] reCAPTCHA

Absolutely brilliant use of crowdsourcing to digitize books and old text:

http://recaptcha.net/learnmore.html

1. When OCR scans the original text, it throws out words that it cannot read with high confidence, and gives it to reCAPTCHA
2. Instead of normal CAPTCHA which asks for one text string, reCAPTCHA asks users for two words, one it knows the answer to, the other it doesn't
3. If user answers the known word correctly, then the answer to the unknown word is marked as a likely-winning candidate
4. The system collects a number of sample of likely-winning candidate, and determine what the right word is likely to be
5. All this, while serving the double duty of stopping bots and spam!

Just brilliant.