Over on Blue Hat SEO an interesting little drama has been playing out that caught my attention. Now let me say from the start that that site recommends both black-hat and white-hat solutions to Search Engine Optimization problems — hence the name I guess. Recently the host posted a guest commentary on ‘captcha breaking’.

Captcha refers to those little graphics with crooked letters in shades of gray or other OCR unfriendly styles, that are intended to ensure that only ‘real people’ enter data (such as registration details), rather than automated scripts. The guest post gave a long, complicated programming technique for converting these graphics into something an OCR program can read with fair reliability, thus allowing black-hat hackers to overcome this obstacle.

Now I have no use for breaking captcha’s, that just doesn’t fit with my style. But I was delighted by the subsequent post by the host of that site, where he gave his own technique for breaking captchas, just because the solution was so simple and elegant — the kind of concept that can be applied to other problems.

Instead of writing a complicated program in C and implementing various steps and procedures, his solution was to use a website that was already part of his web-empire. With a simple little PHP script, he simply displays the captcha he wants to ‘break’ on his own site, as a log-in requirement or in order to use some feature, just as if it were a typical captcha situation. Only he records the result from the user’s response, and feeds that back to the program that needs to know what the captcha says. So long as his site where he displays the captcha gets plenty of traffic, the response will be fast enough to use the results immediately. Such urgency is not inherent to the technique, but applies to this captcha example only.

This is a wonderful example of using a social engineering approach to solve a problem. I’m sure it can be applied (or modified) to many other situations where there is no programmatic substitute for human intelligence. There are sites that use similar techniques to identify the content in photographs, or distinguish galaxies from stars in high-resolution telescope images. How could you use this technique in building your web-empire?

NOTE: if the link to bluehatseo.com does not work, it is because the site is currently under various hack-attacks, supposedly because someone took exception to his revealing ’secrets’ about captcha-hacking. That is the drama part of this story I referred to at the beginning. Most hackers endorse the concept that information should be free, but apparently some harbor IBM style business concepts.