There’s only a few things I like better than having to type a captcha before I can get to my porn or post to my favourite wob2.0 site, things like getting kicked in the crotch, food poisoning and Alanis Morissette. Without further ado, here is how to break captchas part one.
Wintercore – Breaking Gmail’s Audio captcha
Vorm – Defeating audio (voice) captchas
Always look at the audio version of the captcha you are trying to break.
Interesting video all around, skip to last few minutes for an interesting audio captcha breaking approach (you should really watch the whole thing).
Useful ideas and sample code (in python).
Various tricks for bypassing captchas, mostly without actually solving them.
How to break PHP-Nuke’s captcha (used by other applications as well) and Simple Machines Forum’s audio captcha. For the first one, he generated all possible images and stored their hash in a database, so you only had to compare the hash of your captcha to the database to solve it . For SMF’s audio captcha he calculated Hamming distances, although perhaps he should have used the more flexible variant called a Levenshtein distance.
DarkSeoProgramming – PHPBB3 Captcha is super easy
DarkSeoProgramming – Instant GOCR Training
DarkSeoProgramming – Letter Derotation
DarkSeoProgramming – GOCR to Neural Nets Pt 2
DarkSeoProgramming – 10 Steps to Solving a Captcha
DarkSeoProgramming – A custom floodfill routine
DarkSeoProgramming – Replacing GOCR part 1
Lots of good stuff, plus sample source code.
This howto will take you through using Captcha Breaker to break a given
captcha. This howto covers only how to use the solvers once you already have
image files. This howto does NOT cover how to extract that file from a web
site, nor how to enter the text back.
Author has decided to release the source code for PWNtcha. Old but probably worth examining.
Averaging of a series of images can be used to improve image quality (reduce distortion, or improve signal-to-noise ratio, so to say) of captchas and hence to make them more easily recognizable by OCR (optical character recognition) systems.
The del.icio.us solver uses a useful and easy technique for removing noise, which I like very much and you should learn. It goes through each black pixel, checks all the surrounding pixels if they are black. If the number of black neighbours is below a certain threshold (4 in this case) then you can remove that pixel. You should calculate neighbours for all pixels first, then in one pass change all those below the threshold to white. If you change them incrementally, bad things may happen, so keep that in mind.
You should experiment in each case and find how to get best results. The optimal threshold value is usually 3 or 4. More importantly you can experiment with multiple passes, for example using 2 for the first pass, then 3 for the second and 4 for the final pass.
Another interesting idea is setting up some honeypots so you can get infected by botnets that are going to solve captchas. For example, bots solving Live’s and Google’s captcha. You might be able to either acquire the captcha breaking code or perhaps even better, get access to the service that bots are sending their captchas to for solving.
Even more related links:
Yahoo/Hotmail/Google CAPTCHA Extraction
Using AI to beat CAPTCHA
Old Yahoo Captcha solver (now fixed)
Breaking CAPTCHAs without using OCR