The one greatest vulnerability of captchas is the mechanical turk. If a human can solve it, which is not always the case these days, then you can find people that will solve a thousand captchas for a dollar.
Captcha farm services
The following is a list of commercial captcha farms. A few of those seem to be coupled with specific products, which probably means you can’t use them with your own code. Prices are per thousand captchas solved.
- Decaptcher – $2
- SocialMarketServices – ~$2
- Captcha-Bypass – $5
- BypassCaptcha – $7
- BeatCaptchas – $8
- WebEmulator – $10
- ImageToText – $15
- Ocommunity – $25
- ByByeCaptcha – $30
- CaptchaKing – $40
Decaptcher seems to be the best choice for now, with low prices and an assortment of APIs you can use. SocialMarketServices rivals Decaptcher’s prices, especially with their volume discounts, however I haven’t heard from anyone using them.
Advantages of commercial captcha farms
- 24/7 availability of captcha entry workers
- Quick deployment with off the shelf APIs
- Seamless scaling up to moderate volume
- Higher cost than hiring your own workers
- Too slow when there is excess service load
- They can only handle so much volume
Running your own captcha farm
Having your own private captcha farm might be cheaper but it is a more involved affair. First of all, you need a backend and an interface for your workers. SocialMarketServices and SolveMyCaptcha offer solutions that let you hire your own captcha solvers. You could also surely find an assortment of ready-made captcha farm solutions by posting on one of the freelancer websites.
One important design decision is whether your workers initiate the captcha fetching scripts or poll a queue asynchronously. The first one means your scripts might be running too fast, which leads to either getting banned immediately or having big groups of accounts banned at the same time in the future. Fetching from a queue could frustrate your workers, unless you can provide consistent volume. If you pay your workers per captcha, they won’t be happy when you are not giving them more captchas than they can handle. Keeping your queue full and not overloading it, is a fine balance you have to keep, unless of course you like submitting timed out captchas.
Finding workers to solve captchas should be fairly easy. Here is an example project on a freelance site, there are 113 bids. There is a more than adequate supply of people that will solve captchas for $0.001 each.
You should assign your workers shifts that cover the whole 24 hours, that way you can spread your captcha fetching throughout the day. You might be inclined to divide the day in three eight-hour shifts. Typing captchas for eight hours straight is not an easy feat. I think workers would be more comfortable and efficient with two or four-hour shifts.
Advantages of running your own captcha farm
- Cheaper cost per captcha solved
- Designed for your use case, could add tasks other than captcha solving
- Can scale beyond what the commercial services can handle
- Time wasted managing servers, software and workers
- Hard to keep workers occupied if you are doing low volume
- Takes time to deploy, with a service you can get started in a couple of hours
A better captcha farm design
If you are serious about setting up your own captcha farm, here are some ideas to consider. Please leave a comment if there’s something you would like to add.
- Ability to include simple tasks besides solving captchas
- Asynchronous queue for tasks
- When queue is empty, either try to increase the rate of captcha fetching or queue different tasks
- Monitor queue load, send warnings when things aren’t going smoothly
- Offload captchas to a commercial service when things are running hot
- Configurable fixed-time as well as free-running shifts
- Collect statistics on workers, such as: completed tasks, failed tasks, timeouts, task completion speed
- Allow master accounts for managers of worker teams, slave accounts don’t receive payments and can’t choose their shifts
- “Take a break” button
- Tasks pulled from queue via AJAX
- Optional timeout counter attached to tasks, input disabled when time runs out
- Optional input restraints/validation, such as allowed characters
- Move to next captcha input field on
- Allow input field navigation with arrow keys
- Configurable number of tasks displayed simultaneously, vertical list of tasks is better
- Fetch new tasks before the previous ones are finished
- Support translations of the interface
A useful trick
I borrowed this trick from BlackhatWorld. The idea is that you can glue two captchas side by side and have workers treat it as a single captcha with two words, not unlike reCaptcha. Quite the brilliant idea, since I suspect this will not only lower your cost per captcha, but could also increase overall throughput. This is somewhat complicated to implement but I believe it would be time well spent.
Here are some possibly defunct services as well as some worker’s login pages. On a sidenote, PixProfit is allegedly the other side of Decaptcher.