Screen scraper tricks (focus on iMacros)


Deciphering Decaptcher’s protocol

Decaptcher has a TCP socket and an HTTP API. This is a description of the socket API, deciphered from Decaptcher’s official PHP client. You can also look at my version of the Decaptcher PHP client that I posted recently. The following will come in handy if you’d like to code your own client.

Decaptcher PHP API

I rewrote Decaptcher’s PHP monstrosity yesterday. My version is a single class and doesn’t use three dozen constants. You can find the source below:


Captcha Farms

The one greatest vulnerability of captchas is the mechanical turk. If a human can solve it, which is not always the case these days, then you can find people that will solve a thousand captchas for a dollar.

Captcha farm services


Deferred chain

A few months ago, I wrote a small PHP class to help me create chainable interfaces (PHP people like to call these “fluent interfaces”) without having to retrofit old code. I call this DeferredChain.

Here is the source for DeferredChain.


Content may be king, but distribution pays the king’s mortgage.

Google acquired reCaptcha about a month ago, you might want to throttle your reCaptcha solving per IP address from now on.

Matt Cutts on how Google deals with spam.

Why you don’t want to shard.

Real World Web: Performance & Scalability.


Gearman is interesting.

