rants from the dark side of marketing

Deciphering Decaptcher’s protocol

Decaptcher has a TCP socket and an HTTP API. This is a description of the socket API, deciphered from Decaptcher’s official PHP client. You can also look at my version of the Decaptcher PHP client that I posted recently. The following will come in handy if you’d like to code your own client.

The socket API uses a 6 byte header containing protocol_version/command_code/data_size. Let’s call this the cc-header.

protocol_version is there for keeping old code functioning when the protocol changes. Currently, version is still 1.

command_code is used for actual commands as well as error codes.

data_size tells you the size of data in bytes following the cc-header. Size often is 0.

If you are sending a picture or receiving picture text there’s a second header following the cc-header. Let’s call this the pic-header.

The pic-header is 20 bytes long and contains pic_timeout/pic_type/data_size/major_id/minor_id.

pic_timeout is used to tell Decaptcher how much time it has to get the captcha back to you.

pic_type serves as an affiliate application id, as far as I know.

data_size is the size in bytes that will follow the pic-header.

major_id and minor_id are sent to you when you get picture text, they are used when reporting picture text as bad.

You append your image binary after the pic-header, when you are sending.

Picture text comes after the pic-header, when you are receiving.

Logging in:

  1. Open socket to Decaptcher’s server.
  2. Send cc-header, command_code=1, followed by your username.
  3. Receive 32 byte salt, with command_code=3.
  4. Using sha256, hash the salt,md5 of your password and username (in this order).
  5. Send the hash along with a cc-header and command_code=4.
  6. Receive cc-header with command_code=7.

Sending a picture:

  1. Must be logined.
  2. Send cc-header with command_code=12, then a pic-header and then the picture binary.
  3. Wait on the socket until you get a cc-header with command_code=14. A pic-header and picture text will follow. You must store the major_id and minor_id you get in the pic-header preceding the picture text (in order to report bad picture text).
  4. If command_code is not 14 then it’s an error code.

Notifying of bad picture text:

  1. Must be logined.
  2. Send a cc-header with command_code=13, then a pic-header containing the major_id and minor_id that came back with the picture text.

Getting your API credits balance:

  1. Must be logined.
  2. Send a cc-header with command_code=10.
  3. Receive a cc-header with command_code=10 and the balance follows as text.

Logging out:

  1. Must be logined.
  2. Send a cc-header with command_code=2.
  3. Close the socket.

The command codes:

‘cmdCC_UNUSED’, 0
‘cmdCC_LOGIN’, 1 // login
‘cmdCC_BYE’, 2 // end of session
‘cmdCC_RAND’, 3 // random data for making hash with login+password
‘cmdCC_HASH’, 4 // hash data
‘cmdCC_PICTURE’, 5 // picture data, deprecated
‘cmdCC_TEXT’, 6 // text data, deprecated
‘cmdCC_OK’, 7 // ok
‘cmdCC_FAILED’, 8 // failed
‘cmdCC_OVERLOAD’, 9 // server overloaded
‘cmdCC_BALANCE’, 10 // zero balance
‘cmdCC_TIMEOUT’, 11 // time out occured
‘cmdCC_PICTURE2′, 12 // picture data
‘cmdCC_PICTUREFL’, 13 // picture failure
‘cmdCC_TEXT2′, 14 // text data

Picture timeout codes:

‘ptoDEFAULT’, 0 // default timeout, server-specific
‘ptoLONG’, 1 // long timeout for picture, server-specfic
‘pto30SEC’, 2 // 30 seconds timeout for picture
‘pto60SEC’, 3 // 60 seconds timeout for picture
‘pto90SEC’, 4 // 90 seconds timeout for picture

The default picture type:

‘ptUNSPECIFIED’, 0 // picture type unspecified

Posted on Saturday, August 29th, 2009 at 2:17 pm under Rants. You can skip to the end and leave a response. Pinging is currently not allowed.

One Comment

ruth Says:

well informed but i still can’t understand.. seems not east to get the picture to type….

Leave a Reply

You must be logged in to post a comment.



RSS feed





Content may be king, but distribution pays the king’s mortgage.

8/12/09» 15:51» link» comments

Google acquired reCaptcha about a month ago, you might want to throttle your reCaptcha solving per IP address from now on.

14/10/09» 16:22» link» comments

Matt Cutts on how Google deals with spam.

7/10/09» 14:31» link» comments

Why you don’t want to shard.

Real World Web: Performance & Scalability.


Gearman is interesting.

31/08/09» 4:46» link» comments
Copyright 2008, blackhat-seo.com