Mar 2, 2010

The daily /.

A few interesting stories from my daily /.-ing:

1. CAPTCHA troubles
"Ticketmaster used various means to try to thwart Wiseguy’s operation, at one point switching to a service called reCAPTCHA, which is also used by Facebook. It’s a third-party CAPTCHA that feeds a CAPTCHA challenge to a site’s visitors. When a customer tries to purchase tickets, Ticketmaster’s network sends a unique code to reCAPTCHA, which then transmits a CAPTCHA challenge to the customer.

But the perpetrators were able to thwart this as well. They wrote a script that impersonated users trying to access Facebook, and downloaded hundreds of thousands of possible CAPTCHA challenges from reCAPTCHA. They identified the file ID of each CAPTCHA challenge and created a database of CAPTCHA “answers” to correspond to each ID. The bot would then identify the file ID of a challenge at Ticketmaster and feed back the corresponding answer. The bot also mimicked human behavior by occasionally making mistakes in typing the answer, the authorities said."

After having a chat with Aldwin on the topic, it seems like this might be a serious flaw in CAPTCHA (i.e. mapping a challenge to a response via identifying the filename of the CAPTCHA image). After all, CAPTCHA should allow developers to feed arbitrary text that will then get rendered on the fly.

The original article is here.

2. Nearly 60% of apps fail first security tests. Interesting number from Veracode; however, I wonder in what phase of the SDLC were those apps when they were tested. Although I agree with their argument that more work is required in educating the developers, I must also add that more tooling is necessary (e.g. code annotations, code scanning when committing code to the repository, etc.) to enable developers focus on the bigger security problems.

More on the topic here.

1 comment:

Gabriele Giuseppini said...

RE the Captcha issue: as far as I know, reCAPTCHA does not allow devs to specify their text - it's quite the opposite: reCAPTCHA is free because it shows users snippets of problematic digitizations of books, and uses the users' answers to improve the OCR capabilities of the digitization.

The issue described here is that there is a file ID (i.e. a different filename) for each image; however, even if reCAPTCHA used random GUIDs for the file names, you could still hash the image content and build the DB associating the hash with the answer...

The real problem is the (small) number of different captcha challenges. A possible solution would be for reCAPTCHA to randomly modify the images (e.g. adding noise, warping the images, etc.) so to have a potentially infinite number of different captcha images.