Peck began with a general discussion of phishing and its relative importance in the web app security space today. He pointed out that while phishing is old news, and isn’t the latest and greatest threat to hit the headlines, it is still out there and still causes damage. He put up some stats that show that phishing is alive and well (especially targeting Indian firms, apparently), but only constitutes about 1% of the overall amount of cybercrime. And while the overall amount may have grown with time, there is a question of “diminishing returns” based on the amount of effort required to combat an issue of comparatively lesser impact.
It is unsurprising then that phishing detection remains largely unchanged since 2006: built on anti-spam technology (not all spam is phishing), sender blacklists, and site reputation. But changes in the environment have made these older techniques less and less effective. For example, user mobility makes perimeter defenses impossible (e.g. IPS). Also, with the large turnover in domain names and the ease of setting up new ones based on the new top-level domains, blacklists and reputation are hard to keep up to date. And the new vector of social media is almost impossible to police.
To move forward with newer defenses, it is important to understand what makes phishing effective: the human factor. In Peck’s terms: humans are gullible, greedy, careless, and uninformed. To counter this problem, we should try to get the computer to see things the way we humans see things. One way to do this involves the use of perceptual hashing.
Perceptual hashing involves making a hash or “fingerprint” of images. Peck briefly overviewed three hashes: the average hash, the discrete cosine transform hash (uses methods similar to lossy compression to focus on salient detail), and the difference hash (very fast). Comparison of hashes of two images (made with the same algorithm) uses the Hamming distance, which is the count of bits that differ between two hashes.
Phishing detection can utilize these hashes. A library of perceptual hashes of web pages is compiled with associated known good originators. Pages can also be broken down into discrete images that can can be similarly hashed and cataloged. Then, when web pages or emails are encountered, those are hashed in the same way. If the hashes match or come close to those in the database, but the sender is different, a likely phishing attempt is flagged. Although I don’t think Peck described it as such, this is effectively a whitelist approach, making it much more maintainable than a list of constantly changing phishing sites.
I wonder how perceptual hashing could be used by NTOSpider, NTOBJECTives’ web application vulnerability scanner. Perhaps current malware detection could be added as feature. Of course, this would require either NTO maintaining its own list of hashes, or use of another database.
For a Barracuda labs site that uses perceptual hashing, see http://www.threatglass.com/
For some perceptual hashing code: http://www.phash.org/