Today I’ve implemented a new tool in our anti-spam system:
FuzzyOCR  (Dec 13, 2007: URL contains ads only now)
It’s an OCR software used as a plugin for SpamAssassin.
OCR means “optical character recognition” and describes the procedure to recognize characters and words from images. It’s quite useful when you try to catch so-called “Image Spam”, which uses normal text where the real message is hidden in images (inline gifs, etc.)
The results are quite good and I’m confident  : )
Additionally to the packages described on the homepage of FuzzyOCR you’ll need another piece of software (at least with openSuSE 10.0): giflib-progs-4.1.3-7.i586.rpm
Here you can see an example, that I’ve just recieved and that was recognized as spam correctly:
0.7 EXTRA_MPART_TYPE: Header has extraneous Content-type…
1.1 HTML_20_30             BODY: Message is 20% to 30% HTML
0.0 HTML_MESSAGE           BODY: HTML included in message
0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
0.8 SARE_GIF_ATTACH        FULL: Email has a inline gif
0.7 MY_CID_AND_STYLE       SARE: cid and style
 8.0 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
Words found:
“viagra” in 1 lines
“cialis” in 1 lines
“xanax” in 1 lines
“valium” in 1 lines
“pharmacy” in 1 lines
 (5 word occurrences found)
—
From: Antonia [mailto:ademakerzcxl@xxxx.xxx]
Sent: Wednesday, February 26, 2007 9:04 PM
To: xxxx@xxxx.xxx
Subject: *****SPAM***** How’s It Going
