[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[atlarge-discuss] Re: FC: Hexamail's Finn Johansen on how to filter naughty words
Declan, Finn and all,
Great stuff here Finn. Well done. Thank you Declan yet
again for passing this along as well.
I can think of a number of folks that would gain a good
basic education from this Finn. I am "CC'ing" one forum
I participate on that has a number of subscribers that could
learn quite a bit from approach to filtering.
Declan McCullagh wrote:
> Previous Politech message:
> http://www.politechbot.com/p-04831.html
>
> ---
>
> From: "Finn Johansen" <finnj@hexamail.com>
> To: <declan@well.com>
> References: <1055311145.2fe7328.finna.net@[216.110.36.217]>
> Subject: Re: Interscan blocks musician's email due to use of "whore"
> Date: Thu, 12 Jun 2003 11:39:01 +0200
>
> Declan,
>
> I usually don't write this type of emails as it may be considered spam by
> the readers. However, the problem described is very interesting and shows
> the lack of intelligence in various spam filtering solutions.
>
> Blocking emails on the basis of single terms in the email context is rather
> pointless. It may sound amusing in the situation below, but it is certainly
> not amusing to Linda or her contacts. It is, as Thomas also says, a bit
> scary. To leave critical business correspondance to this type of context
> evaluation is a bit like gambling. If you're lucky, the information may pass
> through to the recipient, or it may as well just "disappear" somewhere
> without anyone knowing where it is.
>
> New spam filtering solutions is emerging almost every day. But just a
> minority of these are able to use a contextual approach in evaluating the
> emails. Even though reports shows that the global ratio of spam has reached
> the 50% mark in May 2003, there is still millions of legitimate emails
> passing among servers every day. Having to rely on solutions analyzing
> emails by single terms will certainly block a large amount of these
> legitimate emails and leave behind frustrated people like Linda - not
> getting their business information delivered correctly.
>
> The only way to overcome the limitation of keyword investigation of emails
> is to contextually analyze the content of the email. Words like f*ck has a
> pattern that is understanding to humans, but not to keyword searches, unless
> explicitly told so. Given the context of this pattern, statistical pattern
> matching technology is able to 'understand' this as either good or bad given
> the patterns surrounding it. Using this technique, new patterns from
> spammers can be catched as they are usually found together with other
> patterns that are already known by the system. The statistical approach will
> not catch 100% of spam emails without having to leave behind some false
> positives. However, our test shows that by accepting a block ratio of 96%,
> you end up with 0.01% false positives. Pretty good figures. And best of
> all - it doesn't block emails like this one containing single 'bad' terms
> scattered around the document.
>
> More readings about the method used by us can be found in Gary Robinson's
> execellent article on spam filtering:
> http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
>
> Regards,
>
> Finn Johansen
> CEO
> Hexamail Ltd.
>
> Email: finnj@hexamail.com
> http://www.hexamail.com/
>
> -------------------------------------------------------------------------
> POLITECH -- Declan McCullagh's politics and technology mailing list
> You may redistribute this message freely if you include this notice.
> -------------------------------------------------------------------------
> To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
> This message is archived at http://www.politechbot.com/
> Declan McCullagh's photographs are at http://www.mccullagh.org/
> Like Politech? Make a donation here: http://www.politechbot.com/donate/
> -------------------------------------------------------------------------
Regards,
--
Jeffrey A. Williams
Spokesman for INEGroup LLA. - (Over 131k members/stakeholders strong!)
"Be precise in the use of words and expect precision from others" -
Pierre Abelard
===============================================================
CEO/DIR. Internet Network Eng. SR. Eng. Network data security
Information Network Eng. Group. INEG. INC.
E-Mail jwkckid1@ix.netcom.com
Contact Number: 214-244-4827 or 214-244-3801
---------------------------------------------------------------------
To unsubscribe, e-mail: atlarge-discuss-unsubscribe@lists.fitug.de
For additional commands, e-mail: atlarge-discuss-help@lists.fitug.de