Wednesday, March 19, 2003

R*d* W**ds


Wednesday, 19 March 2003

This morning, a client asked whether we could provide an obscenity filter for a chat forum. You know, one of those software routines that detects typos in words like spit, duck and count and renders the results as @#%! or something similar. This isn't a difficult programming task but it does raise some interesting issues.

From a technical point of view, there's not much to an obscenity filter - all it has to do is search text input for specific sequences of characters and replace them with whatever sequence of dingbats you're using to indicate that the user has entered a dysphemism. In Perl, for example, you can do the job with a single line of code:

$nice_text =~ s/$nasty_word/\@\#\%\!/gi


The interesting part is identifying all the possible values of the variable $nasty_word. Sticking with the example of Perl code, do you use a constant regular expression, hard-coded into your script or do you use an external data file, where users can add new obscenities as they identify them? If you take the first approach, you're stuck with breaking the first rule of polite programming, which is never include offensive material in your source code. If you take the second, you're going to end up with an external text file or database table of obscene epithets which, if your organisation is infested with Mrs Grundy types, will have to be kept a closely guarded (and undocumented) secret within the IT department, lest someone get the wrong idea about the general character of the technical staff.

In the end, writing a successful obscenity filter boils down to human and organisational factors: you need programmers who are able to swear like an old matelot with Korsokov's syndrome. And with imagination. Otherwise, the filter is going to produce output which includes strings like mother@#%!er, or the somewhat ambiguous @#%!-licker which is either Mark Latham's favourite synonym for Prime Minister or something other than Mark Latham's favourite synonym for Prime Minister.

I suspect that somewhere in the English speaking world there's a programmer who has written the ultimate obscenity filtering routine. Of course, we'll never see it in commercial use - I don't think you'll have any trouble working out why.

No comments: