5.4.1 ORF Online Help
Select your ORF version:

Table of Contents

Charset Blacklist


This help section describes the Charset blacklist test and the related settings available under the BlacklistsCharset Blacklist page in the navigation.

General Information

The email language is often declared by a so-called "character set" or "charset", which may be used detect if the email was written in a specific language/script. This feature of ORF can blacklist unwanted emails written in languages you do not normally use.

Enabling or Disabling the Charset Blacklist

You can enable or disable the use of the Charset Blacklist on the FilteringTests page in the navigation.

Using the Charset Blacklist

Adding, modifying and deleting character sets

Click button New to add a new charset to the list. To modify an existing charset, click Modify or hit Enter. Character sets can be deleted using the Delete button or the Delete key.

Sorting the Charset list

Click the column header of any column by which you wish to sort the charset list. To reverse sorting, click the column header again.

Exporting and importing the Charset list

Right-click on the charset list and select "Import List..." or "Export List..." Alternatively, you can do this from the menu, select FileImportCharset Blacklist or FileExportCharset Blacklist.

Notes

Character set search scope

ORF searches the email subject and any body parts for character set declarations. If any of the collected charsets match with any of the charsets on the Charset blacklist, the email will be blacklisted.

Limitations

Charset-based blacklisting has two major limitations.

  • Use of Unicode: Unicode is a common international standard which eliminates the need of character sets, by the ability of representing all major languages/scripts in a single standard. When the email is written in Unicode (typically, with UTF-8 encoding), the email language cannot be determined using charset detection.
  • Latin subset in various charsets: Most character sets also cover the Latin alphabet. Due to this, when you blacklist e.g., the Central/Eastern European (iso-8859-2) charset, you may also blacklist emails written in English, but sent by a person who normally writes emails with iso-8859-2. It is recommended to blacklist charsets only if you do not expect receiving emails from countries where the given charset is used.

Default list

The default list was compiled from Vamsoft spam samples and includes character sets we see most often in our foreign language spam. By default, no character set is enabled. This is because the list has to be customized for your location and email profile.

Finding charsets

Character sets are declared in the email header, email part headers and the subject. In Microsoft® Outlook®, you can view this information under the ViewOptions menu (see Internet Headers). In the email header and part headers, the charset declaration typically looks like this:

Content-Type: text/plain; charset=iso-8859-2

The underlined part is the character set name. In subjects, the charset is encoded as:

Subject: ?=iso-8859-2?B?ASDFGH=

Again, the underlined part is the character set name.

Before blacklisting specific charsets, it is recommended to check what languages/scripts the given charset covers. Note that it is not recommended under any circumstances to blacklist the "utf-8" charset, even when seen in foreign language spam. See the Limitations section for more information.

Copyright © Vamsoft Ltd. 2024. All rights reserved. Document ID adm-oa-charsetblacklist, version 1.