 |
MailShell SpamCatcher Plugin for CommuniGate Pro
|
|
SpamCatcher Plugin Overview
The MailShell Spam Catcher Plugin runs as an External Filter and calculates a spam "score" for
each message being processed. Unlike tools with statically defined patterns for spam messages, the
MailShell SpamCatcher Plugin dynamically retrieves new patterns from SpamCatcher Network thus
providing greater accuracy for new spam messages.
The score ranges from 0 to 100; the higher the message score the more
likely the message is spam. The score info is added to message headers so it can be
processed by Server-Wide, Domain-Wide and Account E-Mail Processing Rules.
By default the added header lines look like this:
X-Junk-Score: 87 [XXXX]
(26%) MESSAGE-ID: INVALID OUTLOOK MESSAGE-ID
(24%) MESSAGE-ID: contains a spam tool Message-ID (12-zeroes variant)
(50%) DATE: contains an invalid date header (timezone does not exist)
X-Alert: possible spam!
X-Color: red
Besides the digital score value, the header field contains a "bar score" to simplify automated message processing:
the more 'X' characters the higher the score. The following ratios between the digital and bar scores
are used by default:
| Digital score range | Bar score |
| 0 | [] |
| 1-39 | [X] |
| 40-76 | [XX] |
| 77-84 | [XXX] |
| 85-89 | [XXXX] |
| 90-99 | [XXXXX] |
| 100 | [XXXXXX] |
Every day at midnight the Plugin generates a report message about
the number of mails processed and their spam scores. By default the report message is mailed
to postmaster address from the CommuniGate main domain.
Note: The MailShell Spam Catcher Plugin is available only for some platforms
supported with the CommuniGate Pro server software. Before you order the SpamCatcher Plugin License,
make sure the plugin is available for your CommuniGate Pro Server platform.
Note: The MailShell Spam Catcher Plugin requires CommuniGatePro version 5.2.3 or later.
Download the SpamCatcher Plugins
-
MailShell SpamCatcher plugins are available for certain platforms only.
| Operating System |
CPU |
Download |
via http |
via ftp |
| Sun Solaris |
Sparc |
 |
 |
| x86 |
 |
 |
| Linux (RedHat, SuSE, Debian) |
x86 |
 |
 |
| x86_64 |
 |
 |
| FreeBSD 6.2+ |
x86 |
 |
 |
| x86_64 |
 |
 |
| FreeBSD 7.0+ |
x86 |
 |
 |
Microsoft Windows NT/2000/XP/2003/Vista Microsoft Windows 95/98/ME |
x86 |
 |
 |
Apple MacOS X (Darwin) version 10.4.x and greater |
UB(Intel+PowerPC) |
 |
 |
The current version of the Plugin is 3.1
Installing on a Unix System.
Installing on a MS Windows System.
- Change the current directory to the CommuniGate Pro base directory.
- Download the CGPSpamCatcher-Win32-Intel.zip Plugin archive file.
- Unpack the Plugin archive with any "unzip" tool.
pkunzip CGPSpamCatcher-*.zip
The CGPSpamCatcher directory will be created inside the base directory.
- Proceed with Testing the SpamCatcher Plugin.
Installing on a MacOS X (Darwin) System.
- Make sure you're running MacOS X version 10.2.x or later.
- Log in as a super-user (root).
- Unpack the CGPSpamCatcher-Darwin-UB.tgz archive using any uncompressing utility,
or start the Terminal application and use the shell tar command:
tar xzpf CGPSpamCatcher-Darwin-UB.tgz
the CGPSpamCatcher.pkg package directory will be created in the current directory.
- Install the software by double-clicking the CGPSpamCatcher.pkg icon.
The plugin software will be installed in the /var/CommuniGate/CGPSpamCatcher/
directory.
Note: If you're upgrading, make sure the old copy of the plugin
(the /var/CommuniGate/CGPSpamCatcher/CGPSpamCatcher application) is not running.
Stop it using CommuniGate Pro WebAdmin interface.
- Proceed with Testing the SpamCatcher Plugin.
Note: There's an alternative way to install it from Terminal program using this command:
installer -pkg ./CGPSpamCatcher.pkg -target /
Updating the SpamCatcher Plugin.
When updating the SpamCatcher Plugin to a newer version, do the following steps:
- If you made any changes to the default configuration - save the contents of the following files:
- CGPSpamCatcher.cfg
- data/spamcatcher.conf
- data/approvedsenders
- data/blockedsenders
- Stop the current copy of the plugin application via CommuniGate Pro WebAdmin interface.
- Delete all files form CGPSpamCatcher directory.
- Install the new version of the Plugin as described above in this document.
- Revise the new CGPSpamCatcher.cfg, data/spamcatcher.conf and data/{approved|blocked}senders files and re-apply the necessary changes.
Testing the SpamCatcher Plugin.
On a Unix System:
- Change the current directory to the CommuniGate Pro base directory:
cd /var/CommuniGate
- Launch the CGPSpamCatcher application from its directory:
CGPSpamCatcher/CGPSpamCatcher
Important: You need to launch it as written above, not from the CGPSpamCatcher directory as ./CGPSpamCatcher
It should print out the following lines:
* CGPSpamCatcher plugin version n.m platform-processor build date started
* CGPSpamCatcher Engine initializing, please wait...
* CGPSpamCatcher engine x.y.z initialized
* Last Rules update time: date time
Note: if the extended_rules option is enabled
the initialization can take up to several minutes.
- Type:
1 FILE CGPSpamCatcher/test.msg
The plugin should answer with ADDHEADER followed by a message header line with some score.
- Quit CGPSpamCatcher by pressing Ctrl-D.
Note: The distributive for some platforms contain the statically-linked CGPSpamCatcher-Static executable.
If regular CGPSpamCatcher executable doesn't work for your OS,
try the CGPSpamCatcher-Static instead of CGPSpamCatcher.
On a MS Windows System:
- Change the current directory to the CommuniGate Pro base directory:
cd "C:\CommuniGate Files"
- Launch the CGPSpamCatcher.exe application from its directory:
CGPSpamCatcher\CGPSpamCatcher.exe
Important: You need to launch it as written above, not from the CGPSpamCatcher directory as CGPSpamCatcher.exe
It should print out the following lines:
* CGPSpamCatcher plugin version n.m Win32-Intel build date started
* CGPSpamCatcher Engine initializing, please wait...
* CGPSpamCatcher engine x.y.z initialized
* Last Rules update time: date time
Note: if the extended_rules option is enabled
the initialization can take up to several minutes.
- Type:
1 FILE CGPSpamCatcher\test.msg
The plugin should answer with ADDHEADER followed by a message header line with some score.
- Quit CGPSpamCatcher.exe by pressing Ctrl-Z.
Integrating the SpamCatcher Plugin with CommuniGate Pro.
Step #1: Create the Helper
Please check the External Filters section of the CommuniGate Pro manual.
Open the General page in the Settings section of the WebAdmin Interface and click the Helpers link.
Create a Helper for the SpamCatcher Plugin:
Note: For a MS Windows system the Program Path should be
CGPSpamCatcher\CGPSpamCatcher.exe with the back slash "\" as the path separator.
Note: For a FreeBSD system you may need to specify the absolute path to the plugin, e.g. /var/CommuniGate/CGPSpamCatcher/CGPSpamCatcher
Step #2: Create the Scanning Rule
To invoke the SpamCathcer Helper you should create a Server-Wide
Rule
with "ExternalFilter SpamCatcher" action. The Scanning Rule will apply SpamCatcher to the
message and the spam score will be added to the message header.
Note: It must be a Server-Wide Rule, not Domain-Wide or Account-level.
The recommended Scanning Rule is as follows:
This rule skips messages from the MAILER-DAEMON address (such as non-delivery reports,
return-receipts, etc.), skips messages from
Client IP Addresses
and from authenticated senders,
and includes only messages for local accounts and mailing lists.
Note: The SpamCatcher License
limits the number of messages the Plugin can scan within any 60 minute
period. If the E-mail traffic exceeds the licensed limit, the Plugin will let the
messages go through unrated. Without the license you can rate up to 5 messages per hour.
Step #3: Dealing with the Rated Messages
SpamCatcher by itself doesn't block spam, it only assigns a spam score to the messages.
To actually block spam you need to create yet another Rule which blocks messages according to their spam score.
There are many scenarios possible:
Scenario #1: suitable for small companies where you can assign one person (e.g. postmaster) to
look through the spam messages daily to check for false positives, and if any false positives
found - redirect them to the appropriate persons.
-
Create a Server-Wide Rule with the following contents:
This Rule moves the incoming messages with score 90 and greater to the "spam_box" mailbox
of the postmaster@domain.com account.
The "Discard" action is required to prevent the message from going to the
initially intended destination (INBOX mailbox).
Note: The priority of this Server-Wide Rule must be lower than the priority of the Scanning Rule.
Scenario #2: suitable for large companies and ISPs. Let users to deal with spam on their own.
-
Create one Domain-Wide rule or many Account-level rules for each account with the following contents:
This Rule moves the incoming messages with score 90 and greater to the "Junk" mailbox
of the original recipeint account. The users should regularly check their "Junk" mailboxes and
purge them. Checking the X-Junk-Score header with different number of X's
different users may have different thresholds for spam. You still can check for
the X-Alert: possible spam! header presence as in the first example, but you'll be tied to
the administrator's choice of the threshold value.
The "Discard" action is required to prevent the message from going to the
initially intended destination (INBOX mailbox).
Note in the example above, the "*" in [XXXXX* is necessary to filter all messages
scored above 5 X's. Without it, the rule will only filter out messages with 5 X's.
The "Junk" mailbox from the above example must exist in every account in the domain. Otherwise the Rule will fail
and the message will be delivered into the user's INBOX.
Scenario #3: suitable for large companies and ISPs for users who don't have access to
mailboxes other than INBOX, e.g. POP3 users.
-
Create one Domain-Wide rule or many Account-level rules for each account with the following contents:
This Rule marks subjects of spam messages with [SPAM] prefix.
Note: This scenario is only possible with CommuniGate Pro 4.3c4 and later.
Scenario #4:suitable for companies with relatively small input traffic, available from CommuniGate Pro version 5.1 and greater.
-
In CommuniGate Pro version 5.1 and greater you can enqueue messages synchronously.
Use the WebAdmin Interface to configure the Enqueuer component. Open the Queue page in the Settings->Mail realm.
Clear off the checkbox of the "Enqueue Asynchronously" option:
Please see the details in CommuniGate Manual.
Create a Server-Wide Rule with the following contents:
When enqueueing synchronously, when a message is rejected with a Server-Wide Rule it is rejected
on SMTP level with 5xx error code, rather than accepted and bounced.
In any scenario it's not recommend to discard spam messages blindly without saving them because
of the possible false positives. It's either highly not recommended to automatically reject spam
(unless you're in synchronous mode using scenario#4) because usually the return addresses are
forged and the rejection notice message will go to an innocent person or a spamtrap, which may
result in your server to become blacklisted. When rejecting in syncronous mode,
the sending host will get an error during SMTP transaction and there will be no
bounce message generated by your server.
The recommended threshold (the score you start treating messages as spam) is between 85 and 91.
If not enough spam is caught then lower the threshold to 85; if there too many false positives,
raise the threshold to 90.
The Plugin Configuration File
On startup the SpamCatcher Plugin reads the contents of the CGSpamCatcher.cfg file from the
current directory. The format of the file data elements is described in
http://www.communigate.com/CommuniGatePro/Data.html.
The description of the data elements you may find in the CGSpamCatcher.cfg file.
The default CGPSpamCatcher.cfg is available here.
-
The default CGPSpamCatcher.cfg has the following contents:
- Header="X-Junk-Score: ^1 [^2]";
- This line defines the header to be added to the rated messages.
The ^1 combination is replaced with the digital message score.
The ^2 combination is replaced with the bar score.
To create a multi-line header use the \e combination as a line breaker.
Make sure each line is a RFC-compliant header, it would be best if you start each with the "X-" prefix.
Example: Header="X-SpamCatcher-Score: ^1\eX-Bar-Score: ^2"
- AlertLevel=90;
- This line defines the score which triggers the AlertHeader to be
inserted into the message, and the messages whose source and destination addresses
will be listed in the daily reports as Spam Sources and Targets.
- AlertHeader="X-Alert: possible spam!\eX-Color: red";
- This line defines the header to be added to the rated messages
if its score is equal or greater than the value of AlertLevel.
The "X-Color: red" combination changes the message color
when viewed via CommuniGate Pro WebMail interface.
Note: To dispatch spam via Rules you may check for the AlertHeader presence instead
of checking the message scores, but this method is not flexible because different users
may want to use different scores as a threshold.
- SubmittedDirectory = "Submitted";
- This line defines the CommuniGatePro Submitted directory
required for submitting the reports via
PIPE module. There can be relative
or absolute path, e.g. "/var/CommuniGate/Submitted"
- OnLicenseLimitReached=Pass;
- This line defines the behaviour of the Plugin when the number of messages exceeds the licensed limit.
When it is set to "Delay" the Plugin suspends the CommuniGate Pro Queue processing module
until next window,
when it is set to "Pass" the Plugin lets extra messages to go through unrated.
Messages not scored will not have any X-Junk-Score headers. You will also be notified
in CommuniGate log when your license has reached its limit.
The SpamCatcher Engine Configuration File
In initialization time the SpamCatcher Engine reads configuration
options specified in the data/spamcatcher.conf file.
-
The following lists the valid options. Note that all arguments are specified as strings. If
options are not explicitly set they will assume their default value.
- approved_domain_list
- This option allows specifying body domains and IPs which should be always approved.
Format: domain1,domain2,...
Default: none
- approved_ip_list
- This option allows specifying IPs which should be always approved.
Format is a comma delimited list of single IPs or ranges of IPs.
Ranges can be specified in three ways:
a) startingIP-endingIP
b) IP/netmask
c) IP.
If the first un-ignored IP in Received: headers match any in this list then message
is scored a 1 and no other checks are made.
Default: none
Example: approved_ip_list=2.3.4.5-2.3.4.8
- auto_training_threshold
- Sets a threshold for auto-training.
If a message is scored at or above the high threshold, that message is considered a definite
spam and is then used to train all the enabled Bayesian modules (rules and/or word) but not
sender or fingerprint. If a message is scored at or below the low threshold, that message
is considered a definite ham and is then used to train all the enabled Bayesian modules
(rules and/or word) but not sender or fingerprint.
To disable auto training set this option to "-1:101".
Format: low:high
Example: auto_training_threshold=2:98
Default: -1:101
- blocked_charset_list
- Allows blocking by character set.
The format is a comma delimited list of "char-set" and "offset" pairs which are themselves
delimited by a colon. A char-set to foreign language map can be found at:
http://www.w3.org/International/O-charset-list.html
Offsets are optional and default to a value of 100.
Note that language to char-set mapping is not 100% accurate so blocking charsets can result in false positives.
Example: blocked_charset_list=cp-950:65,euc-tw:55
Default: none
- blocked_country_list
- Allows blocking by country.
The format is a comma delimited list of "country" and "offset" pairs which are themselves
delimited by a colon. Country is specified as two letter code (ISO-3166). Offsets are
optional and default to a value of 100.
If an IP address in a received header matches a listed country, then that offset is used
and we stop checking any other IPs. If one is in Russia and the user has set "ru:someOffset",
then it'll apply.
The country codes aren't applied to sender addresses. If you want to block From addresses
ending in .ru, you can use the email address block list.
Note that it is possible for a message to have travelled through various countries before
reaching the final destination.
Note that this option is currently only 98% accurate so blocking countries may result
in false positives.
Default: none
Example: blocked_country_list= CN:50
- blocked_domain_list
- This option allows specifying body domains and IPs which should always be blocked.
Format: domain1,domain2,...
Default: none
- blocked_ip_list
- This option allows specifying IPs which should be always blocked.
Format is a comma delimited list of single IPs or ranges of IPs.
Ranges can be specified in three ways:
a) startingIP-endingIP
b) IP/netmask
c) IP.
If any un-ignored IP in Received: headers match any in this list
then message is scored a 100 and no other checks are made.
Default: none
Example: blocked_ip_list=2.3.4.5-2.3.4.8
- convert_unicode
- Improves accuracy and throughput for email message bodies in Unicode
Default: yes
Valid values: yes,no
- custom_rules_list
- Allows user to specify a custom list of rules (i.e. spam, ham, or phishing words/phrases)
Default: none
Arguments: filename1, filename2, ...
In order to utilize the custom_rules_list option, you must first create one or more custom rule files in the "data" subdirectory.
Custom rules apply to Subject Line, Body, and attachments.
The custom rules list specifies a comma separated list of custom rule file names.
For example:
custom_rules_list=filename1, filename2
Another example:
custom_rules_list=spam_phrases.csv,phish_phrases.csv
Custom rules files contain phrases in the following format on separate lines:
phrase,type,confidence,caseSensitivity
phrase can be any text except commas. Any commas in the phrase should be deleted.
type can be either SPAM, PHISH, BOUNCE, ADULT, or FRAUD. If anything other than those are specified, the TYPE is automatically
assumed to be SPAM.
Confidence can be from 1 to 100. If type is SPAM, then 100 indicates a higher confidence of spamminess. If type is PHISH, then 100
indicates a higher confidence of phishiness. If type is BOUNCE, then 100 indicates a higher confidence that phrase is related to bounce
A higher confidence is more likely to impact the final score.
A value of 100 is a special case. If type is SPAM, then 100 will score the message as 100. If type is PHISH, then 100 will score the
message as 100. If type is BOUNCE, then 100 will score the message as 100. As always, any whitelist overrides any blacklist.
caseSensitivity value of 1 means that the phrase will be case sensitive; 0 means that the phrase will be case insensitive.
For example:
spamming is fun,SPAM,100,0
phishing is Phun, PHISH,90,1
return to sender,BOUNCE,80,0
The first line means that all variations of "spamming is fun" are considered as SPAM with a confidence of 100. The phrase is case insensitive.
The second line means that all variations of "phishing is phun" are considered as PHISH with a confidence 90. The phrase is case sensitive.
The third line means that all variations of "return to sender" are considered as BOUNCE with a confidence 80. The phrase is case insensitive.
- dbg_logfile
- Redirects log output to a file in the data directory.
Arguments: filename
Default: none
Example: dbg_logfile=spamcatcher.log
- dnsbl_list
- Specifies a list of DNS Blocklist (MSBL) servers to query with
domains and IPs extracted from the message body.
Arguments: servername:response:offset, servername:response:offset,.
Default: none
- dnsbl_max_domains
- Allows limiting how many domains and IPs are queried against the DNS
Blocklist server.
Note that the total number of queries will be the number of domains and IPs
extracted from message body (up to a max of dnsbl_max_domains)
multiplied by the number of servers specified in dnsbl_list. Note that
domains which match against the "ignored_domain_list" option do not
count towards the dnsbl_max_domains limit.
Arguments: integer
Default: 4
- dnsbl_multihit
- Allows control over limiting further queries once a domain is found on
any query.
Valid values: yes,no
Default: no
- dnsbl_threshold
- Since DNSBL checks can introduce latency and a decrease in
performance, this option allows running DNSBLs check conditionally
based on the score prior to DNSBL checks.
If score is greater than the "high" value then only those DNSBL servers which
can bring score below "high" value are queried.
If score is less than the "low" value then only those DNSBL servers which
can bring score above "low" value are queried.
If score is between "low" and "high" then all DNSBL servers are queried.
Arguments: low:high
Default: 1:99
- dnsbl_timeout
- Allows setting a maximum timeout for finishing all DNS BL queries.
Responses are only used from those servers which responded in time.
If value is "0" then no timeout is enforced.
Arguments: integer
Default: 5
- enable_dnscache
- Enable internal caching of DNS requests.
Arguments: yes | no
Default: yes
- dnscache_enable_filecache
- If enabled, DNS cache will store
entries on disk on shutdown and read
from disk on initialization.
Arguments: yes | no
Default: yes
- dnscache_dns_server
- DNS servers can now be explicitly specified to override the default.
Arguments: Name of DNS server
Default: none
- dnscache_max_entries
- Limits number of entries in internal DNS cache.
Arguments: integer
Default: 100000
- dnscache_min_ttl
- This option allows setting a minimum TTL for entries in the SDK's internal DNS cache.
Notes: The option is specified in units of seconds. For
those DNS responses whose TTL value is less than dnscache_min_ttl,
the SDK's internal cache will instead use dnscache_min_ttl.
Arguments: integer
Default: 0 seconds
- enable_direct_dns
- When set to yes and if dnscache_dns_server is not
specified, then the SDK will make LiveFeed requests directly to
the Mailshell LiveFeed servers. This option is ignored if
dnscache_dns_server is specified as it has precedence.
Notes: This option should be set to yes when direct queries are
more efficient than the default DNS servers.
Arguments: auto|yes|no
Default: auto
- enable_all_spf
- Controls whether domains which are not in spf_list and sc15.bin will be
checked for SPF compliancy. The enable_realtime_spf option must also be
set to yes. Otherwise, enable_all_spf has no effect.
Default: no
Valid values: yes,no
Note: This option is not supported in the Windows version.
- enable_auto_update_engine_thread
- Launches a thread to automatically check, download, and update new
engine versions. If no, then the engine version must be
manually updated.
Valid values: yes,no
Default: no
- enable_country_training
- Controls whether country routing information should be considered when
training and scoring messages.
Increases accuracy but also increases memory usage and decreases throughput.
enable_rules must be set to yes or this option is ignored.
Default: yes
Valid values: yes,no
- enable_domain_cache
- Enables usage of a domain reputation cache.
If enabled, domains are extracted from messages and compared against a domain reputation caches.
The domain reputation cache is stored in file sc8.bin.<date>.
Default: yes
Valid values: yes,no
- enable_fingerprint_cache
- Enables usage of a fingerprint cache.
Increases accuracy but also increases memory usage and decreases throughput.
Default: yes
Valid values: yes,no
- enable_legitrepute_cache
- Enables usage of a LegitRepute cache to reduce false positives especially for newsletters.
Default: yes
Valid values: yes,no
- enable_msf
- Allows for use of an alternate fingerprinting algorithm known as MSF.
This option is a technology preview for the Engine 3.1 and disabled by default.
Default: no
Valid values: yes,no
- enable_realtime_spf
- This option controls whether live DNS queries will be performed for SPF
checks. This may result in greater latency.
Default: no
Valid values: yes,no
- enable_rules
- Controls whether heuristic rules are used.
Increases accuracy but also increases memory usage.
Default: yes
Valid values: yes,no
- enable_spamcompiler
- Speeds up rules processing but requires a little bit more memory.
Default: yes
Valid values: yes,no
- enable_spamcompiler_cache
- If this option is set to yes, SpamCompiler will store the compiled data on disk instead of memory to reduce memory usage.
Default: yes
Valid values: yes,no
- enable_spamcompiler_v5
- Enables the new SpamCompiler 5.0. If disabled, then SpamCompiler 4.x is
used. SpamCompiler 5.0 is much faster but the results may not be the same as 4.x.
Default: yes
Valid values: yes,no
- enable_filecleanup_on_retrieve
- The SDK, by default, will clean up older rule files from the
configuration directory when a new file is retrieved from the Mailshell
SpamCatcher network.š However, some users of the SDK will want to
archive older rule files.š This can be done by disabling the cleanup feature.
Default: yes
Valid values: yes,no
- enable_filemerge_on_reload
- The SDK, by default, will merge multiple incr files and a full file into a
single updated full file.š This is done to reduce file clutter in the
configuration directory.
Default: yes
Valid values: yes,no
- enable_auto_update_data_thread
- Launches a thread to automatically download and update the engine data.
If no, then the data download and update occurs within the computeScore()
function as in SDK 4.X and before.
Default: yes
Valid values: yes,no
- enable_auto_update_engine_thread
- Launches a thread to automatically download and update the engine code.
If no, then the engine code must be manually updated.
Default: no
Valid values: yes,no
- enable_spf
- Controls whether or not to use Sender Policy Framework (SPF).
If set to yes, then attempt to validate that the sender is allowed to send
from a particular domain based on the domains published policy.
Default: yes
Valid values: yes,no
- enable_stat_file
- Logs IPs, Domains, URLs, suspicious words, etc. to the conf directory on the
file system. Logs can be automatically uploaded to Mailshell's analysis servers.
The logs can be converted to plain text for viewing.
Default: yes
Valid values: yes,no
- enable_stat_file_upload_thread
- Launches a thread to automatically upload stat files to Mailshell's
analysis servers. Requires enable_stat_file=yes.
Default: yes
Valid values: yes,no
- enable_training_updates
- Controls whether the word and rules database can be modified or is read-only after initial load.
A read-only training database is faster.
"yes" - the training databases can be modified.
"no" - the training databases are read-only.
Default: yes
Valid values: yes,no
- enable_word_training
- This option controls whether Bayesian Word Token analysis is used. Accuracy can be greatly improved but more memory is used and it is slightly slower.
Default: no
Valid values: yes,no
- extended_rules
- Enable the extended rule set.
This rule set extension is stored as file sc3.bin.<date>
Increases accuracy, but increases memory usage and increases sartup time.
Default: yes
Valid values: yes,no
- extended_rules2
- Enable use of a second extension to the rule set.
This rule set extension is stored as file sc6.bin.<date>
Increases accuracy, but increases memory usage and increases sartup time.
Default: yes
Valid values: yes,no
- full_training_weight
- Controls whether to give full weight local to training data.
If this option is set to "yes", then scoring will be based solely on
local training data. If option is "no", then both global and local training data will be used.
Default: no
Valid values: yes,no
- ham_threshold
- This option allows you to tell the SDK
to skip slow rule checks if the
message is likely to be ham.
Default: 0
- home_country_list
- This option allows specifying a list of countries which are
considered "home" countries. Messages routed through a country
which is not on this list will be scored more aggressively.
If this option is empty then no penalty will occur.
Countries are specified by their two-letter code as defined in ISO 3166
Default: none
Valid values: us,ca,kr,...
- home_language_list
- This option permits you to set languages which are preferred in your email
messages. The format is a comma delimited list of two character ISO 639 language
codes.
Default: none
Valid values: ko,zh,...
- ignored_domain_list
- This option allows specifying body domains and IPs which should
always be excluded from the DNSBL and MSBL checks and ignored.
Arguments: domain1,domain2,...
- ignored_ip_list
- This option allows specifying IPs which should be ignored when doing
RBL checks. You should include all internal IP addresses within the
firewall, not directly accessible from the Internet. This avoids unnecessary
checks and helps identify the actual connecting IP address.
Format is a comma delimited list of single IPs or ranges of IPs. The following are
always implicitly ignored: 10.0.0.0/8, 127.0.0.0/8, 192.168.0.0/16, 172.16.0.0
Ranges can be specified in three ways:
a) startingIP-endingIP
b) IP/netmask
c) IP.
Default: none
Example: ignored_ip_list=2.3.4.5-2.3.4.8
- lbl_list
- The Last Connecting IP is queried
against the LBL server.
You can specify a DNS lookup for the last connecting incoming IP.
Example:
lbl_list=pbl.spamhaus.org
For the last connecting incoming IP, lbl_list is queried instead of rbl_list.
Arguments: servername:response:offset,servername:response:offset
Default: none
- lbl_skip_list
- List of IPs not to query against the
LBL server. If the Last Connecting IP
matches against an IP in lbl_skip_list,
then that IP is queried against RBL
server instead of LBL server.
Arguments: 1.2.3.4,2.3.4.5-2.3.4.8,2.3.4.0/24,5.6.0.0/16
Default: none
- livefeed
- Specifies which server to query for LiveFeed requests. LiveFeed is real-time updates. If blank, then LiveFeed
is disabled.
Arguments: servername
Default: mailshell.net
- livefeed_min_ttl
- This option allows setting a minimum TTL for entries in the SDK's internal LiveFeed cache.
Notes: The option is specified in units of seconds. For those LiveFeed
responses whose TTL value is less than livefeed_min_ttl, the SDK's
internal cache will instead use livefeed_min_ttl.
Arguments: integer
Default: 0 seconds
- max_incr_size
- In order to reduce cpu usage while rule files are updated, the on-disk cache files
(sc*.tmp) are no longer regenerated on every single rule update. Instead they
are regenerated when there is a newer sc*.bin.full file or when the sum of the
sc*.bin.incr grows beyond the number of bytes specified in max_incr_size.
Arguments: integer
Default: 100000
- max_word_entries
- This option specifies the number of word tokens to cache at any time.
The higher the number, the more memory is used but also the higher the
accuracy. enable_word_training must be yes or this option is ignored.
enable_word_training must be set to yes or this option is ignored.
Default: 50000
- message_scansize
- Instructs the Engine not to read more than X bytes when processing a message.
Default: 20000
- min_training
- Initially, only the rule weights are used to compute the spam score.
Training data will only be considered once minimum set of training data has been reached.
The default minimum is 100 which means that must train on at least 100
equivalent known ham messages and 100 equivalent spam messages for a total of 200 messages
before the training data replaces the rule weights. If the number is too low then
the accuracy could be poor due to insufficient data. If the number is too
high, then the training data will not be fully taken advantage of.
Default: 100
- msbl_list
- Specifies a list of Mailshell Specific Blocklist (MSBL) servers to query
with domains and IPs extracted from the message body.
Arguments: servername:response:offset,servername:response:offset
Default: none
- msf_bulk_threshold
- This option specifies how many similar messages are required in order
to consider a message bulk.
enable_msf must yes or this option is ignored.
Default: 5
Valid values: an integer
- msf_cleanup_threshold
- This option specifies an internal variable which determines
how frequently the in-memory MSF cache is pruned.
enable_msf must yes or this option is ignored.
Default: 5
Valid values: an integer
- msf_match_threshold
- This option specifies the match percent threshold for two fingerprints.
If the match percent is higher than this threshold then messages
are considered to be the same.
enable_msf must yes or this option is ignored.
Default: 65
Valid values: an integer between 0 and 100
- msf_max_entries
- This option specifies the number of MSF fingerprints to keep in memory.
The higher the number, the more memory is used but also the higher the accuracy.
enable_msf must yes or this option is ignored.
Default: 5000
- netcheck
- Whether to communicate with the Mailshell SpamLabs to determine scoring via old slower v4.x protocol.
Default: no
Valid values: yes,no,auto
Note: 'auto' setting allows the SDK to automatically use the netcheck feature as a fallback to LiveFeed queries.
- netcheck_threshold
- Allows running netchecks conditionally based on the score.
Network is only queried if score is at or between the 'low' and 'high' range
specified via this option.
Networks can introduce latency and decrease performance, hence an option for conditional checks.
Format: low:high
Example: netcheck_threshold=85:95
Default: 1:99
- nonexistent_user_list
- If the RCPT TO: address from SMTP
envelope matches an email address
in this list, then the statistics file will
record tokens in email message as
being sent to a nonexistent address.
Nonexistent addresses should not be
posted or submitted anywhere. No
legitimate mail should be expected to
be sent to these addresses.
Addresses must match exactly
ignoring case; wildcard entries are not
supported.
Valid values: emailaddress1, emailaddress2, .
Default: none
- proxy_host
- Specify the host name and port number of a HTTPS proxy to connect to
the Mailshell servers.
Default: none
Example: proxy_host=squid.corp.com:8080
- proxy_userpwd
- Specify the host name and port number of a HTTPS proxy to connect to
the Mailshell servers.
Default: none
Example: proxy_userpwd=joe:mypassword
- proxy_authtype
- Specifies which type of HTTP Proxy authentication should be used.
Arguments: auto|basic|digest
Default: auto
- rbl_list
- Specifies a list of Realtime Blackhole List (RBL) servers to query when
analyzing messages.
Format: rbl_list=server:response:offset,server2:response2:offset2,...
rbl_list expects a comma separated list of RBL entries. In turn, each
RBL entry consists of up to 3 colon separated items. Those items are:
1) server - name of an RBL server
2) response - the response given by an RBL server when an IP address is
listed e.g. 127.0.0.2, 127.0.0.3, 127.0.0.4, etc. This is optional. The
default is that all responses apply.
3) offset - The numeric offset to apply to the spam score if an IP
address is listed on this RBL server. This is optional. The default is
an offset of 100
Default: none
Example: rbl_list=bl.spamcop.net::40,bl.spamcop.net:127.0.0.3:75
- rbl_max_ips
- Allows limiting how many IP addresses are queried against the RBL server.
Note that the total number of RBL queries will be the number of IP addresses
in the Received: headers (up to a max of rbl_max_ips) multiplied by the number
of RBL servers specified in "rbl_list".
Note that IPs which match against the "ignored_ip_list" option do not count
towards the rbl_max_ips limit.
Default: 4
- rbl_multihit
- Allows control over limiting further RBL queries once an IP address is found on any RBL query.
RBL servers are checked in the order listed.
"no" = no further queries are made once a positive hit is made on an RBL query.
"yes" = all RBL servers are checked in parallel and the hits which return are added together.
Default: no
Valid values: yes,no
- rbl_threshold
- Since RBL checks can introduce latency and a decrease in performance,
this option allows running RBLs check conditionally based on the score
prior to RBL checks.
If score is greater than the "high" value then only those RBL servers
which can bring score below "high" value are queried.
If score is less than the "low" value then only those RBL servers which
can bring score above "low" value are queried.
If score is between "low" and "high" then all RBL servers are queried.
Format: low:high
Example: rbl_threshold=40:99
Default: 1:99
- rbl_timeout
- Allows setting a maximum timeout for finishing all RBL queries.
RBL responses are only used from those RBL servers which responded in time.
If value is "0" then no timeout is enforced.
Default: 5
Valid values: 0 .. 2^32-1
- ruleupdate
- How often to retrieve new rules from the Mailshell SpamLabs. The
value is specified in units of integral seconds. Note that a value of "0" disables
this feature and rule files will not be updated.
Default: 300
Valid values: 0, 60 .. 2^32-1
- rule_weights
- Allows overriding weights associated with individual Mailshell Rules.
The format is a comma delimited list of "ruleid" and "weight" pairs. The pairs
are themselves delimited with a colon.
Default: none
- scan_attachments
- This controls whether the Engine will scan and consider attachments when
computing the spam score.
Usually spam is not sent as an attachment and hence the default is
"no". However, some installations may have their incoming mail pass through
services such as mail forwarders, virus scanners, etc. which encapsulates the
original email as MIME attachment. Setting this option to
"yes", will instruct the Engine to consider those attachments when
computing spam score.
Default: no
Valid values: yes,no
- snhost
- This parameter configures which
SpamCatcher Network host to query
for rule updates and/or netchecks. If snhost is not set, requests will go to
db11.spamcatcher.net.
Default: none
Arguments: servername
- sntimeout
- Limit how long single request to the Mailshell SpamLabs can take.
The value is specified in units of integral seconds.
Note that a value of "0" disables this feature and no limit will be placed.
Default: 5
Valid values: 0 .. 2^32-1
- spam_threshold
- This option allows you to tell the SDK to stop analyzing the message once a
score has been reached. This can reduce the number of rules and other
checks that are performed, thus improving throughput.
Default: 90
- spambait_user_list
- If the RCPT TO: address from SMTP
envelope matches an email address
in this list, then the statistics file will
record tokens in email message as
being sent to spambait address.
Spambait addresses may be posted
in web pages or submitted
somewhere. No legitimate mail
should be expected to be sent to
these addresses.
Addresses must match exactly
ignoring case; wildcard entries are not
supported.
Default: none
Arguments: emailaddress1, emailaddress2, ...
- spamcompiler_version
- TSpecifies what SpamCompiler version to use. When set to "auto", the SDK will choose the best engine to use.
Notes: Improves accuracy and throughput.
Default: auto
Arguments: auto | 5.2 | 5.1
- spf_list
- Arguments:š domain1:"weights1":"spf record1",domain2:"weights2":"spf record2": ...
This option allows you to override a domain's SPF record. Through the "weights",
you can specify how a particular SPF result will affect the message score.
Example: spf_list=mydomain.com:"f=100 ff=100 fp=-1":"v=spf1 ip4:2.3.4.5 ip4:3.4.5.6 -all"
Default: none
- spf_fail_weight
- This option allows you to specify the default offset to apply to scores when the SPF result is Fail for the From domain.
Default: 10
- spf_neutral_weight
- Specify an offset for this SPF result code.
This corresponds to an SPF neutral result when checking the From domain.
Default: 0
- spf_none_weight
- Specify an offset for this SPF result code.
This corresponds to an SPF none result when checking the From domain.
Default: 0
- spf_pass_weight
- Specify an offset for this SPF result code.
This corresponds to an SPF pass result when checking the From domain.
Default: 0
- spf_permerror_weight
- Specify an offset for this SPF result code.
This corresponds to an SPF permanent error result when checking the From domain.
Default: 0
- spf_softfail_weight
- Specify an offset for this SPF result code.
This corresponds to an SPF SoftFail error result when checking the From domain.
Default: 5
- spf_temperror_weight
- Specify an offset for this SPF result code.
This corresponds to an SPF temporary error result when checking the From domain.
Default: 0
- spoofed_sender_list
- Allows blocking of spammers who spoof selected domains.
For example, spammers often use the recipient's domain name as the
From: domain name. This list allows you to specify which mail servers are
allowed to use which domain names in the From: address.
Format: Address : IP range: offset, Address2 : IP range2: offset2
Example: spoofed_sender_list=corp1.com:223.34.122.1:100
Default: none
- stat_file_upload_url
- URL where statistics files will be uploaded.
Default: http://tisdk.mailshell.net/cgi-bin/mailsh.cgi
- target_throughput
- This option allows you to specify the desired throughput in messages per
second. The Mailshell SDK will attempt to reach that level by optimizing the
rules that are run. It is possible that accuracy may be reduced.
A value of 0 disables the option.
Default: 0
- training_write_buffer
- While training, the SpamCatcher Engine will process a configurable amount of messages
before writing the training database to disk. This option determines how many messages
to process before writing to disk. Writing to disk is expensive so this number should be
made as large as possible for maximum performance. If program is unexpectedly terminated
before buffer has been written to disk, then training performed since the last disk write
will be lost. The buffer is written to disk on normal termination.
Default: 1000
- use_both_mimesections
- The Engine will analyze both text/plain and text/html MIME sections in a
message. If additional performance is desired, it is possible to only analyze
one section. If this option is set to "no", then only one section will be analyzed.
Default: yes
Valid values: yes,no
- use_score_history
- Enable the tracking of historical scores for repeat senders.
Default: no
Valid values: yes,no
- use_score_offsets
- Enable the Training Database.
Default: no
Valid values: yes,no
Recommended Changes from default
-
- Safest
| Option Name |
Argument |
Default |
Recommended Setting |
| enable_auto_update_data_thread | yes | no | yes | no |
| enable_direct_dns | auto | yes | no | auto | no |
| enable_dnscache | yes | no | yes | no |
| enable_domain_cache | yes | no | yes | no |
| enable_fingerprint_cache | yes | no | yes | no |
| enable_legitrepute_cache | yes | no | yes | no |
| enable_spf | yes | no | yes | no |
| enable_stat_file | yes | no | yes | no |
| enable_stat_file_upload_thread | yes | no | yes | no |
| extended_rules | yes | no | yes | no |
| extended_rules2 | yes | no | yes | no |
| spamcompiler_version | auto | 5.2 | 5.1 | auto | 5.1 |
- Fastest
| Option Name |
Argument |
Default |
Recommended Setting |
| approved_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| approved_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| blocked_charset_list | charset1:offset1, | NONE | user specified extensive list |
| blocked_country_list | countryCode1:offset1, | NONE | user specified extensive list |
| blocked_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| blocked_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| custom_rules_list | filename1,filename2 | NONE | user specified extensive list |
| dnsbl_max_domains | integer | 4 | 2 |
| dnsbl_timeout | integer | 5 | 3 |
| enable_auto_update_engine_thread | yes | no | no | yes |
| enable_stat_file | yes | no | yes | no |
| ham_threshold | integer | 0 | 100 |
| home_country_list | us,ca,kr,... | NONE | user specified extensive list |
| ignored_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| ignored_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| lbl_skip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| max_incr_size | integer | 100000 | 10000000 |
| nonexistent_user_list | emailaddress1,emailaddress2, ... | NONE | user specified extensive list |
| rbl_timeout | integer | 5 | 3 |
| spam_threshold | integer | 90 | 0 |
| spambait_user_list | emailaddress1,emailaddress2, ... | NONE | user specified extensive list |
| target_throughput | integer | 0 | 100 |
- Most Accurate
| Option Name |
Argument |
Default |
Recommended Setting |
| approved_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| approved_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| blocked_charset_list | charset1:offset1, | NONE | user specified extensive list |
| blocked_country_list | countryCode1:offset1, | NONE | user specified extensive list |
| blocked_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| blocked_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| custom_rules_list | filename1,filename2 | NONE | user specified extensive list |
| dnsbl_max_domains | integer | 4 | 8 |
| enable_all_spf | yes | no | no | yes |
| enable_auto_update_engine_thread | yes | no | no | yes |
| enable_realtime_spf | yes | no | no | yes |
| home_country_list | us,ca,kr,... | NONE | user specified extensive list |
| home_language_list | lang1,lang2 | NONE | user specified extensive list |
| ignored_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| ignored_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| lbl_skip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| nonexistent_user_list | emailaddress1,emailaddress2, ... | NONE | user specified extensive list |
| rbl_max_ips | integer | 4 | 8 |
| spam_threshold | integer | 90 | 100 |
| spambait_user_list | emailaddress1,emailaddress2, ... | NONE | user specified extensive list |
| spoofed_sender_list | address1:iprange1:offset1, | NONE | user specified extensive list |
- Least Memory
| Option Name |
Argument |
Default |
Recommended Setting |
| enable_auto_update_data_thread | yes | no | yes | no |
| enable_auto_update_engine_thread | yes | no | no | yes |
| enable_dnscache | yes | no | yes | no |
| enable_domain_cache | yes | no | yes | no |
| enable_fingerprint_cache | yes | no | yes | no |
| enable_legitrepute_cache | yes | no | yes | no |
| enable_spf | yes | no | yes | no |
| enable_stat_file | yes | no | yes | no |
| enable_stat_file_upload_thread | yes | no | yes | no |
| extended_rules | yes | no | yes | no |
| extended_rules2 | yes | no | yes | no |
| livefeed | servername | mailshell.net | |
| max_incr_size | integer | 100000 | 50000 |
| spamcompiler_version | auto | 5.2 | 5.1 | auto | 5.1 |
Notes:
* Removing training files can lower memory: mv data/scw* data/scw.bak, mv data/scr* data/scr.bak
* Turning off spamcompiler_cache will reduce peak memory usage but raise average memory.
- Least CPU
| Option Name |
Argument |
Default |
Recommended Setting |
| approved_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| approved_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| blocked_charset_list | charset1:offset1, | NONE | user specified extensive list |
| blocked_country_list | countryCode1:offset1, | NONE | user specified extensive list |
| blocked_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| blocked_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| custom_rules_list | filename1,filename2 | NONE | user specified extensive list |
| dnscache_min_ttl | integer | 0 | 3600 |
| enable_auto_update_engine_thread | yes | no | no | yes |
| enable_stat_file | yes | no | yes | no |
| ham_threshold | integer | 0 | 100 |
| home_country_list | us,ca,kr,... | NONE | user specified extensive list |
| ignored_domain_list | domain1,domain2,... | NONE | user specified extensive list |
| ignored_ip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| lbl_skip_list | 1.2.3.4,2.3.4.5-2.3.4.8, | NONE | user specified extensive list |
| livefeed_min_ttl | integer | 0 | 3600 |
| max_incr_size | integer | 100000 | 10000000 |
| nonexistent_user_list | emailaddress1,emailaddress2, ... | NONE | user specified extensive list |
| spam_threshold | integer | 90 | 0 |
| spambait_user_list | emailaddress1,emailaddress2, ... | NONE | user specified extensive list |
| target_throughput | integer | 0 | 100 |
Additional Notes:
- Any rules in rule_weights will override rule training for that rule.
- IP address priority: approved_ip_list, blocked_ip_list, ignored_ip_list, blocked_country_list, rbl_list/lbl_list, netcheck, extended_rules.
- By default, all approved_* and blocked_* lists will call auto-training if triggered.
- If enable_rules, LiveFeed, rbl_list, and dnsbl_list are off, then all scores will be zero (0) until min_training is reached.
- Country information is added to the training database only if enable_word_training is on.
- The country database is not loaded if blocked_country_list is empty and enable_word_training is off or enable_country_training=no.
- Score of 0 is reserved for approved lists and a score of 100 is reserved for blocked lists.
- If you do not want to use any of the Mailshell training data files or the country data file, you can delete them. This can save memory and improve performance.
- If all of the RBLs in the rbl_list have the same offset and rbl_multihit is off, then all received headers up to rbl_max_ips are queried against each of the RBLs list in parallel and the first hit is the final result. Otherwise, some optimizations can be performed.
- If any of the RBLs in the rbl_list have a different offset and rbl_multihit is off, then all received headers up to rbl_max_ips are queried against each of the RBLs list in parallel. The first hit in the same offset block as RBL list #1 (contiguous list with the same offset as list #1) is the final result. If timeout is reached, the first hit in the beginning of the list is the final result.
Approved Senders List
The Mailshell Spam Engine will accept a list of sender addresses or domains
whose messages never will be considered spam.
The addresses are compared against the address in From: header of the message.
The control file for Approved Senders is located at data/approvedsenders
and will contain one line per sender. Each line can contain an email address or
a domain. Addresses are of the format mailbox@domain and domains are
simply of the format domain.
Examples:
user@isp.com ## accept user@isp.com
mycompany.net ## accept all emails from @mycompany.net
Leading and trailing white space is ignored. Lines beginning with the #
character are considered comments. Wildcard characters and regular expressions are not supported.
After making any changes in Approved Senders or Blocked Senders list files,
in order to make the changes take effect you should tell the plugin to restart the
SpamCatcher Engine by creating "update.sig" file with any contents in the current directory
of the plugin. Example: echo >update.sig The Plugin will delete that file.
Blocked Senders List
The Mailshell Spam Engine will accept a list of sender addresses or domains whose
messages are always considered spam.
The addresses are compared against the address in From: header of the message.
The control file for Blocked Senders is located at data/blockedsenders
and will contain one line per sender. Each line can contain an email address or a domain.
Addresses are of the format mailbox@domain and domains are simply of the
format domain.
Examples:
user@isp.com ## block user@isp.com
spammer.net ## block all emails from @spammer.net
cn ## block all emails from China (.cn)
Leading and trailing white space is ignored. Lines beginning with the # character
are considered comments. Wildcard characters and regular expressions are not supported.
After you have made changes in Approved Senders or Blocked Senders list files,
in order to make the changes take effect you should tell the plugin to restart the
SpamCatcher Engine by creating "update.sig" file with any datain it in the current directory
of the plugin. Example: echo >update.sig The Plugin will delete that file.
Precedence of Approved and Blocked Addresses
When an address matches entries in both the Approved Senders and Blocked Senders lists,
the following priority will be observed.
Email addresses will take precedence over domains, e.g. if you block the domain host.net
but approve the specific address joe@host.net, mail from the latter sender will be approved.
In addition, approved addresses will take precedence over blocked addresses if identical
entries exist on both the Approved Senders and Blocked Senders lists.
Rule File Descriptions:
| File Name |
Description |
To enable |
| sc1.bin.full.* |
Basic rules |
enable_rules=yes |
| sc2.bin.full.* |
Basic rules |
|
| sc3.bin.full.* |
Extended Rules |
extended_rules=yes |
| sc4.bin.full.* |
Fingerprint cache |
enable_fingerprint_cache=yes |
| sc5.bin.full.* |
IP-Country database |
use any country option |
| sc6.bin.full.* |
Extra extended cache |
extended_rules2=yes |
| sc7.bin.full.* |
Extended rule cache |
extended_rules=yes |
| sc8.bin.full.* |
SpamRepute domain cache |
enable_domain_cache=yes |
| sc9.bin.full.* |
Legit Repute domain cache |
enable_legitrepute_cache=yes |
| sc9.bin.tmp |
Legit Repute domain compiled cache |
enable_spamcompiler_cache=yes |
| sc10.bin.full.* |
Fingerprint cache |
enable_fingerprint_cache=yes |
| sc10.bin.tmp |
Fingerprint cache |
enable_spamcompiler_cache=yes |
| sc11.bin.full.* |
Extended rule cache |
extended_rules=yes |
| sc11.bin.tmp |
Extended rule compiled cache |
enable_spamcompiler_cache=yes |
| sc12.bin.full.* |
SpamRepute domain cache |
enable_domain_cache=yes |
| sc12.bin.tmp |
SpamRepute domain cache |
enable_spamcompiler_cache=yes |
| sc14.bin.full.* |
language database |
specify a value for home_language_list |
| sc15.bin.full.* |
Sender Policy Framework (SPF) cache |
enable_spf=yes |
| scmsf.bin.full |
MSF database |
enable_msf=yes |
| scoffset.bin.full |
Fingerprint training database |
use_score_offsets=yes |
| scrh.bin.full |
Bayesian rule training database |
|
| scrw.bin.full |
Bayesian word training database |
enable_word_training=yes |
There are also some "incremental" files which are named for example:
sc7.bin.incr.date
One or more files can be updated every 15 minutes.
Launching the Plugin from the Command Shell.
The Plugin is so-called text-only application which can be launched from a command shell,
and it accepts some command line options.
When used with CommuniGate Pro as a helper applicaiton the Plugin does not
require any command line options.
Rating Message Files.
The Plugin can be used to calculate the spam scores of a number of message files in a directory.
The syntax of the program is: CGPSpamCatcher RATE [options] <directory>
The <directory> must contain message files in RFC822 format
or in CommuniGate format (RFC822 with the envelope info), there must be one message per file.
You can use messages form CommuniGate mailbxes in .mdir format.
Messages in .EML, .MSG, .mbox and other formats must be converted to the RFC822 format.
Be careful not to alter the messages. For example, "Received" headers are often accidentally
removed. Note that mail clients very often modify the messages when they save them.
- The options are:
- -v
- Optional. Flag to output more info about the scoring.
Example: ./CGPSpamCatcher RATE -v /var/CommuniGate/Queue
Training the SpamCatcher Engine.
The Plugin application can add the directory of messages to a Training Database,
which is stored in the data/scoffset.bin.full and data/scoffset.bin.incr files.
Note: By default, the Training Database is disabled and ignored. If you want the Engine
to use the Training Database, then the option use_score_offsets must be set to
yes. If enabled, the Engine will read Training Database files the next time that it is initialized.
To use the training program, you must first collect a directory of spam messages and/or a
directory of legitimate messages. Each message must be in RFC822 format or in CommuniGate
format (RFC822 with the envelope info). There must be one message per file.
You can use messages form CommuniGate mailbxes in .mdir format.
Messages in .EML, .MSG, .mbox and other formats must be converted to the RFC822
format. Then you can use the training program to analyze the directories.
The syntax of the program is: CGPSpamCatcher TRAIN [options] <directory>
- The options are:
- -forget
- Optional. Specify this if you wish to remove the scoring offset set previously.
By default, the program will add the messages to the Training Database.
- -o <offset>
- Optional. If you are adding messages, specify the scoring offset as this parameter.
The value should be between -200 and 200.
-200 will cause the message to be treated as approved, while 200 will cause it to be treated as blocked.
- -score
- Optional. Compute scores of messages and factor them into future scoring of messages from the senders.
- -v
- Optional. Flag to output status of add and delete operations.
- -spam
- Optional. Indicates message is spam. Equivalent to specifying -o 200
- -ham
- Optional. Indicates message is not spam. Equivalent to specifying -o -200
- -clear
- Optional. Remove all entries. The Training Database files will be cleared.
- Examples:
- Approve all messages in the directory named messagedir:
./CGPSpamCatcher TRAIN -ham messagedir
- Block all messages in the directory named messagedir:
./CGPSpamCatcher TRAIN -spam messagedir
- Compute scores for messages in directory dir2. If the messages were sent
by the recipients of approved messages (as set by Example 1) then these scores
will be used in the analysis of future messages from those senders.
This can help reduce false positives.
./CGPSpamCatcher TRAIN -score dir2
- Forget about messages in a directory.
./CGPSpamCatcher TRAIN -forget messagedir
- Clear the database. All data set by previous training sessions
along with scoring history will be deleted.
./CGPSpamCatcher TRAIN -clear
Note: After launching CGPSpamCatcher application from the
command shell for training the database, in order the changes take effect for
another CGPSpamCatcher process launched from CommuniGate, you need to restart it.
Either restart SpamCatcher from CommuniGate WebAdmin interface, or create "update.sig"
file with any contents in the current directory of the plugin.
Example: echo >update.sig The Plugin will delete that file.
Warning: Usage of training may result in worse accuracy than without usage of training.
This is due to efforts by spammers to poison training databases with
words which are deliberately the opposite of their intended message.
Reporting misclassified messages to the MailShell
Important: If a message has score of 0 or 100 it means the message was whitelisted
by data/approvedsenders or blacklisted by data/blockedsenders, respectively. There's no point reporting
sich messages because the wrong score is the result of your whitelisting/blacklisting rather than a fault of SpamCatcher Engine.
- The message being reported must be attached to the email (as an
message/rfc822 MIME attachment). This allows MailShell to get the
message in its original form, as it was when MailShell scanned the
message at the gateway.
The feedback messages should be mailed to one of the following addresses:
-
false.positive@communigate.mailshell.com - for false positives
missed.spam@communigate.mailshell.com - for false negatives
Feedback messages that are not submitted as an RFC822 attachment will
not be forwarded into the MailShell Service for evaluation. However,
these feedback messages will be tracked for statistical analysis purpose.
- Launch Outlook
- Open a new message window by clicking on the New button on the
Outlook toolbar or choose File > New > Message from the menu
options.
- Drag the misclassified message(s) onto the new message window to
attach them.
- Send the new message containing the attachments to one of the above listed feedback addresses.
- Open the misclassified message from list to a separate window
- Click "Forward" link (or icon, depending on the skin you use) to compose a feedback message
- Enter one of the above listed feedback addresses into "To:" input field
- Click "Send" button (or icon).
Evaluating the required license type.
The SpamCatcher License
limits the number of messages the Plugin can rate within any 60 minute
period of time. If the E-mail traffic exceeds the licensed limit, the Plugin will let
the messages go through unrated. Without the license you can rate up to 5
messages per hour.
To evaluate the required license type:
- create a Perl script with the contents below
- run it instead of the real Plugin, e.g. '/usr/bin/perl /home/user/license_count.pl'
as described above with a scanning Rule.
- set the logging level for the Content Filtering helper to All Info
- watch the CommuniGate Pro log on hourly basis.
#!/usr/bin/perl
$|=1;
my $count=0;
while(<STDIN>) {
chomp;
@line = split(" ");
if ($line[1] eq "FILE") { $count++; }
print $line[0] . " OK " . $count . " messages scanned.\n";
}
CommuniGate® Pro Guide. Copyright © 1998-2009, Stalker Software, Inc.