Cleaning up Keyword Blacklist filters RSS Back to forum
@scooter133:
Hello scooter133,
Unfortunately there is no option in ORF which would generate such report. However, the reports generated by the ORF Reporting Tool do include the list of keywords which had the most hits (ORF Reporting Tool: Navigation & Information pane > Tests > Keyword Blacklist). This might prove useful to you.
Having said that, we always recommend relying on the automated test of ORF as much as possible. If you need hundreds of manual entries to keep spam at bay, usually that is a good indication of a sub-optimal configuration. In case you would receive excessive amounts of spam without the manual filters, please let us know and we will be happy to review your logs and configuration and provide suggestions.
Wow 524 entries would be a luxury for us - we've been using ORF since the beginning and several people here and there, did a lot of "send to > [ip/sender] blacklist" over time
our ip blacklist: 8,530 entries (most are /32)
keyword blacklist: 1,173 (half are regex)
sender blacklist: 8,003 (most are wildcards)
still emails flow thru instantly, no performance hit except when saving new entries :)
@scooter133:
You know i was thinking about your question, what if you craft a log search for a timeframe, and triggered by keyword blacklist... then export that log result to excel, sort by the comment column. each line SHOULD say "comment: something" where "something" uniquely matches your keyword blacklist
then in excel just have it count occurrences of eacn comment, and put it into a table to the side
if your keyword blacklists dont have comments, you can export them to CSV, open them in excel, add comments (even if just incrementing 1...2...3....4...etc), save back to CSV, then import-overwrite back into the keyword blacklist. this would only work for tallying from today forward, since comments didnt exist prior
@Bryon: ... you could even have it pivot a table that references the date columninto a range for each keyword comment... and optionally do "text to columns" to split out the actual comment from the rest of that line...
@Bryon: We log to a Syslog Server and the Syslog Server then logs to a SQL Server DB. I have Crafted Queries that out occurrences of Comment stings, though its pretty incomplete as the only ones there are ones with hits. I could then assume that if its not there its not being used. I was hoping for a HitCount like in the Statistics Page to show us which ones of the entries are more popular vs not used.
@scooter133:
Since ORF does not do it for us, the only way I know of is to dump ORF logs into a database table and run your own queries to see which rules are firing on a regular basis and then using that result to see by consequence which other rules are no longer useful. I have not gotten around to this yet since our rules quantities are not yet out of control.
I respectfully disagree with Vamsoft's position on this. I think there are times when we need to add our own rules to handle new unique blasts of spam that are not caught initially even in a well configured ORF system, to stop them in the early days (hours?) before (SU)RBL's or other services might tag them for us. Over time these one-off rules accumulate. So your idea of a cleanup is a good one but there is no mechanism built-in for this.
@Bryon:
Just a tip:
Rather than adding /32 addresses, next time you may want to look up the IP address on a service like tcpiputils.com, see what the network range is, go back to your ORF log, use a setting like 10 days of logs and check that entire range to look message history. This will clearly show you when you have a snowshoe spammer, for example.
Then, its up to you but it may be easier to start entering ip ranges of at least /24 when you are confident that there is no good mail coming from that range.
I use a separate database fed by ORF logs so I have a more precise history of the activity but that takes work to set up.
If you have any clever scripters you could perhaps consolidate your IP blacklist down by combining the individual IP addresses into ranges but that also would take a bit of work.
Looking at the Reports I did a report from January 1, 2015 to Now. 1,485,805 emails.
There is a Section "Report Tests Keyword Blacklist "
You can List the top 100 Blacklist Keyword Filters.
The last several were Hits of only 9 out of 503,969 checks and 29,016 Hits.
They start to Drop off at #70 to less than 30
So I can take those top 100 and Make sure I keep them and or aggregate them if possible into better RegExes.
I have 524 Keyword Blacklist filters. Is there a way to get a report on them to see which ones have NOT been used in X number of Days? My Original list was imported from many people's configs though time and I think there is a lot of bloat.
Is a Regex more Efficient vs Simple Text? I have many Simple Text that could possibly be combined with others converted to a RegEx.
Thanks,