Emoji REGEX Keyword Blacklist RSS Back to forum
@SteveC:
Hello SteveC,
There is no script because you can use regex to match emojis by their Unicode code point. For example, to block emoticons you create a character class to match the entire "Emoticons" Unicode block (U+1F600 - U+1F64F) in the "Supplementary Multilingual Plane" (see: https://www.compart.com/en/unicode/plane/U+10000). The regex pattern would look like this:
.*[\x{1F600}-\x{1F64F}]
If you want to block the "Miscellaneous Symbols and Pictographs" block (U+1F300 - U+1F5FF) as well, you just add its range to the character class:
.*[\x{1F600}-\x{1F64F}\x{1F300}-\x{1F5FF}]
And so on.
If you want to block emoji-like characters, consider blocking characters listed in the Unicode blocks below:
Miscellaneous Symbols and Pictographs:
https://www.compart.com/en/unicode/block/U+1F300
Emoticons:
https://www.compart.com/en/unicode/block/U+1F600
Supplemental Symbols and Pictographs:
https://www.compart.com/en/unicode/block/U+1F900
Miscellaneous Symbols:
https://www.compart.com/en/unicode/block/U+2600
Transport and Map Symbols:
https://www.compart.com/en/unicode/block/U+1F680
If you have any questions, just let me know.
@Daniel Novak (Vamsoft): That reply is ideal thank you. You have confirmed exactly what I was after.
Hi Daniel,
would be good to see the actual symbol within the test text field. Maybe something that could be added in a future version
Further to my previous issue, unfortunately, that has not worked and it turns out the emoji is not an official release and hense the issue faced. Below is example of the email script in question. The email are originally sent from Hotmail accounts. My only option is submitting copies of abuse to Microsoft, who are taking action.
Here is sample code from Emails:
<div style="text-align: center; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span id="🎁">🎁</span><span id="💳">💳</span><span id="💰">💰</span><br>
</span></div>
<div style="text-align: center; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span><span id="💲">💲</span>6994.13<span id="💲">💲</span><br>
</span></span></div>
Other examples include £ instead of $. There are many with emoji ticks, diamonds and such like.
Text seems to be AI generated and vary in languages used, e.g. English, Russin etc
All emails have PDF attachments with random generated word names e.g. the.pdf, just.pdf, what.pdf etc
Other examples include £ instead of $. There are many with emoji ticks, diamonds and such like.
Text seems to be AI generated and vary in languages used, e.g. English, Russin etc
All emails have PDF attachments with random generated word names e.g. the.pdf, just.pdf, what.pdf etc
@SteveC:
Hi Steve,
Daniel's example does not include any unicode icon. You can extend his regex for yourself. Take a look here:
https://unicodelookup.com/
and compare those symbols with Daniels links.
Take a look here:
.*[\x{1F600}-\x{1F64F}\x{1F300}-\x{1F5FF}\x{1F680}-\x{1F6FA}]
@SteveC:
Well that is absolutly normal, that you can't stop all spam just with keywords (at least not without a lot of false positives ;))
Without more infos one won't be able to help. I suggest, that you contact the customer support at Vamsoft to send them some examples and help you out.
HTH
@NorbertFe:
Hi Norbet,
With regards to you request - It's already in the works :) Actually, it was supposed to be an R0x update for the 6.4 Beta, but we have decided to postpone it and add Unicode character support not only to the test fields, but all fields where possible. You can expect this improvement in the first minor update after v6.4 is released.
Thank you for your patience.
Daniel, excellent news.
@NorbertFe thanks for the additional code. Yes, it is impossible to do everything with just keywords. I have created a range of very effective REGEX scripts which are intelligent and not seen any false positives. We use it as part of a multilayered filtration process.
@SteveC:
Hello SteveC,
Could you send some spam samples to ? We will help you to come up with a regex pattern for the Keyword Blacklist.
In addition, please send us your ORF configuration file for a review. There might be a setting or test that you have not enabled yet that could solve the whole spam problem. The ORF configuration file is called "orfent.ini" and can be found in the "C:\ProgramData\ORF Fusion" folder.
I am trying to create an emoji REGEX keyword blacklist script. Is ORF able to cope with this? If yes, has any got a script that works?