Emoji REGEX Keyword Blacklist - ORF Forums

Emoji REGEX Keyword Blacklist RSS Back to forum

1

I am trying to create an emoji REGEX keyword blacklist script. Is ORF able to cope with this? If yes, has any got a script that works?

by SteveC 3 years ago
2

@SteveC: Hello SteveC,

There is no script because you can use regex to match emojis by their Unicode code point. For example, to block emoticons you create a character class to match the entire "Emoticons" Unicode block (U+1F600 - U+1F64F) in the "Supplementary Multilingual Plane" (see: https://www.compart.com/en/unicode/plane/U+10000). The regex pattern would look like this:

.*[\x{1F600}-\x{1F64F}]

If you want to block the "Miscellaneous Symbols and Pictographs" block (U+1F300 - U+1F5FF) as well, you just add its range to the character class:

.*[\x{1F600}-\x{1F64F}\x{1F300}-\x{1F5FF}]

And so on.

If you want to block emoji-like characters, consider blocking characters listed in the Unicode blocks below:

Miscellaneous Symbols and Pictographs:
https://www.compart.com/en/unicode/block/U+1F300

Emoticons:
https://www.compart.com/en/unicode/block/U+1F600

Supplemental Symbols and Pictographs:
https://www.compart.com/en/unicode/block/U+1F900

Miscellaneous Symbols:
https://www.compart.com/en/unicode/block/U+2600

Transport and Map Symbols:
https://www.compart.com/en/unicode/block/U+1F680

If you have any questions, just let me know.

by Daniel Novak (Vamsoft) 3 years ago
(in reply to this post)

3

@Daniel Novak (Vamsoft): That reply is ideal thank you. You have confirmed exactly what I was after.

by SteveC 3 years ago
(in reply to this post)

4

@SteveC: I am glad I was able to help :)

by Daniel Novak (Vamsoft) 3 years ago
(in reply to this post)

5

Hi Daniel,

would be good to see the actual symbol within the test text field. Maybe something that could be added in a future version

by NorbertFe 3 years ago
6

Further to my previous issue, unfortunately, that has not worked and it turns out the emoji is not an official release and hense the issue faced. Below is example of the email script in question. The email are originally sent from Hotmail accounts. My only option is submitting copies of abuse to Microsoft, who are taking action.

Here is sample code from Emails:

<div style="text-align: center; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span id="🎁">🎁</span><span id="💳">💳</span><span id="💰">💰</span><br>
</span></div>
<div style="text-align: center; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span><span id="💲">💲</span>6994.13<span id="💲">💲</span><br>
</span></span></div>

by SteveC 3 years ago
7

Other examples include £ instead of $. There are many with emoji ticks, diamonds and such like.

Text seems to be AI generated and vary in languages used, e.g. English, Russin etc

All emails have PDF attachments with random generated word names e.g. the.pdf, just.pdf, what.pdf etc

by SteveC 3 years ago
8

Other examples include £ instead of $. There are many with emoji ticks, diamonds and such like.

Text seems to be AI generated and vary in languages used, e.g. English, Russin etc

All emails have PDF attachments with random generated word names e.g. the.pdf, just.pdf, what.pdf etc

by SteveC 3 years ago
9

@SteveC: Hi Steve,

Daniel's example does not include any unicode icon. You can extend his regex for yourself. Take a look here:
https://unicodelookup.com/
and compare those symbols with Daniels links.
Take a look here:
.*[\x{1F600}-\x{1F64F}\x{1F300}-\x{1F5FF}\x{1F680}-\x{1F6FA}]

by NorbertFe 3 years ago
(in reply to this post)

10

@SteveC: Well that is absolutly normal, that you can't stop all spam just with keywords (at least not without a lot of false positives ;))
Without more infos one won't be able to help. I suggest, that you contact the customer support at Vamsoft to send them some examples and help you out.

HTH

by NorbertFe 3 years ago
(in reply to this post)

11

@NorbertFe: Hi Norbet,

With regards to you request - It's already in the works :) Actually, it was supposed to be an R0x update for the 6.4 Beta, but we have decided to postpone it and add Unicode character support not only to the test fields, but all fields where possible. You can expect this improvement in the first minor update after v6.4 is released.

Thank you for your patience.

by Daniel Novak (Vamsoft) 3 years ago
(in reply to this post)

12

Daniel, excellent news.

@NorbertFe thanks for the additional code. Yes, it is impossible to do everything with just keywords. I have created a range of very effective REGEX scripts which are intelligent and not seen any false positives. We use it as part of a multilayered filtration process.

by SteveC 3 years ago
13

@SteveC: Hello SteveC,

Could you send some spam samples to ? We will help you to come up with a regex pattern for the Keyword Blacklist.

In addition, please send us your ORF configuration file for a review. There might be a setting or test that you have not enabled yet that could solve the whole spam problem. The ORF configuration file is called "orfent.ini" and can be found in the "C:\ProgramData\ORF Fusion" folder.

by Daniel Novak (Vamsoft) 3 years ago
(in reply to this post)

New comment

Fill in the form below to add a new comment. All fields are required. If you are a registered user on our site, please sign in first.

It will not be published.
hnp1 | hnp2