6.2.1 ORF Online Help
Select your ORF version:

Table of Contents

Regular Expressions in ORF


Perl-compatible regular expressions are supported by many ORF features, such as the sender and recipient email address whitelists and blacklists, the keyword and attachment filtering, the URL domain blacklisting and the Log Viewer Find feature.

This help section provides a brief introduction to regular expressions and their implementation in ORF. Due to the complexity of the topic, this help file cannot volunteer to teach you writing regular expressions, but can point you to the right direction.

Perl-compatible Regular Expressions

Introduction

Regular expressions may sound familiar for Unix/Linux/BSD administrators and software developers as they are widely used in both worlds. It is a powerful string toolkit, which allows defining complex text masks like "any word beginning with letter "t" or the sequence "zorro", followed by a sequence of at least 5, but maximum 8 digits" (this expression is .*\b(t|zorro)\d{5,8}\b.* by the way).

ORF uses case-insensitive regular expression matching, except where case sensitivity can be configured.

Engine

ORF uses the PCRE engine written Philip Hazel ([email protected]). This engine provides high compatibility with Perl 5's regular expression engine and used by projects like Python, Apache, PHP or Postfix.

The PCRE man pages are available at http://www.pcre.org/pcre.txt

Regex Basics

Commonly used wildcards

Find the most common regex wildcards below:

Wildcard Matches Negative Matches
. Any character
^ Beginning of a string
$ End of string
\w Any alphanumeric character \W Any non-alphanumeric character
\s Any whitespace character \S Any character which is not a whitespace
\d Any digit \D Any character which is not a whitespace
\b The beginning or end of a word \B A position that is NOT the beginning or end of a word

Using the latter for wildcards alone will match a single occurrence. For example \s matches a single whitespace character. The same applies to their negative version: \D matches a single character which is not a digit.

Escaping characters

As you can see above, the dot character (.) is a wildcard. But what if you want to match the dot character itself? In this case, it has to be "escaped" which can be achieved by using backslash: \. will match the dot character while . alone matches any character.

Repetitions

If you want to match more than a single occurrence of any character or wildcard, you can do so by adding any of the following:

Repetition wildcard Meaning
* Any number of repetitions
+ One or more repetitions
? Zero or one time
{n} n times
{n,m} Repeat at least n, but no more than m times
{n,} Repeat at least n times

For example the expression

johnny\d{2}@domain\.com

Will match both [email protected] and [email protected] but not [email protected]

Alternatives

By using the pipe character, you can define an OR relation in your expression. For example the expression

(john|mary)@domain\.com

Will match both [email protected] and [email protected] but not [email protected]

The above should be sufficient for constructing basic regular expressions. For more information about advanced regex techniques (such as positive and negative lookarounds, grouping, matching character classes, etc), see the links below.

Resources

A few online resources

Book

  • "Mastering Regular Expressions", by Jeffrey E.F. Friedl, published by O'Reilly, ISBN 1-56592-237-3

Tools

Copyright © Vamsoft Ltd. 2024. All rights reserved. Document ID regexs, version 1.