regex matches when it shouldnt? RSS

1

Hello,

I just discovered that our sender-blacklist regex:
.*@*ebay*

seems to match ANYTHING that has "eba" anywhere before or after the @ sign... disregarding the Y completely.

The goal was to block "anything followed by an @ sign followed by anything which includes the letters ebay anywhere after the @ sign"

Is the Y sacred or something? why does the regex disregard it?

http://img257.imageshack.us/img257/791/ebayhuh.th.jpg[/IMG][/URL]

by Bryon Humphrey 7 years ago
2

oh wrong screenshot link, here's the full size one:
http://imageshack.us/photo/my-images/257/ebayhuh.jpg/

by Bryon Humphrey 7 years ago
3

@Bryon Humphrey: actually, the current expression means "any character (any number of repetitions), followed by @ (any number of repetitions), followed by eba followed by y (any number of repetitions)". You should try

.*@.*ebay.*

instead.

by Krisztian Fekete (Vamsoft) 7 years ago
(in reply to this post)

4

I see, i will give that a whirl

Curious tho, why does it match when i dont include a Y in the sample and the eba is left of the @?

by bryon 7 years ago
5

Because * in regular expressions means "any number of repetitions" including zero repetitions.

* means any repetitions
+ means one or more
? means zero or one

So even if the @ character is missing, the original expression will match any text including "eba"

by Krisztian Fekete (Vamsoft) 7 years ago
6

so if i understand correctly - and i dont meant to turn this into a regex class, i just like to learn everything i can:

.*@*ebay* would mean anything including nothing, with an @ sign after it, followed by anything including nothing, followed by the characters ebay, followed by anything including nothing

doesn't that force "ebay" to be to the right of the @ sign? how does it take the Y out of the equasion? or does the final * modify the Y (and in that case the middle * modifies the @ ?)

by Bryon 7 years ago
7

The expression can be broken into 4 parts:

.* means any character, any number of repetitions (zero or more) followed by
@* meaning @ character, any number of repetitions (zero or more) followed by
eba, followed by
y* meaning the y character, any number of repetitions (zero or more)

The last part takes out Y out of the equation because the * wildcard (any repetitions, zero or more) is always applied to the preceding character. The @ is taken out as well the same way, so "eba" will match regardless of its position of the @ character, moreover, it will also match if @ is absent.

by Krisztian Fekete (Vamsoft) 7 years ago
8

oh i see, so the period turns on a general wildcard for the asterisk... but without the period, the asterisk modifies the previous character

thanks for taking the time to explain that to me

by Bryon Humphrey 7 years ago
9

@Bryon Humphrey: you are welcome :) Yes, basically in regular expressions the dot character is the wildcard for any character and the number of repetitions is controlled by the trailing * character. If you want to match the dot character itself in a regex, you should "escape" it using backslash like:

.*@vamsoft\.com

by Krisztian Fekete (Vamsoft) 7 years ago
(in reply to this post)

10

your regexp should be .*@.*ebay.* if you want to filter any string contains @ebay or @(any number of letter or number)ebay

by sungpill Han 7 years ago

New comment

Fill in the form below to add a new comment. All fields are required. If you are a registered user on our site, please sign in first.

Nickname:
Email address (will not be published):
Your comment:

ORF Technical Support

Configuring, installing and troubleshooting ORF.

News & Announcements

Your dose of ORF-related news and announcements.

Everything but ORF

Discuss Exchange and system administration with fellow admins.

Feature Test Program

Feature Test Program discussion. Membership is required to visit this forum.

ORF Beta

Join the great bug hunt of the latest test release.

Customer Service

Stay Informed