regex matches when it shouldnt? - ORF Forums

regex matches when it shouldnt? RSS Back to forum

1

Hello,

I just discovered that our sender-blacklist regex:
.*@*ebay*

seems to match ANYTHING that has "eba" anywhere before or after the @ sign... disregarding the Y completely.

The goal was to block "anything followed by an @ sign followed by anything which includes the letters ebay anywhere after the @ sign"

Is the Y sacred or something? why does the regex disregard it?

http://img257.imageshack.us/img257/791/ebayhuh.th.jpg[/IMG][/URL]

by Bryon Humphrey 7 years ago
2

oh wrong screenshot link, here's the full size one:
http://imageshack.us/photo/my-images/257/ebayhuh.jpg/

by Bryon Humphrey 7 years ago
3

@Bryon Humphrey: actually, the current expression means "any character (any number of repetitions), followed by @ (any number of repetitions), followed by eba followed by y (any number of repetitions)". You should try

.*@.*ebay.*

instead.

by Krisztian Fekete (Vamsoft) 7 years ago
(in reply to this post)

4

I see, i will give that a whirl

Curious tho, why does it match when i dont include a Y in the sample and the eba is left of the @?

by bryon 7 years ago
5

Because * in regular expressions means "any number of repetitions" including zero repetitions.

* means any repetitions
+ means one or more
? means zero or one

So even if the @ character is missing, the original expression will match any text including "eba"

by Krisztian Fekete (Vamsoft) 7 years ago
6

so if i understand correctly - and i dont meant to turn this into a regex class, i just like to learn everything i can:

.*@*ebay* would mean anything including nothing, with an @ sign after it, followed by anything including nothing, followed by the characters ebay, followed by anything including nothing

doesn't that force "ebay" to be to the right of the @ sign? how does it take the Y out of the equasion? or does the final * modify the Y (and in that case the middle * modifies the @ ?)

by Bryon 7 years ago
7

The expression can be broken into 4 parts:

.* means any character, any number of repetitions (zero or more) followed by
@* meaning @ character, any number of repetitions (zero or more) followed by
eba, followed by
y* meaning the y character, any number of repetitions (zero or more)

The last part takes out Y out of the equation because the * wildcard (any repetitions, zero or more) is always applied to the preceding character. The @ is taken out as well the same way, so "eba" will match regardless of its position of the @ character, moreover, it will also match if @ is absent.

by Krisztian Fekete (Vamsoft) 7 years ago
8

oh i see, so the period turns on a general wildcard for the asterisk... but without the period, the asterisk modifies the previous character

thanks for taking the time to explain that to me

by Bryon Humphrey 7 years ago
9

@Bryon Humphrey: you are welcome :) Yes, basically in regular expressions the dot character is the wildcard for any character and the number of repetitions is controlled by the trailing * character. If you want to match the dot character itself in a regex, you should "escape" it using backslash like:

.*@vamsoft\.com

by Krisztian Fekete (Vamsoft) 7 years ago
(in reply to this post)

10

your regexp should be .*@.*ebay.* if you want to filter any string contains @ebay or @(any number of letter or number)ebay

by sungpill Han 7 years ago

New comment

Fill in the form below to add a new comment. All fields are required. If you are a registered user on our site, please sign in first.

It will not be published.
hnp1 | hnp2