regex matches when it shouldnt? - ORF Forums

regex matches when it shouldnt? RSS Back to forum

1

Hello,

I just discovered that our sender-blacklist regex:
.*@*ebay*

seems to match ANYTHING that has "eba" anywhere before or after the @ sign... disregarding the Y completely.

The goal was to block "anything followed by an @ sign followed by anything which includes the letters ebay anywhere after the @ sign"

Is the Y sacred or something? why does the regex disregard it?

http://img257.imageshack.us/img257/791/ebayhuh.th.jpg[/IMG][/URL]

by Bryon Humphrey more than 10 years ago
2

oh wrong screenshot link, here's the full size one:
http://imageshack.us/photo/my-images/257/ebayhuh.jpg/

by Bryon Humphrey more than 10 years ago
3

@Bryon Humphrey: actually, the current expression means "any character (any number of repetitions), followed by @ (any number of repetitions), followed by eba followed by y (any number of repetitions)". You should try

.*@.*ebay.*

instead.

by Krisztian Fekete (Vamsoft) more than 10 years ago
(in reply to this post)

4

I see, i will give that a whirl

Curious tho, why does it match when i dont include a Y in the sample and the eba is left of the @?

by bryon more than 10 years ago
5

Because * in regular expressions means "any number of repetitions" including zero repetitions.

* means any repetitions
+ means one or more
? means zero or one

So even if the @ character is missing, the original expression will match any text including "eba"

by Krisztian Fekete (Vamsoft) more than 10 years ago
6

so if i understand correctly - and i dont meant to turn this into a regex class, i just like to learn everything i can:

.*@*ebay* would mean anything including nothing, with an @ sign after it, followed by anything including nothing, followed by the characters ebay, followed by anything including nothing

doesn't that force "ebay" to be to the right of the @ sign? how does it take the Y out of the equasion? or does the final * modify the Y (and in that case the middle * modifies the @ ?)

by Bryon more than 10 years ago
7

The expression can be broken into 4 parts:

.* means any character, any number of repetitions (zero or more) followed by
@* meaning @ character, any number of repetitions (zero or more) followed by
eba, followed by
y* meaning the y character, any number of repetitions (zero or more)

The last part takes out Y out of the equation because the * wildcard (any repetitions, zero or more) is always applied to the preceding character. The @ is taken out as well the same way, so "eba" will match regardless of its position of the @ character, moreover, it will also match if @ is absent.

by Krisztian Fekete (Vamsoft) more than 10 years ago
8

oh i see, so the period turns on a general wildcard for the asterisk... but without the period, the asterisk modifies the previous character

thanks for taking the time to explain that to me

by Bryon Humphrey more than 10 years ago
9

@Bryon Humphrey: you are welcome :) Yes, basically in regular expressions the dot character is the wildcard for any character and the number of repetitions is controlled by the trailing * character. If you want to match the dot character itself in a regex, you should "escape" it using backslash like:

.*@vamsoft\.com

by Krisztian Fekete (Vamsoft) more than 10 years ago
(in reply to this post)

10

your regexp should be .*@.*ebay.* if you want to filter any string contains @ebay or @(any number of letter or number)ebay

by sungpill Han more than 10 years ago

New comment

Fill in the form below to add a new comment. All fields are required. If you are a registered user on our site, please sign in first.

It will not be published.
hnp1 | hnp2