| Author |
Message |
creed
Veteran Dog


Joined: 08 Nov 2003 Age: 99 Posts: 6371
Location: Exiled
|
Posted:
Wed May 20, 2009 1:40 pm Post subject: Issues with regular expressions |
|
Hello all
I'm working on a project where I'm reading a flat file and using various regex patterns to determine if content is in said file, and then to process accordingly. For the most part this is working beautifully. However, one area of the file no matter what I try pattern wise will not find a match. Here is the data that is being analyzed.
S_Eriksson GK | GK J_Caola
P_Dinning DF | DF L_Tunstall
A_Mohlin DF | DF N_Bawden
B_Squance DF | DF P_Sulley
L_Titcombe DF | DF H_Jose
J_Farelo DM | DM C_van_Kuyt
A_Frandson MF | MF P_Silva
L_Alves MF | MF K_Padgit
B_Cumberland MF | MF C_Holton
Z_Densem FW | FW R_Bautista
C_Nesling FW | FW N_Arnold
|
E_Eayrs SUB | SUB M_Sinclair
M_Dumas SUB | SUB S_Thwaites
P_Verri SUB | SUB E_Pretty
F_Daud SUB | SUB J_Pople
J_Tipping SUB | SUB J_da_Silva
and here is the pattern that I am using to look for this data.
/(GK|DF|DM|MF|AM|FW|SUB)\s\|\s(GK|DF|DM|MF|AM|FW|SUB)/
where I am trying to match the two to three capital letters that flank the | and space on each side. If they match, grab the entire line. However with preg_match and preg_grep (the default data is in an array, and is converted to a string with preg_match), I'm unable to have them find a match even though using regex validators (like the one in eclipse) state that it should work.
Using PHP 5.1, Apache 2.2, on FreeBSD 6.2 if that helps at all.
Thanks to anyone that can help out here. |
_________________ The Seven faces of Creed
     
|
|
|
|
|
CMTG
Leg Humper


Joined: 23 Feb 2002 Posts: 5449
Location: /var/log/cabin
|
Posted:
Wed May 20, 2009 3:46 pm Post subject: Re: Issues with regular expressions |
|
creed wrote:and here is the pattern that I am using to look for this data.
/(GK|DF|DM|MF|AM|FW|SUB)\s\|\s(GK|DF|DM|MF|AM|FW|SUB)/
Eek! I would have generalised that to something like:
/([A-Z]{2,3})\s\|\s([A-Z]{2,3})/
Or if you want to make sure the characters are the same on either side of the pipe you could do:
/([A-Z]{2,3})\s\|\s(\1)/
But anyway, I feel your pain: I've encountered bugs in PHP's Perl compatible implementation a few times before now. Its support for regular expressions in general is somewhat less than stellar. One day they'll finish PHP and it might be a nice language. (A man can dream...)
Unfortunately, I don't have a workaround. Have you tried the Posix functions? |
_________________ Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.
Charlene's Law: There's no such thing as can't.
Charlene's Corollary: Unless it's followed by be arsed.
I write more quotes than a fucking big book of quotes. - Scroobius Pip
http://fedoraproject.org/get-fedora
|
|
|
|
|
creed
Veteran Dog


Joined: 08 Nov 2003 Age: 99 Posts: 6371
Location: Exiled
|
Posted:
Wed May 20, 2009 4:04 pm Post subject: Re: Issues with regular expressions |
|
CMTG wrote:creed wrote:and here is the pattern that I am using to look for this data.
/(GK|DF|DM|MF|AM|FW|SUB)\s\|\s(GK|DF|DM|MF|AM|FW|SUB)/
Eek! I would have generalised that to something like:
/([A-Z]{2,3})\s\|\s([A-Z]{2,3})/
Or if you want to make sure the characters are the same on either side of the pipe you could do:
/([A-Z]{2,3})\s\|\s(\1)/
But anyway, I feel your pain: I've encountered bugs in PHP's Perl compatible implementation a few times before now. Its support for regular expressions in general is somewhat less than stellar. One day they'll finish PHP and it might be a nice language. (A man can dream...)
Unfortunately, I don't have a workaround. Have you tried the Posix functions?
I might have to do that if worse comes to worse. The reason why tis' the way ti is is that I want it to match only the items listed. Aka GK | GK would match, but AB | AB wouldjn't
And ya I hear ya. After doing coding in PHP professionaliy again over the last month, I'm realyl thinking that maybe Java is the way to go. |
_________________ The Seven faces of Creed
     
|
|
|
|
|
creed
Veteran Dog


Joined: 08 Nov 2003 Age: 99 Posts: 6371
Location: Exiled
|
Posted:
Thu May 28, 2009 1:59 pm Post subject: |
|
with a bit of help from co-workers I found a pattern that worked just nicely for my needs. For those interested, it was ((? A-Z][A-Z]+))(\\s+)(\\|)(\\s+)((? A-Z][A-Z]+)). This site here (http://txt2re.com/) is quite handy. |
_________________ The Seven faces of Creed
     
|
|
|
|
|
|
|