Simplified example:
#!/usr/bin/perl
use warnings;
use strict;
local ($,, $\) = ("\t", "\n");
my $x;
$_ = "abc 123 def 123 ghi";
$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/ # by
4 # FOUR
5 # FIVE
6 # SIX
/gsx; # global, single line, extended format
print 'Made', $x, 'replacements.';
print;
This printed:
Made 2 replacements.
abc # by
4 # FOUR
5 # FIVE
6 # SIX
def # by
4 # FOUR
5 # FIVE
6 # SIX
ghi
I expected: abc 456 def 456 ghi
--
Affijn, Ruud & perl, v5.8.6 built for i386-freebsd-64int
"Gewoon is een tijger."
The changes by /x only affect the regex proper. The replacement part
is still an ordinary double-quotish string.
Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
Your expectations were incorrect.
The /x modifier causes whitespace in the *pattern match* to be ignored.
The replacement portion of a s/// operation is not a pattern match -
it is a double-quoted string. /x has no effect on this replacement.
Paul Lalli
Try:
$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/456/gx; # global, extended format
Note that the /s modifier is redundant (see "perldoc perlre").
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
> [s/ 123 #one-two-three / 456 #four-five-six /x]
>
> Try:
>
> $x = s/ # Replace
> 1 # ONE
> 2 # TWO
> 3 # THREE
> /456/gx; # global, extended format
Of course what is in my real and working code is a lot more like that.
But I like the commented format much better and was real disappointed
that it didn't work.
> Note that the /s modifier is redundant (see "perldoc perlre").
I don't consider the /s modifier redundant. It was not needed in my
example, so maybe you meant "redundant here"?
--
Affijn, Ruud
"Gewoon is een tijger."
Redundant would be if you had something in your pattern match like:
/stuff(?:.|\n)stuff/s
Here, I think /s is simply extraneous.
Paul Lalli
Okay, redundant (or extraneous...) here. I mentioned it because people
misunderstand the meaning of it all the time, and I believe one reason
for that is that "perldoc perlre" - unlike e.g. "perldoc perlop" - is
the only place in the docs (to my knowledge) where its meaning is
properly explained.
>>> Note that the /s modifier is redundant (see "perldoc perlre").
>>
>> I don't consider the /s modifier redundant. It was not needed in my
>> example, so maybe you meant "redundant here"?
>
> Okay, redundant (or extraneous...) here. I mentioned it because people
> misunderstand the meaning of it all the time, and I believe one reason
> for that is that "perldoc perlre" - unlike e.g. "perldoc perlop" - is
> the only place in the docs (to my knowledge) where its meaning is
> properly explained.
OK. It would be nice to have an educational piece of code about /m and
/s.
Let me make a start:
# a.1: without /s, the .* will match up to the first \n
$ echo 'first
second
third' | perl -pe 's/.*/#/'
#
#
#
# a.2: with /s, the .* will match until the very end
$ echo 'first
second
third' | perl -pe 's/.*/#/s'
###
# b.1: without /s or /m, the .$ will match nothing if there are
# two newlines at the end
$ echo 'first
second
third' | perl -pe '$_.="\n"; s/.$/#/'
first
second
third
# b.2: with /s, the .$ will match anything before the last \n
$ echo 'first
second
third' | perl -pe '$_.="\n"; s/.$/#/s'
first#
second#
third#
# b.3: with /m, the .$ will match anything before the first \n
$ echo 'first
second
third' | perl -pe '$_.="\n"; s/.$/#/m'
firs#
secon#
thir#
Damian makes a good argument in PBP to always use /s and /m.
I don't think it's worth raising your finger if someone uses /s or /m
on a regex where it doesn't matter. It's like complaining someone uses
'use warnings' on a piece of code where it didn't matter.
Abigail
--
perl -we 'print split /(?=(.*))/s => "Just another Perl Hacker\n";'
> The changes by /x only affect the regex proper. The replacement part
> is still an ordinary double-quotish string.
OK. I am still trying to think up why it was chosen to not affect the
replacement part. I have no doubt that there is a simple explanation why
it is not feasible, but I just can't think it up (tired of working some
very long days, but very satisfied with the results and very happy with
Perl).
What's PBP?
> I don't think it's worth raising your finger if someone uses /s or /m
> on a regex where it doesn't matter. It's like complaining someone uses
> 'use warnings' on a piece of code where it didn't matter.
A better parallel IMO is that it's like complaining when someone calls a
function using '&' without knowing the implications of doing so. It
'works' most of the time, but not always...
(Not saying that Dr. Ruud doesn't know the implications of using the /s
modifier. It's now obvious that he does.)
Yup, I agree on that one. If I see &sub, I assume that the user requires
the & there. Same with /s or /m. It confuses me if it's just there and adds
line noise.
--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
I ploink googlegroups.com :-)
Because spaces are _supposed_ to matter when they are in a string.
--
Tad McClellan SGML consulting
ta...@augustmail.com Perl programming
Fort Worth, Texas
Tad, what I think he might be getting at is if there soem a possibility
to have a modifier on a literal strings to allow cmments. I cna see how
doign that might not make a lot of sense in many ways (its a string for
cryin' out loud!), but I just thought I'd point out it seems his
thinking is hinting in that general direction perhaps.
--
Stan
> Damian makes a good argument in PBP to always use /s and /m.
I'd better go read it.
> I don't think it's worth raising your finger if someone uses /s or /m
> on a regex where it doesn't matter.
To me, modifiers mean "something out of the ordinary here, pay attention!".
I feel tricked when I try to figure out why the programmer wanted dot
to match newline, only to find that there isn't even a dot in the pattern.
> It's like complaining someone uses
> 'use warnings' on a piece of code where it didn't matter.
'use warnings' always matters.[1] (heh)
[1] Message-ID: <slrn99mn0h.n9t....@gdndev32.lido-tech>
> What's PBP?
Peanut Butter Perl? :-)
Or "Perl Best Practices":
http://www.oreilly.com/catalog/perlbp/
It's a good read. One of the best Perl books published by O'Reilly.
)) > I don't think it's worth raising your finger if someone uses /s or /m
)) > on a regex where it doesn't matter.
))
))
)) To me, modifiers mean "something out of the ordinary here, pay attention!".
))
)) I feel tricked when I try to figure out why the programmer wanted dot
)) to match newline, only to find that there isn't even a dot in the pattern.
That could be, but that's _your_ problem. That's not a reason at all why
said programmer shouldn't use /s or /m. I don't expect you to program in
a style that suits me, so I don't expect you to demand that from someone
else. Your inability to understand code is something you have to solve
yourself. (Practise! ;-))
Damian's argument is that most programmers expect "." to match any
character. And for "^" and "$" to match the beginning and end of a
line. He says that if you always use /sm, you _never_ have to wonder
whether "." matches a newline or not.
Abigail
--
BEGIN {print "Just " }
INIT {print "Perl " }
END {print "Hacker\n"}
CHECK {print "another "}
The recommendation is to use /xms on all regular expressions, whether
the modifiers make a difference or not. It is not an invitation to add
combinations of /x, /m and /s at random.
> I don't think it's worth raising your finger if someone uses /s or /m
> on a regex where it doesn't matter. It's like complaining someone uses
> 'use warnings' on a piece of code where it didn't matter.
...or like using "sort keys ..." where "keys ..." would have done?
It really depends on what the rest of the code is like -- context. If
the general quality of the code is good, an redundant /m is, of course,
no big deal. In code that is clearly written by a beginner, it is a
sign of insecurity and/or cargo culting and ought to be pointed out.
As a reader of a piece of code, it is important to develop a feeling
for the authors competence -- how far can you trust the code. Redundant
constructs are an important indicator *against* the authors competence.
That's why it is generally a good idea to avoid them.
> Damian's argument is that most programmers expect "." to match any
> character. And for "^" and "$" to match the beginning and end of a
> line. He says that if you always use /sm, you _never_ have to wonder
> whether "." matches a newline or not.
/sm would be a nice default. But then you need a way to disable it: /SM.
>>> The changes by /x only affect the regex proper. The replacement
>>> part is still an ordinary double-quotish string.
>>
>> OK. I am still trying to think up why it was chosen to not affect the
>> replacement part.
>
> Because spaces are _supposed_ to matter when they are in a string.
There can also be spaces in the regex, and there are several ways to
present them.
I use \s where possible, and also "\x{0020}", "\x{20}", even "[ ]", "\
", depending on the context.
So I still see no reason why unprotected spaces should not be ignored in
the replacement part.
"\x{20}" and "\ " would work fine there too.
Or use a variable with a run of spaces, like $space42 = ' 'x42, and an
o-modifier.
--
Affijn, Ruud (gimme a \X)
"Gewoon is een tijger."
As noted in another thread, PBP recommends /xsm for all regexes. That
is also the standard in Perl 6.
> Redundant constructs are an important indicator *against* the authors
> competence. That's why it is generally a good idea to avoid them.
I generally agree.
The Posting Guidelines say: "Do not provide too much information", so I
did cut down my code to an example of a few lines. But I forgot to toss
the s-modifier.
And now that I have inserted the m- and x-modifiers in all the
appropriate places (I had read that chapter of PBP before but had forgot
about it), I won't even have to do that anymore. :)
>>> Damian's argument is that most programmers expect "." to match any
>>> character. And for "^" and "$" to match the beginning and end of a
>>> line. He says that if you always use /sm, you _never_ have to wonder
>>> whether "." matches a newline or not.
>>
>> /sm would be a nice default. But then you need a way to disable it:
>> /SM.
>
> As noted in another thread, PBP recommends /xsm for all regexes. That
> is also the standard in Perl 6.
OK, great. I tend to write it as /msx, so in alphabetical order.
What is the way to make '.' not match "\n" in Perl6?
I guess Perl-5-mode, or convert to something like "[^\n]".
I'm still hesitant about adopting this, but if I do, I'll use /xms,
following the pattern in PBP. I'd like to make it as clear as possible
just what convention I'm following.
> What is the way to make '.' not match "\n" in Perl6?
>
> I guess Perl-5-mode, or convert to something like "[^\n]".
I guess it's /[^\n]". The simplification is that the dot *always* matches
all characters, no exceptions. That won't be broken. Switching to Perl 5
for the purpose would be obscure once people have forgotten the quirks
/./ used to have.
You *can't* make . not match \n in Perl 6.
But there is a new metacharacter than matches "everything except \n"
Here's the table that illustrates the underlying pattern:
Match... Match anything but...
Whitespace \s \S
Word char \w \W
Digit \d \D
Newline \n \N
Damian
> "Gewoon is een tijger."
"Habit is a tiger"?
Yeah, a sleepy one, but when you want him to move he's got teeth.
>> "Gewoon is een tijger."
>
> "Habit is a tiger"?
>
> Yeah, a sleepy one, but when you want him to move he's got teeth.
A bit like that yes. It covers adjectives like 'normal', 'usual',
'habitual', 'customary', 'ordinary', 'general', 'common', 'simple',
'just', and most related adverbs too.
It is something my 3 year old answered when she got fed up with me
asking her several times in a row what she meant with 'Gewoon.' (I knew
that she meant "Just because." and she knew that I knew).
Sure you can. Just use a 'p5' prefix and tell Perl 6 you're using Perl 5
style regexes. ;-)
Abigail
--
($;,$_,$|,$\)=("\@\x7Fy~*kde~box*Zoxf*Bkiaox","X"x25,1,"\r");
s/./ /;{vec($_=>1+$"=>8)=ord($/^substr$;=>$"=int rand 24=>1);
print&&select$,,$,,$,,$|/($|+tr/X//c);redo if y/X//};sleep 1;
> OK. I am still trying to think up why it was chosen to not affect the
> replacement part. I have no doubt that there is a simple explanation why
> it is not feasible, but I just can't think it up (tired of working some
> very long days, but very satisfied with the results and very happy with
> Perl).
I don't think the reason is that it's not feasible, but rather that it's
not intuitive. Regular expressions can be messy, so having an option to
add comments, and 'beautify' them is a good idea. The replacement part
of an s/// is simply a string, and won't really benefit much from such
an option.
Moreover, you CAN add comments in the replacement part if you want to.
You just need to modify your code slightly, and use the /e modifier.
From your example:
$_ = "abc 123 def 123 ghi";
$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/ # by
4 . # FOUR
5 . # FIVE
6 # SIX
/gsex; # global sex
But, I think this can be less readable than the alternative if the
replacement part is a simple string.
--Ala
> Regular expressions can be messy, so having an
> option to add comments, and 'beautify' them is a good idea. The
> replacement part of an s/// is simply a string, and won't really
> benefit much from such an option.
I met this with some rather lengthy strings of \x{####} in both the
search and the replacement part.
> Moreover, you CAN add comments in the replacement part if you want to.
> You just need to modify your code slightly, and use the /e modifier.
> From your example:
>
> $_ = "abc 123 def 123 ghi";
>
> $x = s/ # Replace
> 1 # ONE
> 2 # TWO
> 3 # THREE
> / # by
> 4 . # FOUR
> 5 . # FIVE
> 6 # SIX
> /gsex; # global sex
Thanks, that looks workable. Will the o-modifier make up for any lost
performance? I'll test it.
> But, I think this can be less readable than the alternative if the
> replacement part is a simple string.
Yes, but in my case it often isn't. The algorithm needs to be checked by
linguists. They rather read the Unicode character names and such, so I
like to use the \N{name} format, but (without that e-modifier) that
would give very lengthy lines. I was going to store everything in
variables, but I'll test this format too.
Short example (without backreferences):
$x = s/(?<=\x{0020})
\x{0111}\x{0123}\x{0222}\x{02AA}\x{0123}\x{0223}\x{0221}\x{0241}\x{0247}
\x{02E2}\x{0223}(?=\x{0020})
/\x{0117}\x{000D}\x{0223}\x{02AA}\x{000D}\x{0223}\x{0221}\x{0221}\x{0223
}/gmsx;
(actual codes munged)
Probably not.
perldoc -q /o
John
--
use Perl;
program
fulfillment
>> $x = s/ # Replace
>> 1 # ONE
>> 2 # TWO
>> 3 # THREE
>> / # by
>> 4 . # FOUR
>> 5 . # FIVE
>> 6 # SIX
>> /gsex; # global sex
>
> Thanks, that looks workable. Will the o-modifier make up for any lost
> performance?
Of course not.
s///o is a no-op when there are no variables in the pattern part.
s///o has no effect whatsoever on the replacement string part.
but
/gosex; # go have sex
would be cute to have in code. :-)
No. /o only matters if you have a variable inside regexp, and then only
if you encounter the regex more than once with a different value in the
variable. And then only if you want to keep using the old value.
My advice is to *never* use /o. There's no point in using it for speed,
and when it matters for speed, the effect may not be what you want - and
even if you want it, it may confuse anyone else looking at the code.
for (qw /foo bar/) {
print /$_/ ? "Yes 1\n" : "No 1\n";
print /$_/o ? "Yes 2\n" : "No 2\n";
}
__END__
Yes 1
Yes 2
Yes 1
No 1
Abigail
--
INIT {print "Perl " }
CHECK {print "another "}
END {print "Hacker\n"}
BEGIN {print "Just " }
> /o only matters if you have a variable inside regexp, and then
> only
> if you encounter the regex more than once with a different value in
> the variable. And then only if you want to keep using the old value.
I have series of substitutions that have to be tried in order on every
line of many files.
To make the code more readable, I can store these substitutions in a
hash (with keys like 'A01' meaning phase A, first substitution).
It is no problem to unloop the code for speed, so it might look like:
$x = s/$re{'A01'}[SRCH]/$re{'A01')[REPL]/gsx; # or /gosx
print STDERR $re{'A01'}[NAME], $x if ($x > $re{'A01'}[MIN]);
$x = s/$re{'A02'}[SRCH]/$re{'A02')[REPL]/gsx;
print STDERR $re{'A02'}[NAME], $x if ($x > $re{'A02'}[MIN]);
(and then dozens more)
If possible, I would like the modifiers to be in $re{'key'}[MODS].
(yes, this is all totally untested code yet)
OK, let me first try and test the alternatives. I still have a few days.
The question you should ask here is: does "$re{A01}[SRCH]" change?
And if it does, do you want to keep using the *old* value? If the
answer to both questions is yes, you could use /o (although I would
use qr//). If latter question is answered with 'no', using /o will
make that your program will produce the wrong results. If the first
question is answered with 'no', then using /o doesn't matter.
-: $x = s/$re{'A02'}[SRCH]/$re{'A02')[REPL]/gsx;
-: print STDERR $re{'A02'}[NAME], $x if ($x > $re{'A02'}[MIN]);
-:
-: (and then dozens more)
-:
Suppose you have @lines containing all the lines you want to inspect,
and @regexes with all the regexes (as strings), there is a gigantic
difference between:
for my $line (@lines) {
for my $regex (@regexes) {
$line =~ /$regex/
}
}
and
for my $regex (@regexes) {
for my $line (@lines) {
$line =~ /$regex/
}
}
The first code snippet means that you will be doing
scalar (@lines) * scalar (@regexes)
regex compilations, while in the latter case, you only will be
doing
scalar (@regexes)
compilations. (Except if you have only one regex, then you will be
compiling only once, in both code snippets).
-: If possible, I would like the modifiers to be in $re{'key'}[MODS].
-: (yes, this is all totally untested code yet)
s/(?$re{key}[MODS])$re{key}[SRCH]/$re{key}[REPL]/
ought to do the trick.
Abigail
--
map{${+chr}=chr}map{$_=>$_^ord$"}$=+$]..3*$=/2;
print "$J$u$s$t $a$n$o$t$h$e$r $P$e$r$l $H$a$c$k$e$r\n";
> does "$re{A01}[SRCH]" change?
No, it's a constant.
> If the first
> question is answered with 'no', then using /o doesn't matter.
OK. I still hesitate that /o really doesn't matter, because I still
expect that a test needs to be done to find out if the variable has
changed or not, but even with such a (fast) test it can hardly matter.
>> If possible, I would like the modifiers to be in $re{'key'}[MODS].
>> (yes, this is all totally untested code yet)
>
> s/(?$re{key}[MODS])$re{key}[SRCH]/$re{key}[REPL]/
>
> ought to do the trick.
Ah, nice. Just another thing that I had read about but hadn't used yet.
It doesn't actually check whether a variable has changed - it just tests
whether, after interpolation, the regex has changed. And compared to
actually executing a regex, this test takes insignificant time. It
doesn't weight up against the hard to trace bugs if the variable does
change and the regex doesn't because you used /o.
Abigail
--
use lib sub {($\) = split /\./ => pop; print $"};
eval "use Just" || eval "use another" || eval "use Perl" || eval "use Hacker";
I should have known better than to base a translation on the "Dutch is
like German" theory.
> It is something my 3 year old answered when she got fed up with me
> asking her several times in a row what she meant with 'Gewoon.' (I knew
> that she meant "Just because." and she knew that I knew).
Oh well... Pedantic educative questions instead of getting on with
whatever you were doing. You had it coming.
> Pedantic educative questions instead of getting on with
> whatever you were doing. You had it coming.
It's just me.
> [regex without /o]
> It doesn't actually check whether a variable has changed - it just
> tests whether, after interpolation, the regex has changed. And
> compared to actually executing a regex, this test takes insignificant
> time. It doesn't weight up against the hard to trace bugs if the
> variable does change and the regex doesn't because you used /o.
OK, thanks for confirming that.
My /o's meant that the variables will never change after setup.
Is there an efficient way to use constants in regexes? Maybe not useful
when constants are actually subs.
--
Affijn, Ruud (flip-flop)
"Gewoon is een tijger."
using qr//?
"Since Perl may compile the pattern at the moment of execution of qr()
operator, using qr() may have speed advantages in some situations, notably
if the result of qr() is used standalone:"
(perlop)
--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
I ploink googlegroups.com :-)
>> Is there an efficient way to use constants in regexes? Maybe not
>> useful when constants are actually subs.
>
> using qr//?
>
> "Since Perl may compile the pattern at the moment of execution of qr()
> operator, using qr() may have speed advantages in some situations,
> notably if the result of qr() is used standalone:"
>
> {perlop)
Thanks John. Abigail already mentioned it, but I didn't look into it
right away and then I just didn't, so now at last I did.
$re{'A01'}[SRCH] = 'some regex, grouping allowed';
$re{'A01')[REPL] = 'some replacement, backtracking allowed';
$re{'A01'}[MODS] = 'xsg';
:
:
$re{'A01'}[QREX] = qr/(?$re{'A01'}[MODS])$re{'A01'}[SRCH]/; #
qompiled regex
:
:
s/$re{'A01'}[QREX]/$re{'A01')[REPL]/;
I get "No 2" on the end, not "No 1"
$ perl -e 'for (qw /foo bar/) {
print /$_/ ? "Yes 1\n" : "No 1\n";
print /$_/o ? "Yes 2\n" : "No 2\n";
}'
Yes 1
Yes 2
Yes 1
No 2
I guess you didn't actually run the code you posted, or you typed from
memory :-P
--
Stan
Indeed.
""
"" $ perl -e 'for (qw /foo bar/) {
"" print /$_/ ? "Yes 1\n" : "No 1\n";
"" print /$_/o ? "Yes 2\n" : "No 2\n";
"" }'
"" Yes 1
"" Yes 2
"" Yes 1
"" No 2
""
"" I guess you didn't actually run the code you posted, or you typed from
"" memory :-P
Oh, I ran it. Then copied into my posting. Then modified it, and typed the
last line by hand instead of using the mouse.
Abigail
--
perl -wle'print"Кхуф бопфиет Ретм Ибглет"^"\x80"x24'
>Anno Siegel:
>
>> Pedantic educative questions instead of getting on with
>> whatever you were doing. You had it coming.
>
>It's just me.
Wheres the original thread?
Why don't you email each others, ahhh whatever it is
lovers do.