PikeTheCrow February 2016

Perl Regex: Non-Greedy

Okay - newbie with Perl and Regex. Looked over the previous problems / solutions on offer - none really compare.

I need to write a script which does the following:

$ cat testdata.txt 
this is my file containing data
for checking pattern matching with a patt on the back!
only one line contains the p word.

$ ./mygrep5 pat th testdata.txt
this is my file containing data
for checking PATTERN MATCHING WITH a PATT ON THe back!
only one line contains the p word.

I have been able to print the line which is amended with the "a" capitalized as well. I have no idea how to only take what is needed.

I have been messing around (below is my script so far) and all I manage to return is the "PATT ON TH" part.

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';
use Data::Dump 'pp';

my ( $f, $s, $t ) = @ARGV;
my @output_lines;

open( my $fh, '<', $t );

while ( my $line = <$fh> ) {
    if ( $line =~ /$f/ && $line =~ /$s/ ) {
        $line =~ s/($f.+?$s)/$1/g;
        my $sub_phrase = uc $1;
        $line =~ s/$1/$sub_phrase/g;
        print $line;
    }
    #else {
    #       print $line;
    #}
}

close($fh);

which returns: "for checking pattern matching with a PATT ON THe back!"

Please help

Answers


ikegami February 2016

So you want to capitalize from pat to th except for instances of a surrounded by spaces? The easiest way is to uppercase the whole thing, then fix any instances of A surrounded by spaces.

sub capitalize {
    my $s = shift;
    my $uc = uc($s);
    $uc =~ s/ \s \K A (?=\s) /a/xg;
    return $uc;
}

s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;

The downside is that will replacing any existing A surrounded by spaces with a. The following is more complicated, but doesn't suffer from that problem:

sub capitalize {
    my $s = shift;
    my @parts = $s =~ m{ \G ( \s+ | \S+ ) }xg;
    for (@parts) {
       $_ = uc($_) if $_ ne "a";
    }

    return join('', @parts);
}

s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;

The rest of the code can be simplified:

#!/usr/bin/perl

use strict;
use warnings;

sub capitalize { ... }

my $f = shift;
my $s = shift;

while (<>) {
    s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
    print;
}


jcaron February 2016

So, it you want to match each sequence that starts with pat and ends with th, non-greedily, and uppercase that sequence, you can simply use an expression on the right side of your substitution:

$line =~ s/($f.+?$s)/uc($1)/eg;

And that's it.

Post Status

Asked in February 2016
Viewed 2,493 times
Voted 12
Answered 2 times

Search




Leave an answer