Home Ask Login Register

Developers Planet

Your answer is one click away!

Dexter February 2016

Trouble with preg_replace() in Foreach loop

Long story short, my client lost access to their server because of a dispute, and they need all their club photos so I can build them a new site. I'm having to download them by URL, and they are handled by a PHP output that gives different sizes to reduce server load.

There are over 3000 of them, and I'm not about to waste time doing this one by one.

So, I decided to write a quick and [very] dirty PHP script that will crawl the pages using DOMDocument looking for the links to the image, across each album and then across the album sub-pages.

Everything works fine, except this one particular part of the script that looks on the album page for:

(1) a link to an image, which is

<div class='imagethumb'>
    <a href="/gallery/index.php?album=blowout1&image=blahblah.jpg" title="Blahblah>
        <img src="/gallery/index.php?album=blowout1&image=blahblah_thumb.jpg />
    </a>
</div>

(2) a link to a subsequent page, which is

<li>
    <a href="/gallery/index.php?album=beginning&amp;page=2" title="Page 2">2</a>
</li>

(3) a link to the album "Last Page" or "..."

<li>
    <a href="/gallery/index.php?album=recognition&page=9" title="Page 9">...</a>
</li>

Here's the relevant part of the script:

//$url is an argument in the function wrapping this script

//look on albums for links
foreach ($album_links as $a_url) {
    $album_html = file_get_contents($a_url['url']);
    $album = new DOMDocument;
    $album->loadHTML($album_html);
    $i_links = $album->getElementsByTagName('a');
    $album_title = $album->getElementsByTagName('title')->item(0)->textContent;

    //to keep track of the number of sub-page links found, exclude page 1
    $num_page_lnks = 1;

    //search through all links on the page, look for:
        

Answers


RomanPerekhrest February 2016

The problem is in your last nested loop:

//Last Page links appear when greater than 7 pages, so start at 8 ($num_page_links + 1)
for ($count = ($num_page_lnks + 1); $count < ($url_parse['page'] + 1); $count++) {
     array_push($image_page_links,  "http://" . parse_url($url, PHP_URL_HOST) . preg_replace("/[^\=]\d+$/", $count, $link->getAttribute('href')));
 }

When you are reaching the 7th sublink (with text content "...") the $num_page_lnks variable has value 7 and $url_parse['page'] has value 9. So there will be two iterations where $count variable will be assigned with 8, then - with 9.
But ... those links remains the same:

"http://club.website.com/gallery/index.php?album=recognition&page=9"
"http://club.website.com/gallery/index.php?album=recognition&page=9"

because your regex pattern doesn't make the expected replacement.

var_dump(preg_replace("/[^\=]\d+$/",8,"/gallery/index.php?album=recognition&amp;page=9"));
// will output:
string(47) "/gallery/index.php?album=recognition&page=9"

Change your regex pattern to this one: /\d+$/ or consider some other logic.

Post Status

Asked in February 2016
Viewed 1,885 times
Voted 8
Answered 1 times

Search




Leave an answer


Quote of the day: live life