Scrape web contents faster

when scraping websites, i usually use the function file_get_contents. However, there are times when we only need a specific portion of the site to get; for instance: getting the title of the site or the description.
Instead of using file_get_contents function we instead use the builtin file fopen and fgets functions like this:

<?php
$url = 'http://www.tildemark.com/';
$fp = fopen( $url, 'r' );          // r means open the site for reading
$buffer = trim(fgets($fp, 1024));  // read the first 1024 bytes of data
print "<pre>$buffer</pre>";
?>

But, using CURL functions will be a lot faster. We will use CURLOPT_RANGE to get the specific amount of data from a specified url. CURLOPT_RANGE defines as range(s) of data to retrieve in the format “X-Y” where X or Y are optional. HTTP transfers also support several intervals, separated with commas in the format “X-Y,N-M”.

<?php
$url = 'http://www.tildemark.com/';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RANGE, "0-1024");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($curl);
echo "<pre>$content</pre>";
?>
Tags: , , ,

Written by tildemark

Alfredo Sanchez, Jr is an internet technologist who spent so little time away from his computer. This little time is usually spent on teaching college students at a nearby university. He's been writing technology topics since 2005 and if combined might be able to make a book out of it.

Comments

2 Comments on "Scrape web contents faster"

  1. This range thing doesnt work when we are using POST :s
    it downloads the whole page..

  2. I typically use the file get contents method myself. This is an interesting way to just get portions of data. Thx for the tip, Richard


Here's your chance to leave a comment!