Recently in PHP Category

2007 Feb 20

Here's a function from the php manual that returns everything inside two tags or between two string phrases. As we all know, substr requires the integer position of the string to return, so it would be hard if we are only given with start string and end string but not its integer position.


function substring_between($haystack,$start,$end) {
if (strpos($haystack,$start) === false || strpos($haystack,$end) === false) {
return false;
} else {
$start_position = strpos($haystack,$start)+strlen($start);
$end_position = strpos($haystack,$end);
return substr($haystack,$start_position,$end_position-$start_position);
}
}
$content = '<title>Tildemark blogs</title>';
$title = substring_between($content,'<title>','</title>');

the code above returns the title of an html page.

2007 Feb 8

To enable curl library with XAMPP we need to modify the php.ini files in our xampp folder.

1) Locate the following files:
C:\Program Files\xampp\apache\bin\php.ini
C:\Program Files\xampp\php\php.ini
C:\Program Files\xampp\php\php4\php.ini

2) Uncomment the following line on your php.ini file by removing the semicolon.

;extension=php_curl.dll

3) Restart your apache server.

4) Check your phpinfo if curl was properly enabled.





-----
Demand by businesses for 650-393 certified people is fueling the drive for IT certifications. Microsoft 70-284 certification along with 70-528 can be helpful in boosting your career to the next level. Today, PMI-001 and other IT certifications especially CISCO 350-029 certification are becoming as marketable as academic degrees. CISCO 640-801 certification is opening doors for professionals to the most rewarding posts in networking field.

2006 Oct 2

I often get this error when i'm installing mysql server and running phpmyadmin. Just in case you are also experiencing the same,

Find your php.ini file. Open your httpd.conf file and look for the PHPIniDir variable. If it is set to C:\PHP then use that ini file.

Open your php.ini file on your php folder usually at C:\PHP. Look for this line


;extension=php_mysql.dll

uncomment the line by removing the semicolon ';'

Look for the loadable extension modules directory called extension_dir, set it to


extension_dir = "c:\php\ext"

You might want to enable mbstring and curl modules as well by uncommenting the following lines on your php.ini file.

extension=php_mbstring.dll extension=php_curl.dll
2006 Oct 1

when scraping websites, i usually use the function file_get_contents. However, there are times when we only need a specific portion of the site to get; for instance: getting the title of the site or the description.

Instead of using file_get_contents function we instead use the builtin file fopen and fgets functions like this:

<?php
$url = 'http://www.tildemark.com/';
$fp = fopen( $url, 'r' );          // r means open the site for reading
$buffer = trim(fgets($fp, 1024));  // read the first 1024 bytes of data
print "<pre>$buffer</pre>";
?>

But, using CURL functions will be a lot faster. We will use CURLOPT_RANGE to get the specific amount of data from a specified url. CURLOPT_RANGE defines as range(s) of data to retrieve in the format "X-Y" where X or Y are optional. HTTP transfers also support several intervals, separated with commas in the format "X-Y,N-M".

<?php
$url = 'http://www.tildemark.com/';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RANGE, "0-1024");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($curl);
echo "<pre>$content</pre>";
?>
2006 Oct 1

Here's a handy function to get specific tag contents. You could modify the tag you wish to scrape by changing the $start_tag and $end_tag variables. Useful in getting data from multiple html tags.

<?php
function get_tag_contents($start_tag, $end_tag, $url){
$data = file_get_contents($url);
preg_match( "|$start_tag(.*)$end_tag|s", $data, $match);
return match[1];
}
$start_tag = '<p>';
$end_tag = '</p>';
$url = 'http://www.tildemark.com/';
$tag_contents = get_tag_contents($start_tag, $end_tag);
print $tag_contents;
?>


Fell free to modify this code and don't forget to post your changes here.

2006 Sep 21

Getting the a website title and description is easy. Using the PHP's builtin file_get_contents command together with a regex pattern allows us to capture and get any website title and description without any complex methods that is if the site has a title or a description. In case a site has no description a simple excerpt function is also provided.

Getting the site title:

function getMetaTitle($content){
$pattern = "|<[\s]*title[\s]*>([^<]+)<[\s]*/[\s]*title[\s]*>|Ui";
if(preg_match($pattern, $content, $match))
return $match[1];
else
return false;
}

The code above returns the title of the site enclosed by the tags <title> and </title>. The function would return a boolean false in case there was none.

Getting the meta description:

function getMetaDescription($content) {
$metaDescription = false;
$metaDescriptionPatterns = array("/]*>/Ui", "/]*>/Ui");
foreach ($metaDescriptionPatterns as $pattern) {
if (preg_match($pattern, $content, $match))
$metaDescription = $match[1];
break;
}
return $metaDescription;
}

The code above returns the meta description of the site enclosed with single quotes or double quotes. It will return a boolean false it there wasn't any. If this would happen we could get an excerpt of maybe the first website sentence to serve as our website description instead, however getting an excerpt would not be very efficient and i had some trouble with my code. Please fell free to make a comment to optimize it.

Getting the first website sentence:

function getExcerpt($content) {
$text = html_entity_decode($content);
$excerpt = array();
//match all tags
preg_match_all("|<[^>]+>(.*)]+>|", $text, $p, PREG_PATTERN_ORDER);
for ($x = 0; $x < sizeof($p[0]); $x++) {
if (preg_match('< p >i', $p[0][$x])) {
$strip = strip_tags($p[0][$x]);
if (preg_match("/\./", $strip))
$excerpt[] = $strip;
}
if (isset($excerpt[0])){
preg_match("/([^.]+.)/", $strip,$matches);
return $matches[1];
}
}
return false;
}

The code above reads the entire page and looks for the <p> tag, then returns the first phrase that ends with a period and stripping all the html code inside.

Here's a sample code to test our script:

$url = 'http://www.tildemark.com/';
$content = file_get_contents($url);
$title = getMetaTitle($content);
$description = getMetaDescription($content);
$excerpt = getExcerpt($content);
print "title: $title ";
print "< br />";
print "description: $description ";
print "< br />";
print "excerpt: $excerpt";
?>

You may download a working copy of the title and description scraper script.

Thank you for the comment:
Yes, indeed. We could use the builtin get_meta_tags function to get the website description without any knowledge on regular expressions. here's how:

<?php $meta_data= get_meta_tags('http://www.tildemark.com/'); echo $meta_data['description']; ?>

Aside from getting the description, you could also get Author, Keyword and GeoPosition meta data using the function get_meta_data().

About this Archive

This page is a archive of recent entries in the PHP category.

MySQL is the previous category.

Find recent content on the main index or look in the archives to find all content.