<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>Tildemark blogs</title>
        <link>http://www.tildemark.com/</link>
        <description>Blogging on uniquely random things. </description>
        <language>en</language>
        <copyright>Copyright 2008</copyright>
        <lastBuildDate>Tue, 20 Feb 2007 09:58:01 +0800</lastBuildDate>
        <generator>http://www.sixapart.com/movabletype/</generator>
        <docs>http://www.rssboard.org/rss-specification</docs>
        
        <item>
            <title>Getting data between html tags or string phrases</title>
            <description><![CDATA[<p>Here's a function from the <a href="http://www.php.net/manual/en/function.substr.php#68936">php manual</a> that returns everything inside two tags or between two string phrases. As we all know, <a href="http://www.php.net/manual/en/function.substr.php">substr</a> requires the integer position of the string to return, so it would be hard if we are only given with start string and end string but not its integer position. </p>

<div class="module-code"><code><br>
function substring_between($haystack,$start,$end) {<br>
   if (strpos($haystack,$start) === false || strpos($haystack,$end) === false) {<br>
       return false;<br>
   } else {<br>
       $start_position = strpos($haystack,$start)+strlen($start);<br>
       $end_position = strpos($haystack,$end);<br>
       return substr($haystack,$start_position,$end_position-$start_position);<br>
   }<br>
}<br>
$content = '&lt;title&gt;Tildemark blogs&lt;/title&gt;';<br>
$title = substring_between($content,'&lt;title&gt;','&lt;/title&gt;');<br>
</code></div>

<p>the code above returns the title of an html page.</p>]]></description>
            <link>http://www.tildemark.com/programming/php/getting-data-between-html-tags-or-string-phrases.html</link>
            <guid>http://www.tildemark.com/programming/php/getting-data-between-html-tags-or-string-phrases.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">PHP</category>
            
            
            <pubDate>Tue, 20 Feb 2007 09:58:01 +0800</pubDate>
        </item>
        
        <item>
            <title>Enable curl with XAMPP on Windows XP</title>
            <description><![CDATA[<p>To enable curl library with XAMPP we need to modify the php.ini files in our xampp folder.</p>

<p>1) Locate the following files:<br />
C:\Program Files\xampp\apache\bin\php.ini<br />
C:\Program Files\xampp\php\php.ini<br />
C:\Program Files\xampp\php\php4\php.ini</p>

<p>2) Uncomment the following line on your php.ini file by removing the semicolon.</p>

<p>;extension=php_curl.dll</p>

<p>3) Restart your apache server.</p>

<p>4) Check your phpinfo if curl was properly enabled.</p>

<p><br /><br />
<br /><br />
<p />-----<br />
Demand by businesses for <a href="http://www.testking.com/650-393.htm">650-393</a> certified people is fueling the drive for IT certifications. Microsoft <a href="http://www.testking.com/70-284.htm">70-284</a> certification along with <a href="http://www.testking.com/70-528.htm">70-528</a> can be helpful in boosting your career to the next level.  Today, <a href="http://www.testking.com/PMI-001.htm">PMI-001</a> and other IT certifications especially CISCO <a href="http://www.testking.com/350-029.htm">350-029</a> certification are becoming as marketable as academic degrees. CISCO <a href="http://www.testking.com/640-801.htm">640-801</a> certification is opening doors for professionals to the most rewarding posts in networking field.</p>]]></description>
            <link>http://www.tildemark.com/programming/php/enable-curl-with-xampp-on-windows-xp.html</link>
            <guid>http://www.tildemark.com/programming/php/enable-curl-with-xampp-on-windows-xp.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">PHP</category>
            
            
            <pubDate>Thu, 08 Feb 2007 14:46:31 +0800</pubDate>
        </item>
        
        <item>
            <title>Cannot Load Mysql extension please check PHP configuration in phpMyAdmin</title>
            <description><![CDATA[<p>I often get this error when i'm installing mysql server and running phpmyadmin. Just in case you are also experiencing the same, </p>

<p>Find your php.ini file. Open your httpd.conf file and look for the PHPIniDir variable. If it is set to C:\PHP then use that ini file. </p>

<p>Open your php.ini file on your php folder usually at C:\PHP. Look for this line <br />
<div class="module-code"><br />
;extension=php_mysql.dll<br />
</div><br />
uncomment the line by removing the semicolon ';'</p>

<p>Look for the loadable extension modules directory called extension_dir, set it to <br />
<div class="module-code"><br />
extension_dir = "c:\php\ext"<br />
</div></p>

<p>You might want to enable mbstring and curl modules as well by uncommenting the following lines on your php.ini file.</p>

<div class="module-code">
extension=php_mbstring.dll
extension=php_curl.dll
</div>]]></description>
            <link>http://www.tildemark.com/programming/mysql/cannot-load-mysql-extension-please-check-php-confi.html</link>
            <guid>http://www.tildemark.com/programming/mysql/cannot-load-mysql-extension-please-check-php-confi.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">MySQL</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">PHP</category>
            
            
            <pubDate>Mon, 02 Oct 2006 07:12:49 +0800</pubDate>
        </item>
        
        <item>
            <title>Scrape web contents faster</title>
            <description><![CDATA[<p>when scraping websites, i usually use the function <a href="http://www.php.net/file_get_contents">file_get_contents</a>. However, there are times when we only need a specific portion of the site to get; for instance: <a href="http://www.tildemark.com/programming/php/getting-website-title-and-description.html">getting the title of the site or the description</a>.  </p>

<p>Instead of using file_get_contents function we instead use the builtin file <a href="http://www.php.net/manual/en/function.fopen.php">fopen</a> and <a href="http://www.php.net/manual/en/function.fgets.php">fgets</a> functions like this:</p>

<div class="module-code"><pre>&lt;?php
$url = 'http://www.tildemark.com/';
$fp = fopen( $url, 'r' );          // r means open the site for reading
$buffer = trim(fgets($fp, 1024));  // read the first 1024 bytes of data
print "&lt;pre&gt;$buffer&lt;/pre&gt;";
?&gt;</pre>
</div>

<p>But, using <a href="http://www.php.net/manual/en/function.curl-setopt.php">CURL</a> functions will be a lot faster. We will use CURLOPT_RANGE to get the specific amount of data from a specified url. CURLOPT_RANGE defines as range(s) of data to retrieve in the format "X-Y" where X or Y are optional. HTTP transfers also support several intervals, separated with commas in the format "X-Y,N-M". </p>

<div class="module-code"><pre>&lt;?php
$url = 'http://www.tildemark.com/';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RANGE, "0-1024");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($curl);
echo "&lt;pre&gt;$content&lt;/pre&gt;";
?&gt;</pre></div>]]></description>
            <link>http://www.tildemark.com/programming/php/scrape-web-contents-faster.html</link>
            <guid>http://www.tildemark.com/programming/php/scrape-web-contents-faster.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">PHP</category>
            
            
            <pubDate>Sun, 01 Oct 2006 14:59:28 +0800</pubDate>
        </item>
        
        <item>
            <title>Getting specific tag contents</title>
            <description><![CDATA[<p>Here's a handy function to get specific tag contents. You could modify the tag you wish to scrape by changing the $start_tag and $end_tag variables. Useful in getting data from multiple html tags. </p>

<div class="module-code">&lt;?php<br />
function get_tag_contents($start_tag, $end_tag, $url){<br />
$data = file_get_contents($url);<br />
preg_match( "|$start_tag(.*)$end_tag|s", $data, $match);<br />
  return match[1];<br />
}<br />
$start_tag = '&lt;p&gt;';<br />
$end_tag = '&lt;/p&gt;';<br />
$url = 'http://www.tildemark.com/';<br />
$tag_contents = get_tag_contents($start_tag, $end_tag);<br />
print $tag_contents;<br />
?&gt;</div>

<p><br />
Fell free to modify this code and don't forget to post your changes here. <br />
</p>]]></description>
            <link>http://www.tildemark.com/programming/php/getting-specific-tag-contents.html</link>
            <guid>http://www.tildemark.com/programming/php/getting-specific-tag-contents.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">PHP</category>
            
            
            <pubDate>Sun, 01 Oct 2006 13:56:36 +0800</pubDate>
        </item>
        
        <item>
            <title>Getting website title and description</title>
            <description><![CDATA[<p>Getting the a website title and description is easy. Using the PHP's builtin file_get_contents command together with a regex pattern allows us to capture and get any website title and description without any complex methods that is if the site has a title or a description. In case a site has no description a simple excerpt function is also provided. </p>

<p>Getting the site title:<br />
<div class="module-powered module"><div class="module-content"><code>function getMetaTitle($content){<br />
  $pattern = "|<[\s]*title[\s]*>([^<]+)<[\s]*/[\s]*title[\s]*>|Ui";<br />
  if(preg_match($pattern, $content, $match))<br />
  return $match[1];<br />
  else<br />
  return false;<br />
}</code></div></div><br />
The code above returns the title of the site enclosed by the tags &lt;title&gt; and &lt;/title&gt;. The function would return a boolean false in case there was none. </p>

<p>Getting the meta description:<br />
<div class="module-powered module"><div class="module-content"><code>function getMetaDescription($content) {<br />
  $metaDescription = false;<br />
  $metaDescriptionPatterns = array("/<meta.+description.+content[\s]*=[\s]*\"([^\"]+)\"[^>]*>/Ui", "/<meta.+description.+content[\s]*=[\s]*'([^']+)'[^>]*>/Ui");<br />
  foreach ($metaDescriptionPatterns as $pattern) {<br />
    if (preg_match($pattern, $content, $match))<br />
      $metaDescription = $match[1];<br />
    break;<br />
  }<br />
  return $metaDescription;<br />
}</code></div></div><br />
The code above returns the meta description of the site enclosed with single quotes or double quotes. It will return a boolean false it there wasn't any. If this would happen we could get an excerpt of maybe the first website sentence to serve as our website description instead, however getting an excerpt would not be very efficient and i had some trouble with my code. Please fell free to make a comment to optimize it. </p>

<p>Getting the first website sentence:<br />
<div class="module-powered module"><div class="module-content"><code>function getExcerpt($content) {<br />
  $text = html_entity_decode($content);<br />
  $excerpt = array();<br />
  //match all tags<br />
  preg_match_all("|<[^>]+>(.*)</[^>]+>|", $text, $p, PREG_PATTERN_ORDER);<br />
  for ($x = 0; $x < sizeof($p[0]); $x++) {<br />
    if (preg_match('< p >i', $p[0][$x])) {<br />
      $strip = strip_tags($p[0][$x]);<br />
      if (preg_match("/\./", $strip))<br />
        $excerpt[] = $strip;<br />
    }<br />
    if (isset($excerpt[0])){<br />
      preg_match("/([^.]+.)/", $strip,$matches);<br />
      return $matches[1];<br />
    }<br />
  }<br />
  return false;<br />
}</code></div></div><br />
The code above reads the entire page and looks for the &lt;p&gt; tag, then returns the first phrase that ends with a period and stripping all the html code inside.</p>

<p>Here's a sample code to test our script:<br />
<div class="module-powered module"><div class="module-content"><code><?php<br />
$url = 'http://www.tildemark.com/';<br />
$content = file_get_contents($url);<br />
$title = getMetaTitle($content);<br />
$description = getMetaDescription($content);<br />
$excerpt = getExcerpt($content);<br />
print "title: $title ";<br />
print "< br />";<br />
print "description: $description ";<br />
print "< br />";<br />
print "excerpt: $excerpt";<br />
?></code></div></div></p>

<p>You may download a working copy of the <a href="http://www.tildemark.com/downloads/title-and-description-scraper.txt">title and description scraper script</a>.</p>

<p>Thank you for the comment:<br />
Yes, indeed. We could use the builtin get_meta_tags function to get the website description without any knowledge on regular expressions. here's how:</p>

<div class="module-powered module"><div class="module-content"><code>&lt;?php
$meta_data= get_meta_tags('http://www.tildemark.com/');
echo $meta_data['description'];
?&gt;</code></div></div>

<p>Aside from getting the description, you could also get Author, Keyword and GeoPosition meta data using the function get_meta_data(). </p>]]></description>
            <link>http://www.tildemark.com/programming/php/getting-website-title-and-description.html</link>
            <guid>http://www.tildemark.com/programming/php/getting-website-title-and-description.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">PHP</category>
            
            
            <pubDate>Thu, 21 Sep 2006 19:04:53 +0800</pubDate>
        </item>
        
    </channel>
</rss>






